Inspect DS9 production safely from raoDesktop using Azure, App Insights, Key Vault, and the readonly Postgres lane. Use when Sunil asks Linus to investigate a production Tribble / DS9 issue without making product changes.
Use this skill for production debugging on raoDesktop.
Goals:
Do not use this skill to:
raoDesktopRun this skill from raoDesktop WSL, not the droplet.
Expected prerequisites:
az is installed[email protected]KV-tribble-prod is reachable from the current workstation IPCurrent known production context:
SUBSCRIPTION_ID_PROD from /home/sunil/ds9/scripts/.envRG-prodAI-tribble-prodKV-tribble-prodDATABASEURL-READONLYDATABASEURLvm-prod-pg-tunnelDefault to readonly operations.
Allowed:
az resource showaz webapp config appsettings listaz functionapp config appsettings listaz rest against App Insights query APIaz keyvault secret show for readonly connection discoveryaz vm run-command invoke only for readonly psql queries via DATABASEURL-READONLYNot allowed unless Sunil explicitly asks:
INSERT, UPDATE, DELETE, ALTER, DROP, TRUNCATENever allowed through this skill:
az webapp deployIf production diagnosis reveals a code bug:
Important current state:
From raoDesktop:
python3 /home/sunil/.local/share/linus/ds9-prod-debug/scripts/verify_prod_debug_access.py
That should confirm:
Use:
python3 /home/sunil/.local/share/linus/ds9-prod-debug/scripts/query_prod_app_insights.py \
"requests | where timestamp > ago(30m) | summarize count()"
Examples:
python3 /home/sunil/.local/share/linus/ds9-prod-debug/scripts/query_prod_app_insights.py \
"exceptions | where timestamp > ago(24h) | order by timestamp desc | take 50"
python3 /home/sunil/.local/share/linus/ds9-prod-debug/scripts/query_prod_app_insights.py \
"traces | where timestamp > ago(2h) and message has 'findAllowedBot' | order by timestamp desc | take 100"
Use:
python3 /home/sunil/.local/share/linus/ds9-prod-debug/scripts/query_prod_read_db.py \
"select current_user, current_database();"
This runs the query through Azure Run Command on vm-prod-pg-tunnel using DATABASEURL-READONLY.
Examples:
python3 /home/sunil/.local/share/linus/ds9-prod-debug/scripts/query_prod_read_db.py \
"select * from tribble.allowed_bot limit 20;"
python3 /home/sunil/.local/share/linus/ds9-prod-debug/scripts/query_prod_read_db.py \
\"select slack_bot_id, slack_user_id, slack_team_id, tribble_user_id from tribble.allowed_bot order by created_at desc limit 20;\"
python3 /home/sunil/.local/share/linus/ds9-prod-debug/scripts/query_prod_read_db.py \
"show transaction_read_only;"
For the common Slack bot allow-list issue:
python3 /home/sunil/.local/share/linus/ds9-prod-debug/scripts/debug_allowed_bot.py U0AEX96E0SK T05261TL2EP
This does two things:
tribble.allowed_bot through the readonly prod DB lanefindAllowedBotUse that before speculating about bot ID mismatches or client context.
In shared Slack channels:
In DM with Sunil:
verify_prod_debug_access.py.