Scan ArgoCD application logs, triage deploy/sync issues, check container health, and diagnose migration failures across ops clusters. Use when the user says "argocd", "check deploys", "check pods", "what's failing", "triage", or asks about app health.
Scan and triage ArgoCD-managed applications across the ops production and staging clusters. This skill checks application sync/health status, pulls pod logs, inspects Kubernetes events, and surfaces actionable issues.
| Alias | Cluster | API Server |
|---|---|---|
eks-ops-production | Production | https://5B9C5E61FDD7E90F0CF230129CD5D4B6.gr7.us-east-2.eks.amazonaws.com |
eks-ops-staging | Staging | https://4DB97F6B7EEB40C519A2845E2656BDFB.yl4.us-west-1.eks.amazonaws.com |
opsargocd.flybreeze.teamAsk the user what they want to check. Options:
If the user's initial message already specifies scope (e.g., "check staging" or "what's failing in prod"), skip this prompt and proceed.
List all applications in the target scope:
argocd app list --server argocd.flybreeze.team -o wide
To filter by cluster, pipe through grep for the relevant API server URL. To filter by project/namespace:
argocd app list --server argocd.flybreeze.team --project ops -o wide
Parse the output and build a summary table with columns: App Name, Sync Status, Health Status, Cluster, Last Sync Time.
Flag any app where:
SyncedHealthyIf everything is green and the user asked for a general scan, report that all apps are healthy and stop unless the user wants deeper inspection.
For each flagged app, get detailed status:
argocd app get <app-name> --server argocd.flybreeze.team
This shows:
Report the resource-level health breakdown. Identify which specific resources are degraded.
For apps with unhealthy pods or containers, pull recent logs:
argocd app logs <app-name> --server argocd.flybreeze.team --namespace ops --tail 100
If specific container or pod names are known from Step 3, narrow the log query:
argocd app logs <app-name> --server argocd.flybreeze.team --namespace ops --container <container> --tail 150
For crash-looping containers, pull previous container logs:
argocd app logs <app-name> --server argocd.flybreeze.team --namespace ops --container <container> --previous --tail 150
Scan logs for:
For apps that are out-of-sync or recently failed:
argocd app history <app-name> --server argocd.flybreeze.team
This shows recent deploy attempts with revisions and status. Identify:
If ArgoCD logs alone don't explain the issue, check resource events:
argocd app resources <app-name> --server argocd.flybreeze.team
This lists all managed resources and their health. For specific failing resources, the user may need kubectl access — note this as a next step if ArgoCD tooling isn't sufficient.
Present findings as a structured report:
Cluster Health Overview:
Issues Found (for each issue):
Migration Issues (if any):
If no issues are found, confirm all apps are healthy with last-sync timestamps.
--server argocd.flybreeze.team on every argocd command.argocd commands in parallel where possible (e.g., fetching details for multiple apps simultaneously).--tail to avoid dumping thousands of lines. Start with 100-150 lines; fetch more only if the issue isn't visible.argocd commands fail with auth errors, prompt the user to re-login: argocd login argocd.flybreeze.team --sso