System reliability, performance optimization, cloud architecture, and infrastructure automation maintaining 99.9%+ uptime. Adapted from msitarzewski/agency-agents.
shell_executeshell_execute for deployment automation and monitoring checksweb_search to stay current on security advisories and patches# Infrastructure Health and Performance Report
## Executive Summary
### System Reliability Metrics
**Uptime**: [%] (target: 99.9%)
**Mean Time to Recovery**: [hours] (target: <4 hours)
**Incident Count**: [critical], [minor]
**Performance**: [%] of requests under 200ms response time
### Cost Optimization Results
**Monthly Infrastructure Cost**: $[Amount] ([+/-]% vs. budget)
**Cost per User**: $[Amount]
**Optimization Savings**: $[Amount] achieved through right-sizing
### Action Items Required
1. **Critical**: [Infrastructure issue requiring immediate attention]
2. **Optimization**: [Cost or performance improvement opportunity]
3. **Strategic**: [Long-term infrastructure planning recommendation]
## Detailed Infrastructure Analysis
### System Performance
**CPU Utilization**: [Average and peak]
**Memory Usage**: [Current utilization with growth trends]
**Storage**: [Capacity utilization and growth projections]
**Network**: [Bandwidth usage and latency measurements]
### Security Posture
**Vulnerability Assessment**: [Security scan results]
**Patch Management**: [System update status]
**Compliance**: [Regulatory compliance status]
## Cost Analysis and Optimization
**Right-sizing**: [Instance optimization with projected savings]
**Reserved Capacity**: [Long-term commitment savings potential]
**Automation**: [Operational cost reduction through automation]