System health monitoring — run health checks, track errors, and auto-remediate issues
Proactive system monitoring: runs health checks across all services, tracks errors in SQLite, scores system health, and auto-remediates common issues.
cd /opt/PROJECT && /opt/PROJECT/venv/bin/python -c "
import sys; sys.path.insert(0, '/opt/PROJECT'); sys.path.insert(0, '/opt/PROJECT/SKILLS')
from dotenv import load_dotenv; load_dotenv('/opt/PROJECT/.env')
from SKILLS.monitoring import health_monitor, error_tracker
# Run full health check (10 checks)
results = health_monitor.run_health_monitor(dry_run=False)
print(health_monitor.format_report(results))
# Telegram push was removed 2026-04-17 (spammy). Use format_report() above
# or run with --json for machine-readable output.
# Error tracking
summary = error_tracker.get_error_summary(days=7)
trends = error_tracker.get_error_trends(days=30)
recent = error_tracker.get_recent_errors(limit=20)
uptime = error_tracker.get_uptime_stats(days=7)
# Print text report of errors
error_tracker.print_text_report(days=7)
# Generate HTML error report
html = error_tracker.generate_html_report(days=7)
"
run_health_monitor(dry_run=False) — Run all health checks, returns results dictformat_report(results) — Human-readable report stringtrack_error(skill, error_type, message, ...) — Log an error to SQLitetrack_errors(skill) — Decorator to auto-track errors from a skillget_error_summary(days=7) — Summary of errors by skill/typeget_error_trends(days=30) — Error trends over timeget_recent_errors(limit=20) — Most recent errorsget_uptime_stats(days=7) — Uptime statistics per skillprint_text_report(days=7) — Print formatted error reportgenerate_html_report(days=7) — HTML error dashboardAutomatically fixes common issues: OAuth token refresh, service restarts, rate limit backoff, stale JSON cleanup, import errors, disk space.