Autonomous setup and management of external service monitoring using UptimeRobot (HTTP endpoint monitoring) and Healthchecks.io (heartbeat/Dead Man's Switch monitoring). Use when setting up monitoring for Cloud Run Jobs, VM services, or investigating monitoring configuration issues. Includes Pushover integration, validated API patterns, and dual-service architecture.
This skill provides validated patterns for setting up external service monitoring using two complementary services:
Key Principle: Never monitor from the same infrastructure being monitored. Both services run externally on separate infrastructure to avoid single points of failure.
Cost: $0/month using free tiers of both services for complete monitoring coverage.
Use this skill when:
Is the workload ephemeral (runs and exits)?
├─ YES → Use Healthchecks.io (Dead Man's Switch)
│ ├─ Job pings on success
│ ├─ No ping within timeout → Alert
│ └─ Free tier: 20 checks, user-defined timeouts
│
└─ NO → Is it a persistent HTTP endpoint?
└─ YES → Use UptimeRobot (HTTP polling)
├─ Service pings endpoint every N minutes
├─ No response → Alert
└─ Free tier: 50 monitors, 5-minute intervals
Recommended Architecture: Use BOTH services for dual-pipeline monitoring
# /// script
# dependencies = ["requests"]
# ///
import os
from scripts.healthchecks_client import HealthchecksClient
# Get API key from Doppler
api_key = os.popen(
"doppler secrets get HEALTHCHECKS_API_KEY --project claude-config --config dev --plain"
).read().strip()
client = HealthchecksClient(api_key)
# Create check for Cloud Run Job
result = client.create_check(
name="Ethereum Collector Job",
timeout=7200, # 2 hours
grace=600, # 10 minutes grace period
tags="production ethereum",
channels="*" # All notification channels
)
ping_url = result["ping_url"]
print(f"Add to Cloud Run Job environment: HEALTHCHECK_PING_URL={ping_url}")
# In your Cloud Run Job script:
# import requests
# requests.get(os.getenv("HEALTHCHECK_PING_URL")) # On success
# requests.get(f"{os.getenv('HEALTHCHECK_PING_URL')}/fail") # On failure
# /// script
# dependencies = ["requests"]
# ///
import os
from scripts.uptimerobot_client import UptimeRobotClient
# Get API key from Doppler
api_key = os.popen(
"doppler secrets get UPTIMEROBOT_API_KEY --project claude-config --config dev --plain"
).read().strip()
client = UptimeRobotClient(api_key)
# Create HTTP monitor for VM service
result = client.create_monitor(
friendly_name="Production API Endpoint",
url="https://your-vm-ip:8000/health",
type=1, # HTTP(S)
interval=300, # 5 minutes (free tier)
alert_contacts=client.get_pushover_contact_id() # Type 9 Pushover contact
)
print(f"Monitor created: {result['monitor']['id']}")
UptimeRobot API v2 uses POST data authentication:
def _request(self, endpoint: str, data: Dict) -> Dict:
data["api_key"] = self.api_key
data["format"] = "json"
response = requests.post(f"{self.base_url}/{endpoint}", data=data)
response.raise_for_status()
result = response.json()
if result.get("stat") != "ok":
raise Exception(f"UptimeRobot API error: {result}")
return result
List Monitors:
monitors = client.get_monitors()
for monitor in monitors:
print(f"{monitor['friendly_name']}: {monitor['url']} - {monitor['status']}")
Create Monitor:
result = client.create_monitor(
friendly_name="API Health Check",
url="https://api.example.com/health",
type=1, # HTTP(S)
interval=300, # 5 minutes
alert_contacts=pushover_id
)
Delete Monitor:
client.delete_monitor(monitor_id="801762241")
Get Pushover Contact ID:
pushover_id = client.get_pushover_contact_id()
if not pushover_id:
print("⚠️ Pushover not configured - see Pushover Integration Setup")
1 = HTTP(S) - Checks endpoint response2 = Keyword - Checks for specific text in response3 = Ping - ICMP ping4 = Port - TCP port checkHealthchecks.io API v3 uses X-Api-Key header authentication:
headers = {
"X-Api-Key": api_key,
"Content-Type": "application/json"
}
response = requests.get(f"{base_url}/checks/", headers=headers)
List Checks:
checks = client.get_checks()
for check in checks:
print(f"{check['name']}: {check['status']} - {check['ping_url']}")
Create Check:
result = client.create_check(
name="Daily Backup Job",
timeout=86400, # 24 hours
grace=3600, # 1 hour grace
tags="backup production",
channels=pushover_id # Or "*" for all channels
)
ping_url = result["ping_url"]
Ping Check (Success):
# From your job/script
import requests
requests.get(ping_url)
Ping Check (Failure):
requests.get(f"{ping_url}/fail")
Delete Check:
client.delete_check(check_uuid="6a991157-552d-4c2c-b972-d43de0a96bff")
Perfect for ephemeral workloads (Cloud Run Jobs, cron jobs):
Job Lifecycle:
1. Job starts
2. Job executes work
3. Job pings Healthchecks.io on success
4. If no ping within timeout → Alert
Advantages:
- No always-on endpoint needed
- Works with ephemeral infrastructure
- Simple integration (one HTTP GET)
- Catches job crashes, hangs, or scheduling failures
Important: Pushover integration must be configured via web UI first. API can only list and assign existing integrations, not create them.
client.get_pushover_contact_id() (should return numeric ID)API Note: Pushover contacts appear as type=9 in UptimeRobot API responses.
client.get_pushover_channel_id() (should return UUID)API Note: Pushover channels use kind code "po" (abbreviated), not "pushover".
Common Issue: If get_pushover_contact_id() or get_pushover_channel_id() returns None, Pushover is not configured. Complete setup steps above via web UI.
429 Too Many Requests:
Retry-After header (observed: 47 seconds)X-RateLimit-Remaining header (counts down from 9 to 0)Rate Limit Example:
import time
from requests.exceptions import HTTPError