Name: Service Monitoring Setup
Author: terrylica

Buscar habilidades.../

Service Monitoring Setup | Skills Pool

Is the workload ephemeral (runs and exits)?
├─ YES → Use Healthchecks.io (Dead Man's Switch)
│   ├─ Job pings on success
│   ├─ No ping within timeout → Alert
│   └─ Free tier: 20 checks, user-defined timeouts
│
└─ NO → Is it a persistent HTTP endpoint?
    └─ YES → Use UptimeRobot (HTTP polling)
        ├─ Service pings endpoint every N minutes
        ├─ No response → Alert
        └─ Free tier: 50 monitors, 5-minute intervals

# /// script
# dependencies = ["requests"]
# ///

import os
from scripts.healthchecks_client import HealthchecksClient

# Get API key from Doppler
api_key = os.popen(
    "doppler secrets get HEALTHCHECKS_API_KEY --project claude-config --config dev --plain"
).read().strip()

client = HealthchecksClient(api_key)

# Create check for Cloud Run Job
result = client.create_check(
    name="Ethereum Collector Job",
    timeout=7200,  # 2 hours
    grace=600,     # 10 minutes grace period
    tags="production ethereum",
    channels="*"   # All notification channels
)

ping_url = result["ping_url"]
print(f"Add to Cloud Run Job environment: HEALTHCHECK_PING_URL={ping_url}")

# In your Cloud Run Job script:
# import requests
# requests.get(os.getenv("HEALTHCHECK_PING_URL"))  # On success
# requests.get(f"{os.getenv('HEALTHCHECK_PING_URL')}/fail")  # On failure

# /// script
# dependencies = ["requests"]
# ///

import os
from scripts.uptimerobot_client import UptimeRobotClient

# Get API key from Doppler
api_key = os.popen(
    "doppler secrets get UPTIMEROBOT_API_KEY --project claude-config --config dev --plain"
).read().strip()

client = UptimeRobotClient(api_key)

# Create HTTP monitor for VM service
result = client.create_monitor(
    friendly_name="Production API Endpoint",
    url="https://your-vm-ip:8000/health",
    type=1,  # HTTP(S)
    interval=300,  # 5 minutes (free tier)
    alert_contacts=client.get_pushover_contact_id()  # Type 9 Pushover contact
)

print(f"Monitor created: {result['monitor']['id']}")

def _request(self, endpoint: str, data: Dict) -> Dict:
    data["api_key"] = self.api_key
    data["format"] = "json"
    response = requests.post(f"{self.base_url}/{endpoint}", data=data)
    response.raise_for_status()
    result = response.json()
    if result.get("stat") != "ok":
        raise Exception(f"UptimeRobot API error: {result}")
    return result

monitors = client.get_monitors()
for monitor in monitors:
    print(f"{monitor['friendly_name']}: {monitor['url']} - {monitor['status']}")

result = client.create_monitor(
    friendly_name="API Health Check",
    url="https://api.example.com/health",
    type=1,  # HTTP(S)
    interval=300,  # 5 minutes
    alert_contacts=pushover_id
)

client.delete_monitor(monitor_id="801762241")

pushover_id = client.get_pushover_contact_id()
if not pushover_id:
    print("⚠️ Pushover not configured - see Pushover Integration Setup")

headers = {
    "X-Api-Key": api_key,
    "Content-Type": "application/json"
}
response = requests.get(f"{base_url}/checks/", headers=headers)

checks = client.get_checks()
for check in checks:
    print(f"{check['name']}: {check['status']} - {check['ping_url']}")

result = client.create_check(
    name="Daily Backup Job",
    timeout=86400,  # 24 hours
    grace=3600,     # 1 hour grace
    tags="backup production",
    channels=pushover_id  # Or "*" for all channels
)
ping_url = result["ping_url"]

# From your job/script
import requests
requests.get(ping_url)

requests.get(f"{ping_url}/fail")

client.delete_check(check_uuid="6a991157-552d-4c2c-b972-d43de0a96bff")

Job Lifecycle:
1. Job starts
2. Job executes work
3. Job pings Healthchecks.io on success
4. If no ping within timeout → Alert

Advantages:
- No always-on endpoint needed
- Works with ephemeral infrastructure
- Simple integration (one HTTP GET)
- Catches job crashes, hangs, or scheduling failures

import time
from requests.exceptions import HTTPError

Service Monitoring Setup

Overview

When to Use This Skill

Service Monitoring Setup

Overview

When to Use This Skill

Monitoring Architecture Decision Tree

Quick Start

Healthchecks.io (Dead Man's Switch)

UptimeRobot (HTTP Monitoring)

UptimeRobot Operations

Authentication

Free Tier Capabilities

Common Operations

Monitor Types

Healthchecks.io Operations

Authentication

Free Tier Capabilities

Common Operations

Dead Man's Switch Pattern

Pushover Integration Setup

Prerequisites

UptimeRobot Pushover Setup

Healthchecks.io Pushover Setup

Troubleshooting

UptimeRobot Issues

Feishu Drive

Deployment Pipeline Design

Istio Traffic Management

Linkerd Patterns

Grafana Dashboards

Feishu Wiki