Load testing methodology: tool selection, baseline establishment, threshold definition, and reporting
Load this skill whenever:
Load testing at Instinct SRE is an enabler of SLO confidence, not a one-time gate. The goal is to understand where the limits are before users discover them.
When NOT to load test: Static sites (instinctsre.ai) served by Azure SWA with global CDN do not require load testing — Azure SWA auto-scales globally. Focus load testing on the SRE Agent API and Ultra Instinct Terminal server-side services (Phase 2+).
Before writing a single line of test configuration, answer:
Test types and when to use them:
| Test Type | Purpose | Duration | When to Run |
|---|---|---|---|
| Smoke test | Validates test setup works; 1-2 VUs | 1-2 min | Before every load test run |
| Load test | Validates SLO under expected traffic | 5-30 min | Pre-deploy to prod for new APIs |
| Stress test | Finds breaking point | Increase until failure | Once per major release |
| Soak test | Detects memory leaks, slow degradation | 1-8 hours | Before long-running service launch |
| Spike test | Validates behaviour under sudden surge | 1-5 min | Pre-launch for anticipated viral moments |
Instinct SRE standard: k6
| Tool | Use Case | Why |
|---|---|---|
| k6 | HTTP API load testing, scripted user journeys | JavaScript API, built-in metrics, Azure-native integration possible |
| Playwright | Browser-based load testing for web | Same framework as E2E tests; consistent toolchain |
| wrk / hey | Quick one-liner baseline benchmarks | Minimal setup, useful for fast smoke checks |
| Azure Load Testing | Scale to thousands of VUs, Azure-native | Managed service; use for launch-scale tests |
Standard k6 install:
# Windows
choco install k6
# Or via npm (project-local)
npm install -g k6
Run a baseline before any optimisation or change. The baseline is the control measurement.
// baseline.js — k6 smoke test
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
vus: 1, // 1 virtual user
duration: '30s', // for 30 seconds
thresholds: {
http_req_duration: ['p99<500'], // 99th percentile under 500ms
http_req_failed: ['rate<0.01'], // <1% error rate
},
};
export default function () {
const res = http.get('https://dev.instinctsre.ai/api/health');
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 200ms': (r) => r.timings.duration < 200,
});
sleep(1);
}
k6 run baseline.js
Record the baseline: note p50, p95, p99, error rate, and RPS achieved. Store in .squad/docs/performance/baselines/{service}-{date}.md.
Thresholds must be derived from SLO targets (see slo-sli-sla-definitions skill):
export const options = {
thresholds: {
// From SLO: p99 latency < 500ms for SRE Agent API
http_req_duration: ['p(99)<500', 'p(95)<200', 'p(50)<100'],
// From SLO: < 0.1% error rate
http_req_failed: ['rate<0.001'],
// From SLO: > 99.9% requests handled
checks: ['rate>0.999'],
},
};
Threshold failure = SLO risk. A load test that fails its thresholds means the service will breach its SLO under that traffic level.
Standard load test ramp pattern:
export const options = {
stages: [
{ duration: '2m', target: 10 }, // Ramp up to 10 VUs
{ duration: '5m', target: 10 }, // Hold at 10 VUs
{ duration: '2m', target: 50 }, // Ramp to 50 VUs (stress)
{ duration: '5m', target: 50 }, // Hold at 50 VUs
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: { /* from step 4 */ },
};
Execution:
# Run and output results to JSON for reporting
k6 run --out json=results.json load-test.js
# Run with real-time metrics (Azure-compatible)
k6 run --out statsd load-test.js
Metrics to always include in the report:
| Metric | Target | Actual | Pass/Fail |
|---|---|---|---|
| p50 latency | < 100ms | ||
| p95 latency | < 200ms | ||
| p99 latency | < 500ms | ||
| Error rate | < 0.1% | ||
| Peak RPS handled | > X | ||
| Breaking point RPS | N/A (informational) |
Report format:
# Load Test Report — {Service} — {Date}
**Test type:** Load / Stress / Soak / Spike
**Duration:** X minutes
**Peak VUs:** X
**Tool:** k6 vX.X.X
## Results Summary
| Metric | Target | Actual | Pass/Fail |
...
## Observations
- [Notable behaviour at specific load levels]
- [Any errors observed and their types]
- [Performance cliff (if found)]
## Recommendations
- [Changes needed to meet SLO under target load]
- [Infrastructure scaling required]
## Attached
- results.json — full k6 output
Store report in .squad/docs/performance/reports/{service}-{date}.md.
// sre-agent-load.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate } from 'k6/metrics';
const errorRate = new Rate('errors');
export const options = {
stages: [
{ duration: '1m', target: 5 },
{ duration: '3m', target: 20 },
{ duration: '1m', target: 0 },
],
thresholds: {
http_req_duration: ['p(99)<500'],
errors: ['rate<0.01'],
},
};
const BASE_URL = __ENV.BASE_URL || 'https://api.instinctsre.ai';
export default function () {
// Health check
const health = http.get(`${BASE_URL}/health`);
errorRate.add(health.status !== 200);
check(health, { 'health 200': (r) => r.status === 200 });
sleep(0.5);
}
sleep()); 10 VUs with no sleep can generate 100+ RPS. Always define think time with sleep()