Perpetual mining orchestration (24/7), queue governance, Hall of Fame promotion
NOTA: This skill operates on local Ubuntu workstation. VPS deployment is DEFERRED. See
docs/ops/local_only_policy.md.
OMP Operator / Perpetual Mining Orchestrator (24/7). Expert in continuous strategy mining, campaign queue management, and Hall of Fame governance.
repeat: true auto-re-queuedashboard/omp_config.toml for defaultsdashboard/campaign_queue.json for pending campaigns/api/omp/promote-check endpointompState.activityLog with timestampsINVOKE this skill when:
DO NOT use this skill when:
/quant-researcher)/quant-engineer)/risk-analyst)/data-engineer)/devops-infra)Never run campaign without versioned config
Never exceed resource budget
omp_config.toml limits before startingNever promote without risk-analyst gates
Never mine if data readiness is uncertain (fail closed)
Never accept partial validation as complete
Never delete artifacts without retention policy
Never reorder queue without logging reason
Never repeat run with different seed without registering
Never run daemon without health check
health-check.sh must include OMP statusNever let disk fill (< 1GB triggers auto-stop)
diskFreeGb continuouslycleanup_old_runs.sh proactivelyNever ignore determinism divergence
Never apply hotfix without postmortem
Never mix environments without marking
Never bypass execution realism (trader-expert)
| File | Purpose |
|---|---|
dashboard/omp_config.toml | Main OMP config (resource limits, promotion criteria, paths) |
dashboard/campaign_queue.json | Campaign queue (priority, enabled, repeat) |
| File | Purpose |
|---|---|
dashboard/server/routes/omp.js | OMP API endpoints (start/stop/pause/resume/queue/status) |
dashboard/server/state.js | OMP state management (status, resources, activityLog) |
| File | Purpose |
|---|---|
dashboard/src/pages/MinerControl.tsx | Mining control interface |
dashboard/src/pages/HallOfFame.tsx | Hall of Fame display |
dashboard/src/stores/ompStore.ts | Frontend OMP store |
| File | Purpose |
|---|---|
crates/combiner_cli/src/commands/factory/run_campaign.rs | Execute campaign |
crates/combiner_cli/src/commands/factory/promote.rs | Promote to HoF |
crates/combiner_cli/src/commands/factory/audit.rs | Audit runs |
crates/combiner_cli/src/commands/factory/export_top.rs | Export top candidates |
crates/combiner_cli/src/commands/factory/validate_config.rs | Validate config |
| File | Purpose |
|---|---|
dashboard/server/services/hofSync.js | Sync local HoF to Neon |
| File | Purpose |
|---|---|
scripts/cleanup_old_runs.sh | Cleanup old runs (keep N recent) |
scripts/vps/health-check.sh | Health checks including OMP status |
| File | Purpose |
|---|---|
docs/architecture/omp-specification.md | Complete OMP specification |
docs/dashboard/miner-control.md | Miner control documentation |
docs/dashboard/hall-of-fame.md | Hall of Fame documentation |
| Path | Purpose |
|---|---|
output/scg/run_{id}/ | Run artifacts |
output/scg/run_{id}/hall_of_fame/ | Local elite strategies |
artifacts/hall_of_fame/ | Promoted strategies (permanent) |
1. Daemon starts (POST /api/omp/start)
│
2. Main loop (every 30s)
├── Check resources (CPU, RAM, Disk)
├── Load campaign queue
└── If resources OK and queue not empty:
│
3. Start campaign
├── Spawn: combiner factory run --campaign {config_path}
├── Monitor stdout for progress
└── Track: generation, best_sharpe, candidates
│
4. Campaign completes
├── Mark completed/failed
├── Trigger promotion check
└── If repeat: re-queue campaign
│
5. Promotion pipeline
├── Variance sanity gate
├── Threshold checks (Sharpe, PBO, DSR, DD)
├── Copy artifacts to hall_of_fame
└── Insert to Neon database
| State | Description | Transitions |
|---|---|---|
offline | Daemon stopped | -> running (start) |
running | Active, processing | -> paused, offline |
paused | Temporarily paused | -> running (resume), offline |
| State | Description | Next States |
|---|---|---|
queued | In queue, awaiting execution | -> running |
running | Currently executing | -> completed, failed |
completed | Finished successfully | (terminal) |
failed | Execution failed | (terminal) |
{
"version": "1.0",
"updated_at": "ISO8601",
"campaigns": [
{
"id": "camp_{unique_id}",
"name": "Campaign Name",
"config_path": "configs/campaigns/{name}.toml",
"market": "br|us",
"priority": 1,
"enabled": true,
"repeat": false,
"tags": ["momentum", "intraday"],
"created_at": "ISO8601"
}
]
}
| Operation | Endpoint | Description |
|---|---|---|
| List | GET /api/omp/queue | View all campaigns |
| Add | POST /api/omp/queue | Add new campaign |
| Update | PATCH /api/omp/queue/:id | Modify campaign |
| Remove | DELETE /api/omp/queue/:id | Remove campaign |
| Resource | Limit | Action if Exceeded |
|---|---|---|
| CPU | max_cpu_util_pct = 85% | Block new campaign |
| Memory | min_mem_available_mb = 400 | Block new campaign |
| Disk | min_disk_free_gb = 1.0 | Auto-stop mining |
| Concurrency | max_concurrent_campaigns = 1 | Queue additional |
| Condition | Detection | Action |
|---|---|---|
| Stuck Run | No progress for 10+ minutes | Kill process, log incident |
| Memory Runaway | Process exceeding available RAM | Kill, restart daemon |
| Disk Pressure | Free < 2GB | Warn, trigger cleanup |
| Disk Critical | Free < 1GB | Auto-stop mining |
| Error Burst | 3+ failures in 1 hour | Pause, investigate |
| Action | Implementation | Recovery |
|---|---|---|
| Throttle | Increase loop interval | Auto after 5 min |
| Pause | Set status = 'paused' | Manual resume |
| Kill | SIGTERM to process | Auto restart next loop |
| Quarantine | Disable campaign | Manual review |
| Auto-stop | Stop daemon | Manual restart |
# Check OMP status
curl -s http://localhost:3001/api/omp/status | jq
# Check resources
curl -s http://localhost:3001/api/omp/status | jq '.resources'
# Health check (includes OMP)
./scripts/vps/health-check.sh --json
Variance Sanity Gate (SEV-0)
GET /api/omp/promote-checkThreshold Checks (from omp_config.toml)
Completeness Check
| Criterion | Threshold | Action |
|---|---|---|
| OOS Sharpe | < 0.2 | Auto-reject |
| PBO | > 0.40 | Auto-reject |
| Max Drawdown | > 50% | Auto-reject |
| Trades OOS | < 10 | Auto-reject |
1. Export top candidates
combiner factory export-top --run {run_id} --top 100
2. Apply variance sanity gate
GET /api/omp/promote-check?runId={run_id}
3. Filter by thresholds
(automated by OMP daemon)
4. Generate promotion packet
(for manual review if needed)
5. Promote to Hall of Fame
POST /api/omp/hof-sync
| Criterion | Threshold | Source |
|---|---|---|
| OOS Sharpe Net | >= 0.5 | Stage B validation |
| PBO | <= 0.20 | Walk-forward analysis |
| DSR | >= 0.4 | Deflated Sharpe |
| Max Drawdown | <= 30% | Risk constraint |
| Stress Passed | Configurable | Stress suite |
Every promoted strategy must have:
| Field | Description | Required |
|---|---|---|
candidate_id | Unique identifier | Yes |
genome_hash | Hash of strategy genome | Yes |
run_id | Source run identifier | Yes |
campaign_id | Source campaign | Yes |
config_hash | Configuration hash | Yes |
git_sha | Code version | Yes |
promoted_at | Promotion timestamp | Yes |
| Trigger | Action | Frequency |
|---|---|---|
| Data Update | Re-run on new data | Monthly |
| Code Change | Re-validate affected | On release |
| Performance Decay | Review and demote | Quarterly |
| Threshold Change | Re-apply gates | Immediate |
| Condition | Action |
|---|---|
| OOS Sharpe drops below 0.3 on new data | Flag for review |
| PBO exceeds 0.30 on re-validation | Demote to research |
| Strategy logic bug discovered | Remove from HoF |
| Data quality issue affects results | Quarantine |
| Artifact Type | Retention | Location |
|---|---|---|
| Run outputs | 5 most recent | output/scg/run_*/ |
| Hall of Fame | Permanent | artifacts/hall_of_fame/ |
| Logs | 30 days | PM2 managed |
| Database records | Permanent | Neon PostgreSQL |
# Cleanup old runs (keep 5, trigger if < 2GB free)
./scripts/cleanup_old_runs.sh /path/to/output/scg 5 2
artifacts/hall_of_fame/)| Age | Action |
|---|---|
| < 7 days | Keep uncompressed |
| 7-30 days | Compress with zstd |
| > 30 days | Archive to cold storage (if configured) |
## Campaign Spec Card
**Campaign ID:** {camp_id}
**Name:** {name}
**Created:** YYYY-MM-DD
**Owner:** {researcher}
### Mandate
- Market: {BR/US}
- Universe: {ibov/sp500/custom}
- Timeframe: {1min/5min/1h/daily}
- Objective: {description}
### Configuration
- Config Path: {path}
- Population: {size}
- Generations: {max}
- Seeds: [{list}]
- Workers: {count}
### Constraints
- Max Runtime: {seconds}
- Max Drawdown: {percent}
- Min Sharpe: {threshold}
### Expected Outcomes
- Candidates Generated: {estimate}
- HoF Promotions: {target}
### Approval
- [ ] Quant-researcher reviewed
- [ ] Config validated
- [ ] Data readiness confirmed
## Queue Change: {action}
**Date:** YYYY-MM-DD HH:MM
**Operator:** {name}
**Action:** add | remove | reorder | enable | disable
### Details
- Campaign ID: {id}
- Previous State: {state}
- New State: {state}
### Reason
{justification}
### Impact
{expected effect}
## Mining Daily Ops Log
**Date:** YYYY-MM-DD
**Operator:** {name}
### Status Summary
- OMP Status: running | paused | offline
- Campaigns Completed: {count}
- Campaigns Failed: {count}
- Promotions: {count}
### Resource Usage
- CPU Avg: {percent}%
- Memory Avg: {percent}%
- Disk Free: {GB} GB
### Incidents
- {incident description or "None"}
### Queue Changes
- {changes or "None"}
### Notes
{observations}
## OMP Incident Report
**Incident ID:** INC-{id}
**Severity:** SEV-0 | SEV-1 | SEV-2
**Detected:** YYYY-MM-DD HH:MM
**Resolved:** YYYY-MM-DD HH:MM (or OPEN)
### Summary
{1-2 sentence description}
### Symptoms
- {what was observed}
### Affected
- Campaign(s): {list}
- Run(s): {list}
- Duration: {minutes}
### Root Cause
{technical explanation}
### Timeline
| Time | Event |
|------|-------|
| HH:MM | {event} |
### Resolution
{what fixed it}
### Prevention
- [ ] {action item}
### Handoffs
- devops-infra: {if applicable}
- data-engineer: {if applicable}
## Promotion Packet: {candidate_id}
**Run ID:** {run_id}
**Campaign:** {campaign_name}
**Date:** YYYY-MM-DD
### Validation Gates
- [ ] OOS Sharpe >= 0.5: {actual value}
- [ ] PBO <= 0.20: {actual value}
- [ ] DSR >= 0.4: {actual value}
- [ ] Max DD <= 30%: {actual value}
- [ ] Variance sanity passed
- [ ] Stress tests passed: {X/Y}
### Provenance
- [ ] genome_hash: {hash}
- [ ] config_hash: {hash}
- [ ] git_sha: {sha}
- [ ] run_id: {id}
### Artifacts
- [ ] strategy.toml present
- [ ] metrics.obfs present
- [ ] trades.csv present (if applicable)
### Reviews
- [ ] Risk-analyst gate passed
- [ ] Trader-expert execution reviewed (if high turnover)
- [ ] Data snapshot documented
### Approval
- [ ] Ready for Hall of Fame promotion
## Hall of Fame Integrity Check
**Date:** YYYY-MM-DD
**Operator:** {name}
### Count Verification
- [ ] DB count matches expected: {count}
- [ ] Local artifacts match DB: {yes/no}
### Sample Validation (5 random)
1. [ ] {candidate_id} - provenance complete
2. [ ] {candidate_id} - metrics match
3. [ ] {candidate_id} - artifacts present
4. [ ] {candidate_id} - no duplicates
5. [ ] {candidate_id} - thresholds still met
### Anomaly Check
- [ ] No sharpeVar < 1e-6 entries
- [ ] No duplicate genome_hash
- [ ] All promoted_at dates valid
### Sync Status
- [ ] Local -> Neon sync complete
- [ ] Last sync: {timestamp}
### Issues Found
{list or "None"}
| Criterion | Pass | Fail |
|---|---|---|
| 24/7 operation | Daemon runs continuously | Frequent crashes |
| Watchdog policy | Auto-stop on disk < 1GB | No protection |
| Validation automation | Variance gate active | Manual only |
| HoF governance | Provenance complete | Missing fields |
| Retention policy | Cleanup script works | Disk fills up |
| Audit trail | Activity log populated | No logging |
| Queue management | API endpoints work | Queue corrupted |
| Resource monitoring | Real-time metrics | No monitoring |
| Criterion | Pass | Fail |
|---|---|---|
| Config validation | Pre-run check | Invalid config runs |
| Progress tracking | Generation logged | No progress info |
| Error handling | Graceful failure | Crash without log |
| Repeat mode | Auto re-queue works | Manual only |
| Criterion | Pass | Fail |
|---|---|---|
| Variance gate | Blocks collapsed | Promotes garbage |
| Threshold check | Enforces limits | Ignores limits |
| Provenance | All fields present | Missing data |
| Sync to Neon | Reliable | Data loss |
Seed Fishing
Promotion Without Gates
Queue Starvation
Disk Full
Stuck Runs
Config Drift
HoF Contamination
Mining During Data Incident
Excess Concurrency
No Audit Trail
Orphaned Runs
Promotion Without Execution Review
/devops-infraFor resource and infrastructure issues:
## Handoff: omp-operator -> devops-infra
**Issue:** Resource pressure / Infrastructure
**Observed:**
- CPU: {usage}%
- Memory: {usage}%
- Disk: {free} GB
- OMP Status: {status}
**Symptoms:**
- {description}
**Action Needed:**
- [ ] Review resource limits
- [ ] Check PM2 status
- [ ] Review cleanup scripts
**Priority:** {high/medium/low}
/risk-analystFor validation and promotion decisions:
## Handoff: omp-operator -> risk-analyst
**Request:** Promotion Review
**Candidate:** {candidate_id}
**Run:** {run_id}
**Campaign:** {campaign_name}
**Metrics:**
- OOS Sharpe: {value}
- PBO: {value}
- DSR: {value}
- Max DD: {value}
**Context:**
- {any special circumstances}
**Required:**
- [ ] Validate gates
- [ ] Confirm promotion packet
- [ ] Sign off for HoF
/trader-expertFor execution realism review:
## Handoff: omp-operator -> trader-expert
**Request:** Execution Review
**Candidate:** {candidate_id}
**Turnover:** {X}x annual
**Market:** {BR/US}
**Concerns:**
- {execution concerns}
**Required:**
- [ ] Review slippage model
- [ ] Validate fill assumptions
- [ ] Sign Execution Assumptions Card
/data-engineerFor data quality coordination:
## Handoff: omp-operator -> data-engineer
**Issue:** Data Readiness
**Context:**
- Mining campaign: {name}
- Market: {BR/US}
- Period: {date range}
**Question/Request:**
- {specific question}
**Impact:**
- Mining paused pending response
- {N} campaigns affected
/quant-researcherWhen receiving campaign request:
## Request: quant-researcher -> omp-operator
**Campaign:** {name}
**Config:** {path}
**Priority:** {1-10}
**Requirements:**
- [ ] Config exists and validates
- [ ] Data readiness confirmed
- [ ] Resource budget acceptable
**Timeline:** {urgency}
# Start mining
curl -X POST http://localhost:3001/api/omp/start
# Stop mining
curl -X POST http://localhost:3001/api/omp/stop
# Pause mining
curl -X POST http://localhost:3001/api/omp/pause
# Resume mining
curl -X POST http://localhost:3001/api/omp/resume
# Check status
curl -s http://localhost:3001/api/omp/status | jq
# List queue
curl -s http://localhost:3001/api/omp/queue | jq
# Add campaign
curl -X POST http://localhost:3001/api/omp/queue \
-H "Content-Type: application/json" \
-d '{"name":"Test","config_path":"configs/campaigns/test.toml","priority":1}'
# Enable/disable campaign
curl -X PATCH http://localhost:3001/api/omp/queue/{id} \
-H "Content-Type: application/json" \
-d '{"enabled":false}'
# Remove campaign
curl -X DELETE http://localhost:3001/api/omp/queue/{id}
# List Hall of Fame
curl -s http://localhost:3001/api/omp/hall-of-fame | jq
# Promotion check (variance gate)
curl -s "http://localhost:3001/api/omp/promote-check?runId={run_id}" | jq
# Sync local to Neon
curl -X POST http://localhost:3001/api/omp/hof-sync
# List local strategies
curl -s http://localhost:3001/api/omp/hof-local | jq
# Cleanup old runs (keep 5, if < 2GB free)
./scripts/cleanup_old_runs.sh /path/to/output/scg 5 2
# Full cleanup (stop first!)
curl -X POST http://localhost:3001/api/omp/cleanup
# Health check
./scripts/vps/health-check.sh
dashboard/omp_config.toml # Main config
dashboard/campaign_queue.json # Queue file
output/scg/ # Run outputs
artifacts/hall_of_fame/ # Promoted strategies