Multi-agent orchestration patterns for complex tasks across microservices
This skill describes orchestration patterns for coordinating work across multiple agents for complex tasks in CSKU Lab. Use this when you need to delegate work to specialized agents or when tasks span multiple microservices.
The orchestrator pattern allows a primary agent to break down complex work and delegate specialized tasks to other agents:
Task involves multiple services
Different expertise areas needed
Parallel work opportunities exist
Task requires specific tools/context
Use Case: Changes that depend on each other (schema → API → frontend)
Task Analysis
↓
Backend Dev: Add database column
↓
Backend Dev: Update API response
↓
Frontend Dev: Update UI
↓
Integration Testing
Implementation:
Use Case: Independent changes that can happen simultaneously
Task Analysis
↓
┌──────────────────────┬───────────────────────┐
│ │ │
Service A Changes Service B Changes Documentation
│ │ │
└──────────────────────┴───────────────────────┘
↓
Integration Testing
Implementation:
Use Case: Complex tasks with sub-tasks (multi-service gRPC changes)
Main Task: Add new gRPC service feature
│
├─ Skill Load: grpc-api-design
│
├─ Proto Definition
│ └─ Backend Dev: Define proto
│
├─ Service Implementation
│ ├─ Backend Dev: Implement gRPC server
│ └─ Backend Dev: Add gRPC client
│
├─ Integration
│ └─ Integration Tests
│
└─ Documentation
└─ Update API docs
Task: Add JWT authentication to config-server
Analysis:
Orchestration Plan:
1. Load Skills
- Load go-microservices skill
- Load grpc-api-design skill (for gRPC auth)
2. Backend Work (Sequential)
- Dispatch: "Add authentication middleware to config-server"
- Reference main-server implementation
- Update gRPC interceptors
- Add auth headers to requests
- Wait for completion
- Dispatch: "Add JWT validation to requests"
- Parse and verify tokens
- Add to context
- Wait for completion
- Dispatch: "Write integration tests"
3. Documentation
- Dispatch: "Document auth changes"
Task: Support video submissions (currently text-based code)
Analysis:
Orchestration Plan:
1. Load Skills
- go-microservices (backend changes)
- docker-compose (test environment)
- database-migrations (schema changes)
2. Parallel Phase 1 (Independent)
- Backend Dev: "Update assignment schema in task-server"
- Add submission_type field
- Update MongoDB schema
- Backend Dev: "Add submission type enum to main-server"
- Create enums
- Update validation
3. Sequential Phase 2 (Depends on Phase 1)
- Wait for Phase 1 completion
- Backend Dev: "Implement MinIO upload handling"
- Add file upload to submission
- Store in MinIO
- Frontend Dev: "Add video upload UI"
- Video preview
- File validation
4. Integration
- Run end-to-end tests
- Test file upload flow
- Verify storage
5. Documentation
- Update API documentation
- Update architecture docs
Task: Investigate and fix timeout issues in go-grader
Analysis:
Orchestration Plan:
1. Load Skills
- docker-compose (service inspection)
- code-sandbox (isolate configs)
- go-microservices (worker code)
2. Investigation (Parallel)
- Explore: "Check go-grader worker logs"
- Find timeout pattern
- Identify affected tasks
- Explore: "Check isolate configuration"
- Review resource limits
- Check wall-time settings
- Explore: "Check RabbitMQ queue health"
- Queue depth
- Connection issues
3. Root Cause Analysis
- Synthesize findings
- Determine likely cause
4. Implementation (Varies)
- If Docker: Update docker-compose timeout
- If Isolate: Adjust sandbox config
- If Worker: Optimize grading code
- If Queue: Add monitoring
5. Testing
- Run grading tasks
- Monitor execution time
- Verify fixes
When dispatching agents, provide:
What to do: Clear, specific task
Why it matters: Context and dependencies
What to reference: Related code/services
How to verify: Success criteria and tests
Dependencies: What must complete first
Timeline: Urgency and deadlines
Example:
Task: Update main-server to call new config-server endpoint
Context: We added GetStatus() to config-server. main-server health checks
need to verify config service health.
Reference:
- Main-server health endpoint: internal/adapters/http/health.go
- New config-server endpoint: config-server/protos/config_service.proto (GetStatus RPC)
- Existing config-server client: main-server/internal/adapters/grpc/config_client.go
Verification:
- Health endpoint returns 500 if config-server unavailable
- Health checks include config-server status
- Tests pass: go test ./internal/adapters/http
- Manual: curl localhost:8080/health
Dependencies: config-server GetStatus must be deployed first
Timeline: Needed for release next week
When dispatching multiple agents in parallel:
Escalate if:
Example:
You: Working on main-server submission API
Issue: Need to understand task-server schema
Action: Dispatch task to @explore to search task-server proto definitions
All agents must:
go test ./...)Before merging parallel work:
After all work is merged:
Don't give one agent too much:
Bad: "Update config-server to support new features, improve performance,
add monitoring, and integrate with new logging service"
Good: "Add GetStatus() RPC to config-server to support health checks"
(other changes in separate tasks)
Don't leave ordering ambiguous:
Bad: "Update authentication in main-server and config-server"
Good: "First: Update config-server auth middleware"
"Then: Update main-server to send auth headers to config-server"
Don't assume agents know project state:
Bad: "Add the new field"
Good: "Add 'submission_type' field to Submission message in task-server
to support video submissions alongside text submissions"
Don't skip testing:
Bad: "Update the database schema"
Good: "Add 'status' column with default 'active' to submissions table.
Verify: Old code still works without schema changes, migration is backward compatible"
When to use this skill: Use this when designing complex multi-service changes, coordinating parallel work across agents, or planning large feature implementations involving multiple microservices.
Related Skills: