Design a system — service boundaries, data flow, API contracts, and non-functional requirements.
Design a system for $ARGUMENTS.
Before designing anything, extract and document requirements explicitly.
Functional requirements:
Non-functional requirements (MUST quantify — no vague adjectives):
| Dimension | Question | Bad answer | Good answer |
|---|---|---|---|
| Scale | How many users/requests? | "High traffic" | "10K concurrent users, 500 req/s peak" |
| Latency | How fast? | "Fast" | "p95 < 200ms for reads, p95 < 500ms for writes" |
| Availability | How reliable? | "Always up" | "99.9% (8.7h downtime/year), degraded mode acceptable" |
| Durability | What if data is lost? | "Don't lose data" | "Zero tolerance for financial records, eventual consistency OK for analytics" |
| Security | Who can access what? | "Secure" | "OAuth2 + RBAC, PII encrypted at rest, SOC 2 compliant" |
| Cost | What's the budget? | "Cheap" | "< $5K/month infrastructure at projected scale" |
Every design is built on assumptions. Document them explicitly so they can be validated.
### Assumptions
| # | Assumption | Impact if wrong | Confidence | Validation method |
|---|---|---|---|---|
| A1 | Peak traffic is 500 req/s | Over-/under-provisioned infra | Medium | Load test before launch |
| A2 | Users are in US/EU only | CDN and data residency design | High | Product confirms |
| A3 | Write volume is 10% of reads | Read replica strategy changes | Low | Instrument and measure |
Flag assumptions with confidence below "High" — these are design risks.
Define what is inside the system and what is outside.
Each component has ONE responsibility. For every component, specify:
For every key workflow, trace the data path:
sequenceDiagram
participant Client
participant API Gateway
participant Service A
participant Database
participant Queue
participant Service B
Client->>API Gateway: POST /orders
API Gateway->>Service A: Validate + create order
Service A->>Database: Write order (ACID)
Service A->>Queue: Publish OrderCreated
Queue->>Service B: Process async
Service B->>Database: Update inventory
For each data flow, annotate:
For each data store:
| Property | Decision |
|---|---|
| Type | Relational, document, key-value, time-series, graph |
| Engine | PostgreSQL, MongoDB, Redis, ClickHouse, etc. |
| Schema | Key entities, relationships, indexes |
| Access patterns | Read-heavy vs write-heavy, query shapes, hot keys |
| Retention | How long is data kept? Archival strategy? |
| Backup/recovery | RPO (how much data can you lose), RTO (how fast to recover) |
For every significant design decision, present at least 2 options:
### Decision: Message broker selection
| Criterion | Option A: RabbitMQ | Option B: Kafka | Option C: SQS |
|---|---|---|---|
| Ordering guarantee | Per-queue FIFO | Per-partition | FIFO (with dedup) |
| Throughput | 10K msg/s | 100K+ msg/s | Variable (managed) |
| Operational cost | Self-managed | Self-managed | Fully managed |
| Replay capability | No | Yes (log retention) | No |
| Team expertise | Low | Medium | High |
| **Recommendation** | | **Selected** | |
**Rationale:** [Why this option wins given our specific constraints]
**Trade-off acknowledged:** [What we sacrifice by choosing this]
Before finalising, assess what happens when things change:
Rate confidence in each component of the design:
| Component | Confidence | Reason | Risk mitigation |
|---|---|---|---|
| API Gateway | High (90) | Standard pattern, team has experience | None needed |
| Event pipeline | Medium (60) | First time using Kafka, ordering assumptions untested | Spike in week 1, load test |
| Search service | Low (40) | Requirements unclear, scale unknown | Prototype before committing |
Rule: Any component with confidence below 60 must have a spike or prototype planned before implementation begins.
Every system design includes at minimum:
Use Mermaid syntax for all diagrams.
Structure diagrams using the C4 model by Simon Brown. Each level zooms in from the previous:
Every system design should include at least Level 1 and Level 2 diagrams. Level 3 is recommended for complex or high-risk containers.
The output format aligns with arc42 architecture documentation, covering context, building blocks, runtime views, deployment, and cross-cutting concerns. Use the system-design template (templates/system-design.md) for arc42-aligned output structure.
# System Design: [name]
## Requirements
### Functional
[Bulleted list]
### Non-Functional
[Table with quantified targets]
## Assumptions
[Assumption ledger table]
## Architecture
### Component Diagram
[Mermaid diagram]
### Components
[For each: purpose, API, data, failure mode, scaling]
## Data Flows
[Sequence diagrams for key workflows]
## Storage Design
[Per-store specification table]
## Key Decisions
[Options analysis for each major decision]
## Change Impact
[What-if analysis]
## Confidence Assessment
[Component confidence table]
## Risks and Mitigations
[Prioritised list]
## Recommended ADRs
[List of decisions that warrant an ADR — suggest via `/architect:write-adr`]