Apply first-principles thinking to break down complex production systems problems into fundamental truths. Use when facing ambiguous requirements, architectural decisions, performance bottlenecks, or when existing solutions don't fit. Helps identify root causes and build solutions from foundational understanding rather than copying patterns.
"Everyone uses PostgreSQL" (not a fundamental truth)
"NoSQL is web scale" (marketing, not engineering)
"Our team knows MongoDB" (valid constraint, but separate from technical requirements)
Step 3: Reason Up from Fundamentals
Build the solution from first principles.
Example: Database choice
From fundamentals:
We need 10K writes/second → rules out single-node SQL with HDD
We need strong consistency for financial data → rules out eventually-consistent stores
Our access pattern is mostly key-value lookups → don't need complex joins
We can tolerate 50ms P99 read latency → memory caching could help
Conclusion: Start with PostgreSQL + read replicas + Redis cache. This satisfies our fundamentals. Only switch to distributed database if we exceed single-node capacity (which we can measure).
Common Patterns
Pattern 1: Performance Investigation
Don't assume: "The API is slow, we need faster servers"
Baseline: What should it be? (Based on workload, not vibes)
Bottleneck: What's the limiting factor?
Solution: Address the actual constraint
Pattern 2: Scalability Decisions
Don't assume: "We'll need Kubernetes because we're building a SaaS"
First principles:
Current load: What's our actual traffic? (RPS, concurrent users, data volume)
Growth rate: How fast are we growing? (10x in 1 year? 2x in 5 years?)
Failure modes: What happens if a server dies?
Operational capacity: Can our team run Kubernetes?
Cost: What's the TCO of each option?
Often: A single server with good monitoring and backups beats a complex distributed system for 90% of startups.
Pattern 3: Technology Selection
Don't assume: "We need microservices and event-driven architecture"
First principles:
Team size: Can we maintain multiple services? (Each service needs on-call)
Change frequency: Do components really need independent deployment?
Failure isolation: Do failures need to be contained?
Data consistency: Can we tolerate eventual consistency?
Network reliability: How do we handle network partitions?
Tradeoffs:
Monolith: Simple ops, shared database, easier transactions, harder to scale team
Microservices: Complex ops, distributed transactions, easier to scale team, harder to debug
Choose based on your actual constraints, not industry trends.
Reliability and Performance Trade-offs
CAP Theorem in Practice
You can't have all three: Consistency, Availability, Partition tolerance.
First principles:
Banking: Consistency > Availability (Better to be down than show wrong balance)
Social media: Availability > Consistency (Stale like counts are fine)
E-commerce: Depends on operation (Inventory: Consistency, Reviews: Availability)
Latency Numbers Every Programmer Should Know
Use these fundamentals to reason about performance:
L1 cache: 0.5 ns
L2 cache: 7 ns
RAM: 100 ns
SSD read: 16 µs
Network within datacenter: 500 µs
Disk seek: 10 ms
Network cross-continent: 150 ms
Implication: If your API is 200ms, adding an SSD won't help. The bottleneck is elsewhere.
Examples
Example 1: "Make the API Faster"
Bad approach: Add caching, switch to NoSQL, rewrite in Go
First principles:
1. Measure actual latency: P50=50ms, P99=500ms
2. Profile: 80% of time in database query
3. Analyze query: Full table scan on 10M rows
4. Root cause: Missing database index
5. Solution: Add index, latency drops to P99=50ms
Cost: 5 minutes to add index vs. weeks to rewrite
Example 2: "We Need a Message Queue"
Question: Do you need a message queue, or do you need async processing?
First principles:
Need: Process 1000 jobs/hour without blocking API responses
You've quantified the actual constraints and requirements
You've reasoned up from fundamentals rather than copying patterns
You can explain your decision to someone unfamiliar with the domain
You have measurable success criteria and a rollback plan
Architecture Decision Records (ADR)
For significant decisions, use the ADR format (industry standard from AWS, Microsoft, Google):
ADR Template
# ADR-001: [Decision Title]
## Status
[Proposed | Accepted | Deprecated | Superseded by ADR-XXX]
## Context
[What is the issue that we're seeing that is motivating this decision?]
## Decision
[What is the change that we're proposing and/or doing?]
## Consequences
[What becomes easier or more difficult because of this change?]
## Alternatives Considered
[What other options were evaluated?]
ADR Best Practices (AWS 2024)
Keep ADRs focused: One decision per ADR
Timely decisions: 1-3 review sessions max; most decisions are reversible
Immutable once accepted: Create new ADR to supersede, don't modify
Root in requirements: Justify with data and engineering principles, not opinions
Focus on "why": Reasoning matters more than implementation details