Name: Systems Analyst
Author: Tomlord1122

스킬 검색.../

Systems Analyst | Skills Pool

Shape	Description	Example Systems
Number over time	A value sampled at regular intervals	Prometheus, CloudWatch metrics
Event stream	Ordered text records, one per occurrence	Loki, CloudWatch Logs, Elasticsearch
Request tree	A hierarchy of spans, all sharing one ID	Tempo, Jaeger, Zipkin
State snapshot	Current desired vs. actual state of objects	Kubernetes API, CMDB

Something happens in the world
    → Who/what observes it?
    → How is it encoded?
    → How is it transmitted?
    → Who enriches or transforms it?
    → Where is it stored?
    → Who can query it?
    → What can they NOT see from here?

ENVELOPE (platform-generated):
  The platform observes your service from the outside.
  It knows: request arrived, response sent, how long it took, status code.
  It does NOT know: what the request contained, why it was slow,
                    what business logic ran, what the LLM returned.

  Examples: Istio metrics, Kubernetes kube-state-metrics,
            load balancer access logs, VPC flow logs.

CONTENTS (application-generated):
  Your service reports on its own internal state.
  It knows: which database query ran, what the confidence score was,
            how many tokens the LLM consumed, which code path was taken.

  Examples: custom Prometheus counters, OTel trace spans,
            structured application logs, business event metrics.

Level 1 — "Is the system healthy?" (answered by Metrics / Numbers)
  Q: What is the current error rate?
  Q: Is P99 latency within SLA?
  Q: Are all pods running?
  Tool: Prometheus dashboards, alerts

Level 2 — "Where is it unhealthy?" (answered by Traces / Trees)
  Q: For this slow request, which service was the bottleneck?
  Q: Which Temporal activity failed and caused the retry?
  Q: What was the call graph for case ID 9876?
  Tool: Distributed tracing (Tempo, Jaeger)

Level 3 — "Why is it unhealthy?" (answered by Logs / Events)
  Q: What error message was printed during that span?
  Q: What was the exact SQL query that timed out?
  Q: What did the LLM API return before the timeout?
  Tool: Log aggregation (Loki, CloudWatch Logs)

Level 4 — "Show me the evidence chain"
  Click a metric spike → jump to example trace
  Click a trace span   → jump to correlated log lines
  Click a log error    → jump to the trace that produced it

Audience	Best Format
Engineer learning a new system	Learning doc with ASCII diagrams + concrete examples
Team deciding what to build next	Gap table ranked by severity + proposed architecture diagram
Engineer debugging right now	Data flow trace for a specific request type
Manager understanding investment	Before/after capability table in plain language

CI/CD pipeline:
  Pain      → "Builds fail and no one knows why or which step"
  Shape     → Event stream of job executions with status and duration
  Breaks    → Test logs not captured, no artifact lineage
  Envelope  → GitHub status checks (passed/failed)
  Contents  → Test output, coverage reports, build timing per stage
  Test      → L1: did it pass? L2: which step failed? L3: what was the error?

Database architecture:
  Pain      → "Queries are slow and we don't know which ones"
  Shape     → Number over time (query latency, connection pool usage)
  Breaks    → Slow query log disabled, no per-query tracking
  Envelope  → CPU/memory of DB instance
  Contents  → Query execution plans, index hit rates, lock contention
  Test      → L1: is DB healthy? L2: which query is slow? L3: why is it slow?

Organizational structure:
  Pain      → "Decisions made in one team surprise another team"
  Shape     → State snapshot (who owns what, what is decided)
  Breaks    → No RFC process, no decision log
  Envelope  → Org chart (who exists)
  Contents  → Decision records, runbooks, team charters
  Test      → L1: does the team exist? L2: who owns this? L3: why was this decided?

Systems Analyst

Core Philosophy

Thinking Process

Step 1: Find the Pain (Why Does This Exist?)

Systems Analyst

Core Philosophy

Thinking Process

Step 1: Find the Pain (Why Does This Exist?)

Step 2: Identify the Shape of the Data

Step 3: Trace the Data Flow — Find the Breaks

Step 4: Separate Envelope from Contents

Step 5: Apply the Three-Level Detective Test

Step 6: Produce the Output

Application to Any Domain

Present Results to User

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags