Name: The Flywheel
Author: BertCalm

The Flywheel

The Flywheel — continuous skill improvement engine that passively observes session friction, analyzes accumulated usage signals, tracks skill maturity, and triggers refinement cycles. Combines lightweight post-session logging with real-time behavioral signal detection (redirects, confusion, abandonment, resignation, delight). Use when: user says 'flywheel', 'improve my skills', 'skill analysis', 'what skills need work', 'optimize skills', 'skill health', 'library health', 'friction analysis', 'signal detection', 'skill maturity', or when the session log has 10+ entries for any single skill. Also invoke proactively at the start of sessions when the log shows patterns worth acting on, and after any session where multiple skills were invoked. The always-running quality engine for the skill ecosystem.

BertCalm0 星標2026年3月24日

職業
分類: 機器學習

Every skill invocation is a data point. Every user redirect is a signal. Every abandoned session is a confession. The Flywheel turns all of it into skill improvements — continuously, without the user having to remember to optimize anything.

The user's silence is data. Premature closure is data. Frustration language is data. None of it requires the user to do anything.

Architecture

┌─────────────────────────────────────────────────────┐
│                    THE FLYWHEEL                       │
├─────────────────────────────────────────────────────┤
│                                                       │
│  ┌────────────────────┐                               │
│  │ LAYER 1: OBSERVE    │  ← runs during every skill   │
│  │  (always on)        │    session automatically     │
│  │  Self-corrects when │                               │
│  │  possible, logs     │                               │
│  │  what it can't fix  │                               │
│  └──────┬─────────────┘                               │
│          │ logs signals                                │
│          ▼                                             │
│  ┌────────────────────┐                               │
│  │ LAYER 2: LOG        │  session-log.jsonl            │
│  │  (append-only)      │  (existing, expanded schema) │
│  └──────┬─────────────┘                               │
│          │ periodic analysis                           │
│          ▼                                             │
│  ┌────────────────────┐                               │
│  │ LAYER 3: ANALYZE    │  triage → propose → report   │
│  └──────┬─────────────┘                               │
│          │ human approves                              │
│          ▼                                             │
│  ┌────────────────────┐                               │
│  │ LAYER 4: IMPROVE    │  edit SKILL.md, version bump │
│  └────────────────────┘                               │
└─────────────────────────────────────────────────────┘

The Flywheel

BertCalm0 星標2026年3月24日

職業
分類: 機器學習

Architecture

┌─────────────────────────────────────────────────────┐ │ THE FLYWHEEL │ ├─────────────────────────────────────────────────────┤ │ │ │ ┌────────────────────┐ │ │ │ LAYER 1: OBSERVE │ ← runs during every skill │ │ │ (always on) │ session automatically │ │ │ Self-corrects when │ │ │ │ possible, logs │ │ │ │ what it can't fix │ │ │ └──────┬─────────────┘ │ │ │ logs signals │ │ ▼ │ │ ┌────────────────────┐ │ │ │ LAYER 2: LOG │ session-log.jsonl │ │ │ (append-only) │ (existing, expanded schema) │ │ └──────┬─────────────┘ │ │ │ periodic analysis │ │ ▼ │ │ ┌────────────────────┐ │ │ │ LAYER 3: ANALYZE │ triage → propose → report │ │ └──────┬─────────────┘ │ │ │ human approves │ │ ▼ │ │ ┌────────────────────┐ │ │ │ LAYER 4: IMPROVE │ edit SKILL.md, version bump │ │ └────────────────────┘ │ └─────────────────────────────────────────────────────┘

Situation	Action	Example
User confused by a question	Self-correct: rephrase immediately	"Let me ask that differently..."
User redirects on a misunderstanding	Self-correct: acknowledge, course-correct	"Got it — adjusting."
Same question causes confusion across 3+ sessions	Log signal: needs SKILL.md edit	Proposal generated
User abandons mid-phase	Log signal: cannot fix a closed session	Abandonment record written
User's replies get shorter after long output	Self-correct: shorten subsequent outputs	Observation layer trims verbosity
Output format consistently rejected	Log signal: structural mismatch	Proposal generated

Field	Type	Description
`skill`	string	Which skill was invoked
`date`	string	ISO date
`context`	string	Brief description (2-5 words)
`duration_signal`	`short` \| `medium` \| `long`	Rough work volume
`followed_by_correction`	bool	Did the user correct/redo something after?
`user_sentiment`	`positive` \| `neutral` \| `negative`	Inferred from user response
`note`	string	Specific feedback (often empty)
`termination_type`	`clean` \| `abandoned` \| `timeout` \| `error`	How the session ended
`last_phase_reached`	string	Where the skill got to
`friction_events`	array	Structured friction records (see taxonomy below)
`positive_signals`	array	`delight` \| `re-use` \| `recommendation` \| `scope-expansion-post-completion`

Signal	Detection Pattern	What It Means
Redirect	"no", "wait", "that's not", "I meant", "go back"	Agent misunderstood; wrong branch taken
Confusion	"what does that mean", "I don't understand", "which one"	Skill language or concept unclear
Frustration	"ugh", "why is it", "this is wrong", "come on"	Accumulated friction hitting threshold
Resignation	Very short replies after long agent output	User disengaging; going through motions
Restart	"let's start over", "forget that", "begin again"	Phase or intake fundamentally failed
Clarification loop	Same concept clarified 3+ times	Intake variable or description is broken

Signal	Detection Pattern	What It Means
Intake re-collection	A variable answered, then re-answered	Intake question was ambiguous
Phase skip request	"skip that", "just do X", "move on"	Phase is blocking or irrelevant
Phase repeat request	"redo that", "can you do that again"	Output was wrong or incomplete
Output rejection	"that's not right", "not what I asked for"	Phase logic or output format mismatch
Scope expansion	User adds context mid-run that should have been in intake	Intake didn't capture enough upfront

Signal	Severity	What It Means
Clean completion	Low	Skill worked. Still capture friction events en route.
Silent completion	Medium-low	Skill finished but output may not have landed.
Mid-phase abandonment	High	Skill broke at this phase. Highest-value signal.
Premature closure	Medium	Last agent message didn't land — too long? Too complex?
Frustration + completion	High	User finished but experience was bad. They won't come back.

The Flywheel

Architecture

The Flywheel

Architecture

Layer 1: Real-Time Observation

What It Does During a Session

Self-Correction vs. Logged Signal

Layer 2: Signal Logging

Expanded Schema

Field Reference

Signal Taxonomy

Friction Signals (inferred from language)

Flow Signals (inferred from behavior)

Termination Signals (inferred from session end)

Positive Signals (also captured)

Layer 3: Analysis

When to Run

Analysis Modes

Per-Skill Health Metrics

Pattern → Action Map

Cross-Skill Patterns

Layer 4: Improvement

Flywheel Report

Improvement Proposals

Executing Improvements

Skill Maturity Tracking

Maturity Levels

Promotion Criteria (Beta → Stable)

Regression Alert Triggers (Stable → At Risk)

Library Health Dashboard

Log Maintenance

Relationship to Other Skills

Values

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns

Signal	Detection Pattern	What It Means
Delight	"this is great", "exactly", "perfect"	Phase or output exceeded expectations
Re-use	Same user activates same skill multiple times	Skill has become part of workflow
Recommendation	User describes sharing the skill output	High value signal
Post-completion expansion	User adds more work after clean completion	Skill built trust and appetite

Metric	How to Compute	Action Threshold
Invocation count	Count entries	Low count + high value = undertriggering
Correction rate	% with `followed_by_correction: true`	> 30% = skill body revision needed
Abandonment rate	% with `termination_type: abandoned`	> 25% = critical
Redirect rate	Friction events of type `redirect` / sessions	> 2/session = intake broken
Sentiment distribution	positive / neutral / negative ratio	Mostly negative = investigate
Context diversity	Unique contexts / total invocations	Low = too narrow, consider generalizing

Pattern	Action
Correction rate > 30%	Flag for skill body revision
Sentiment mostly negative	Read notes, identify root cause, propose rewrite
Low invocations but high value when used	Description optimization (undertriggering)
High invocations + low correction	Healthy — leave it alone
Same context every time	Too narrow — consider generalizing
Never invoked despite existing	Description is wrong OR skill is redundant
Abandonment at same phase across sessions	That phase is broken — even one repeat = review it
3+ redirects on same intake variable	Rewrite that intake question
Resignation signals after long output	Break output into smaller confirmed steps

Pattern	Location	Fix
Confusion at intake	Across multiple skills	Shared intake conventions needed
Abandonment at output	Final phase across skills	Output format not matching expectation
"What does X mean"	Phase descriptions	Jargon in skill language; simplify
Resignation after long output	Any long response	Break into smaller steps
Scope expansion mid-session	Intake across skills	Skills need better upfront capture

Pattern	Version Bump	Who Validates
Wording confusion in intake (1-2 fields)	`patch` x.x.1	User review
Phase logic produces wrong output	`patch` x.x.1	User review
Entire phase causes abandonment	`minor` x.1.x	User review + `/skill-creator` eval
Core pipeline assumption is wrong	`major` x+1.x.x	User review + `/skill-creator` eval
Cross-skill shared pattern found	New shared convention	All affected skills

Level	Criteria	What It Means
New	< 3 uses	Not enough data. Observe.
Beta	3+ uses, still showing friction	Working but rough edges
Stable	5+ clean completions, abandon rate < 20%, correction rate < 20%, 1+ positive signal	Reliable. Core of the ecosystem.
At Risk	Was Stable, now regressing (abandon rate spiking, corrections increasing)	Something changed. Investigate.

Skill	Relationship
`/skill-creator`	Creates and evaluates skills. Flywheel tells it WHICH skills need attention and WHY.
`/sisters`	Process optimization. Sisters optimize workflows; Flywheel optimizes the skills themselves.
`/adaptive-workflow-architect`	Designs entropy-aware architectures for new skills. Flywheel monitors whether those architectures hold up in practice.
`/ringleader`	Orchestrates skill pipelines. Flywheel data informs which skills the Ringleader should trust and which need caveats.
`/historical-society`	Documents. Flywheel surfaces when skill documentation has drifted from actual behavior.