스킬 파일

Learn Batch Extraction

Name: Learn Batch Extraction
Author: dotcommander

Novelty classification framework for extracting genuinely new knowledge from documentation. Use when: processing docs with /learn, extracting insights from large codebases, batch knowledge extraction. NOT for: summarizing documentation, generating training data, producing general overviews.

dotcommander3 스타2026. 3. 5.

직업
카테고리: 문서

스킬 내용

Batch Learning Extraction

Extracts Tier 2-4 insights from documentation via parallel batch processing. Filters out training data synthesis (Tier 1).

Quick Reference

Intent	Action
"Extract insights from docs"	Apply Tier 1-4 classification, return JSON
"What tier is this insight?"	Use decision tests below
"Build learning document"	Read(references/document-template.md)
"Handle >500 files"	Warn, cap at 500, suggest --focus
"All insights in batch are Tier 1"	Return empty insights array, report excluded_count

MUST (BLOCKING):

ONLY return Tier 2-4 insights — never include Tier 1
Return JSON only from batch subagents — no prose, no explanations
Include source attribution (file + lines) for every insight
When uncertain between tiers, EXCLUDE (bias toward high-value novelty)

관련 스킬

Learn Batch Extraction | Skills Pool

{
  "batch_number": 1,
  "files_analyzed": 12,
  "insights": [
    {
      "tier": 2,
      "pattern_name": "Laravel tap() helper",
      "what_i_learned": "tap() enables method chaining on methods that return void",
      "why_it_matters": "Allows fluent interfaces even when underlying methods don't support chaining",
      "code_example": "return tap($user, fn($u) => $u->save());",
      "source_file": "docs/laravel/helpers.md",
      "source_lines": "145-152",
      "when_to_apply": ["Building fluent APIs", "Chaining operations with side effects"],
      "when_not_to_apply": ["Method already returns $this"],
      "confidence": 85
    },
    {
      "tier": 4,
      "pattern_name": "PSR-4 vs Classmap Performance",
      "what_i_learned": "PSR-4 autoloading has filesystem overhead; classmap is 10x faster in production",
      "why_it_matters": "Contradicts assumption that autoloading is free — requires explicit optimization",
      "code_example": "composer dump-autoload -o",
      "source_file": "docs/performance/autoloading.md",
      "source_lines": "23-45",
      "anti_pattern": "Using PSR-4 in production without classmap optimization",
      "blast_radius": "Application-wide (every class load)",
      "severity": "high",
      "confidence": 90
    }
  ],
  "focus_filter_applied": false,
  "focus_topic": null,
  "excluded_count": 3
}

Scenario	Response
>500 files discovered	Warn, cap at 500, suggest `--focus` for remainder
Batch returns Tier 1 only	Return `{"insights": [], "excluded_count": N}`, continue
Context >90% in coordinator	Stop batching, synthesize what's collected, note files skipped
Same pattern in multiple batches	Keep highest confidence, merge when_to_apply

Insight: "Use interfaces for testability"
Test: "Could I have written this without reading the docs?" → YES
Decision: Tier 1 — EXCLUDE

Insight: "Laravel's tap() returns the first argument after invoking callback"
Test: "Does this show HOW to do something specific?" → YES
Decision: Tier 2 — INCLUDE

Insight: "Symfony chose synchronous events to enforce explicit async via Messenger"
Test: "Does this explain WHY architects made a trade-off?" → YES
Decision: Tier 3 — INCLUDE

Insight: "DB::transaction() does NOT retry on deadlock — manual retry loop required"
Test: "Does this contradict common assumption?" → YES
Decision: Tier 4 — INCLUDE (highest priority)

Anti-Pattern	Problem	Fix
Including Tier 1 insights	Dilutes signal with training data	Apply "could I have written this?" test
Prose in batch output	Bloats subagent response, harder to parse	Enforce JSON-only output
More than 10 insights per batch	Quality drops, context fills	Cap at 10, prefer Tier 3-4
Missing source attribution	Cannot verify or locate insight	Always include source_file + source_lines
Sequential batch processing	Slow for large doc sets	Dispatch all batches in parallel

Approach	Tokens for 500 files
Direct reading	~2,500k (exceeds budget)
Batch subagents (12 files, JSON response)	~50k (10x savings)

Learn Batch Extraction

Batch Learning Extraction

Quick Reference

Learn Batch Extraction

Batch Learning Extraction

Quick Reference

Workflow

Novelty Classification

Tier 1: Training Data Synthesis — EXCLUDE

Tier 2: Implementation-Specific Details — INCLUDE

Tier 3: Architectural Decision Insights — HIGH VALUE

Tier 4: Counter-Intuitive / Corrective — HIGHEST VALUE

Batch JSON Schema

Focus Filter

Context Efficiency

Edge Cases

Document Template

Examples

Anti-Patterns

Success Criteria

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing

Learn Batch Extraction

Batch Learning Extraction

Quick Reference

Learn Batch Extraction

Batch Learning Extraction

Quick Reference

Workflow

Novelty Classification

Tier 1: Training Data Synthesis — EXCLUDE

Tier 2: Implementation-Specific Details — INCLUDE

Tier 3: Architectural Decision Insights — HIGH VALUE

Tier 4: Counter-Intuitive / Corrective — HIGHEST VALUE

Batch JSON Schema

Focus Filter

Context Efficiency

Edge Cases

Document Template

Examples

Anti-Patterns

Success Criteria

Related Skills

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing