Complete end-to-end pipeline for transforming Excel customer support data into production-ready Agent SOP documents and flowcharts through Clustering, Pattern Extraction, SOP Generation, and Flowchart Generation stages.
This SOP orchestrates the complete end-to-end pipeline for transforming Excel customer support data into production-ready Agent SOP documents with visual flowcharts. It integrates all four stages: Clustering (Python), Pattern Extraction (LLM), SOP Generation (LLM), and Flowchart Generation (LLM + Mermaid).
Language: Detect the language from the user's first message and respond in that language throughout. Support Korean (한국어) and Japanese (日本語). Default to Korean if language is unclear.
Pipeline Flow:
Excel Input (고객 상담 데이터)
↓
Stage 1: Clustering (Python) [~3 min]
→ clustered_data.xlsx, cluster_tags.xlsx, analysis_report.md
↓
Stage 2: Pattern Extraction (LLM) [5-10 min]
→ patterns.json, faq.json, response_strategies.json, keywords.json
↓
Stage 3: SOP Generation (LLM) [~5 min]
→ TS_*.sop.md, HT_*.sop.md, metadata.json
↓
Stage 4: Flowchart Generation (LLM) [~4 min, (기본 활성화)]
→ *_FLOWCHART.md (Mermaid markdown, SVG 선택)
↓
Output: Ready-to-deploy Agent SOP + Visual Flowcharts
Total Time: ~20-30 minutes (Stage 1-4 full pipeline)
"ko": Korean (한국어)"ja": Japanese (日本語)Stage 1은 /stage1-clustering 스킬을 호출하여 파라미터를 자동 수집합니다.
input_file: data/ 디렉토리에서 자동 감지 (파일 1개면 자동 선택)company: 파일명에서 자동 추출output_dir: results/{company}로 자동 설정sample_size: 기본값 3000 (데이터가 3000건 미만이면 전체)tagging_mode: 기본값 "agent" (변경 불필요)k: 기본값 "auto"Note: 파라미터를 묻지 않고 기본값으로 진행합니다. 파일이 여러 개일 때만 선택을 요청합니다.
true: Generate Mermaid flowcharts after SOP generation (recommended)false: Skip Stage 4 (flowchart generation)Note: Stage 4 (Flowchart Generation)은 기본적으로 활성화되어 있으나 옵션으로 비활성화할 수 있습니다. 플로우차트 생성을 건너뛰려면 generate_flowcharts=false로 설정하거나 검토 단계에서 생성 여부를 거부하세요.
flowchart_target (default: "all"): Which SOPs to generate flowcharts for
"all": Generate for all SOPs (both TS and HT, recommended)"ts_only": Only Troubleshooting SOPs"ht_only": Only How-To SOPsflowchart_format (default: "markdown"): Flowchart output format
"markdown": Mermaid markdown only (recommended, no CLI needed)"svg": SVG images only (requires Mermaid CLI)"both": Both markdown and SVGauto_proceed (default: true): Automatic stage progression
true: Auto-proceed through stages without manual review (recommended)false: Pause after each stage for reviewValidate environment and prepare for execution.
Actions:
language variable ("ko" or "ja") — all subsequent Python script executions MUST be prefixed with LANGUAGE={language}grep -s "UPSTAGE_API_KEY" .env
up_YOUR_API_KEY_HERE:
request-api-key skill flow inline (send Channel.io message to groupId "531940", sleep 10, check thread, write key to .env)Expected Output:
✅ Pipeline initialized
- Python package validated
- API key confirmed
- Ready for Stage 1 (대화형 파일 선택)
📋 Stage 1 will prompt you to:
1. Select Excel data file (from data/ directory scan)
2. Choose company name (auto-extracted from filename)
3. Confirm output directory (auto-suggested)
4. (Optional) Adjust clustering parameters
Proceeding to Stage 1...
Run clustering via /stage1-clustering skill with auto-detected parameters.
Documentation: See stage1-clustering SKILL.md
Execution:
# Execute Stage 1 skill
/stage1-clustering
# Stage 1 will:
# 1. Scan data/ directory for Excel files (auto-select if single file)
# 2. Extract company name from filename
# 3. Set output_dir to results/{company}
# 4. Run clustering with defaults (sample_size=3000, k=auto, tagging_mode=agent)
# 5. Generate analysis report
Outputs (Auto-detected for next stages):
results/{company}/01_clustering/{company}_clustered.xlsx - Full dataset with cluster assignmentsresults/{company}/01_clustering/{company}_tags.xlsx - Cluster summaryresults/{company}/01_clustering/analysis_report.md - Comprehensive analysis for Stage 2After Stage 1 Completion:
Pipeline automatically detects company and output_base_dir from Stage 1 outputs:
# Auto-detection logic
output_base_dir="results/{company}" # From Stage 1 output path
company="{company}" # From Stage 1 output prefix
# Validate Stage 1 outputs exist
✓ {output_base_dir}/01_clustering/{company}_clustered.xlsx
✓ {output_base_dir}/01_clustering/{company}_tags.xlsx
✓ {output_base_dir}/01_clustering/analysis_report.md
Quality Checks:
Stage Transition:
IF auto_proceed=true (기본값):
IF auto_proceed=false:
Use LLM to extract patterns, FAQs, and response strategies from clusters.
Documentation: See stage2-extraction SKILL.md
Execution:
# In Claude Code:
/stage2-extraction
# Parameters (auto-detected from Stage 1):
# - clustering_output_dir: $output_base_dir/01_clustering
# - company: $company
# - n_samples_per_cluster: 20
Inputs:
$output_base_dir/01_clustering/): {company}_clustered.xlsx, {company}_tags.xlsx, analysis_report.mdOutputs:
patterns.json - Extracted patterns per clusterfaq.json - FAQ pairs for common inquiriesresponse_strategies.json - Response strategies and escalation ruleskeywords.json - Keyword taxonomyextraction_summary.md - Summary and recommendationsExpected Duration: ~5-10 minutes
Quality Checks:
Stage Transition:
IF auto_proceed=true (기본값):
IF auto_proceed=false:
Use LLM to generate Agent SOP documents from extracted patterns, organized by customer journey stages.
Documentation: See stage3-sop-generation SKILL.md
Execution:
# In Claude Code:
/stage3-sop-generation
# Parameters (auto-detected from Stage 2):
# - extraction_output_dir: $output_base_dir/02_extraction
# - company: $company
Inputs:
patterns_enriched.json, faq.json, response_strategies.json, keywords.jsonOutputs:
03_sop/HT_*.sop.md - 10-15개 SOP 파일 (고객 여정 기반)03_sop/TS_*.sop.mdmetadata.json - SOP metadataExpected Duration: ~5-10 minutes
Quality Checks:
metadata.json is complete and accurateStage Transition:
IF auto_proceed=true (기본값):
IF auto_proceed=false:
Generate Mermaid flowcharts from SOP documents.
Documentation: See stage4-flowchart-generation SKILL.md
Execution:
# In Claude Code:
/stage4-flowchart-generation
# Parameters:
# - sop_dir: $output_base_dir/03_sop
# - target_sops: "all"
# - output_format: "markdown"
Inputs:
TS_*.sop.md, HT_*.sop.mdOutputs:
*_FLOWCHART.md - Mermaid flowchart markdown (required)*_flowchart.svg - SVG images (optional, only if user has Mermaid CLI)flowchart_generation_summary.md - Summary reportExpected Duration: ~3-5 minutes
Quality Checks:
Skip Conditions (Stage 4 is optional; default: enabled):
generate_flowcharts=false (사용자가 명시적으로 비활성화)설명: Stage 4은 기본적으로 활성화되어 있으나 필요에 따라 건너뛸 수 있습니다. 위 두 조건 중 하나가 충족되면 플로우차트 생성은 수행되지 않습니다.
Perform final validation of all outputs.
Actions:
Files to Check:
$output_base_dir/
├── 01_clustering/
│ ├── {company}_clustered.xlsx
│ ├── {company}_tags.xlsx
│ └── analysis_report.md
├── 02_extraction/
│ ├── patterns.json
│ ├── faq.json
│ ├── response_strategies.json
│ ├── keywords.json
│ └── extraction_summary.md
├── 03_sop/
│ ├── TS_*.sop.md (multiple files)
│ ├── HT_*.sop.md (multiple files)
│ ├── *_FLOWCHART.md (if Stage 4 executed)
│ ├── *_flowchart.svg (if Stage 4 executed with CLI)
│ ├── metadata.json
│ └── generation_summary.md
└── pipeline_summary.md (generated in next step)
Expected Output (without Stage 4):
✅ All output files validated
- Stage 1: 3 files
- Stage 2: 5 files
- Stage 3: 10+ files (multiple SOPs)
Total: 18+ files, pipeline complete (Stage 1-3)
Expected Output (with Stage 4):
✅ All output files validated
- Stage 1: 3 files
- Stage 2: 5 files
- Stage 3: 10+ files (multiple SOPs)
- Stage 4: 14+ files (7 FLOWCHART.md + 7 SVG)
Total: 32+ files, full pipeline complete (Stage 1-4)
Create comprehensive summary of pipeline execution and results.
Summary Contents:
Output: {output_base_dir}/pipeline_summary.md
Template:
# Userchat-to-SOP Pipeline Summary
## Execution Information
- Company: {company}
- Execution Date: {timestamp}
- Total Duration: {duration}
- Stages Executed: {stages} (1-3 or 1-4)
## Stage Results
### Stage 1: Clustering
- Records: {N}, Clusters: {K}, Score: {score}
### Stage 2: Pattern Extraction
- Patterns: {P}, FAQs: {F}, Strategies: {S}
### Stage 3: SOP Generation
- TS SOPs: {TS_count} files
- HT SOPs: {HT_count} files
- Total Lines: {total_lines}
### Stage 4: Flowchart Generation (if executed)
- Flowcharts: {flowchart_count} (TS: {ts_fc}, HT: {ht_fc})
- SVG Images: {svg_count} (CLI: {cli_status})
## Key Insights
1. {insight_1}
2. {insight_2}
3. {insight_3}
## Next Steps
1. Review SOPs: {sop_dir}
2. Review flowcharts (if generated): {flowchart_dir}
3. Test with sample inquiries
4. Deploy to Claude Skills
5. Monitor metrics
Present pipeline results to stakeholders.
Communication Template:
✅ Userchat-to-SOP Pipeline Complete: {Company}
📊 Pipeline Results
- Total Records: {N:,}
- Clusters: {K}
- Patterns: {P}
- FAQ Pairs: {F}
- SOP Files: {sop_count} (TS: {ts_count}, HT: {ht_count})
- Flowcharts: {flowchart_count} (if Stage 4 executed)
💡 Key Insights
1. {insight from analysis report}
2. {insight from extraction}
3. {automation opportunity}
📁 Output Files
- Analysis Report: {path}/01_clustering/analysis_report.md
- Extraction Summary: {path}/02_extraction/extraction_summary.md
- SOP Documents: {path}/03_sop/TS_*.sop.md, HT_*.sop.md
- Flowcharts: {path}/03_sop/*_FLOWCHART.md (if generated)
🚀 Next Steps
1. Review analysis report and SOPs
2. Review flowcharts (if generated)
3. Test SOPs with sample inquiries
4. Deploy via Claude Skills
5. Monitor key metrics
Scenario: Complete pipeline with flowcharts (default configuration)
Execution:
/userchat-to-sop-pipeline
Stage 1 (대화형 파라미터 수집):
data/user_chat_channelcorp.xlsx (선택)results/channelcorp (자동 제안)Optional Parameters (Stage 2-4):
min_total_samples=300 # Stage 2 (default, dynamically adjusts per-cluster count)
sop_detail_level="standard" # Stage 3 (default)
generate_flowcharts=true # Stage 4 (default)
flowchart_target="all" # Stage 4 (default)
auto_proceed=false # Pause for review
Timeline:
Results:
Scenario: Fast pipeline execution without manual review pauses
Execution:
/userchat-to-sop-pipeline
Stage 1 (대화형):
Optional Parameters:
auto_proceed=true # No pauses (자동 진행)
Timeline:
Results:
Scenario: Quick validation before full run
Execution:
/userchat-to-sop-pipeline
Stage 1 (대화형):
data/user_chat_test.xlsx (선택)results/testco (자동)Optional Parameters:
n_samples_per_cluster=10 # Stage 2 (빠른 분석)
sop_detail_level="concise" # Stage 3 (간소화)
flowchart_target="ts_only" # Stage 4 (TS만)
auto_proceed=true # 자동 진행
Timeline:
Results:
Solution:
pip install -r requirements.txt/stage1-clusteringSolution:
n_samples_per_cluster to 10 or 20 (standard)focus_clusters="top_10" to analyze only top clustersSolution:
n_samples_per_cluster=30Solution: Each stage is independent and can be resumed:
# Resume from Stage 2
/stage2-extraction
# (provide clustering_output_dir parameter)
# Resume from Stage 3
/stage3-sop-generation
# (provide extraction_output_dir parameter)
# Resume from Stage 4
/stage4-flowchart-generation
# (provide sop_dir parameter)
Solution:
mmdc --versionnpm install -g @mermaid-js/mermaid-clioutput_format="markdown" for markdown-only/stage4-flowchart-generationtemplates/HT_template.md, templates/TS_template.md../docs/clustering-guide.mdPython (Stage 1):
LLM (Stage 2, 3):
Hybrid = Best of Both Worlds
Important Implementation Details:
⚠️ Stage 2 uses sequential processing in main agent
✅ Enrichment is always generated
patterns_enriched.json file is automatically generated in Stage 2 Step 7See /stage2-extraction skill for detailed execution strategy
기본 설정 (파라미터 질문 없이 바로 실행):
| Stage | 파라미터 | 기본값 |
|---|---|---|
| Stage 1 | sample_size | 3000 |
| Stage 1 | umap | true (4096D → 30D) |
| Stage 1 | k_range | 8,10,12,15,20,25 |
| Stage 1 | tagging_mode | agent |
| Stage 2 | min_total_samples | 300 |
| Stage 2 | n_samples_per_cluster | max(20, ceil(300/K)) |
| Stage 3 | SOP 구성 | 고객 여정 기반 ~10-15개 |
| Stage 4 | flowchart_target | all |
| Stage 4 | flowchart_format | markdown |
Stage 4를 건너뛰려면 generate_flowcharts=false로 설정.
Stage 1 (Upstage Solar):
Stage 2 & 3 (Claude Sonnet 4.5):
Total: ~$0.60-2.50 per 1000 records
After SOP deployment:
Metrics to Track:
This is an orchestration SOP. For detailed implementation of each stage, refer to the stage-specific SOPs linked above.