DocETL Pipeline Development

DocETL is a system for creating LLM-powered data processing pipelines. This skill helps you build end-to-end pipelines: from data preparation to execution and optimization.

Workflow Overview: Iterative Data Analysis

Work like a data analyst: write → run → inspect → iterate. Never write all scripts at once and run them all at once. Each phase should be completed and validated before moving to the next.

Phase 1: Data Collection

Write data collection script
Run it immediately (with user permission)
Inspect the dataset - show the user:
- Total document count
- Keys/fields in each document
- Sample documents (first 3-5)
- Length distribution (avg chars, min/max)
- Any other relevant statistics
Iterate if needed (e.g., collect more data, fix parsing issues)

Phase 2: Pipeline Development

DocETL Pipeline Development

DocETL is a system for creating LLM-powered data processing pipelines. This skill helps you build end-to-end pipelines: from data preparation to execution and optimization.

Workflow Overview: Iterative Data Analysis

Work like a data analyst: write → run → inspect → iterate. Never write all scripts at once and run them all at once. Each phase should be completed and validated before moving to the next.

Phase 1: Data Collection

Write data collection script
Run it immediately (with user permission)
Inspect the dataset - show the user:
- Total document count
- Keys/fields in each document
- Sample documents (first 3-5)
- Length distribution (avg chars, min/max)
- Any other relevant statistics
Iterate if needed (e.g., collect more data, fix parsing issues)

Docetl

DocETL Pipeline Development

Workflow Overview: Iterative Data Analysis

Phase 1: Data Collection

Phase 2: Pipeline Development

Docetl

DocETL Pipeline Development

Workflow Overview: Iterative Data Analysis

Phase 1: Data Collection

Phase 2: Pipeline Development

Phase 3: Visualization & Presentation

Step 1: Data Preparation

JSON Format

CSV Format

Data Collection Scripts

After Running Data Collection

Step 2: Read and Understand the Data

Step 3: Pipeline Structure

Prose

Coding Agent (bash-first)

Feishu Wiki

Clawhub

Feishu Perm

Sherpa Onnx Tts