技能档案

Data Flow Diagram Availability Notation

Name: Data Flow Diagram Availability Notation
Author: rengotaku

Document data availability at each pipeline stage in flow diagrams to make dependencies and contradictions visible

rengotaku0 星标2026年2月22日

职业
分类: 数据工程

技能内容

Problem

Traditional data flow diagrams show processing order but not data availability:

Diagram shows step A → step B is correct order
But doesn't show whether step B has the data it needs
Contradictions exist but remain invisible
Reviewers approve diagrams that contain impossible flows

Example: Diagram showed [yomitoku] → [code_detector] → [OCR] which is correct processing order, but code_detector requires text which isn't available until OCR runs. The diagram was accurate but incomplete.

Solution

Add Data Availability Annotations to Flow Diagrams

Original Diagram (Processing Order Only):

[yomitoku] → [layout.json v1]
                   ↓
           [code_detector] ← NEW
                   ↓
             [book.md via OCR]

相关技能

Data Flow Diagram Availability Notation | Skills Pool

[yomitoku] → [layout.json]
                  │
                  │ ✅ fields: bbox, type, label
                  │ ❌ missing: text content ← EXPLICIT
                  ↓
           [code_detector]
                  │
                  │ 📋 requires: text content ← EXPLICIT
                  │ ⚠️  ERROR: not available! ← CONTRADICTION VISIBLE

Step	Input	Output	Available Data	Required Data	Status
yomitoku	image	layout.json	bbox, type, label	-	✅ OK
code_detector	layout.json	CODE regions	bbox only	text	❌ text はどこから？
OCR	image + layout	rover/*.txt	text content	-	✅ OK

## データフロー

[画像入力]
    ↓
[yomitoku] → layout.json 生成
    ↓
[コード検出] → CODE領域をマーキング
    ↓
[OCR実行] → テキスト抽出
    ↓
[Markdown生成]

## データフロー（データ可用性つき）

[画像入力]
    ↓
[yomitoku] → layout.json 生成
    │
    │ ✅ Available: bbox, type, label
    │ ❌ Missing: text, confidence
    ↓
[コード検出] ← 📋 Requires: text content
    │         ⚠️ ERROR: text not available!
    │
    │ ❌ Cannot proceed (dependency unmet)
    ↓
[OCR実行] → テキスト抽出
    │
    │ ✅ Available: text content (NOW available)
    ↓
[コード検出] ← Can only run AFTER OCR
    ↓
[Markdown生成]

## パイプライン依存関係

| Stage | Input Files | Output Files | Data Available | Data Required | Issues |
|-------|-------------|--------------|----------------|---------------|--------|
| 1. yomitoku | page.jpg | layout.json | bbox, type, label | - | - |
| 2. code_detect | layout.json | layout_v2.json | bbox, type, label | **text** | ❌ text not in layout.json |
| 3. OCR | page.jpg + layout.json | rover/*.txt | text content | - | - |
| 4. convert | rover/*.txt + layout.json | book.md | text + bbox | - | ✅ All available |

**矛盾**: Stage 2 requires `text` but it's not available until Stage 3.

**解決策**: Move code detection to Stage 4 (after OCR) or integrate into converter.

Data Flow Diagram Availability Notation

Problem

Solution

Add Data Availability Annotations to Flow Diagrams

Data Flow Diagram Availability Notation

Problem

Solution

Add Data Availability Annotations to Flow Diagrams

Notation Guidelines

Example

Before (Hidden Contradiction)

After (Visible Contradiction)

Dependency Table Format

When to Use

Checklist

Impact

Clickhouse Io

Clickhouse Io

Claude Devfleet

Clickhouse Io

Ai First Engineering

Postgres Patterns