Validate PDF to Markdown conversion quality using multi-dimensional metrics. Assess table accuracy, style preservation (bold/italic/headings), robustness, and performance with standardized F1-scoring methodology.
Validate PDF to Markdown conversion quality using a comprehensive, multi-dimensional evaluation framework. This skill provides standardized metrics, evaluation harnesses, and reporting tools to assess conversion fidelity across table accuracy, style preservation, robustness, and performance.
Use this skill when you need to:
The validation framework computes a composite quality score (0–100) combining four independent metric dimensions:
FinalScore = (0.40 × TableAccuracy) + (0.40 × StyleAccuracy)
+ (0.10 × Robustness) + (0.10 × Performance)
Each dimension is independent and can be evaluated separately or together.
Measures how accurately tables are detected and their cell content extracted.
Components:
Table Detection F1: IoU-based matching of predicted vs. gold tables (IoU ≥ 0.5 threshold)
Cell Content Accuracy: Token-level F1 averaging across matched table cells
Formula:
TableAccuracy = (0.5 × TableDetectionF1) + (0.5 × CellContentAccuracy)
Interpretation:
Measures how accurately text formatting (bold, italic, heading levels) is preserved.
Components:
Formula:
StyleAccuracy = macro_average(BoldF1, ItalicF1, HeadingF1)
= (BoldF1 + ItalicF1 + HeadingF1) / 3
Interpretation:
Measures system stability and validity across a test corpus, particularly edge cases.
Components:
pandoc syntax validationFormula:
Robustness = (CrashFreeRate + MarkdownValidityRate + CompletenessRate) / 3
Interpretation:
Measures processing speed relative to a baseline; targets 1-page PDF ≈ 200–500ms.
Components:
Formula:
Performance = 0.5 × min(1.0, baseline_median / run_median)
+ 0.5 × min(1.0, baseline_p95 / run_p95)
Interpretation:
Create .gold.md files for each test PDF:
# Copy reference Markdown next to PDF with .gold.md extension
cp reference_output.md test.gold.md
# Format: Each section annotated with metadata
# Gold format example:
# # Heading 1
# **bold text** and *italic text*
#
# | Column A | Column B |
# |----------|---------|
# | cell 1 | cell 2 |
# Evaluate against ground truth
cargo run -p edgequake-pdf --example real_dataset_eval -- \
--input crates/edgequake-pdf/test-data/real_dataset \
--gold \
--metrics
# Generate detailed report
python3 .github/skills/pdf-markdown-validator/scripts/validate.py \
--pdf-dir crates/edgequake-pdf/test-data/real_dataset \
--gold-dir . \
--output-report metrics_report.json
# View summary scores
cat metrics_report.json | jq '.summary'
# Analyze failures by category
python3 .github/skills/pdf-markdown-validator/scripts/analyze_failures.py \
metrics_report.json
# Embed metrics in standard cargo test output
cargo test -p edgequake-pdf -- --nocapture
# Fail CI if composite score below threshold
cargo test -p edgequake-pdf --features ci-strict
from pdf_validator import PDFValidator
validator = PDFValidator(
pdf_dir="test-data/real_dataset",
gold_dir="test-data/gold",
metrics=["table", "style", "robustness", "performance"]
)
score = validator.evaluate()
print(f"Composite Score: {score.composite}/100")
# GitHub Actions example
- name: Validate PDF → Markdown
run: |
cargo run -p edgequake-pdf --example real_dataset_eval -- --metrics
python .github/skills/pdf-markdown-validator/scripts/validate.py \
--ci-mode --fail-below 75
Input PDF: 2×3 table with headers "Name, Age" and row "John, 25"
Gold Markdown:
| Name | Age |
| ---- | --- |
| John | 25 |
Generated Markdown (Perfect):
| Name | Age |
| ---- | --- |
| John | 25 |
Scores:
Generated Markdown (Partial Match):
| Name | Age |
| ---- | ---- |
| John | 25.0 |
Scores:
Gold Markdown:
# Main Heading
This is **bold** and _italic_ text.
## Sub Heading
More content here.
Generated Markdown (Perfect):
# Main Heading
This is **bold** and _italic_ text.
## Sub Heading
More content here.
Scores:
Generated Markdown (Partial):
# Main Heading
This is bold and italic text.
## Sub Heading
More content here.
Scores:
Test corpus: 30 PDFs (including 5 edge cases: corrupted, multilingual, scanned, etc.)
Results:
Robustness Score: (96.7 + 100 + 93.3) / 3 = 96.7%
Baseline (previous release):
Current run:
Scores:
# Navigate to PDF crate
cd edgequake/crates/edgequake-pdf
# Ensure ground-truth annotations exist
# Files should be named: <pdf_name>.gold.md
ls -1 test-data/real_dataset/*.gold.md
# Convert all PDFs to Markdown
cargo run -p edgequake-pdf --example real_dataset_eval -- --write
# Outputs written to: test-data/real_dataset/*.md
# Compute all metrics
python3 ../../.github/skills/pdf-markdown-validator/scripts/validate.py \
--pdf-dir test-data/real_dataset \
--gold-dir test-data/real_dataset \
--output-report validation_report.json \
--verbose
# View summary
cat validation_report.json | jq '.summary'
# Detailed per-document breakdown
cat validation_report.json | jq '.documents | .[] | {name, scores}'
# Identify failure patterns
python3 ../../.github/skills/pdf-markdown-validator/scripts/analyze_failures.py \
validation_report.json --group-by failure_type
The .gold.md files serve as reference implementations. Use this structure:
# Document Title (H1)
## Section Heading (H2)
This paragraph contains **bold text** and _italic text_ and **_bold-italic text_**.
### Subsection (H3)
#### Sub-subsection (H4)
**Note:** Use standard Markdown syntax. Be precise with:
- Bold: **text**
- Italic: _text_
- Bold-Italic: **_text_**
- Headings: # through #### for H1–H4
### Tables
| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
| Cell 1 | Cell 2 | Cell 3 |
| Cell 4 | Cell 5 | Cell 6 |
Ensure:
- Pipes align properly
- Headers separated by `---|---` row
- No trailing spaces (can affect parsing)
### Code Blocks
\`\`\`python
def hello():
print("world")
\`\`\`
Use triple backticks with language identifier.
### Lists
Bullet list:
- Item 1
- Item 2
- Nested item
- Item 3
Numbered list:
1. First
2. Second
3. Third
### Edge Cases
- **Multi-line table cells**: Not standard Markdown; flatten to single line
- **Merged cells**: Not representable in Markdown tables; split into separate rows
- **Vertical headers**: Use first row convention (all cells with **bold**)
# Full validation pipeline
python3 scripts/validate.py \
--pdf-dir <path/to/pdfs> \
--gold-dir <path/to/gold> \
[--output-report <report.json>] \
[--metrics table,style,robustness,performance] \
[--ci-mode] \
[--fail-below 75]
Options:
--pdf-dir: Directory containing PDFs and generated .md files--gold-dir: Directory containing .gold.md reference files--output-report: JSON file for machine-readable results (default: validation_report.json)--metrics: Comma-separated metrics to compute (default: all)--ci-mode: Fail with non-zero exit code if score below threshold--fail-below: Minimum acceptable score (default: 75)# Identify and categorize failures
python3 scripts/analyze_failures.py \
<report.json> \
[--group-by failure_type|document|metric] \
[--export <output.csv>]
# Compare two validation runs
python3 scripts/compare_runs.py \
<baseline_report.json> \
<current_report.json> \
[--show-improvements] \
[--show-regressions]
Edit PDFs with natural-language instructions using the nano-pdf CLI.