Preferred workflow:
- Extract full text into a stable workspace artifact such as
artifacts/raw_text.txt.
- Recover figure-adjacent context into a smaller artifact such as
artifacts/figure_context.txt.
- Render the figure-relevant pages into image artifacts under
artifacts/page_previews/.
- Crop or isolate the target figure into a stable reference image such as
artifacts/figure_reference.png.
- Turn recovered evidence into explicit intermediate files:
caption.txt
equations.txt or equations.md
parameters.json
plot_semantics.json
figure_reference_context.json
- If direct text extraction is weak, use OCR on rendered pages or cropped figure regions and persist the OCR outputs under
artifacts/ocr/.
- Preserve exact commands and intermediate files so reviewer can audit provenance.
For paper figure reproduction, visual evidence is mandatory:
- preserve the original plotted image or the best available crop from the paper
- capture visible axis labels, tick marks, legends, annotations, and panel labels