Convert local files or URLs with a locally deployed Docling Gradio service into Markdown, JSON, HTML, text, or DocTags, with OCR and image export support. Use when handling `.docx`, `.pdf`, `.pptx`, `.xlsx`, `.html`, images, or web pages for document-to-Markdown conversion, batch conversion, image extraction, or Docling-based parsing through `http://localhost:5001`.
Use this skill to run document conversion through a local Docling service instead of ad-hoc parsing.
http://localhost:5001.scripts/docling_gradio_convert.py for repeatable work. It wraps the documented Gradio API and handles submission, waiting, and archive extraction.pip install gradio_client
beautifulsoup4 is missing, install it:pip install beautifulsoup4 lxml
references/gradio-api-workflow.md only when changing endpoints, tuning advanced options, or debugging output layouts.Classify the inputs. Use the file flow for local paths and the URL flow for web pages. Do not mix files and URLs in one API request; if the user gives both, run two jobs.
Choose the outputs.
Default to md.
Add json when the user also needs structured output.
Add html, text, or doctags only when the task explicitly needs them.
Choose the processing options.
Keep pipeline=standard, ocr=true, force_ocr=false, pdf_backend=dlparse_v4, and table_mode=accurate unless the task calls for a change.
Keep image_export_mode=embedded when the goal is to preserve extracted images. The wrapper post-processes embedded Markdown images into real files under images/.
Turn on enrichment flags only when the user explicitly wants code, formulas, picture classification, or picture descriptions.
For URL jobs, the wrapper also normalizes Markdown output by injecting stable front matter, preserving unknown existing front matter keys, and prepending # title only when the document body does not already start with that title.
Run the wrapper script.
# Single file
python scripts/docling_gradio_convert.py report.pdf
# Batch files with Markdown + JSON
python scripts/docling_gradio_convert.py "*.pdf" --to-format md --to-format json
# Single URL
python scripts/docling_gradio_convert.py https://example.com/article --output-dir ./article
# Single URL with optional sidecar files
python scripts/docling_gradio_convert.py https://example.com/article --save-source-html --save-manifest
# Alternate service URL
python scripts/docling_gradio_convert.py slides.pptx --service-url http://localhost:5001
return_as_file=true, downloads the returned artifact, extracts it into the chosen output directory, rewrites embedded Markdown images into local files when needed, and for URL conversions can backfill Docling image placeholders from the source page.
URL Markdown outputs are post-processed after extraction so the final .md contains normalized front matter plus a title heading when needed.
Inspect the produced Markdown plus any extracted image assets before presenting the result to the user.docling-<slug> under the current working directory.docling-files-batch or docling-urls-batch under the current working directory, unless --output-dir is supplied.--output-dir and both file and URL jobs are needed, the script creates files/ and urls/ subdirectories to keep the results separate.scripts/docling_gradio_convert.py --dry-run ... to verify grouping, endpoint selection, and destination paths without contacting the service.http://localhost:5001 becomes http://localhost:5001/ui/./change_ocr_lang for the default OCR language set when --ocr-lang is not provided. Fall back to en,fr,de,es if the endpoint is unavailable.gradio_client installation as an environment issue and fix it with pip install gradio_client instead of rewriting the workflow.<!-- 🖼️❌ Image not available ... -->, let the wrapper fetch the source page, collect article images, download them into images/, and replace placeholders in order.url, title, description, author, published, cover_image, language, captured_at, converter, pipeline, ocr, and ocr_lang.--save-source-html to write source.html for single URL jobs, and --save-manifest to write manifest.json with the conversion settings and output summary.scripts/docling_gradio_convert.pyUse this wrapper for deterministic Docling conversions. It supports:
references/gradio-api-workflow.mdRead this reference when you need:
wait_task_finish tuple layout