Extracts text (with locations) from images and PDF documents using PaddleOCR.
Invoke this skill in the following situations:
Do not use this skill in the following situations:
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
python scripts/ocr_caller.pyIf the script execution fails (API not configured, network error, etc.):
Identify the input source:
--file-url parameter--file-path parameter--file-pathInput type note:
Execute OCR:
python scripts/ocr_caller.py --file-url "URL provided by user" --pretty
Or for local files:
python scripts/ocr_caller.py --file-path "file path" --pretty
Default behavior: save raw JSON to a temp file:
--output is omitted, the script saves automatically under the system temp directory<system-temp>/paddleocr/text-recognition/results/result_<timestamp>_<id>.json--output is provided, it overrides the default temp-file destination--stdout is provided, JSON is printed to stdout and no file is savedResult saved to: /absolute/path/...--stdout only when you explicitly want to skip file persistenceParse JSON response:
ok field: true means success, false means errortext field contains all recognized text--stdout is used, parse the stdout JSON directlyok is false, display error.messagePresent results to user:
CRITICAL: Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.
text fieldtext content to the user, no matter how long it isCorrect approach:
I've extracted the text from the image. Here's the complete content:
[Display the entire text here]
Incorrect approach:
I found some text in the image. Here's a preview:
"The quick brown fox..." (truncated)
Example 1: URL OCR:
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty
Example 2: Local File OCR:
python scripts/ocr_caller.py --file-path "./document.pdf" --pretty
Example 3: OCR With Explicit File Type:
python scripts/ocr_caller.py --file-url "https://example.com/input" --file-type 1 --pretty
Example 4: Print JSON Without Saving:
python scripts/ocr_caller.py --file-url "https://example.com/input" --stdout --pretty
The output JSON structure is as follows:
{
"ok": true,
"text": "All recognized text here...",
"result": { ... },
"error": null
}
Key fields:
ok: true for success, false for errortext: Complete recognized textresult: Raw API response (for debugging)error: Error details if ok is falseRaw result location (default): the temp-file path printed by the script on stderr
You can generally assume that the required environment variables have already been configured. Only when an OCR task fails should you analyze the error message to determine whether it is caused by a configuration issue. If it is indeed a configuration problem, you should notify the user to fix it.
When API is not configured:
The error will show:
CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com
Configuration workflow:
Show the exact error message to the user (including the URL).
Guide the user to configure securely:
- PADDLEOCR_OCR_API_URL
- PADDLEOCR_ACCESS_TOKEN
- Optional: PADDLEOCR_OCR_TIMEOUT
If the user provides credentials in chat anyway (accept any reasonable format), for example:
PADDLEOCR_OCR_API_URL=https://xxx.paddleocr.com/ocr, PADDLEOCR_ACCESS_TOKEN=abc123...Here's my API: https://xxx and token: abc123Then parse and validate the values:
PADDLEOCR_OCR_API_URL (look for URLs with paddleocr.com or similar)PADDLEOCR_OCR_API_URL is a full endpoint ending with /ocrPADDLEOCR_ACCESS_TOKEN (long alphanumeric string, usually 40+ chars)Ask the user to confirm the environment is configured.
Retry only after confirmation:
Authentication failed:
API_ERROR: Authentication failed (403). Check your token.
Quota exceeded:
API_ERROR: API rate limit exceeded (429)
No text detected:
text field is emptyIf recognition quality is poor, suggest:
For in-depth understanding of the OCR system, refer to:
references/output_schema.md - Output format specificationNote: Model version, capabilities, and supported file formats are determined by your API endpoint (
PADDLEOCR_OCR_API_URL) and its official API documentation.
To verify the skill is working properly:
python scripts/smoke_test.py
This tests configuration and API connectivity.