Use this skill when users need to extract text from images, PDFs, or documents. Supports URLs and local files. Returns structured JSON containing recognized text.
Invoke this skill in the following situations:
Do not use this skill in the following situations:
MANDATORY RESTRICTIONS - DO NOT VIOLATE
python scripts/ocr_caller.pyIf the script execution fails (API not configured, network error, etc.):
Identify the input source:
--file-url parameter--file-path parameter--file-pathExecute OCR:
python scripts/ocr_caller.py --file-url "URL provided by user" --pretty
Or for local files:
python scripts/ocr_caller.py --file-path "file path" --pretty
Save result to file (recommended):
python scripts/ocr_caller.py --file-url "URL" --output result.json --pretty
Parse JSON response:
ok field: true means success, false means errortext field contains all recognized textok is false, display error.messagePresent results to user:
CRITICAL: Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.
text fieldtext content to the user, no matter how long it isCorrect approach:
I've extracted the text from the image. Here's the complete content:
[Display the entire text here]
Incorrect approach:
I found some text in the image. Here's a preview:
"The quick brown fox..." (truncated)
URL OCR:
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty
Local File OCR:
python scripts/ocr_caller.py --file-path "./document.pdf" --pretty
The script outputs JSON structure as follows:
{
"ok": true,
"text": "All recognized text here...",
"result": { ... },
"error": null
}
Key fields:
ok: true for success, false for errortext: Complete recognized textresult: Raw API response (for debugging)error: Error details if ok is falseWhen API is not configured:
The error will show:
CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com
Configuration workflow:
Show the exact error message to user (including the URL)
Tell user to provide credentials:
Please visit the URL above to get your PADDLEOCR_OCR_API_URL and PADDLEOCR_ACCESS_TOKEN.
Once you have them, send them to me and I'll configure it automatically.
When user provides credentials (accept any format):
PADDLEOCR_OCR_API_URL=https://xxx.paddleocr.com/ocr, PADDLEOCR_ACCESS_TOKEN=abc123...Here's my API: https://xxx and token: abc123Parse credentials from user's message:
Configure automatically:
python scripts/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN"
If configuration succeeds:
If configuration fails:
Authentication failed:
API_ERROR: Authentication failed (403). Check your token.
Quota exceeded:
API_ERROR: API rate limit exceeded (429)
No text detected:
text field is emptyIf recognition quality is poor, suggest:
For in-depth understanding of the OCR system, refer to:
references/output_schema.md - Output format specificationreferences/provider_api.md - Provider API contractNote: Model version and capabilities are determined by your API endpoint (PADDLEOCR_OCR_API_URL).
To verify the skill is working properly:
python scripts/smoke_test.py
This tests configuration and API connectivity.