Automates the process of removing specific pages and redacting confidential text from PDF documents using the core logic scripts.
This skill provides instructions on how to use the automated cli.py or core logic within the workspace to perform batch PDF editing tasks.
To quickly and efficiently redact specific text or remove unwanted pages from a list of PDF files recursively, without any manual UI interaction (e.g. for CI/CD or Agentic automation).
Verify Dependencies: Ensure the virtual environment has all requirements installed from requirements.txt (specifically pymupdf).
Execution via CLI:
You can run the script located at cli.py for any target PDF.
Remove single or multiple pages (1-indexed):
python cli.py --input path/source.pdf --output path/dest.pdf --remove-pages 1,4,5
Redact specific text:
python cli.py --input path/source.pdf --output path/dest.pdf --redact-text "Confidencial"
Both operations:
python cli.py --input path/source.pdf --output path/dest.pdf --remove-pages 2 --redact-text "Secret"
Batch Processing (Agent task):
If you (the Agent) need to iterate over a folder, use a standard Python script traversing the folder and calling core.pdf_processor.delete_pages / redact_text programmatically as opposed to branching multiple CLI calls to save overhead.
cli.py and UI are 1-indexed (for humans).core/pdf_processor.py arguments are expected to be 0-indexed (for PyMuPDF). Ensure you handle this conversion if operating through python code.Edit PDFs with natural-language instructions using the nano-pdf CLI.