Intelligent screenshot and screen capture with OCR, markdown conversion, and annotation. Triggered by PrtSc key, captures screen regions, extracts text with OCR, converts to markdown using MarkItDown, saves with auto-formatting. Similar to Windows Snipping Tool with AI enhancements for text extraction and document processing.
Intelligent screen capture with OCR, markdown conversion, and smart formatting. Capture screen regions, extract text, convert images/PDFs to markdown, and save with automated formatting.
Trigger methods:
PrtSc (customizable)python scripts/capture.pyWorkflow:
Core (required):
# Screenshot and OCR
pip install pillow pyautogui mss pytesseract pyscreenshot --break-system-packages
# MarkItDown (Microsoft's converter)
pip install markitdown --break-system-packages
# Keyboard hooks
pip install keyboard pynput --break-system-packages
# GUI for dialogs
pip install tkinter --break-system-packages # May be pre-installed
OCR engine (Tesseract):
Windows:
# Download installer from:
# https://github.com/UB-Mannheim/tesseract/wiki
# Install to: C:\Program Files\Tesseract-OCR\
# Add to PATH
macOS:
brew install tesseract
Linux:
sudo apt-get install tesseract-ocr
# or
sudo dnf install tesseract
Optional enhancements:
# Better OCR (EasyOCR - slower but more accurate)
pip install easyocr --break-system-packages
# PDF handling
pip install pdf2image pypdf2 --break-system-packages
# Image enhancement
pip install opencv-python --break-system-packages
# Clipboard integration
pip install pyperclip --break-system-packages
See reference/setup-guide.md for detailed installation.
1. Region Selection
2. Window Capture
3. Full Screen
4. Scrolling Capture
OCR Engines:
Smart text processing:
Using MarkItDown (Microsoft):
Conversion features:
Keyboard shortcut:
# Run as background service
python scripts/screenshot_service.py
# Now press PrtSc anytime:
# 1. Screen freezes
# 2. Choose "Image" or "Text"
# 3. Select region
# 4. Auto-process and save
Command line:
# Capture with UI
python scripts/capture.py
# Capture full screen immediately
python scripts/capture.py --fullscreen --output screenshot.png
# Capture region with coordinates
python scripts/capture.py --region 100,100,800,600 --output region.png
Interactive:
# Start capture
python scripts/capture.py --mode text
# Process:
# 1. Select region
# 2. OCR extracts text
# 3. MarkItDown formats
# 4. Save dialog opens
# 5. Save as .md file
Automatic:
# Capture and OCR
python scripts/capture_text.py --output extracted.md
# With specific language
python scripts/capture_text.py --lang eng+fra --output text.md
# With enhancement
python scripts/capture_text.py --enhance --output clean.md
Interactive:
# Start capture
python scripts/capture.py --mode image
# Process:
# 1. Select region
# 2. Annotation tools appear
# 3. Add arrows, boxes, text
# 4. Save dialog opens
With annotations:
# Capture and annotate
python scripts/capture_annotate.py --output annotated.png
# Annotation tools:
# - Arrow
# - Rectangle
# - Circle
# - Text
# - Highlight
# - Blur (redact sensitive info)
Convert PDF to markdown:
# Using MarkItDown
python scripts/pdf_to_markdown.py --input document.pdf --output document.md
# With OCR for scanned PDFs
python scripts/pdf_to_markdown.py --input scanned.pdf --ocr --output text.md
# Batch convert folder
python scripts/batch_pdf_convert.py --input ./pdfs/ --output ./markdown/
Process existing image:
# Extract text to markdown
python scripts/image_to_markdown.py --input screenshot.png --output text.md
# Clean up image first
python scripts/enhance_and_extract.py --input noisy.png --output clean.md
Settings file: config.yaml
# Keyboard shortcut