Automatically process unprocessed audio and image files in Gastrohem daily WhatsApp folders. This skill should be used when the user asks to transcribe audio files, perform OCR on images, or process media in daily folders (e.g., "Process media in today's folder", "Transcribe audio and OCR images in 24.10 folder"). Handles audio transcription using insanely-fast-whisper (parallelized, creates .json) and image OCR using Claude's vision capabilities (creates natural .md summaries with Gastrohem-relevant info).
Automatski procesira WhatsApp media fajlove (audio i slike) u Gastrohem dnevnim folderima.
Što radi:
.json fajlove.md fajlove sa prirodnim sažetkomPerformance:
Note: Skill kreira .json za audio i .md za slike - dodavanje u chat.md je odvojen korak.
User says:
Default behavior: Uses today's date, scans all departments automatically.
Process all media for today (DEFAULT):
python .claude/skills/gastrohem-media-processor/scripts/process_media.py
Process specific date:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py --scan-date 24.10
Process specific folder:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py --folder "gastrohem whatsapp/administracija/20.10 - 27.10/24.10"
Audio Processing:
.mp3, .ogg, .m4a, .wav, .opus).json transcriptions{audio_filename}.json
{
"speakers": [],
"chunks": [...],
"text": "Full transcribed text here"
}
Image Processing:
.png, .jpg, .jpeg, .webp, .bmp).md files{image_filename}.md
# image.png
**Poslao:** Mahir Kadic
**Datum:** 26.10.2025 13:58
---
[Natural language summary focusing on Gastrohem-relevant information:
contacts, names, emails, phone numbers, business details, etc.]
The skill now uses three modular scripts for better organization:
Purpose: Audio transcription only (parallelized)
Usage:
python scripts/process_audio.py "path/to/folder" [--max-workers 3] [--output-json results.json]
Arguments:
folder - Path to folder containing audio files--max-workers N - Max parallel processes (default: 3)--no-skip-existing - Re-transcribe files with existing JSON--output-json FILE - Save results to JSONWhat it does:
.mp3, .ogg, .m4a, .wav, .opus).json transcriptions{audio_filename}.jsonRequirements: insanely-fast-whisper in PATH
Purpose: Image OCR helper functions
Usage:
# List images needing OCR
python scripts/process_images.py "path/to/folder" [--output-json images.json]
# Use in Python for batch processing
from process_images import save_ocr_md, batch_save_ocr, get_images_needing_ocr
Key functions:
get_images_needing_ocr(folder_path) - Returns list of images without .md filessave_ocr_md(image_file, summary, sender) - Save natural summary to .md filebatch_save_ocr(ocr_results) - Save multiple OCR results at onceMarkdown structure:
# image.png
**Poslao:** Mahir Kadic
**Datum:** 26.10.2025 13:58
---
Natural language summary focusing on Gastrohem-relevant information:
contacts, names, emails, phone numbers, business details, etc.
Purpose: Master script combining audio + images
Usage:
# Scan all folders for a specific date
python scripts/process_media.py --scan-date DD.MM [--output-json results.json]
# Process a specific folder
python scripts/process_media.py --folder "path/to/folder" [--output-json results.json]
Arguments:
--scan-date DD.MM - Scan all departments for this date--folder PATH - Process specific folder--base-path PATH - Base path (default: gastrohem whatsapp)--no-skip-existing - Re-process all files--output-json FILE - Save results to JSONWhat it does:
process_audio.py for audio filesprocess_images.py to find images needing OCR--scan-date for daily processing - Automatically finds all folders for a specific date across all departments--output-json to keep a record of processing resultschat.md is a separate workflowIf transcription fails:
insanely-fast-whisper is installedIf image OCR is unclear:
User: "Process media"
Claude:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py.json files created.md files createdUser: "Process media for 24.10"
Claude:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py --scan-date 24.101. Paralelizacija Audio Transkripicja:
ThreadPoolExecutor za paralelno izvršavanje2. Batch OCR za Slike:
.md fajl: image.png.mdsave_ocr_md() za lako čuvanje3. Automatski Scan Svih Foldera:
Struktura Skripti:
process_audio.py - Audio only (paralelno)process_images.py - Image OCR helper functionsprocess_media.py - Master skripta (kombinuje oba)Tipična brzina: