Convert lecture recordings (.vtt transcripts, optional .mp4 video and slides) into polished Markdown lecture notes with auto-extracted screenshots, embedded slides, speaker quotes, and Q&A sections. Use for any lecture, class, or talk recording.
Transform lecture recordings into polished, tutorial-style Markdown notes. Given a transcript (and optionally video, slides, and supplementary materials), produce a complete Markdown document with embedded images — including auto-extracted screenshots from demo moments in the video.
Execute these four phases in order. After Phase 0, confirm the inventory with the user before proceeding.
Scan the working directory and classify all relevant files:
| Type | Extensions | Role |
|---|---|---|
| Transcript | .vtt | Required — WebVTT from Zoom |
| Video | .mp4 | Optional — needed for auto-screenshots |
| Slides | .pptx, .pdf | Optional — embedded in notes |
| Supplementary | .md, .txt, , papers |
.docx| Optional — extra context |
| Manual screenshots | Images in screenshots/ | Optional — included alongside auto-extracted |
Check for required tools and Python packages:
pymupdf Python package — needed for PDF slide conversion. Check with python3 -c "import fitz". Install with pip install pymupdf if missing.opencv-python Python package — needed for screenshot extraction from video. Check with python3 -c "import cv2". Install with pip install opencv-python if missing.python-pptx Python package — needed for PPTX slide text extraction. Check with python3 -c "import pptx". Install with pip install python-pptx if missing. Only needed if PPTX files are present.Install missing Python packages with pip. All core functionality uses Python packages to avoid system dependency issues.
Present inventory to the user:
## Discovered Files
- Transcript: [filename.vtt]
- Video: [filename.mp4] (or "not found")
- Slides: [filename.pptx] (or "not found")
- Supplementary: [list or "none"]
- Manual screenshots: [count] images in screenshots/ (or "none")
## Tool Availability
- pymupdf (Python): [available/installed/missing]
- opencv-python (Python): [available/installed/missing — needed for video screenshots]
- python-pptx (Python): [available/installed/missing — only needed for PPTX slides]
Wait for user confirmation before proceeding.
If the transcript .vtt file is not found, stop and ask the user to provide one.
Skip this phase if no slide deck is found.
Goal: Convert slides into individual PNG images at slides/slide-NNN.png.
For PDF slides (preferred — Python-based, no external tools needed):
Use the bundled script (auto-installs pymupdf if missing):
python3 <skill-path>/scripts/pdf_to_slides.py "slides.pdf" slides/
For PPTX slides:
First check if libreoffice is available (which libreoffice or which soffice). If so, convert to PDF then to PNGs for high-quality slide images:
libreoffice --headless --convert-to pdf presentation.pptx --outdir .
Then use the pymupdf PDF-to-PNG conversion above on the resulting PDF.
If libreoffice is not available, use the bundled script to extract structured text from each slide for transcript alignment (auto-installs python-pptx if missing):
python3 <skill-path>/scripts/pptx_to_text.py "presentation.pptx" -o slides_text.json
Since python-pptx cannot render slides as images, rely on video screenshots to provide visuals for PPTX-based sessions when LibreOffice is unavailable.
Fallback chain for PDF slides if pymupdf is unavailable:
pdftoppm (from poppler): pdftoppm -png -r 200 slides.pdf slides/slide then rename to zero-padded formatconvert: convert -density 200 slides.pdf slides/slide-%03d.pngAfter conversion, report the number of slides extracted.
Read the .vtt file and parse it into structured entries:
{ index, start_time, end_time, speaker, text }
VTT format rules:
--> contain timestamps: HH:MM:SS.mmm --> HH:MM:SS.mmmSpeaker Name: text on the text lineWEBVTT header and any NOTE blocksMerge consecutive entries from the same speaker into coherent paragraphs for analysis.
Identify timestamps where visual content is important using three methods:
Method 1 — Regex triggers. Flag entries containing phrases like:
Method 2 — Gap detection. Flag gaps where start_time(entry N+1) - end_time(entry N) > 3 seconds. Silent pauses often indicate visual activity (typing, navigating, waiting for output).
Method 3 — Slide transitions. Flag phrases indicating slide changes:
Collect all flagged timestamps into a deduplicated list.
Skip if no .mp4 video is found.
For each detected timestamp, extract a frame with a 2-second offset (to let the screen settle).
Preferred method — Python with opencv-python (no system dependencies):
Use the bundled script (auto-installs opencv-python if missing). Pass timestamps as a JSON array of seconds:
python3 <skill-path>/scripts/extract_frames.py "recording.mp4" screenshots/ '[2700, 2850, 3000]'
Fallback — ffmpeg (if opencv-python is unavailable):
mkdir -p screenshots
ffmpeg -ss <seconds + 2> -i recording.mp4 -frames:v 1 -q:v 2 screenshots/auto_<timestamp>.png
Use the format auto_HHMMSS.png for filenames (e.g., auto_004523.png for 00:45:23).
Deduplication:
Slide vs. screenshot selection: When both a slide image and a video screenshot exist for the same moment, do not use both. The goal is one image per moment:
If a screenshots/ folder exists with pre-existing images, include them in the output. Sort manual screenshots by filename and interleave them with auto-extracted screenshots based on best-guess chronological order.
If slides were extracted in Phase 1:
slide-NNN.png → transcript segment at timestamp TThis mapping is used in Phase 3 to place slides at the correct positions in the notes.
All outputs (the Markdown file, slides/, and screenshots/) should be written into the session's subdirectory — the directory where the source materials (VTT, slides) are located. Name the Markdown file [session-name]-lecture-notes.md, deriving the session name from the subdirectory name or the VTT filename (e.g., if the VTT is 03-how-llms-actually-work.vtt, the output is 03-how-llms-actually-work-lecture-notes.md).
The Markdown file should have this structure:
---