Generate Anki flashcards (Basic and Cloze) from source materials (PDFs, lecture slides, text files). Uses dual-pass PDF analysis with pdftotext and Ollama vision models. Outputs import-ready .txt files and figures.md for manual image occlusion.
Generate high-quality Anki flashcards from educational materials through manual curation by the LLM.
Understanding Anki's data model is essential for card generation:
Note: A single record in Anki's database. A note contains fields (text content) and can generate one or more cards depending on the note type.
Card: An individual flashcard that the user answers. Cards are generated from notes based on the note type:
{{c1::...}}, {{c2::...}}, etc.){{c1::...}} appears multiple times) create only one card - they are revealed togetherBasic Note (1 card):
DMA short <b>Direct Memory Access</b> Hardware mechanism for transferring data...
→ 1 card asking "What is DMA?"
Basic Note with Reverse=y (2 cards):
WWAN short <b>Wireless Wide Area Network</b> Cellular networks... y
→ Card 1: "WWAN → What does it stand for and what is it?" → Card 2: "Wireless Wide Area Network → What is the acronym?"
Cloze Note (3 cards):
IoT - everything connected to the Internet, equipped with {{c1::sensors}} and {{c2::actuators}}.
→ Card 1: Tests "sensors" → Card 2: Tests "actuators"
Cloze Note with same number (1 card):
The {{c1::Application}} and {{c1::Transport}} layers are end-to-end.
→ 1 card that reveals both Application AND Transport together
When checking for duplicate concepts, compare cards (what the user actually answers), not just notes. A single cloze note with multiple deletions creates multiple cards, and each must be checked against Basic cards.
When the user asks to generate Anki cards:
Ask the user for:
<basename>_analysis.jsonpython /path/to/skill/pdf_reader.py <pdf_path>
The script outputs JSON to <pdf_dir>/<pdf_basename>_analysis.json with:
Read directly and analyze content structure.
CRITICAL: Cards must be manually curated by the LLM reading the JSON analysis. Do NOT use automated scripts.
The LLM should:
handwritten_annotations field in the JSON may contain important clarifications, corrections, or exam hints added by the lecturerName[tab]Alternative Names[tab]Front Text[tab]Back Text[tab]Reverse[tab]Notes[tab]Tags
| Field | Content |
|---|---|
| Name | Short concept/term (1-4 words, title case) |
| Alternative Names | short <b>ACRONYM EXPANSION</b> for acronyms - shows what the acronym stands for |
| Front Text | Additional context (often empty - Name field serves as question) |
| Back Text | Answer content with HTML formatting |
| Reverse | y ONLY for bidirectional recall (term↔definition, acronym↔long form) |
| Notes | Images via <img src="paste-...">, examples |
| Tags | Hierarchical: source::section::subsection |
Create TWO cards for each acronym:
DMA with short <b>Direct Memory Access</b> in Alt Names, definition in Back Text, Reverse=yLong form of DMA with empty Alt Names, empty Front Text, <b>Direct Memory Access</b> in Back Text, empty ReverseIMPORTANT: The short <b>... format in Alt Names is ONLY for acronyms. For regular concept definitions, put the term in Name field and definition in Back Text, leaving Alt Names empty.
Good:
WWAN with short <b>Wireless Wide Area Network</b> in Alt Names, Reverse=y (acronym with bidirectional recall)Long form of WWAN with <b>Wireless Wide Area Network</b> in Back Text (standalone lookup)Mobile Computing with definition in Back Text, Alt Names empty (concept)Bad:
Mobile Computing with long <b>computing environment allowing... in Alt Names (NOT an acronym)#separator:Tab
#html:true
#notetype:Basic
#deck:<DeckName>
#tags column:7
Text with {{c1::cloze deletions}}[tab]Tags
{{c1::}}) for same concept expressed differently{{c1::}} {{c2::}} {{c3::}}) for related but distinct conceptsReal-time = {{c1::reaction in time}}, not {{c2::fast}}.#separator:Tab
#html:true
#notetype:Cloze
#deck:<DeckName>
#tags column:2
List all figures with page numbers and descriptions for MANUAL image occlusion creation.
DO NOT create "describe the figure" Basic cards. Figures are for image occlusion cards that the user creates manually.
IMPORTANT: Add TODO: Insert <description> from p.XY to the Notes field (column 6) of Basic cards that have relevant diagrams. This serves as a reminder to the user to insert images manually.
TODO: Insert ... notes# Figures for Manual Image Occlusion
**Note:** Cards with diagrams already have `TODO: Insert ...` notes in their Notes field in `<deck>_basic.txt`. This file provides additional context for manual image occlusion cards.
Generate three files:
<output_dir>/<deck_name>_basic.txt — Basic cards<output_dir>/<deck_name>_cloze.txt — Cloze cards<output_dir>/<deck_name>_figures.md — Figures for manual image occlusionAfter generating output files, run the following verification checks:
# Verify Basic cards have exactly 7 fields (6 tabs)
awk -F'\t' '{print NF}' <deck>_basic.txt | sort | uniq -c
# Expected: all lines show 7 (except 5 for header)
# Verify Cloze cards have exactly 2 fields (1 tab)
awk -F'\t' '{print NF}' <deck>_cloze.txt | sort | uniq -c
# Expected: all lines show 2 (except 1 for header)
# Check for unescaped ampersands (should use &)
grep -n '&' <deck>_basic.txt | grep -v '&' | grep -v '\\('
# Check for incorrectly escaped HTML tags (should NOT use < > for tags)
grep -n '<' <deck>_basic.txt
# Check for < in comparisons that should be < (not HTML tags)
grep -n '< ' <deck>_basic.txt
# If found, replace with < (e.g., "t_d < t_b" → "t_d < t_b")
# Check Reverse field contains only "y" or empty
awk -F'\t' 'NR>5 {print $5}' <deck>_basic.txt | sort | uniq -c
# Expected: only empty lines and "y"
# Cards with Reverse=y should have acronym expansion in Alt Names (field 2)
awk -F'\t' 'NR>5 && $5=="y" && $2=="" {print $1}' <deck>_basic.txt
# These should be acronym cards with "short <b>EXPANSION</b>" in Alt Names
# Alt Names should use correct format: "short <b>...</b>" for acronyms only
# Check for "short <b>" format (correct for acronyms)
awk -F'\t' 'NR>5 && $2!="" && $2 !~ /^short <b>/ && $2 !~ /^long <b>/ {print NR": "$1" | "$2}' <deck>_basic.txt
# "Long form of X" cards should have:
# - Empty Alt Names (field 2)
# - Empty Reverse (field 5)
# - Just "<b>EXPANSION</b>" in Back Text (field 4)
grep "^Long form of" <deck>_basic.txt
If generating from multiple PDFs in same course, check for semantic duplicates:
# Compare with existing output files
awk -F'\t' 'NR>5 {print $1}' outputs/*/_basic.txt | sort | uniq -c | sort -rn
# Names appearing multiple times may indicate duplicates
# Check for acronym expansions in cloze that duplicate basic
grep -E "{{c[0-9]+::.*Network}}|{{c[0-9]+::.*System}}|{{c[0-9]+::.*Multiplexing}}" <deck>_cloze.txt
Review the generated cards and eliminate redundancy:
Example of semantic duplicate:
MANET with short <b>Mobile Ad-hoc Network</b> in Alt Names, Reverse=yLong form of MANET with <b>Mobile Ad-hoc Network</b> in Back TextMANET = {{c1::Mobile Ad-hoc Network}}All three test the same knowledge (acronym↔expansion). Keep:
Avoid in Cloze:
MANET = {{c1::Mobile Ad-hoc Network}} — redundant with "Long form of MANET" Basic cardThe acronym WWAN stands for {{c1::Wireless Wide Area Network}} — redundantOK in Cloze (different enough):
Wireless network types by range: {{c1::WWAN}} (widest), {{c2::WMAN}}, {{c3::WLAN}}, {{c4::WPAN}} (narrowest). — tests ordering, not just expansionCorrect any issues found before finalizing.
Based on review of generated card decks, these content types are frequently missed or underrepresented:
handwritten_annotations field.long <b>Full Term</b> or short <b>ACRONYM</b> format<ul><li>...</li></ul> for unordered lists<ol><li>...</li></ol> for ordered lists<b>...</b> for bold<br> for line breaks& for ampersand (IMPORTANT: this is the ONLY escaped character)< and > as < and > - HTML tags like <ul>, <li>, <b>, <br> must be plain& as & - This is required for valid HTML< in comparisons (e.g., <1s), use < since it's not an HTML tag\(...\) for LaTeX, e.g., \(\neq\) for ≠, \(\mu\) for μ\[...\] or $$...$$ for display equations\(d^2\), \(c\), \(n\)Transmission power scales with \(d^2\) in free space.y ONLY when bidirectional recall makes senseTODO: Insert <description> from p.XY for supplementary visuals (not tested)source::section::subsectionrts::introduction::definition, rts::challenges#separator:Tab
#html:true
#notetype:Basic
#deck:RTOS Introduction
#tags column:7
DMA short <b>Direct Memory Access</b> Hardware mechanism for transferring data between I/O devices & main memory, relieving CPU from controlling I/O data transfer. y rts::introduction::challenges
Long form of DMA <b>Direct Memory Access</b> rts::acronyms
Long form of WCET <b>Worst-Case Execution Time</b> rts::acronyms
#separator:Tab
#html:true
#notetype:Cloze
#deck:RTOS Introduction
#tags column:2
Real-time = {{c1::reaction in time}}, not {{c2::fast}}. rts::introduction::definition
RT tasks are classified as {{c1::hard}}, {{c2::firm}}, or {{c3::soft}} based on consequences of missed deadline. rts::introduction::classification
{{c1::Testing and Debugging}} don't show correctness. Programs must be {{c1::composed correctly}} and need {{c1::sophisticated design, rigorous analysis methods}} to {{c1::prove correctness incl. timing behavior}}. rts::introduction::lessons
# Figures for Manual Image Occlusion
**Note:** Cards with diagrams already have `TODO: Insert ...` notes in their Notes field in `<deck>_basic.txt`. This file provides additional context for manual image occlusion cards.
## Source: Lecture Slides
### Real-Time Systems Definition
**Page 4: Controlled System Diagram**
- Block diagram showing Input → Real-time Computing System → Output
- Callout about late reactions being dangerous
### DMA Architecture
**Page 17: DMA Block Diagram**
- CPU, Memory, I/O Device, DMA Device connections
- Memory bus sharing concept
## Cards Requiring Manual Image Insertion
### Basic Cards (Notes field)
| Card Name | Image Reference | Page |
|-----------|----------------|------|
| DMA | DMA block diagram | p.17 |
### Image Occlusion Cards (for manual creation)
| Card Name | Image Reference | Page |
|-----------|----------------|------|
| RTS Definition | Controlled System diagram | p.4 |
| DMA Architecture | CPU-Memory-I/O-DMA diagram | p.17 |
Note: Basic cards that reference figures have TODO: Insert <description> from p.XY in their Notes field, e.g.:
DMA short <b>Direct Memory Access</b> Hardware mechanism for transferring data between I/O devices & main memory, relieving CPU from controlling I/O data transfer. y TODO: Insert DMA block diagram from p.17 rts::introduction::challenges
ollama librarypdftotext CLI tool (poppler-utils)pdf2image package for rendering pagespip install ollama pdf2image
# Ubuntu/Debian:
sudo apt-get install poppler-utils
Check if a .venv or venv virtual environment exists in the workspace. If it exists and contains the required packages (ollama, pdf2image), use it:
<venv_path>/bin/python <skill_path>/pdf_reader.py <pdf_path>
If no venv exists or dependencies are missing, create one and install packages.
From PDF:
User: Create Anki cards from lecture/Chapter1-Introduction.pdf put the outputs to outputs/
Multiple files:
User: Generate cards from lecture/Chapter3.pdf and lecture/Chapter4.pdf
With explicit deck name:
User: Generate cards from slides.pdf with deck name "RTOS Final Exam"
<ul>, <li>, <b>, <br> are correct& needs to be escaped as &< in comparisons, use < (e.g., <1s → <1s)\(d^2\), \(c\), \(\mu\) - this makes them visually distinctshort <b>EXPANSION</b> in Alt Names field for acronyms; do NOT use long <b>... format for non-acronym conceptsEdit PDFs with natural-language instructions using the nano-pdf CLI.