Import vocabulary from Word documents (.docx) containing Arabic words. Use when the user wants to import, add, or extract vocabulary from a Word document, docx file, or Arabic learning materials. Extracts Arabic-English word pairs and creates/updates decks based on categories.
This skill extracts Arabic vocabulary from .docx files and imports them into the flashcard system.
processed/ folderUse pandoc to convert the Word document to markdown:
pandoc "path/to/document.docx" -t markdown
From the extracted content, identify vocabulary entries and create a markdown file in the same directory as the source document.
Filename pattern: {original_name}.vocab.md
Example: صفات.docx → صفات.vocab.md
Markdown format:
# Vocabulary: صفات (Adjectives)
Source: صفات.docx
Extracted: 2024-01-15
Category: Adjectives
## Words
| # | Arabic | English | Notes | Import |
|---|--------|---------|-------|--------|
| 1 | جديد / جديدة | new | masc/fem | ✓ |
| 2 | قديم / قديمة | old | masc/fem | ✓ |
| 3 | ثقيل / ثقيلة | heavy | masc/fem | ✓ |
| 4 | خفيف / خفيفة | light | masc/fem | ✓ |
## Summary
- Total words found: 12
- Ready to import: 12
- Skipped: 0
Parsing rules:
{dir="rtl"} markers from Arabic textPresent the generated markdown to the user and ask:
I've extracted 12 vocabulary items from "صفات.docx"
Category detected: Adjectives (صفات)
Target deck: [New deck "Adjectives (صفات)" / Existing deck "..."]
Preview:
1. جديد / جديدة - new
2. قديم / قديمة - old
3. ثقيل / ثقيلة - heavy
...
Would you like to:
- Import all words
- Review the full list in صفات.vocab.md first
- Skip certain words (specify which)
Check existing decks:
curl -s http://localhost:3001/api/decks
Determine deck:
Create new deck (if needed):
curl -X POST "http://localhost:3000/api/decks" \
-H "Content-Type: application/json" \
-d '{"name": "Adjectives (صفات)", "description": "Imported from صفات.docx"}'
curl -X POST "http://localhost:3000/api/decks/{deck_id}/cards" \
-H "Content-Type: application/json" \
-d '[
{"front": "جديد / جديدة", "back": "new", "notes": "masc/fem"},
{"front": "قديم / قديمة", "back": "old", "notes": "masc/fem"}
]'
After successful import, move both files to a processed/ subfolder:
# Create processed folder if it doesn't exist
mkdir -p "path/to/processed"
# Move the original docx
mv "path/to/document.docx" "path/to/processed/"
# Move the generated vocab markdown
mv "path/to/document.vocab.md" "path/to/processed/"
Folder structure after processing:
example-arabic-docs/
├── processed/
│ ├── صفات.docx
│ ├── صفات.vocab.md
│ ├── الأسماء.docx
│ └── الأسماء.vocab.md
└── new-document.docx (not yet processed)
✓ Imported 12 cards to deck "Adjectives (صفات)" (ID: 5)
Files archived:
→ processed/صفات.docx
→ processed/صفات.vocab.md
View deck: http://localhost:3000/deck/5
npm run dev