Vocabulary learning strategies and retention science for any language -- frequency-based word selection, spaced repetition systems (Ebbinghaus forgetting curve, Leitner system, SM-2 algorithm), cognate exploitation, word family networks, context-based learning, collocations, depth of word knowledge (form, meaning, use), reading progression from controlled to authentic texts, and productive vs. receptive vocabulary thresholds. Use when building vocabulary plans, optimizing review schedules, selecting learning materials, or diagnosing vocabulary gaps.
Vocabulary is the single strongest predictor of reading comprehension and communicative competence in a second language. Without grammar, very little can be conveyed; without vocabulary, nothing can be conveyed. This skill covers the science and practice of vocabulary learning as a meta-skill applicable to any target language.
Agent affinity: krashen (comprehensible input, incidental acquisition), bruner-l (scaffolding, pedagogical sequencing)
Concept IDs: lang-high-frequency-words, lang-spaced-repetition, lang-cognates-word-families, lang-reading-progression
Research by Nation (2001, 2006) establishes frequency-based thresholds:
| Words Known | Coverage of Running Text | Practical Ability |
|---|---|---|
| 1,000 word families | ~78-80% | Basic survival communication |
| 2,000 word families | ~85-90% | Simple conversation, graded readers |
| 3,000 word families |
| ~93-95% |
| Unassisted reading of simplified texts |
| 5,000 word families | ~97-98% | Independent reading of most authentic texts |
| 8,000-9,000 word families | ~98-99% | Near-native reading comprehension |
A word family includes a base word plus all its inflected and derived forms: "nation, national, nationality, nationalize, internationally" = one word family. The 2,000 most frequent word families in any language cover a disproportionate share of everyday text, making frequency-based learning the most efficient strategy.
Receptive vocabulary is always larger than productive vocabulary. A learner may recognize 5,000 word families but actively produce only 2,000. Teaching methods that target recognition (reading, listening) build receptive vocabulary faster; production practice (speaking, writing) converts receptive knowledge to productive use.
Hermann Ebbinghaus (1885) demonstrated that memory decays exponentially after initial learning. Without review, approximately 60% of new material is forgotten within 24 hours, and 80% within a week. The critical insight: a single well-timed review dramatically flattens the forgetting curve.
Reviewing information at increasing intervals produces stronger retention than massed practice (cramming). Each successful retrieval at a longer interval strengthens the memory trace.
Optimal review schedule (simplified):
Leitner system (physical flashcards). Cards are sorted into boxes. Correct answers advance a card to the next box (longer review interval). Incorrect answers send the card back to Box 1 (immediate re-review). Simple and effective with physical materials.
SM-2 algorithm (Anki and similar). SuperMemo's algorithm computes per-item review intervals based on a difficulty rating (1-5) assigned at each review. Easy items get exponentially increasing intervals; difficult items are reviewed more frequently. Anki is the most widely used implementation.
Practical guidance:
Cognates are words in two languages that share a common etymological origin and similar form: English "telephone" / French "telephone" / Spanish "telefono" / German "Telefon."
Cognate density by language pair:
False cognates (false friends): Words that look similar but differ in meaning. English "actually" vs. French "actuellement" (currently). English "embarrassed" vs. Spanish "embarazada" (pregnant). Systematic cataloging of false friends prevents interference.
Words are not isolated units but members of morphological families:
Root + affixes:
Semantic fields:
Collocations:
Knowing a word is not binary. Nation (2001) identifies eight aspects of word knowledge:
| Aspect | Receptive | Productive |
|---|---|---|
| Form: spoken | Recognize the word when heard | Pronounce the word correctly |
| Form: written | Recognize the word when read | Spell the word correctly |
| Form: word parts | Recognize root, prefix, suffix | Use word parts to build forms |
| Meaning: form-meaning link | Recall meaning from form | Recall form from meaning |
| Meaning: concepts | Understand the concept behind the word | Use the word to express the concept |
| Meaning: associations | Know related words | Produce appropriate related words |
| Use: grammar | Recognize grammatical patterns | Use in correct grammatical patterns |
| Use: collocations | Recognize natural word combinations | Produce natural combinations |
| Use: register | Know formality constraints | Use at appropriate formality level |
Most learners have shallow knowledge of many words (form + basic meaning) and deep knowledge of few. Deepening word knowledge -- learning collocations, register, and productive use -- is a distinct learning task from adding new words.
Krashen's Input Hypothesis argues that most vocabulary is acquired incidentally through comprehensible input -- reading and listening to meaningful content where the learner encounters new words in context.
Conditions for successful incidental acquisition:
Extensive reading is the most powerful vehicle for incidental vocabulary acquisition. Graded readers (controlled vocabulary texts at the learner's level) provide the right density of known-to-unknown words.
Deliberate study (flashcards, word lists, exercises) is more efficient per unit of time for initial acquisition but less effective for depth and retention than incidental learning in context.
Optimal strategy: Use intentional learning for the first 2,000 high-frequency word families (these need to be acquired quickly for basic comprehension), then shift to extensive reading for incidental acquisition of the next 3,000-6,000 families.
Texts written within a strict vocabulary limit. Every unknown word is glossed or repeated. Purpose: build basic sight vocabulary and decoding confidence.
Simplified authentic stories or purpose-written narratives. Unknown word density: 2-5%. Purpose: extensive reading for pleasure and incidental vocabulary acquisition.
News articles, blog posts, children's literature adapted to reduce low-frequency vocabulary. Purpose: bridge from controlled to authentic input.
Unmodified texts with dictionary access or glossary. Unknown word density: 3-5%. Purpose: develop strategies for handling unknown vocabulary in real texts.
Native-audience materials: novels, newspapers, academic texts. Unknown word density: 1-2%. Purpose: native-like reading experience.