General quality standards for all je-dict-1 dictionary entries. Use when creating or revising any entry type.
When creating or revising dictionary entries for je-dict-1, follow these quality standards:
DO NOT use Python scripts or automation to mass-produce entries.
Each dictionary entry must be written individually by hand, using:
verb-entry, adjective-entry, particle-entry, other-entries, vocabulary-notes)Why this matters:
The correct workflow:
candidate_words.json or user requestpython3 build/validate.pyAfter finishing all entries for a session:
python3 build/validate.py # Validate all entries
python3 build/add_conjugations.py # Add conjugation to any new verbs
python3 build/add_adjective_conjugations.py # Add conjugation to any new i-adjectives
python3 build/update_indexes.py # Update indexes and sync candidates
python3 build/build_flat.py # Rebuild website (REQUIRED for GitHub Pages)
git add entries/ docs/ *.json PROJECT_STATUS.md
git commit -m "Add N new dictionary entries"
git push
Recent Changes rotation: PROJECT_STATUS.md keeps only the 5 most recent change entries. When adding a new entry to the "Recent Changes" section, move the oldest one to PROJECT_STATUS-archive.md.
The build_flat.py step is critical - without it, new entries won't appear on the live site. The build uses an atomic process (builds to temp directory, then swaps) to prevent broken states if the build fails.
Never create scripts that generate entry content programmatically.
IMPORTANT: Always check if an entry already exists before creating a new one.
A word is a duplicate ONLY if BOTH the headword AND reading match exactly.
Run the duplicate check script:
python3 build/check_duplicate.py "食べる" "たべる"
If homophones are reported, check for spelling variants: The script reports entries with the same reading but different headwords. Most are genuine homophones (different words), but some may be spelling variants of the same word (e.g., a kana-only form alongside a kanji form). Before creating the entry, verify:
See the consolidate-entries skill for the full decision framework.
Batch checking (optional, to plan which candidates to work on):
python3 build/check_duplicate.py --batch "食べる:たべる" "飲む:のむ" "書く:かく"
If the word was in candidate_words.json: It will be automatically removed when you run python3 build/update_indexes.py after creating the entry.
Only create new entries for words that pass the duplicate check AND the variant check.
This prevents duplicate entries and wasted effort on entries that must later be deleted or merged.
All verb entries must include a conjugation field with the full set of conjugated forms hard-coded in the JSON. See the verb-conjugations skill for the complete specification.
After creating verb entries, run python3 build/add_conjugations.py to automatically generate and write the conjugation data. Or include the full conjugation field directly when writing the entry JSON.
All i-adjective entries must include a conjugation field with 6 conjugated forms (Present, Past, て form, Adverbial, Conditional ば, Conditional たら). See the verb-conjugations skill for the JSON structure (same format, type is "i-adjective" or "ii" for いい compounds).
After creating i-adjective entries, run python3 build/add_adjective_conjugations.py to automatically generate and write the conjugation data. Or include the full conjugation field directly when writing the entry JSON.
Na-adjectives do NOT have a conjugation field. Their conjugation is shown in the notes field instead.
See the example-sentences skill for complete guidelines on:
Every example sentence must have a sense_numbers field that links it to the definition(s) it illustrates:
"examples": [
{
"id": "00001_word_ex1",
"japanese": "...",
"english": "...",
"sense_numbers": [1]
}
]
Rules:
[1] for all examples[1, 2] formatsense_number values in definitionsThe validation script checks that all examples in multi-sense entries have valid sense_numbers.
All kanji MUST have furigana in ALL fields, including notes.
Format: {漢字|かんじ}
This applies to:
Common mistakes to avoid:
✗ WRONG: 暖簾に腕押し
✓ RIGHT: {暖簾|のれん}に{腕押|うでお}し
✗ WRONG: 安堵の息をつく
✓ RIGHT: {安堵|あんど}の{息|いき}をつく
✗ WRONG: Sometimes written as 家鴨
✓ RIGHT: Sometimes written as {家鴨|あひる}
Use compound readings for jukugo: {友達|ともだち} not {友|とも}{達|だち}
Verify before finalizing:
python3 build/verify_furigana.py <entry_id>
Every entry must include:
id: Format {5-digit-number}_{romaji} (e.g., 00396_taberu). See Romaji/ID Format below for critical rules.headword: With furigana notationreading: Hiragana only (see Reading Format below)romaji: Must match the full reading, concatenated without internal underscorespart_of_speech: Consistent terminologygloss: Brief English equivalentdefinitions: Array with sense_number, gloss, explanationexamples: 2-3 minimum, with id, Japanese, English, sense_numbers, and optional notesnotes: Usage notes, grammar patterns, common mistakes (see vocabulary-notes skill for formatting requirements)schema_version: Set to "2.0" for all new entries (top-level field, optional for existing entries)metadata: Including vocabulary_tier (always "general" for new entries), created, modified timestampsAll readings MUST be in hiragana, never katakana.
This applies to ALL entries, including:
Examples:
✓ CORRECT:
headword: "スキー"
reading: "すきー"
✓ CORRECT:
headword: "DM"
reading: "でぃーえむ"
✗ WRONG:
headword: "スキー"
reading: "スキー" ← Katakana readings cause duplicates!
Why this matters:
Note: The long vowel mark ー is acceptable in hiragana readings (e.g., すきー, すとれーじ) since there is no hiragana equivalent.
The validation script (validate.py) will report errors for entries with katakana readings.
The entry ID and romaji field must follow this format. The schema regex is: ^[0-9]{5}_[a-z]+(_[a-z]+)?$
Rules:
Correct examples:
21022_ketteisuru ← 決定する (けっていする) — suru concatenated06899_kaowodasu ← 顔を出す (かおをだす) — particles concatenated21019_shitekina ← 私的な (してきな) — na concatenated21409_moushiwakearimasen ← 申し訳ありません (もうしわけありません)Wrong examples:
21391_kasoku_suru ← splits "suru" as a second segment (use kasokusuru)21399_koe_wo_dasu ← three segments after the number (use koewodasu)21410_fushizen_na ← splits "na" as a second segment (use fushizenna)Entries MUST be placed in the correct numeric range directory.
The path follows this pattern: entries/{range}/{entry_id}.json
The range directory is determined by the numeric portion of the entry ID, rounded down to the nearest 500:
entries/00000/entries/00500/entries/01000/00396_taberu → entries/00000/00396_taberu.json00538_aruku → entries/00500/00538_aruku.json01186_mukau → entries/01000/01186_mukau.json06237_fumikiru → entries/06000/06237_fumikiru.jsonALWAYS run this command to determine the correct path before writing:
python3 build/get_entry_path.py <reading> <entry_id>
Example:
python3 build/get_entry_path.py ふみきる 06237_fumikiru
# Output: entries/06000/06237_fumikiru.json
python3 build/get_entry_path.py こうりつてき 06240_kouritsuteki
# Output: entries/06000/06240_kouritsuteki.json
The validate.py script checks for directory mismatches and will report errors.
CRITICAL: Timestamps MUST be actual current UTC time. The website converts UTC to JST (+9 hours) for display. Incorrect timestamps will show as wrong dates/times (often appearing hours or days in the future).
ALWAYS run this command to get the current UTC timestamp before writing each entry:
python3 build/get_timestamp.py
This outputs the current UTC time, e.g.: 2026-01-12T10:45:30Z
Copy this exact output into both created and modified fields (for new entries) or just modified (for revisions).
Z suffix means UTC (not local time, not JST)16:00:00Z when actual UTC is 10:00, it displays as 01:00 JST next day (wrong!)10:00:00Z when actual UTC is 10:00, it displays as 19:00 JST same day (correct!)12:00:00Z or 15:00:00Z (these are almost certainly wrong)Run python3 build/validate.py to check for:
:00:00 seconds, likely not from the script)Note: The validator allows a 24-hour grace period for timestamps to accommodate CI/CD clock drift.
All new entries must be assigned to the "general" tier.
As of January 2026, the vocabulary tier realignment is complete:
Do NOT assign new entries to basic or core tiers unless explicitly instructed by the user. The basic and core tiers have been curated to meet specific word count targets and maintain semantic group integrity.
In metadata.vocabulary_tier, always use "general":
"metadata": {
"vocabulary_tier": "general",
"created": "...",
"modified": "..."
}
All entries must have properly structured tags in metadata.tags. This enables search, filtering, and export functionality.
"metadata": {
"vocabulary_tier": "general",
"tags": {
"pos": ["noun"], // REQUIRED: Part of speech (array)
"formality": "neutral", // REQUIRED: formal/neutral/informal/vulgar
"politeness": "plain", // REQUIRED: honorific/humble/polite/plain
"semantic": ["food"] // REQUIRED: Semantic category (array)
},
"created": "...",
"modified": "..."
}
pos)Valid values: noun, verb-godan, verb-ichidan, verb-suru, verb-kuru, verb-irregular, adjective-i, adjective-na, adjective-no, adjective-taru, adverb, particle, conjunction, interjection, pronoun, counter, prefix, suffix, expression, pre-noun-adjectival, number, onomatopoeia, auxiliary
["noun", "verb-suru"]formal: Used in formal/written contexts (敬語, 硬い表現)neutral: Standard usage appropriate for most contexts (default)informal: Casual/colloquial usage (くだけた表現)vulgar: Strong/offensive language (use sparingly)honorific: 尊敬語 - Elevates the subject (いらっしゃる, おっしゃる)humble: 謙譲語 - Lowers the speaker (申す, 参る)polite: 丁寧語 - General polite forms (です/ます base forms)plain: 普通体 - Plain/dictionary forms (default for most entries)Choose the most appropriate category(ies) for the word's meaning:
Specific categories (use when applicable):
time-day-of-week, time-month, time-season, time-period, time-generalanimal-mammal, animal-bird, animal-fish, animal-insect, animal-general, plant-tree, plant-flower, plant-general, weather, geographybody-part, body-internal, family, person, occupationfood, clothing, building, transportation, tool, furniture, electronicsemotion, color, number, direction, size, quantitymovement, communication, cognition, existence, consumptiongreeting, education, work, leisureFallback categories (when no specific category fits):
general: For nouns without a specific semantic categoryaction: For verbs not fitting other action categoriesdescriptive: For adjectives and adverbsgrammatical: For particles and conjunctionsexpression: For fixed expressions and interjectionsonomatopoeia: For mimetic words"tags": {
// ... required tags above ...
"transitivity": "transitive", // For verbs: transitive/intransitive/both
"style": ["spoken"], // written/spoken/literary/archaic/slang
"domain": ["business"] // business/academic/technical/legal/medical/etc.
}
transitivity: Required for verbs - indicates if verb takes a direct objectstyle: Use when word is strongly associated with a mediumdomain: Use when word is specialized/technicalfood not general for 寿司["food", "time-period"]Before finalizing any entry, verify:
python3 build/get_entry_path.py <reading> <entry_id>)python3 build/verify_furigana.py <entry_id> shows "✓ OK"vocabulary-notes skill)python3 build/validate.py to catch any directory or other errors