Create meaning-based alignments between Hebrew source and English UST text. Handles radical restructuring and implied information. Use when asked to align UST or produce aligned UST USFM.
This skill maps English UST phrases to Hebrew source words. The workflow is two-step:
UST alignment is fundamentally different from ULT alignment. Where ULT aims for word-level precision, UST shows which Hebrew word(s) contribute meaning to each English phrase. The mapping is concept-to-phrase, not word-to-word.
You need:
data/hebrew_bible/*.usfm - contains Strong's numbers, lemmas, morphology--ust <path> is provided, use that file. Otherwise look in output/AI-UST/ or use user-provided text. When the orchestrator provides an explicit path, that is the authoritative source — do not fetch from Door43.--verses START-END (optional): process only this verse range (e.g. ). When provided:
--verses 12-22BOOK-CH-vSTART-vEND-aligned.usfm (e.g. LAM-04-v12-v22-aligned.usfm)\id, \usfm, \ide, \h, \mt, \c) followed by only the verses in rangeIf output/AI-UST/hints/<BOOK>-<CH>.json exists, read it first. These hints are a rough mapping from the UST generator showing which Hebrew words contributed to each English phrase. The generator wrote them while the translation decisions were fresh, so they capture the "why" behind each phrase.
Use hints as your starting point:
"implied": true correspond to bracketed content with hebrew_indices: []The hints give you the translator's intent. You provide the precision.
Same JSON structure as ULT alignment:
{
"reference": "PSA 1:1",
"hebrew_words": [
{"index": 0, "word": "אַ֥שְֽׁרֵי", "strong": "H0835", "lemma": "אֶשֶׁר"},
{"index": 1, "word": "הָאִ֗ישׁ", "strong": "d:H0376", "lemma": "אִישׁ"},
{"index": 2, "word": "אֲשֶׁ֤ר", "strong": "H0834a", "lemma": "אֲשֶׁר"},
{"index": 3, "word": "לֹ֥א", "strong": "H3808", "lemma": "לֹא"},
{"index": 4, "word": "הָלַךְ֮", "strong": "H1980", "lemma": "הָלַךְ"},
{"index": 5, "word": "בַּעֲצַ֪ת", "strong": "b:H6098", "lemma": "עֵצָה"},
{"index": 6, "word": "רְשָׁ֫עִ֥ים", "strong": "H7563", "lemma": "רָשָׁע"}
],
"english_text": "The person who will have a truly good life is the person who does not do what evil people tell him to do,",
"alignments": [
{"hebrew_indices": [1], "english": ["The", "person", "who", "will", "have", "a", "truly", "good", "life"]},
{"hebrew_indices": [0], "english": ["is", "the", "person"]},
{"hebrew_indices": [2], "english": ["who"]},
{"hebrew_indices": [3, 4, 5, 6], "english": ["does", "not", "do", "what", "evil", "people", "tell", "him", "to", "do,"]}
]
}
| Aspect | ULT | UST |
|---|---|---|
| English arrays | 1-3 words typical | 5-15 words common |
| Hebrew indices | Usually 1 per entry | Often 3-5 per entry |
| Split alignment | Rare | Common (same Hebrew index in multiple entries) |
{brackets} | Grammar additions | Implied information |
hebrew_indices: [] | Not used | Used for purely implied info |
| Alignment granularity | Word-level | Phrase-level |
| Field | Description |
|---|---|
reference | Book chapter:verse (e.g., "PSA 1:1") |
hebrew_words | Array of Hebrew words with index, word form, Strong's, lemma |
english_text | Complete English UST translation - authoritative word order |
alignments | Array mapping Hebrew indices to English word arrays |
d_text | (Optional) Superscription text for \d line |
hebrew_indices: Array of Hebrew word indices (0-based)
[3, 4, 5, 6] for meaning groups (very common in UST)[] for purely implied information with no Hebrew correspondentenglish: Array of English words that render this Hebrew meaning
{word} for implied informationsection: (Optional) Set to "d" for superscription entriesThere are two cases depending on how the UST source structures the superscription:
When the UST source has the superscription inside \v 1 (with an empty \d marker):
\d
\v 1 A song of ascents.
\q1 Yahweh, remember David
Do NOT use d_text or section: "d". Include all words (superscription + body) in english_text and use normal alignment entries:
{
"reference": "PSA 132:1",
"hebrew_words": [
{"index": 0, "word": "...", "strong": "H7892a", "lemma": "שִׁיר"},
{"index": 1, "word": "...", "strong": "d:H4609b", "lemma": "מַעֲלָה"},
{"index": 2, "word": "...", "strong": "H2142", "lemma": "זָכַר"},
{"index": 3, "word": "...", "strong": "H3068", "lemma": "יְהֹוָה"}
],
"english_text": "A song of ascents. Yahweh, remember David and all of the difficulties that he had.",
"alignments": [
{"hebrew_indices": [0], "english": ["A", "song"]},
{"hebrew_indices": [1], "english": ["of", "ascents."]},
{"hebrew_indices": [2], "english": ["remember"]},
{"hebrew_indices": [3], "english": ["Yahweh"]}
]
}
The script picks up the empty \d and \q1/\q2 markers from the UST source file automatically.
\d line, separate from verse 1 (d_text and section)When the UST source has the superscription text on the \d line (Hebrew v1 is only superscription, body starts at Hebrew v2 mapped to English v1):
\d This is for the chief musician. It is a psalm of David.
\v 1 All the people on the earth, shout joyfully to God!
Use the d_text field and "section": "d" on relevant alignment entries:
{
"reference": "PSA 66:1",
"hebrew_words": [
{"index": 0, "word": "...", "strong": "l:H5329", "lemma": "נָצַח"},
{"index": 1, "word": "...", "strong": "H7892a", "lemma": "שִׁיר"},
{"index": 2, "word": "...", "strong": "H4210", "lemma": "מִזְמוֹר"},
{"index": 3, "word": "...", "strong": "H7321", "lemma": "רוּעַ"},
{"index": 4, "word": "...", "strong": "l:H0430", "lemma": "אֱלֹהִים"},
{"index": 5, "word": "...", "strong": "H3605", "lemma": "כֹּל"},
{"index": 6, "word": "...", "strong": "d:H0776", "lemma": "אֶרֶץ"}
],
"d_text": "This is for the chief musician. It is a psalm of David.",
"english_text": "All the people on the earth, shout joyfully to God!",
"alignments": [
{"hebrew_indices": [0], "english": ["for", "the", "chief", "musician."], "section": "d"},
{"hebrew_indices": [1], "english": ["a"], "section": "d"},
{"hebrew_indices": [2], "english": ["a", "psalm", "of", "David."], "section": "d"},
{"hebrew_indices": [3], "english": ["shout", "joyfully"]},
{"hebrew_indices": [4], "english": ["to", "God"]},
{"hebrew_indices": [5], "english": ["All", "the"]},
{"hebrew_indices": [6], "english": ["people", "on", "the", "earth"]}
]
}
english_text Controls Output Word OrderThe english_text field is authoritative for the final output word order, just as with ULT. Alignment array order doesn't matter.
Brackets in UST mark implied information -- content not directly present in the Hebrew but added for clarity. There are two categories:
Implied info connected to Hebrew: {word} + hebrew_indices: [N]
{when Yahweh} linked to the Hebrew word for "judgment"Purely implied info: {word} + hebrew_indices: []
{It seems like} with no Hebrew sourceBrackets are pre-determined by the UST-gen skill. The alignment skill preserves them; it does not add or remove brackets.
Group Hebrew words that collectively produce a single English phrase. "Smaller" in UST context means phrase-level, not word-level.
Typical UST alignment:
{"hebrew_indices": [3, 4, 5, 6], "english": ["does", "not", "do", "what", "evil", "people", "tell", "him", "to", "do,"]}
This groups a negation + verb + preposition-noun + adjective because the UST restructures them into a single clause.
Only merge Hebrew words when the meaning genuinely cannot be divided:
Good -- separated when meaning allows:
{"hebrew_indices": [1], "english": ["wicked", "people"]},
{"hebrew_indices": [2], "english": ["are", "like"]}
Avoid -- unnecessarily large groups:
{"hebrew_indices": [1, 2, 3, 4], "english": ["wicked", "people", "are", "like", "chaff", "..."]}
Same Hebrew index in multiple entries when meaning maps to non-contiguous English. Very common in UST:
{"hebrew_indices": [5], "english": ["in", "what"]},
{"hebrew_indices": [6], "english": ["Yahweh"]},
{"hebrew_indices": [5], "english": ["teaches"]}
Here Hebrew בְּתוֹרַ֥ת (in-law-of) splits to "in what...teaches" because UST restructures "in the law of Yahweh" to "in what Yahweh teaches."
Brackets from UST-gen are preserved:
{word} + hebrew_indices: [N] = implied info connected to specific Hebrew{word} + hebrew_indices: [] = purely implied, no Hebrew correspondentNot every Hebrew index must appear. Some Hebrew words may be unaligned when:
The validation script in --ust mode allows unaligned Hebrew indices.
Read the Hebrew USFM for the verse:
grep -A 20 "\\\\v 1$" data/hebrew_bible/19-PSA.usfm | head -20
Parse each \w tag to extract word form, Strong's number, lemma, and morphology.
Read the UST text, noting any {bracketed} content. Each bracketed word will need hebrew_indices (either a specific index or []).
Map proper nouns first -- these are direct 1:1 mappings:
Find the English verb or clause that renders each Hebrew verb. In UST this often involves significant restructuring.
Find English noun phrases that render Hebrew nouns. In UST, a single Hebrew noun may expand to a full clause.
Prepositions, particles, conjunctions. These often get absorbed into larger phrase groups.
Remaining {bracketed} English words that don't map to any Hebrew get hebrew_indices: [].
Every English word must appear in exactly one alignment entry. Run validation:
Use mcp__workspace-tools__validate_alignment_json with files set to the array of alignment JSON paths and ust=true.
Write to scratchpad or output directory:
/tmp/claude-*/scratchpad/alignments/PSA-001-001.json
After creating the mapping JSON, convert it to aligned USFM using the conversion tool. This step is mandatory -- never write aligned USFM directly, as manual occurrence counting is error-prone.
Critical:
sourceMUST point to the UST file (output/AI-UST/BOOK/...). Never pass the ULT file here. If you pass the ULT file, the output will silently contain ULT English text and look structurally valid -- the error will not be obvious.
Option A -- MCP tool (preferred, works without Bash):
mcp__workspace-tools__create_aligned_usfm({
hebrew: "data/hebrew_bible/19-PSA.usfm",
mapping: "tmp/alignments/PSA-001-001.json",
source: "output/AI-UST/PSA/PSA-001.usfm",
ust: true,
chapter: 1, verse: 1
})
Option B -- Bash (when available):
node .claude/skills/utilities/scripts/usfm/create_aligned_usfm.js \
--hebrew data/hebrew_bible/19-PSA.usfm \
--mapping /tmp/alignments/PSA-001-001.json \
--source output/AI-UST/PSA/PSA-001.usfm \
--ust \
--chapter 1 --verse 1
The --ust flag tells the script to:
\w tags)EN_UST in the header id tagIn UST mode, brackets wrap milestone groups:
{\zaln-s |x-strong="H1234" ...\*\w when\w*
\w Yahweh\w*}\w judges\w*\zaln-e\*
vs ULT mode where brackets go inside \w tags:
\w {when}|...\w*
# Create header
cat > output/AI-UST/PSA/PSA-001-aligned.usfm << 'EOF'
\id PSA EN_UST - Aligned
\usfm 3.0
\ide UTF-8
\h Psalms
\mt Psalms
\c 1
EOF
# Append each verse
for v in $(seq 1 6); do
vpad=$(printf "%03d" $v)
node .claude/skills/utilities/scripts/usfm/create_aligned_usfm.js \
--hebrew data/hebrew_bible/19-PSA.usfm \
--mapping alignments/PSA-001-${vpad}.json \
--source output/AI-UST/PSA/PSA-001.usfm \
--ust \
--chapter 1 --verse $v 2>/dev/null | sed -n '/^\\[vqdsb]/,/^$/p' >> output/AI-UST/PSA/PSA-001-aligned.usfm
done
output/AI-UST/{BOOK}/{BOOK}-{CHAPTER}-aligned.usfm # whole chapter
output/AI-UST/{BOOK}/{BOOK}-{CHAPTER}-{START}-{END}-aligned.usfm # partial chapter
After creating the final aligned USFM, normalize quotes:
mcp__workspace-tools__curly_quotes({
input: "output/AI-UST/{BOOK}/{BOOK}-{CHAPTER}-aligned.usfm",
inPlace: true
})
Before anything else, confirm the aligned UST contains different English than the aligned ULT. If these are the same, the alignment was run against the ULT source — discard and redo.
# Extract English words from both aligned files and compare
BOOK=HOS; CH=01
diff \
<(grep -oE '\\w [^|]+\|' output/AI-UST/$BOOK/$BOOK-$CH-aligned.usfm | sed 's/\\w //;s/|//') \
<(grep -oE '\\w [^|]+\|' output/AI-ULT/$BOOK/$BOOK-$CH-aligned.usfm | sed 's/\\w //;s/|//') \
> /dev/null && echo "ERROR: UST and ULT aligned text are identical — wrong --source file was used" || echo "OK: UST and ULT text differ as expected"
Use mcp__workspace-tools__extract_ult_english with inputDir="output/AI-UST", outputDir="/tmp/verify-alignment", force=true.
Then compare extracted text with original unaligned UST: diff /tmp/verify-alignment/BOOK.usfm against output/AI-UST/BOOK/BOOK-unaligned.usfm.
In UST mode, brackets should appear OUTSIDE milestones:
{\zaln-s ...\*\w word\w*...\zaln-e\*} # correct (UST)
\zaln-s ...\*{word}\zaln-e\* # wrong (ULT style)
Quick check:
grep -oE '\\w [^|]+\|' aligned.usfm | sed 's/\\w //;s/|//' | tr '\n' ' '
Before finalizing alignment JSON:
hebrew_indices: [] used only for purely implied info (no Hebrew source)english_text contains the exact English UST text with correct word orderAfter validation, run Gemini as an independent reviewer. Only run if --gemini is explicitly passed. Skip by default.
python3 .claude/skills/utilities/scripts/gemini_review.py --stage alignment-ust --book <BOOK> --chapter <CHAPTER>
output/review/<BOOK>/<BOOK>-<CH>-alignment-ust-gemini.mdSee reference/ust_alignment_rules.md for detailed alignment rules and pattern catalog.