Name: Pos Disambiguator
Author: DigitalHumanitiesCraft

Pos Disambiguator | Skills Pool

#	Error	Wrong Tag	Correct Tag	Rule Reference
1	niht, nit, nich, ne, en etc. tagged as pronoun	PRO	NEG (always!)	See "MHG Negation Patterns" below
2	sant before proper names tagged as adjective	ADJ	NAM	See "sant: Always NAM" below
3	Deictic daz (pointing to prior content) tagged as pronoun	PRO	DET	See "DET vs PRO vs SCNJ" below
4	kein/dekein/dehein before noun tagged as pronoun	PRO	DET	See "kein, dekein, dehein" below
5	wâr in vür wâr tagged as noun	NOM	ADV	See "Fixed Phrases" below

Indicator	Action
Non-normalized spelling	Slow down, verify context
Complex syntax (hypotaxis)	Analyze full clause structure
Literary/poetic texts	Consider stylistic variations
Religious/philosophical texts	Check specialized vocabulary
Fragmentary context	Assign best guess with `confidence='low'`

Tag	Name	Examples
NOM	Nomen (Noun)	acker, zît, minne
NAM	Name (Proper noun)	Uolrîch, Wiene, Rhîn, sant (before names)
ADJ	Adjektiv (Adjective)	grôz, schoene, guot, wâr
ADV	Adverb	schone, vil, sêre, gar, als (komparativ), wie (komparativ)
DET	Determinante (Determiner)	der, diu, daz, ein, eine, diser, jener, kein, dekein, dehein
POS	Possessivpronomen	mîn, dîn, unser
PRO	Pronomen (Pronoun)	ich, ez, wir, Relativpronomen, swer (indefinit)
PRP	Präposition (Preposition)	ûf, zuo, under, durch
NEG	Negation	nie, niht, nit, nich, nieht, niet, niut, nyt, ne, en, âne
NUM	Numeral	zwô, drî, zweinzegest
CNJ	Konjunktion (general)	danne (additiv: er sanc, danne si spilten)
SCNJ	Subordinierende Konj.	daz (clause), ob, swenne, sît, als (temporal), wie (subordinierend)
CCNJ	Koordinierende Konj.	und, oder, aber, ouch, noch
IPA	Interrogativpartikel	wie (interrogativ), war (wohin?), swer (interrogativ)
VRB	Verb (Full verb)	liuhten, varn, machen, haben/sîn/werden (lexikalisch)
VEX	Hilfsverb (Auxiliary)	haben/sîn/werden (mit Partizip II)
VEM	Modalverb (Modal verb)	müezen, suln, kunnen
INJ	Interjektion	ahî, owê
DIG	Zahl (Roman numeral)	IX, XVII, III

Word	Tag	Reasoning
ne / en / n	NEG	Negation particle (often proclitic on verb)
niht	NEG	Negation particle (sentence negation) - NEVER PRO!
nit, nich, nieht	NEG	Variant spellings - NEVER PRO!
vil	ADV	Intensifier, remains adverbial even in negation context
ensanc	VRB	Full verb (the en- is fused NEG, but verb stays VRB)

Context	Tag	Example
Temporal/causal subordination	SCNJ	als er kam (when he came)
Comparative (Vergleichspartikel)	ADV	grœzer als ein man (larger than a man)
Subordinating comparison	SCNJ	als ob er slâfe (as if he slept)
Direct question	IPA	wie tuost du daz? (how do you do that?)
Comparative (Vergleichspartikel)	ADV	schoener wie er (more beautiful than he)
Subordinating (indirect)	SCNJ	ich weiz wie er daz tet (I know how he did that)
Ambiguous/unclear	CNJ	fallback when context insufficient

Meaning	Tag	Example
"wohin" (interrogative)	IPA	war gât er? (where is he going?)
"wahr" (true)	ADJ	diu war rede (the true speech)
"woher/wo" (locative)	ADV	war kom er her? (where did he come from?)
Form of sîn/wesen (full verb)	VRB	er war dort (he was there)
Form of sîn/wesen (auxiliary)	VEX	er war komen (he had come)

xml_id | old_pos → new_pos | confidence | reason

xml_id | old_pos → new_pos | confidence | reason | reason="value"

ABS_11010_0 | PRO VEM → VEM | high | modal verb wilt in contraction
ABS_11010_1 | DET NUM → DET | high | indefinite article before noun
ABS_12010_15 | VRB VEX → VEX | high | auxiliary haben with participle gesehen
ABS_11020_7 | PRP CNJ → PRP | high | preposition ze governing noun

ABS_14040_5 | PRO VRB → VRB PRO | high | enclitic contraction | reason="färbe+ez"

ABS_11010_7 |  → DET | high | indefinite article ainen

ABS_15030_2 | ADJ → NOM | high | substantivized adjective, no following noun

Compound	Resolution	Reasoning
`DET NUM`	Usually `DET`	ein as indefinite article, not numeral
`ADJ ADV`	Context	Modifies noun → ADJ; modifies verb → ADV
`NOM ADJ`	Context	Substantivized → NOM; attributive → ADJ
`DET CNJ`	Context	daz is either determiner OR conjunction, not both
`DET PRO`	Context	Attribuierend → DET; substituierend → PRO
`VRB VEX`	Context	With Partizip II → VEX; lexical meaning → VRB
`ADV NEG`	Usually `NEG`	niht, nie negating → NEG

Pattern	Tag	Example
With Partizip II (Perfect)	VEX	hât gesehen, ist komen
With Partizip II (Passive)	VEX	wirt geslagen
Copula + NP/ADJ (no Partizip)	VRB	ist guot, ist ein man
Possession/lexical meaning	VRB	hân ein hûs
Main action verb	VRB	er sach
After modal	VRB	mac sehen

Context	Analysis	Tag
daz kumet von abegescheidenheit	Points to prior content, main clause verb	DET
unum est necessarium, daz ist...	Points to Latin quote, main clause	DET
daz ist wâr	Points to prior statement	DET

python --version          # Verify Python 3.13+
pip install lxml          # Install if needed

Text Type	Difficulty	Processing Strategy
Cookbooks, practical texts	LOW	Standard processing
Early NHG tendency, normalized	LOW	Standard processing
Literary prose	MEDIUM	Check more context
Religious/philosophical	HIGH	Slow, careful analysis
Complex poetry (Minnesang)	HIGH	Full clause analysis
Non-normalized, archaic MHG	VERY HIGH	Maximum scrutiny, but ALWAYS assign a tag (use 'low' confidence if unsure)

# Dry run first to see what would be fixed
python .gemini/skills/pos-disambiguator/scripts/find-and-fix-malformed-results.py temp/disambiguation --dry-run

# Apply fixes
python .gemini/skills/pos-disambiguator/scripts/find-and-fix-malformed-results.py temp/disambiguation

python .gemini/skills/pos-disambiguator/scripts/merge-pos-validation-results.py temp/disambiguation {SIGLE} tei/{SIGLE}.xml

python .gemini/skills/pos-disambiguator/scripts/validate-disambiguation.py

Detect Missing Decisions: Run the detection script to identify which chunks have unresolved items (skipped decisions):
```
python .gemini/skills/pos-disambiguator/scripts/find-missing-decisions.py temp/disambiguation {SIGLE}
```
This will list chunks sorted by the number of missing decisions.
Batch Fix (Top Offenders): Prioritize the chunks with the highest missing counts. For each target chunk:
- Prepare Fix Task: Run the preparation script to extract the context and the specific missing items:
```
python .gemini/skills/pos-disambiguator/scripts/prepare-fix-task.py temp/disambiguation/{SIGLE}-chunk-{NUM}.md
```
- Generate Fix: Use the output to create a FIX file {SIGLE}-chunk-{NUM}-result_FIX-01.md containing ALL missing decisions.
- Format: Same as standard results (xml_id | old_pos → new_pos | confidence | reason).

Re-Merge:

python .gemini/skills/pos-disambiguator/scripts/merge-pos-validation-results.py temp/disambiguation {SIGLE} tei/{SIGLE}.xml

The script uses "Last-Write-Wins", so your new FIX files will automatically overwrite missing or incorrect entries.

python .gemini/skills/pos-disambiguator/scripts/split-tei-for-pos-validation.py tei/{SIGLE}.xml

python .gemini/skills/pos-disambiguator/scripts/find-and-fix-malformed-results.py temp/disambiguation --dry-run
python .gemini/skills/pos-disambiguator/scripts/find-and-fix-malformed-results.py temp/disambiguation

python .gemini/skills/pos-disambiguator/scripts/merge-pos-validation-results.py temp/disambiguation {SIGLE} tei/{SIGLE}.xml

python .gemini/skills/pos-disambiguator/scripts/validate-disambiguation.py

python .gemini/skills/pos-disambiguator/scripts/find-missing-decisions.py temp/disambiguation {SIGLE}

python .gemini/skills/pos-disambiguator/scripts/prepare-fix-task.py temp/disambiguation/{SIGLE}-chunk-{NUM}.md

✓ {SIGLE}.tei COMPLETE
  - Chunks processed: X/X
  - Words validated: N
  - Changes made: M
  - Refinement iterations: N/3
  - Validation: CLEAN

⚠️ {SIGLE}.tei INCOMPLETE (after 3 refinement attempts)
  - Remaining errors: X compound tags, Y empty tags
  - Failure report: temp/disambiguation/{SIGLE}-FAILURE-REPORT.md

Function	Tag	Examples
Attribuierend (modifies noun)	DET	der man, diu frouwe, ein hûs, diser tac
Substituierend (replaces noun)	PRO	der (= he/that one), daz (= that), swer (whoever)

Sequence	Tags	Note
sant Paulus	NAM + NAM	Onomastic unit
sant Johans	NAM + NAM	Onomastic unit
sant Marîe	NAM + NAM	Onomastic unit

Phrase	Meaning	Tag for adjective
vür wâr	"truly, verily"	wâr = ADV
ze wâre	"truly"	wâre = ADV

Pattern	Tag
DET + X + noun	ADJ (attributive)
DET + X (no noun)	NOM (substantivized)
After copula	ADJ (predicative)

Pos Disambiguator

Middle High German PoS Disambiguator Workflow

Your Primary Goal: Semantic Analysis

Pos Disambiguator

Middle High German PoS Disambiguator Workflow

Your Primary Goal: Semantic Analysis

Forbidden Actions (Critical!)

Known Error Patterns (Critical Watch List!)

Valid PoS Tags (19 Tags)

Important Distinctions

DET vs PRO (Functional Distinction)

POS as Separate Class

sant: Always NAM (before proper names)

kein, dekein, dehein: DET (when modifying noun)

swer: PRO vs IPA

vil, sêre, gar: Always ADV

Fixed Phrases: vür wâr, ze wâre, etc.

MHG Negation Patterns (CRITICAL - Common Error Source!)

als, wie: Context-Dependent

war: Highly Variable Surface Form

haben, sîn, werden: VRB vs VEX

Output Format

Output ONLY changes - skip unchanged tags

Standard Format (one line per changed word):

For Compound POS Exceptions (add reason attribute):

Examples

When to Keep Compound POS Tags

DEFAULT BEHAVIOR: Resolve to SINGLE POS tag

EXCEPTION: Keep TWO tags only for morphological fusions

NOT Exceptions (always resolve to single):

Disambiguation Guidelines

CNJ vs SCNJ vs CCNJ

VRB vs VEX (Verb vs Auxiliary)

DET vs PRO vs SCNJ (daz, der, etc.)

NOM vs ADJ

Confidence Levels

Worked Examples

Workflow Phases

Phase 0: Environment Setup (once per session)

Phase 1: Discovery

Phase 2: Processing (Linguistic Analysis)

Phase 2.5: Fix Malformed Output

Phase 3: Merge Results

Phase 4: Validation

Phase 5: Refinement (Batch Strategy)

Script Reference

split-tei-for-pos-validation.py

find-and-fix-malformed-results.py

merge-pos-validation-results.py

validate-disambiguation.py

find-missing-decisions.py

prepare-fix-task.py

Progress Reporting

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing

For Compound POS Exceptions (add `reason` attribute):