Convert uploaded music into canonical, arrangement-ready structured JSON for downstream guitar workflows. Use when the task should extract stable song structure from audio or mixed song-source inputs before chord-sheet generation, harmonic simplification, capo suggestion, lyric alignment, browser editing, rendering, export, or any other downstream arrangement step that should rely on an inspectable intermediate representation instead of jumping directly from raw music to final sheet output.
Convert messy song inputs into a stable JSON contract that downstream guitar-arrangement tools can inspect, edit, diff, and reuse. Treat this as the structural analysis layer, not the final arranging or rendering layer.
Identify what kind of source was provided:
Capture source provenance first. Record what was directly observed, what was inferred, and what remains uncertain.
source metadata so later systems can track where the analysis came from.analysis, including tempo, meter, key, groove, and pickup behavior.sections, using conservative boundaries when the evidence is weak.harmony, including held spans, slash chords, alternates, and ambiguity notes.phrasesarrangement_signals without collapsing this stage into full arranging.quality information that makes gaps, conflicts, and review needs explicit.Prefer editable structure over presentation. Emit machine-usable timelines and normalized labels, not renderer-specific output.
Do not skip directly to a final score, tabs, or chord-sheet rendering. Those are downstream products built on top of this structured layer.
Do not hide uncertainty. If a field is unclear, keep the leading estimate and attach confidence, alternates, or review flags.
Keep naming and units stable across runs so later tools can diff the output.
Record provenance for the major output domains. Downstream tools should be able to tell whether a signal came from provided metadata, direct audio measurement, decoded audio, or a coarse heuristic layer.
If plain-text lyrics are provided, attach them as coarse structural hooks rather than pretending they are fully time-aligned. Prefer phrase-level anchors when phrase objects exist, otherwise fall back to section-level anchors.
This skill should answer:
This skill should not decide: