Model-specific operator skill for Qwen3-TTS fine-tuning on Hemma and Colab. Use when the task is specifically about Qwen TTS training, Swedish language expansion with Qwen, Qwen preprocessing or runtime policy, or deciding whether a fine-tuned Qwen model should enter the Sir Convert-a-Lot sidecar candidate lane.
Qwen/Qwen3-TTS-12Hz-1.7B-Base.speech-model-finetuning-on-hemma skill.Use this skill together with the broader local skill:
.codex/skills/speech-model-finetuning-on-hemma/SKILL.md
.codex/skills/sir-convert-a-lot-colab-hemma/SKILL.md
docs/runbooks/runbook-qwen3-swedish-finetuning-on-hemma-and-colab.md
docs/runbooks/runbook-hemma-devops-and-gpu.md
docs/backlog/epics/epic-08-qwen3-tts-swedish-language-expansion-fine-tuning-on-hemma-and-colab.md
docs/backlog/stories/story-24-swedish-multi-speaker-corpus-preprocessing-and-evaluation-for-qwen3-tts.md
docs/backlog/tasks/task-116-expand-rixvox-staging-and-run-a-sustained-detached-row-processing-window-for-the-bounded-hemma-pilot.md
docs/backlog/stories/story-25-containerized-qwen3-tts-swedish-full-finetune-baseline-on-hemma-and-colab.md
docs/backlog/tasks/task-141-define-frozen-qwen-pilot-dataset-use-for-finetuning.md
docs/backlog/tasks/task-142-materialize-frozen-qwen-pilot-training-bundle-for-task-101.md
docs/backlog/stories/story-32-consolidate-qwen-experiment-governance-and-surface-taxonomy.md
.codex/rules/096-qwen-experiment-governance.md
docs/decisions/0006-hemma-sidecar-tts-architecture-and-non-pdf-gpu-governance.md
docs/decisions/0007-reusable-multi-backend-tts-sidecar-capability-contract.md
Upstream truth to verify before major claims or runtime changes:
Before proposing anything, classify the request into one of these lanes:
Benchmark lane
Single-speaker adaptation lane
Language-expansion lane
If the user says "general Swedish support," always choose lane 3 unless they explicitly narrow the scope.
1.7B base model.pdm run task-101-pilot-bundle buildsft_12hz.py is still train-only and does not perform in-training
evaluation.500/100/3 posture:
durable checkpoint every 500 optimizer steps, held-out eval every 100
steps, retain newest 3 durable trainer-state checkpoints.--pilot-bundle-root override, do not assume the
saved intra-epoch cursor is still meaningful; treat any impossible cursor as
a fail-closed condition, not a warning.500/100/3 scheduled posture.status -> diagnose-non-finite -> fix -> bounded retry.provenance, mechanism, or recoveryqwen-t221-historical-control: provenanceqwen-story31-stability-lab: mechanismqwen-train launch/status fresh-start proof lane:
recovery, blocked until promotionqwen-story30-freshstart-proof and
qwen-story30-backward-lineage: legacy-readonlyqwen-t197-proof and qwen-t198-proof: deprecated for new workT221 is now resolved as negative recreated-control evidence:
the recreated original-recipe shape plus only the T206 token-span fix
still fails immediately under the current trainer/runtimeT225 is complete as the exact parity contractT226 is now complete as the committed local parity-probe surface:
pdm run qwen-story31-parity-probe runtask226-20260317t224307Z found no meaningful checkpoint divergence
between the current and intended pathsT219 is now recorded as negative bounded evidence under
task219-20260317t180700z-a1T228 is now complete as the ranked closure of that familyT229 is now complete as the narrowed rerun under
task229-20260318t064712z-a1sub_talker_loss family localizes to
talker_core.layer_16.input_layernormT230 is now complete as the negative bounded normalization-entry rerun
under task230-20260318t082049z-a1T231 is now complete as the explicit no-winner promotion decisionT232 is now complete as the lane decision to stay in mechanismT233 is now complete as the normalization-internal rerun under
task233-20260318t112544z-a1talker_core.layer_16.input_layernorm.outputT234 is now complete under task234-20260318t123644z-a10p5 member
shifted the pair and line-13 sub_talker_loss cases to
talker_core.layer_15.output, while line-4 still first broke at
talker_core.layer_16.input_layernormT235 is now complete under task235-20260318t140352z-a1sub_talker_loss result is repeatable: pair and line-13
stay at talker_core.layer_15.output, while line-4 stays at
talker_core.layer_16.input_layernormT236 is now complete under T187-T191 is the permanent anti-god-file architecture lane for
the Qwen training control plane and is now delivered. Keep new host-side
logic in ml/qwen/training/control_plane/, detached launch logic in
ml/qwen/training/detached_runtime/, reporting logic in
ml/qwen/training/reporting/, and patched runtime logic in the bounded
sft_12hz_* runtime modules. orchestrator.py and reporting.py are gone
and must not be reintroduced.status.json or report.json artifacts unless they
clearly belong to the active resumed container./srv/scratch for Docker root, HF/model caches, and hot generated
preprocessing/training artifacts/srv/storage for raw Swedish corpora and colder retained datasets/srv/scratch/sir-convert-a-lot/{build,cache} remains the canonical SSD
storage truth/home/paunchygent/.data/sir-convert-a-lot/{build,cache} is the normal
Docker-visible bind source under snap Dockerpdm run run-hemma -- pdm run qwen-docker-bind-roots statuspdm run run-hemma -- pdm run qwen-docker-bind-roots probeT242The broader Hemma speech-model skill covers the generic training workflow. This Qwen skill adds the model-family-specific decisions:
1.7B base modelsft_12hz.py must be patched to preserve the speaker_encoder and tts_model_type="base" to avoid collapsing into a single-speaker state.dataset.py to parse multiple speakers, build a spk_id_map, and carry a dataset-scoped speaker_id through the manifest and batch surfaces. In the current base-model path, this is metadata for governance, eval, and optional future speaker-bank export, not the primary conditioning signal.Treat Swedish data as three different roles:
KBLab/rixvox
google/fleurs Swedish
KTH/waxholm
Never treat "available Swedish data" as a single undifferentiated pool.
When planning the corpus, always answer:
For the current pilot lane, also answer:
ref_audio anchors materialized inside the
bundle?After the broader speech-model skill has set the runtime/data/eval frame, apply this Qwen-specific order:
Qwen/Qwen3-TTS-12Hz-1.7B-Basespeaker_id).speaker_id note: track it for metadata, splits, and optional future speaker-bank export; current conditioning still comes from ref_audio -> ref_mel -> speaker_encoder.summary surface for median/min/max host CPU, host RAM, GPU busy,
and VRAM evidenceWatch for these specifically:
rixvox is too noisy20s clip target as a hard upstream Qwen
rule instead of a conservative repo heuristic that must be checked against
live runtime and duration evidencejournald alone is historical GPU monitoring when no periodic GPU
sampler is actually writing to the journalA fine-tuned Qwen model does not become a production candidate just because it trained successfully.
Before recommending it as a sidecar candidate, require:
task236-20260318t145434z-a1line-13
stay at talker_core.layer_15.output, while line-4 stays at
talker_core.layer_16.input_layernorm.outputT237 is now complete under task237-20260318t154708z-a11e3 fp32-output-cap winner converged pair, line-13, and line-4
sub_talker_loss to talker_core.layer_15.outputT240 is now complete under task240-20260318t165458z-a1sub_talker_loss rows first broke at
talker_core.layer_15.output, so the convergence class is
converged_layer15_outputT241 is now complete under task241-20260318t175714z-a1sub_talker_loss rows still first broke at
talker_core.layer_15.output, so the classification is
converged_layer15_output_residualT242 is now complete as the permanent Hemma bind-root contract:
the repo-rendered service is installed and active, status now proves the
home roots are mounted onto the canonical /srv/scratch trees, and
probe confirms Docker must use
/home/paunchygent/.data/sir-convert-a-lot/{build,cache} as the
effective bind rootsT243 is now complete under task243-20260318t190832z-a1sub_talker_loss rows first broke at
talker_core.layer_15.output, so the classification is
converged_layer15_output_returnT244 is now complete under task244-20260318t193736z-a1sub_talker_loss rows still first broke at
talker_core.layer_15.output, so the classification is
converged_output_returnT245 is now complete under task245-20260318t202916z-a1sub_talker_loss rows still first broke at
talker_core.layer_15.output, so the classification is
multiply_not_causalT246 is now the immediate diagnosis-only mechanism sliceT246 must split the fp32-scaled layer-15 output result from the final
emitted tensor before any new stabilizer family is consideredT227 is contingent only if a later verified trainer/runtime divergence
appearsT217 remains the blocked recovery lane until a mechanism candidate
passes the local promotion gate