Generate training data for model distillation by running prompts from the prompt bank through a teacher model and saving responses in ChatML format.
Generate high-quality training data for ABLE model distillation.
Drive active corpus generation sessions using the prompt bank to produce teacher-model response pairs for fine-tuning Qwen 3.5 local models.
L3 (Act) — Writes training data files to disk.
| Name | Type | Required | Description |
|---|---|---|---|
| domain | string | no | Filter prompts by domain (coding, security, reasoning, creative, agentic) |
| difficulty | string |
| no |
| Filter by difficulty (easy, medium, hard) |
| count | int | no | Number of prompts to run (default: 10) |
| from_failures | bool | no | Generate prompts from known failure patterns instead |
| Name | Type | Description |
|---|---|---|
| pairs | list | Training pairs in ChatML format |
| stats | dict | Count by domain, difficulty, quality score distribution |
| output_path | string | Path to generated JSONL file |
/generate-corpus --domain coding --count 25 -- Generate 25 coding prompts/generate-corpus --domain security --difficulty hard --count 10 -- Hard security prompts/generate-corpus --from-failures --count 15 -- Generate from known failure patterns/generate-corpus --status -- Show corpus statisticsable/core/distillation/prompt_bank.py)source="corpus_generator", teacher_model from current sessiondata/distillation_corpus.jsonl)| Error | Response |
|---|---|
| Empty prompt bank | Report count, suggest adding prompts |
| Model refuses prompt | Skip, log refusal, continue with next |
| Low quality response | Flag for review, include but mark as needs_review |
| Disk write failure | Buffer in memory, retry, alert operator |
able/core/distillation/prompt_bank_data/data/distillation_*.jsonl/generate-corpus --status