Archivo del skill

Corpus Generator

Name: Corpus Generator
Author: iamthetonyb

Generate training data for model distillation by running prompts from the prompt bank through a teacher model and saving responses in ChatML format.

iamthetonyb0 estrellas2 abr 2026

Contenido de la habilidad

Generate high-quality training data for ABLE model distillation.

Purpose

Drive active corpus generation sessions using the prompt bank to produce teacher-model response pairs for fine-tuning Qwen 3.5 local models.

L3 (Act) — Writes training data files to disk.

Name	Type	Required	Description
domain	string	no	Filter prompts by domain (coding, security, reasoning, creative, agentic)
difficulty	string

Skills relacionados

Corpus Generator | Skills Pool

Error	Response
Empty prompt bank	Report count, suggest adding prompts
Model refuses prompt	Skip, log refusal, continue with next
Low quality response	Flag for review, include but mark as needs_review
Disk write failure	Buffer in memory, retry, alert operator

Name	Type	Description
pairs	list	Training pairs in ChatML format
stats	dict	Count by domain, difficulty, quality score distribution
output_path	string	Path to generated JSONL file