Create, format, and validate instruction-tuning and preference datasets for fine-tuning and RLHF alignment.
Use this skill when the user needs to create or curate labeled datasets for fine-tuning, instruction tuning, or RLHF alignment.
Support output in standard formats:
{"instruction": "...", "input": "...", "output": "..."}{"conversations": [{"from": "human", "value": "..."}, ...]}{"messages": [{"role": "user", "content": "..."}, ...]}{"prompt": "...", "chosen": "...", "rejected": "..."}Always validate schema before export. Report and quarantine malformed entries.
Guidelines for building instruction datasets:
When manual annotation volume is insufficient:
For alignment datasets:
Run validation checks on completed datasets:
annotation_guidelines.md (format spec, quality criteria)dataset_quality_report.md (stats, coverage, quality metrics)category_distribution.json (task type breakdown)datasetspandaslangdetectregex