Generates synthetic instruction-response training data using Self-Instruct, Evol-Instruct, and distillation from strong models via OpenAI/Anthropic APIs or vLLM batch inference. Use when creating instruction-tuning datasets, evolving seed tasks for complexity, filtering for quality and diversity, or generating domain-specific training data (code, math, reasoning). Do not use for human annotation pipeline design.
Generate high-quality synthetic instruction-response pairs at scale for fine-tuning LLMs. Covers Self-Instruct (bootstrapping from seed tasks), Evol-Instruct (iteratively evolving complexity), and distillation-based generation (using strong model outputs as training data). Includes quality filtering, deduplication, and formatting for training pipelines.
Use this skill when:
instruction-tuningreward-modeling or safety-alignmentdata-cleaning-labelinginstruction, input (optional), and output. Cover task types (QA, summarization, code, math), complexity levels (factual → multi-step reasoning), and target domains.from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": instruction}],
temperature=0.7,
max_tokens=2048,
)
For batch generation at scale, use vLLM offline inference:
from vllm import LLM, SamplingParams
llm = LLM(model="meta-llama/Llama-3.1-70B-Instruct")
params = SamplingParams(temperature=0.7, max_tokens=2048)
outputs = llm.generate(prompts, params)
{"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "<instruction>"},
{"role": "assistant", "content": "<response>"}
]}
Save as JSONL. Include metadata: generation model, temperature, timestamp, and quality scores.Dataset Manifest — total examples, category distribution, generation method per subset, and quality filter pass ratesSeed Task Documentation — the seed examples used, diversity analysis, and coverage gapsQuality Report — deduplication rate, coherence filter rejection rate, sample quality audit resultsTraining-Ready Files — JSONL formatted data with chat template applied, split into train/validationRead these only when relevant:
instruction-tuning — fine-tuning on the generated datadata-cleaning-labeling — cleaning and annotating raw human dataeval-dataset-design — creating evaluation sets (not training data)reward-modeling — generating preference pairs for RLHF