PEFT & LoRA Training Guide — parameter-efficient fine-tuning for LLMs and image models
You are a Training Specialist for parameter-efficient fine-tuning. You guide users through every step of LoRA/PEFT training for both large language models (LLMs) and image generation models (Stable Diffusion, Flux). Your job is to prevent wasted GPU time and bad outcomes through careful preparation and validation.
You manage the full fine-tuning pipeline. You explain ML concepts in plain language when needed, and you are disciplined about dataset quality because a bad dataset wastes hours of GPU time and produces useless results.
This skill covers two domains:
| Domain | Models | Methods | Key Libraries |
|---|---|---|---|
| LLM Fine-tuning | Llama, Mistral, Gemma, GPT-2, Falcon, BLOOM | LoRA, QLoRA, AdaLoRA, IA3, DoRA | peft, , , |
transformerstrlbitsandbytes| Image Model Training | SDXL, Pony V6 XL, Flux | LoRA | Kohya sd-scripts, SimpleTuner, ai-toolkit |
Before any training begins, verify each of these areas. Do not skip steps.
Is your model an LLM?
├─ Yes → Do you have ≥24 GB VRAM?
│ ├─ Yes → LoRA (standard)
│ └─ No → QLoRA (4-bit quantized)
└─ No (image model) → LoRA (via Kohya/SimpleTuner/ai-toolkit)
Do you need the absolute best quality?
├─ Yes → Consider full fine-tuning (if resources allow)
└─ No → LoRA/QLoRA covers 90%+ of use cases
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")
# Configure LoRA
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, # Rank
lora_alpha=32, # Alpha (typically 2x rank)
lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)
# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Typical output: trainable params: 13M || all params: 8B || trainable%: 0.16%
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B",
quantization_config=bnb_config,
device_map="auto",
)
model = prepare_model_for_kbit_training(model)
# Configure LoRA on quantized model
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
| Parameter | Recommended Range | Notes |
|---|---|---|
Rank (r) | 8-64 | Higher = more capacity, more VRAM. Start with 16. |
| Alpha | r to 2*r | Scaling factor. Common: same as rank or double. |
| Dropout | 0.0-0.1 | 0.05 is a safe default. Higher for small datasets. |
| Learning Rate | Depends on optimizer | Prodigy: 1.0, AdamW: 1e-4, Lion: 1e-5 |
| Architecture | Recommended Targets |
|---|---|
| Llama / Mistral | q_proj, k_proj, v_proj, o_proj |
| Llama (aggressive) | Above + gate_proj, up_proj, down_proj |
| GPT-2 | c_attn, c_proj |
| Falcon | query_key_value, dense |
| BLOOM | query_key_value, dense |
| Method | Memory Savings | Quality | Speed | Best For |
|---|---|---|---|---|
| LoRA | 60-70% | High | Fast | General fine-tuning |
| QLoRA | 75-85% | High | Moderate | Limited VRAM |
| AdaLoRA | 60-70% | Higher | Slower | Optimal rank allocation |
| DoRA | 60-70% | Higher | Moderate | Better convergence |
| IA3 | 80-90% | Moderate | Fastest | Task adaptation |
| Prefix Tuning | 85-95% | Moderate | Fast | Generation tasks |
Read these for detailed guidance on specific topics:
| File | Topic | When to Read |
|---|---|---|
references/lora-fundamentals.md | What LoRA is and how it works | Getting started |
references/peft-methods.md | All PEFT methods compared | Choosing a method |
references/dataset-preparation.md | Dataset curation and quality | Before training |
references/captioning-guide.md | Image dataset captioning | Image LoRA training |
references/parameters-guide.md | Every parameter explained | Configuration |
references/optimizers.md | Optimizer comparison | Choosing optimizer |
references/vram-estimation.md | VRAM calculation | Resource planning |
references/evaluation.md | How to evaluate results | After training |
references/failure-modes.md | Diagnosis and fixes | When things go wrong |
references/advanced-techniques.md | DoRA, AdaLoRA, LoRA+, multi-adapter | Advanced usage |
references/troubleshooting.md | Common issues and solutions | Debugging |