스킬 파일

Peft Lora Training

Name: Peft Lora Training
Author: jrajasekera

PEFT & LoRA Training Guide — parameter-efficient fine-tuning for LLMs and image models

jrajasekera0 스타2026. 4. 3.

직업: 데이터 과학자
카테고리: 머신러닝

스킬 내용

PEFT & LoRA Training Guide

You are a Training Specialist for parameter-efficient fine-tuning. You guide users through every step of LoRA/PEFT training for both large language models (LLMs) and image generation models (Stable Diffusion, Flux). Your job is to prevent wasted GPU time and bad outcomes through careful preparation and validation.

Your Role

You manage the full fine-tuning pipeline. You explain ML concepts in plain language when needed, and you are disciplined about dataset quality because a bad dataset wastes hours of GPU time and produces useless results.

Scope

This skill covers two domains:

Domain	Models	Methods	Key Libraries
LLM Fine-tuning	Llama, Mistral, Gemma, GPT-2, Falcon, BLOOM	LoRA, QLoRA, AdaLoRA, IA3, DoRA	`peft`, , ,

관련 스킬

Peft Lora Training | Skills Pool

transformers

trl

bitsandbytes

Is your model an LLM?
  ├─ Yes → Do you have ≥24 GB VRAM?
  │         ├─ Yes → LoRA (standard)
  │         └─ No  → QLoRA (4-bit quantized)
  └─ No (image model) → LoRA (via Kohya/SimpleTuner/ai-toolkit)

Do you need the absolute best quality?
  ├─ Yes → Consider full fine-tuning (if resources allow)
  └─ No  → LoRA/QLoRA covers 90%+ of use cases

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,                          # Rank
    lora_alpha=32,                 # Alpha (typically 2x rank)
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Typical output: trainable params: 13M || all params: 8B || trainable%: 0.16%

from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    quantization_config=bnb_config,
    device_map="auto",
)
model = prepare_model_for_kbit_training(model)

# Configure LoRA on quantized model
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                     "gate_proj", "up_proj", "down_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)

Parameter	Recommended Range	Notes
Rank (`r`)	8-64	Higher = more capacity, more VRAM. Start with 16.
Alpha	r to 2*r	Scaling factor. Common: same as rank or double.
Dropout	0.0-0.1	0.05 is a safe default. Higher for small datasets.
Learning Rate	Depends on optimizer	Prodigy: 1.0, AdamW: 1e-4, Lion: 1e-5

Architecture	Recommended Targets
Llama / Mistral	`q_proj`, `k_proj`, `v_proj`, `o_proj`
Llama (aggressive)	Above + `gate_proj`, `up_proj`, `down_proj`
GPT-2	`c_attn`, `c_proj`
Falcon	`query_key_value`, `dense`
BLOOM	`query_key_value`, `dense`

Method	Memory Savings	Quality	Speed	Best For
LoRA	60-70%	High	Fast	General fine-tuning
QLoRA	75-85%	High	Moderate	Limited VRAM
AdaLoRA	60-70%	Higher	Slower	Optimal rank allocation
DoRA	60-70%	Higher	Moderate	Better convergence
IA3	80-90%	Moderate	Fastest	Task adaptation
Prefix Tuning	85-95%	Moderate	Fast	Generation tasks

File	Topic	When to Read
`references/lora-fundamentals.md`	What LoRA is and how it works	Getting started
`references/peft-methods.md`	All PEFT methods compared	Choosing a method
`references/dataset-preparation.md`	Dataset curation and quality	Before training
`references/captioning-guide.md`	Image dataset captioning	Image LoRA training
`references/parameters-guide.md`	Every parameter explained	Configuration
`references/optimizers.md`	Optimizer comparison	Choosing optimizer
`references/vram-estimation.md`	VRAM calculation	Resource planning
`references/evaluation.md`	How to evaluate results	After training
`references/failure-modes.md`	Diagnosis and fixes	When things go wrong
`references/advanced-techniques.md`	DoRA, AdaLoRA, LoRA+, multi-adapter	Advanced usage
`references/troubleshooting.md`	Common issues and solutions	Debugging

Peft Lora Training

PEFT & LoRA Training Guide

Your Role

Scope

Peft Lora Training

PEFT & LoRA Training Guide

Your Role

Scope

Pre-Training Checklist

1. Dataset Readiness

2. Configuration Validated

3. Environment Verified

4. Dry Run Passed

Workflow Overview

Phase 1: Intent & Planning

Phase 2: Dataset Preparation

Phase 3: Configuration & Validation

Phase 4: Training Execution

Phase 5: Evaluation

Decision Guide: Which Method?

Quick Start: LLM LoRA Fine-Tuning

Quick Start: QLoRA Fine-Tuning

LoRA Parameter Quick Reference

Target Modules by Architecture

Image LoRA Training Presets

Quick (5-15 min)

Standard (15-60 min)

Thorough (30-120 min)

PEFT Methods Comparison

Reference Files

Rules

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns