Archivo del skill

Skill: ComfyUI LoRA Data Gathering

Name: Skill: ComfyUI LoRA Data Gathering
Author: Bakery88

Use this skill when the user wants to gather, collect, or prepare training images for LoRA fine-tuning. Covers the full pipeline: web research, image downloading, video frame extraction, dataset curation, image processing, and auto-captioning. Use it when the user mentions collecting images from the web, building a training dataset, finding reference images for a style or subject, or preparing images they already have into a LoRA-ready dataset. Also use it when they say things like "I want to train a LoRA but I don't have images yet" or "help me find training data." Do NOT use this skill if the user already has a prepared dataset and just wants to start training — that's comfyui-lora-training.

Bakery880 estrellas5 abr 2026

Ocupación
Categorías: Machine Learning

Contenido de la habilidad

Research, collect, and prepare training datasets for LoRA fine-tuning. Handles the full pipeline from web research through image download, video frame extraction, and auto-captioning — delivering a ready-to-train dataset folder.

When to Use

User asks "Help me gather LoRA data for [subject/style]" or "Find training images for..."
User wants to build a dataset for LoRA training but doesn't have images yet
User has a concept (art style, character type, aesthetic) and needs reference material collected
User mentions collecting images from the web, YouTube, or art communities for training
User says something like "I want to train a LoRA on [style] but I don't have any images"
User asks to prepare or organize images they've found into a training dataset

If the user already has their images and just needs training guidance, redirect to comfyui-lora-training instead.

Dependencies

: Check available base models to recommend appropriate dataset resolution (FLUX = 1024px, SDXL = 1024px, SD 1.5 = 512px)

Skills relacionados

Skill: ComfyUI LoRA Data Gathering | Skills Pool

Archivo del skill

Skill: ComfyUI LoRA Data Gathering

Bakery880 estrellas5 abr 2026

Ocupación
Categorías: Machine Learning

Contenido de la habilidad

Research, collect, and prepare training datasets for LoRA fine-tuning. Handles the full pipeline from web research through image download, video frame extraction, and auto-captioning — delivering a ready-to-train dataset folder.

When to Use

User asks "Help me gather LoRA data for [subject/style]" or "Find training images for..."
User wants to build a dataset for LoRA training but doesn't have images yet
User has a concept (art style, character type, aesthetic) and needs reference material collected
User mentions collecting images from the web, YouTube, or art communities for training
User says something like "I want to train a LoRA on [style] but I don't have any images"
User asks to prepare or organize images they've found into a training dataset

If the user already has their images and just needs training guidance, redirect to comfyui-lora-training instead.

Dependencies

: Check available base models to recommend appropriate dataset resolution (FLUX = 1024px, SDXL = 1024px, SD 1.5 = 512px)

Skills relacionados

Tool	Purpose
`WebSearch`	Find image sources, art communities, video tutorials, reference sheets
`WebFetch`	Crawl pages to extract image URLs and metadata
`list_models`	Check base model to determine target resolution
Scripts in `scripts/`	Download images, extract video frames, process images, generate captions

{
  "sources": [
    {
      "url": "https://...",
      "type": "gallery|video|article|reference-sheet",
      "tier": "A|B|C",
      "description": "What this source contains",
      "estimated_usable_images": 10
    }
  ]
}

{
  "urls": [
    {"url": "https://direct-image-link.jpg", "source_page": "https://gallery-page", "tier": "A"},
    {"url": "https://another-image.png", "source_page": "https://portfolio-page", "tier": "B"}
  ]
}

yt-dlp -f "bestvideo[height>=720]" -o "datasets/{project-name}/video/%(title)s.%(ext)s" "{video_url}"

Count check: Report total images gathered vs. the target (20-30 typical). Break down by source tier and source type (downloaded vs. video frames).
Quality scan: Flag specific files that might cause training issues. Reference filenames so the user can find them:
- Too small (below target resolution) — list filenames and their resolutions
- Suspected watermarks or text overlays — flag images from sources known for watermarking
- Collages or multi-panel images (should be split or removed)
- Off-topic images — flag any from Tier C sources or sources with low relevance
Near-duplicate detection: Images that are very similar but not byte-identical (e.g., slightly different crops, video frames from adjacent timestamps) waste training budget. To catch these:
- Review video frames from the same source at nearby timestamps — consecutive frames within 2-4s of each other are likely near-duplicates
- Flag images from the same source page that have similar filenames or sequential numbering
- Tell the user: "Please scan through the raw/ folder sorted by name — images from the same source are grouped together, which makes spotting near-duplicates easier."

Diversity checklist: Score each criterion as PASS / WEAK / FAIL:

Criterion	PASS	WEAK	FAIL
Subject variety	5+ distinct subjects/scenes	3-4 subjects	<3 (dataset will overfit to specific subjects)
Composition range	Has close-ups, medium, and wide shots	Missing one type	All same framing
Color distribution	Varied palette (or single palette IS the style)	Slightly skewed	Dominated by one palette unintentionally
Background variety	3+ different settings/contexts	2 settings	All same background
People (if applicable)	3+ expressions, 3+ lighting setups, varied clothing	Partial variety	All same pose/expression/lighting
Styles (if applicable)	5+ different subjects in the style	3-4 subjects	<3 (will learn subject, not style)

Present the summary to the user with:
- Per-file recommendations: keep, remove, or "need more like this"
- The diversity scorecard above
- A clear ask: "Please review the images in datasets/{project-name}/raw/ and let me know if you agree with these recommendations, or if you'd like to adjust."

User wants to learn...	Mode	Caption format
An art style / aesthetic	`style`	`{trigger_word} style, {description of content}`
A specific person / character	`subject`	`{trigger_word}, {description of everything except the subject}`
A type of object / product	`object`	`{trigger_word}, {description of context and setting}`

Field	Source
Image count, resolution buckets	`processed/processing-report.json`
Captioning method, trigger word, mode	`processed/caption-report.json`
Source URLs and attribution	`download-log.json`, `extraction-log.json`
Source tiers	`sources.json`
Quality scorecard	Your Step 4 curation notes

{
  "project_name": "jrpg-art-style",
  "created": "2026-04-04T12:00:00Z",
  "target_architecture": "FLUX",
  "trigger_word": "jrpgart",
  "captioning_mode": "style",
  "captioning_method": "florence2",
  "caption_strip_words": ["final fantasy", "square enix"],
  "target_resolution": 1024,
  "image_count": 25,
  "resolution_buckets": {"1024x1024": 10, "1152x896": 8, "896x1152": 7},
  "sources_summary": {
    "tier_a": 3,
    "tier_b": 2,
    "total_urls": 40,
    "images_kept": 25,
    "video_frames_used": 8,
    "iterations": 1
  },
  "quality_scorecard": {
    "subject_variety": "PASS",
    "composition_range": "PASS",
    "color_distribution": "PASS",
    "background_variety": "PASS"
  },
  "dataset_path": "datasets/jrpg-art-style/processed/",
  "ready_for_training": true,
  "notes": "Focus on Amano-style watercolor JRPG illustrations. 25 images with Florence-2 captions."
}

Dataset ready: datasets/{project-name}/processed/
  - {N} images across {bucket_count} resolution buckets
  - Captioned with {captioner} ({mode} mode)
  - Trigger word: {trigger_word}
  - Quality: all scorecard criteria PASS
  - Report: datasets/{project-name}/dataset-report.json

To start training, say:
  "Train a LoRA using the {project-name} dataset"

The training skill will read dataset-report.json to pick up your
trigger word, resolution, and captioning settings automatically.

Note: "variety" scored WEAK — most images are front-facing portraits.
The LoRA may struggle with other poses. You can proceed, or gather
more varied images first.

Capability	Online	Offline
Web research & source discovery	Yes	No — need internet
Image downloading	Yes	No — need internet
Video downloading	Yes	No — need internet
Frame extraction from local video	N/A	Yes (ffmpeg is local)
Image processing & resizing	N/A	Yes (local scripts)
Auto-captioning	N/A	Yes (local models)
Dataset report generation	N/A	Yes

Base Model	Target Resolution
FLUX	1024
SDXL	1024
SD 1.5	512

Skill: ComfyUI LoRA Data Gathering

When to Use

Dependencies

Skill: ComfyUI LoRA Data Gathering

When to Use

Dependencies

Tools Used

Clarification Interview

Questions to Ask

When to Skip the Interview

Procedure

Step 1: Research Sources

Step 2: Download Images

Phase 1: Extract Image URLs

Phase 2: Download by Tier

When Downloads Fail

Step 3: Extract Video Frames

Download Videos

Extract Frames

Step 4: Curate the Dataset

Build the Curation Report

Step 4b: Gap Analysis and Iteration

Analysis

When to Iterate

How to Iterate

Step 5: Process Images

Remove Rejected Images

Step 6: Auto-Caption Images

Choose Mode and Trigger Word

Run Auto-Captioning

Caption Hygiene

Review Captions

Step 7: Generate Dataset Report

Step 8: Hand Off to Training

Dataset Folder Structure

Ethical Considerations

Offline Mode

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns