Name: ML Research Skill
Author: Pranav-Karra-3301

ML Research Skill

Comprehensive skill for ML/AI research experiments and finetuning. Use when: (1) Setting up new ML research project ("create ML project", "init experiment") (2) Finetuning models ("finetune LLM", "adapt pretrained model", "LoRA", "QLoRA") (3) Training from scratch ("train model", "run experiment") (4) Debugging ML issues ("model not converging", "loss exploding", "GPU OOM") (5) Setting up experiment tracking ("add W&B", "setup MLflow") (6) Optimizing GPU usage ("batch size tuning", "memory optimization") (7) Creating visualizations ("plot training curves", "confusion matrix") (8) Auditing ML code ("check reproducibility", "review experiment") Triggers: "ML", "machine learning", "deep learning", "training", "finetuning", "PyTorch", "TensorFlow", "experiment", "GPU", "CUDA", "model", "neural network", "W&B", "MLflow", "reproducibility", "learning rate", "checkpoint", "epoch"

Pranav-Karra-33010 星标2026年1月30日

职业
分类: 机器学习

Overview

This skill provides comprehensive support for ML/AI research experiments, finetuning, and training. It helps with:

System Detection: GPU, CUDA, memory, framework versions
Project Setup: Cookiecutter-style ML project structure with proper documentation
Experiment Tracking: W&B, MLflow, TensorBoard configuration
Best Practices: Reproducibility, data handling, training loops
Debugging: Common mistakes, memory issues, convergence problems
Visualization: Publication-quality plots, colorblind-friendly palettes

Core Workflow

Phase 1: System Detection

Before any ML work, detect the compute environment:

1. GPU Detection

# Check for NVIDIA GPU
nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv,noheader 2>/dev/null || echo "No NVIDIA GPU detected"

# Check CUDA version
nvcc --version 2>/dev/null | grep "release" || echo "CUDA not found"

# Check cuDNN (if accessible)
cat /usr/local/cuda/include/cudnn_version.h 2>/dev/null | grep CUDNN_MAJOR -A 2 || echo "cuDNN version not directly accessible"

ML Research Skill

Pranav-Karra-33010 星标2026年1月30日

职业
分类: 机器学习

Overview

This skill provides comprehensive support for ML/AI research experiments, finetuning, and training. It helps with:

System Detection: GPU, CUDA, memory, framework versions

Project Setup: Cookiecutter-style ML project structure with proper documentation

Experiment Tracking: W&B, MLflow, TensorBoard configuration

Best Practices: Reproducibility, data handling, training loops

Debugging: Common mistakes, memory issues, convergence problems

Visualization: Publication-quality plots, colorblind-friendly palettes

Core Workflow

Phase 1: System Detection

Before any ML work, detect the compute environment:

1. GPU Detection

# Check for NVIDIA GPU nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv,noheader 2>/dev/null || echo "No NVIDIA GPU detected" # Check CUDA version nvcc --version 2>/dev/null | grep "release" || echo "CUDA not found" # Check cuDNN (if accessible) cat /usr/local/cuda/include/cudnn_version.h 2>/dev/null | grep CUDNN_MAJOR -A 2 || echo "cuDNN version not directly accessible"

Task Type	Key Indicators	Critical Checks
Training from scratch	New model, random init	Data size, compute budget
Finetuning	Pretrained model, adaptation	Base model selection, LR schedule
Evaluation	Metrics, benchmarking	No data leakage, proper splits
Inference	Deployment, serving	Batch size, latency requirements

ML Research Skill

Overview

Core Workflow

Phase 1: System Detection

ML Research Skill

Overview

Core Workflow

Phase 1: System Detection

Phase 2: Task Understanding

Phase 3: Gap Analysis

Reproducibility Checklist

Data Handling Checklist

Training Checklist

Documentation Checklist

Phase 4: Project Setup / Code Generation

Recommended Project Structure

Configuration Generation

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns