Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.
Use when building ML/AI apps in Rust. Keywords: machine learning, ML, AI, tensor, model, inference, neural network, deep learning, training, prediction, ndarray, tch-rs, burn, candle, 机器学习, 人工智能, 模型推理
Add support for a new Vision-Language Model (VLM) to AutoRound, including multimodal block handler, calibration dataset template, and special model handling. Use when integrating a new VLM like LLaVA, Qwen2-VL, GLM-Image, Phi-Vision, or similar multi-modal models for quantization.
Adapt AutoRound to support a new LLM architecture that doesn't work out-of-the-box. Use when quantization fails for a new model type, block detection doesn't find layers, MoE models need unfusing, custom forward passes are needed, or non-standard linear layer types need handling.
GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.
OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.
Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.
Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support
Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training
Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.
Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.
Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization
Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2
Skill: lazy-prefetch-pattern
Skill: ultracite
Skill: trpc-patterns
Darwin Skill (达尔文.skill): autonomous skill optimizer inspired by Karpathy's autoresearch. Evaluates SKILL.md files using an 8-dimension rubric (structure + effectiveness), runs hill-climbing with git version control, validates improvements through test prompts, and generates visual result cards. Use when user mentions "优化skill", "skill评分", "自动优化", "auto optimize", "skill质量检查", "达尔文", "darwin", "帮我改改skill", "skill怎么样", "提升skill质量", "skill review", "skill打分".
bug-hunter 阶段 2 技能。负责将随机化后的 diff 按 persona 矩阵分发给 8 个子智能体并行评审,并收集统一 JSON 结果。
分布式多智能体缺陷检测总控技能。基于输入随机化、角色化并行评审、语义桶化、加权共识与裁决复核输出高信噪比代码评审报告。用于大规模 PR、复杂逻辑变更、安全敏感改动或单智能体评审召回率不足的场景。
bug-hunter 阶段 3 技能。负责对多智能体原始发现做语义去重、桶化聚类与冲突识别,形成可投票的缺陷候选池。
bug-hunter 阶段 4 技能。负责对缺陷桶执行加权共识投票,筛选过阈值问题,并输出裁决级结构化评审报告。
Generate release notes for the new NNCF release.
Add a new Numerai model type to the agents training pipeline. Use when you need to register a model in `agents/code/modeling/utils/model_factory.py`, handle fit/predict quirks in `agents/code/modeling/utils/numerai_cv.py`, and update configs so the model can run via `python -m agents.code.modeling`.
Evaluate a trained checkpoint with visualization
Launch a training run for a robot environment using PPO
Implementation details for EF Core Roslyn analyzers. Use when changing analyzers, fix providers, or diagnostic suppressors.
Propose high-fitness and high-diversity mutants of the VP1 capsid protein of Adeno-Associated Virus (AAV) through multi-round iterative optimization.
Propose high-fluorescence and high-diversity mutants of Green Fluorescent Protein (GFP) through multi-round iterative optimization.