Use when working with Paddle 3.0 compiler full pipeline: SOT (Symbolic Opcode Translator) for bytecode-level dy2st graph capture, PIR (Paddle IR) for SSA-based intermediate representation, CINN for fused CUDA kernel generation, operator decomposition (Prim), or the end-to-end flow from Python eager code to optimized GPU execution.

PaddlePaddle23.8k

Internos de Frameworks

Paddle Phi Kernel

Use when working with Paddle's PHI kernel system: registering new kernels, debugging kernel selection/dispatch, understanding code auto-generation from YAML, or implementing operator decomposition via the combination mechanism.

PaddlePaddle23.8k

Internos de Frameworks

Paddle Op Dev

PaddlePaddle (飞桨) C++ 算子开发指南。提供从 YAML 配置、InferMeta 函数、Kernel 实现、Python API 封装、单元测试到编译验证的完整算子开发流程指导。在以下场景使用此 skill：(1) 为 Paddle 框架新增 C++ 算子 (2) 修改或调试已有 Paddle 算子 (3) 编写算子的 YAML 配置、InferMeta、Kernel、Python API 或单元测试 (4) 理解 Paddle 算子开发架构和流程 (5) 编译 Paddle 并验证算子正确性

PaddlePaddle23.8k

Internos de Frameworks

Token Efficiency

Activate ultra-compressed output mode for maximum token efficiency. Use when context is running low, user requests brevity, or dealing with large-scale operations.

SuperClaude-Org22.3k

Internos de Frameworks

Typescript Sdk

TypeScript SDK patterns for Opik. Use when working in sdks/typescript.

comet-ml18.9k

Internos de Frameworks

Python Sdk

Python SDK patterns for Opik. Use when working in sdks/python, on SDK APIs, integrations, or message processing.

comet-ml18.9k

Internos de Frameworks

PennyLane

Hardware-agnostic quantum ML framework with automatic differentiation. Use when training quantum circuits via gradients, building hybrid quantum-classical models, or needing device portability across IBM/Google/Rigetti/IonQ. Best for variational algorithms (VQE, QAOA), quantum neural networks, and integration with PyTorch/JAX/TensorFlow. For hardware-specific optimizations use qiskit (IBM) or cirq (Google); for open quantum systems use qutip.

K-Dense-AI18.8k

Internos de Frameworks

Pymoo

Multi-objective optimization framework. NSGA-II, NSGA-III, MOEA/D, Pareto fronts, constraint handling, benchmarks (ZDT, DTLZ), for engineering design and optimization problems.

K-Dense-AI18.8k

Internos de Frameworks

Pytorch Lightning

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

K-Dense-AI18.8k

Internos de Frameworks

Add New Jit Ee Api

Add a new API to the JIT-VM (aka JIT-EE) interface in the codebase.

dotnet17.8k

Internos de Frameworks

Writing Extensions & Compression Code

Guidance for writing and modifying Microsoft.Extensions.* and System.IO.Compression code in dotnet/runtime. Covers DI lifetime management, configuration binding, options validation, logging provider patterns, caching semantics, compression format compliance, and host lifecycle. For full code review, delegates to the @extensions-reviewer agent. Trigger words: Microsoft.Extensions, IServiceCollection, IConfiguration, ILogger, IHost, IMemoryCache, IOptions, ZipArchive, HttpClientFactory, IFileProvider, IChangeToken.

dotnet17.8k

Internos de Frameworks

Frontend Forge Fi Operations

Operate FrontendIntegration resources and the frontend-forge extension. Use when Codex needs to create a FrontendIntegration from FrontendIntegration YAML, update or patch FI lifecycle state, inspect or troubleshoot FI build output, or create, enable, disable, uninstall, and inspect the frontend-forge extension through its InstallPlan and extension resources.

kubesphere16.9k

Internos de Frameworks

Golang Expert

Go programming expert for goroutines, channels, interfaces, modules, and concurrency patterns

RightNow-AI16.8k

Internos de Frameworks

Iii Rust Sdk

Rust SDK for the iii engine. Use when building high-performance workers, registering functions, or invoking triggers in Rust.

iii-hq15.3k

Internos de Frameworks

Dspy Ruby

Build type-safe LLM applications with DSPy.rb — Ruby's programmatic prompt framework with signatures, modules, agents, and optimization. Use when implementing predictable AI features, creating LLM signatures and modules, configuring language model providers, building agent systems with tools, optimizing prompts, or testing LLM-powered functionality in Ruby applications.

EveryInc14.7k

Internos de Frameworks

Tooling

Implementation details for the EF Core dotnet-ef CLI and tooling. Use when changing dotnet-ef commands, the ef wrapper, EFCore.Tools (PMC), or EFCore.Tasks MSBuild integration.

dotnet14.6k

Internos de Frameworks

Marko Best Practices

Apply Marko syntax and best practices when editing `.marko` files and building Marko components.

marko-js14.4k

Internos de Frameworks

Kernel Triton Writing

ONLY for OpenAI Triton (@triton.jit) kernel development. NEVER use for CUDA C++ kernels, TileIR, or profiling tools (ncu, nsys). The user's request must involve Triton explicitly. Covers Triton-specific patterns: fused elementwise, reductions (softmax, LayerNorm, RMSNorm), tiled GEMM with triton.autotune, and flash attention. Workflow: design, write, verify (with fast-path for explicit requests).

NVIDIA13.4k

Internos de Frameworks

Perf Optimization

Performance optimization coordination playbook. Contains specialist routing table, TileIR two-step pipeline, kernel generation specialist selection, prioritization criteria, and safe modification workflow. Use when the user asks to apply optimizations, write kernels, or improve performance. Covers both user-specified optimization and autopilot-driven iterative optimization.

NVIDIA13.4k

Internos de Frameworks

Workload Profiling

Code instrumentation for timing workloads. Two scenarios: (1) Training loop — inject manual timing to report per-iteration latency, throughput (samples/sec), and data load time. (2) Standalone kernel/op — write CUDA event timing code with warmup, per-iteration statistics, and anti-pattern avoidance. Also covers NVTX annotation for labeling profiler timelines. NOT for: running or analyzing profiler tools (nsys, ncu, Nsight Systems, Nsight Compute), writing kernels (Triton, CuTe, CUDA), applying optimizations (CUDA Graphs, gradient checkpointing, fusion), or interpreting roofline/SOL% metrics. Triggers: "measure throughput", "benchmark this function", "time my training loop", "samples per second", "NVTX annotate", "instrument my dataloader", "data load time", "kernel timing", "how do I time".

NVIDIA13.4k

Internos de Frameworks

CuTe DSL

Write and implement GPU kernels using NVIDIA CuTe DSL (CUTLASS 4.x Python API) — NOT for Triton, CUDA C++, or conceptual explanations. Trigger only when the user wants to write or implement a kernel, not when asking questions about CuTe DSL concepts or layouts. CuTe DSL uses cute.jit/cute.kernel decorators and cutlass.cute imports. Covers element-wise kernels, GEMM patterns, reductions, memory hierarchy (global/shared/register/TMA), MMA tensor core operations, software pipelining, and framework integration.

NVIDIA13.4k

Internos de Frameworks

Triton TileIR Optimization

Optimize existing Triton kernels for NVIDIA TileIR backend on Blackwell GPUs (sm_100+). Adds TileIR-specific autotune configs: occupancy, num_ctas, TMA descriptors. Covers kernel classification (dot-related, norm-like, elementwise, reduction), type-specific transformations, and PTX-vs-TileIR benchmarking. Triggered by: "optimize for TileIR", "add TileIR configs", "Blackwell optimization", "TMA descriptors", "2CTA mode", "occupancy tuning". Kernels use standard `import triton`; TileIR activates via ENABLE_TILE=1 when nvtriton is installed.

NVIDIA13.4k

Internos de Frameworks

CUDA Graphs for PyTorch

Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, manual torch.cuda.graph), code compatibility, capture workflows, dynamic pattern handling, and troubleshooting. Triggers: CUDA graph, torch.cuda.graph, make_graphed_callables, reduce-overhead, graph capture, graph replay, kernel launch overhead, CudaGraphManager, FullCudaGraphWrapper, full-iteration graph, stream capture.

NVIDIA13.4k

Internos de Frameworks

Optimize

Solve constrained optimization problems using Z3. Supports minimization and maximization of objective functions over integer, real, and bitvector domains.

Z3Prover12.2k

Internos de Frameworks

Distributed Training

Multi-GPU and distributed training patterns with PyTorch DDP. Use when scaling training across GPUs.

aiming-lab11.3k

Internos de Frameworks

Model Redux Statebuild Slices And Selectors

Use this when authoring or refactoring slices with createSlice, selectors, create.asyncThunk, entity adapters, or lazy reducer injection. Covers Immer-backed mutation syntax, slice selectors, getSelectors, injectInto, withLazyLoadedSlices, and current RTK 2 slice patterns.

reduxjs11.2k

Internos de Frameworks

Rust Engineer

Writes, reviews, and debugs idiomatic Rust code with memory safety and zero-cost abstractions. Implements ownership patterns, manages lifetimes, designs trait hierarchies, builds async applications with tokio, and structures error handling with Result/Option. Use when building Rust applications, solving ownership or borrowing issues, designing trait-based APIs, implementing async/await concurrency, creating FFI bindings, or optimizing for performance and memory safety. Invoke for Rust, Cargo, ownership, borrowing, lifetimes, async Rust, tokio, zero-cost abstractions, memory safety, systems programming.

Jeffallan8.3k

Internos de Frameworks

Cpp Pro

Writes, optimizes, and debugs C++ applications using modern C++20/23 features, template metaprogramming, and high-performance systems techniques. Use when building or refactoring C++ code requiring concepts, ranges, coroutines, SIMD optimization, or careful memory management — or when addressing performance bottlenecks, concurrency issues, and build system configuration with CMake.

Jeffallan8.3k

Internos de Frameworks

Embedded Systems

Use when developing firmware for microcontrollers, implementing RTOS applications, or optimizing power consumption. Invoke for STM32, ESP32, FreeRTOS, bare-metal, power optimization, real-time systems, configure peripherals, write interrupt handlers, implement DMA transfers, debug timing issues.

Jeffallan8.3k

Internos de Frameworks Skills | Skills Pool

Internos de Frameworks

Internos de Frameworks

V8 Best Practices

Paddle Design Compiler

Paddle Phi Kernel

Paddle Op Dev

Token Efficiency

Typescript Sdk

Python Sdk

PennyLane

Pymoo

Pytorch Lightning

Add New Jit Ee Api

Writing Extensions & Compression Code

Frontend Forge Fi Operations

Golang Expert

Iii Rust Sdk

Dspy Ruby

Tooling

Marko Best Practices

Kernel Triton Writing

Perf Optimization

Workload Profiling

CuTe DSL

Triton TileIR Optimization

CUDA Graphs for PyTorch

Optimize

Distributed Training

Model Redux Statebuild Slices And Selectors

Rust Engineer

Cpp Pro

Embedded Systems