框架內部結構
Port To MLX
Use when porting CUDA-first or Linux-first code to Apple Silicon MLX/Metal. Covers Triton/CUDA kernel porting to MLX Metal (sparse ops, atomics, custom VJP, mx.fast.metal_kernel), PyTorch nn.Module to mlx.nn.Module conversion, 3D Gaussian Splatting rasterization (tile-based rendering, SH evaluation, EWA splatting, GLM convention handling, hybrid autodiff backward), vision foundation models (SAM + CLIP multi-backend with auto-detect factory pattern, mlx_clip, mlx_sam3, PyTorch MPS fallback), IsaacLab simulator backends, Stereolabs ZED stereo cameras, Intel RealSense depth processing (CUDA→MLX filters, point cloud, alignment), NVIDIA Triton Inference Server to MLX (server architecture, mlx-lm integration, OpenAI-compatible API, production hardening), ragged tensor patterns, packaging splits, benchmark harnesses, and macOS install/CI paths. Apply whenever replacing CUDA, Triton, Warp, or Linux assumptions with MLX, Metal, CPU fallback, or mac-native adapters.