Transcribes audio and video files to text using Qwen3-ASR. Supports two modes — local MLX inference on macOS Apple Silicon (no API key, 15-27x realtime) and remote API via vLLM/OpenAI-compatible endpoints. Auto-detects platform and recommends the best path. Triggers when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字. Also triggers for meeting recordings, lectures, interviews, podcasts, screen recordings, or any audio/video file the user wants converted to text.
Transcribe audio/video files to text using Qwen3-ASR. Two inference paths:
| Mode | When | Speed | Cost |
|---|---|---|---|
| Local MLX | macOS Apple Silicon | 15-27x realtime | Free |
| Remote API | Any platform, or when local unavailable | Depends on GPU | API/self-hosted |
Configuration persists in ${CLAUDE_PLUGIN_DATA}/config.json.
cat "${CLAUDE_PLUGIN_DATA}/config.json" 2>/dev/null
If config exists, read values and proceed to Step 1.
If config does not exist, auto-detect platform first:
python3 -c "
import sys, platform
is_mac_arm = sys.platform == 'darwin' and platform.machine() in ('arm64', 'aarch64')
print(f'Platform: {sys.platform} {platform.machine()}')
print(f'Apple Silicon: {is_mac_arm}')
if is_mac_arm:
print('RECOMMEND: local-mlx')