Use this skill when a user provides an audio file (local file or URL) and wants it transcribed into Traditional Chinese text. Prefer this skill for requests like "transcribe this audio", "convert speech to text", "轉錄這段錄音", "語音轉文字", or when converting meeting recordings, voice memos, or podcast clips to text. The output can be further processed by meeting-note-formatter for structured meeting notes.
使用 Gemini Interactions API 的原生音訊理解能力,將音訊檔案轉錄為繁體中文逐字稿。支援本機檔案與遠端 URL,當偵測到多位說話者時會自動標記區分。輸出的逐字稿可搭配 meeting-note-formatter skill 進一步整理為結構化會議紀錄。
GEMINI_API_KEY 環境變數必須設定.mp3、.wav、.ogg、.flac、.m4a、.aac、.webm、.wma直接執行預建腳本 — 不需要 npm install 或額外設定:
node .agents/skills/audio-transcriber/scripts/transcribe.js <audio-path-or-url>
遠端 URL:
GEMINI_API_KEY=your_api_key node .agents/skills/audio-transcriber/scripts/transcribe.js "https://example.com/audio/meeting.mp3"
本機檔案:
GEMINI_API_KEY=your_api_key node .agents/skills/audio-transcriber/scripts/transcribe.js "./recordings/meeting.m4a"
搭配 meeting-note-formatter 使用:
# 先轉錄音訊
GEMINI_API_KEY=your_api_key node .agents/skills/audio-transcriber/scripts/transcribe.js "recording.mp3" > transcript.md
# 再用 meeting-note-formatter 整理成會議紀錄
設定 AUDIO_TRANSCRIBER_DRY_RUN=1 可在不呼叫 Gemini API 的情況下,預覽輸入解析結果:
AUDIO_TRANSCRIBER_DRY_RUN=1 node .agents/skills/audio-transcriber/scripts/transcribe.js "https://example.com/test.mp3"
輸出範例:
{
"source": "remote-url",
"mimeType": "audio/mpeg",
"localPath": null,
"uriPreview": "data:audio/mpeg;base64,..."
}
⚠️ skill 腳本位於 repo 根目錄。若 cwd 不在 repo root,先獨立執行 git rev-parse --show-toplevel 取得路徑,再 cd 到該路徑後執行。禁止使用 $(...) 語法。
GEMINI_API_KEY。node .agents/skills/audio-transcriber/scripts/transcribe.js "<audio-path-or-url>"
file:// URL,腳本會自動轉換為 Base64 data URI。meeting-note-formatter skill 處理轉錄結果。GEMINI_API_KEY 時,腳本以狀態碼 1 退出並顯示錯誤訊息。如需修改腳本,編輯 src/transcribe.js 後重新建置:
cd .agents/skills/audio-transcriber
bun install
bun build src/transcribe.js --outfile scripts/transcribe.js --target node --minify