Transcribes audio files using Google Cloud Speech-to-Text. Use as a background capability in agentic workflows. Returns structured transcript data for programmatic use.
A non-interactive transcription capability for use in agentic workflows. Transcribes audio files and returns structured output without user prompts.
/vmemo command insteadGOOGLE_APPLICATION_CREDENTIALS environment variable set| Parameter | Required | Description |
|---|---|---|
audio_path | Yes | Absolute path to audio file |
format | No | Output format: text (default), json, srt |
language | No | BCP-47 language code (default: en-GB) |
output_path | No | Path to save output file (optional) |
Returns a structured result:
TRANSCRIPTION RESULT
====================
Source: <audio_path>
Duration: <seconds>
Status: success | error
Transcript:
-----------
<transcript_text>
python scripts/google_transcribe.py "<audio_path>" --format text
python scripts/google_transcribe.py "<audio_path>" --format json
python scripts/google_transcribe.py "<audio_path>" --format srt --output "<output_path>"
When using this skill from another workflow:
| Condition | Behaviour |
|---|---|
| File not found | Return error with clear message |
| Unsupported format | Return error listing supported formats |
| API quota exceeded | Return error with retry suggestion |
| Credentials missing | Return error with setup instructions |
Last updated: January 2025