Name: Audio Transcription
Author: hasna

Audio Transcription Skill

This skill provides high-quality speech-to-text transcription using multiple AI providers. It automatically handles large files through compression and chunking.

Supported Providers

ElevenLabs Scribe

Accuracy: 96.7% for English (industry-leading)
Max file size: 3GB / 10 hours
Features: Speaker diarization (up to 32 speakers), word-level timestamps
Cost: $0.40/hour
Best for: Multi-speaker recordings, highest accuracy needs

OpenAI Whisper

Accuracy: Excellent
Max file size: 25MB (automatic chunking for larger files)
Features: Segment timestamps, language detection
Cost: $0.006/min ($0.003/min with GPT-4o Mini)

Format	Extension	Description
text	.txt	Plain text transcript
srt	.srt	SubRip subtitle format
vtt	.vtt	WebVTT subtitle format
json	.json	Full structured data with metadata

Audio Transcription

Audio Transcription Skill

Supported Providers

ElevenLabs Scribe

OpenAI Whisper

Audio Transcription

Audio Transcription Skill

Supported Providers

ElevenLabs Scribe

OpenAI Whisper

Google Gemini

Usage

Basic Transcription

With Speaker Diarization

Export to Subtitles

View Provider Info

Output Formats

Large File Handling

Configuration

Dependencies

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api