Respond to audio messages with audio-only responses. Use when: (1) User sends audio message, (2) Protocol requires audio-only response, (3) Need to convert text to speech, (4) Label audio responses usefully.
This skill implements the audio-only response protocol: When user inputs audio, respond with audio only. It handles transcription, processing, TTS conversion, and labeled audio delivery.
Audio responses are labeled with:
[Topic] [Duration] [Key Point]
Examples:
[Disk Space] [15s] EBS volume solved capacity crisis[Transcription] [8s] Whisper now working with 95% accuracy[TTS Setup] [12s] espeak configured for basic audio responsesecho "Response text" | espeak --stdout > response.wav
from openai import OpenAI
client = OpenAI()
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Response text"
)
from elevenlabs import generate, play
audio = generate(text="Response text", voice="Rachel")
from gtts import gTTS
tts = gTTS(text="Response text", lang="en")
tts.save("response.mp3")
scripts/audio_responder.pyMain script that:
scripts/tts_engine.pyTTS conversion with fallback:
scripts/label_generator.pyGenerate useful labels:
User Audio → Transcription → Processing → Text Response → TTS → Labeled Audio Response
python3 scripts/audio_responder.py \
--audio /path/to/user_audio.ogg \
--tts-engine espeak \
--label-format "[{topic}] [{duration}s] {key_point}"
# Test TTS
echo "Audio response protocol is now active" | espeak --stdout > test.wav
# Test full workflow
python3 scripts/audio_responder.py --test
def handle_telegram_audio(audio_path):
# Transcribe
transcript = transcribe_audio(audio_path)
# Generate response
response_text = generate_response(transcript)
# Convert to audio
audio_response = text_to_speech(response_text)
# Send with label
label = generate_label(transcript, response_text)
send_audio_response(audio_response, label)
Script checks and configures best available TTS:
Input: "How did you fix the disk space issue?"
Output Label: [Infrastructure] [18s] Added 50GB EBS volume, now 47GB free
Input: "Can you transcribe audio now?"
Output Label: [Capabilities] [10s] Whisper transcription working with 9 files processed
Input: "What's next?"
Output Label: [Planning] [14s] Fix Moltbook cron, set up auto-transcription
[Response] [{duration}s] Audio replyexport ELEVENLABS_API_KEY="..."
export OPENAI_API_KEY="..."
export TTS_ENGINE="elevenlabs" # or openai, google, espeak
~/.config/audio-response.json):{
"tts_engine": "espeak",
"label_format": "[{topic}] [{duration}s] {key_point}",
"fallback_engines": ["openai", "google", "espeak"],
"max_duration": 30
}
Send an audio message to test:
The first audio response will explain the TTS setup status.