AI-powered audio generation using ElevenLabs API - text-to-speech with lifelike voices, sound effects generation, and music creation from text descriptions. Generate natural-sounding speech in 32 languages, create custom sound effects for games and videos, and compose royalty-free music tracks. Use this skill when the user requests: - Voice generation or text-to-speech conversion - Audio narration for content (videos, audiobooks, podcasts) - Sound effects for games, videos, or applications - Music generation from text descriptions - Multi-speaker dialogue or conversation audio - Voice cloning or custom voice creation - Audio streaming for real-time applications Capabilities: Text-to-speech (32 languages, 100+ voices), sound effects generation, music composition, voice cloning, real-time audio streaming Python SDK: elevenlabs (pip install elevenlabs)
This skill enables AI-powered audio generation through ElevenLabs API. Create lifelike text-to-speech in 32 languages, generate custom sound effects for games and videos, and compose royalty-free music from text descriptions. Support for 100+ professional voices, custom voice cloning, real-time streaming, and multi-speaker dialogue.
This skill should be invoked when the user asks to:
Models:
eleven_multilingual_v2) - Highest quality, 29 languageseleven_flash_v2_5) - Ultra-low 75ms latency, 32 languages, 50% cheapereleven_turbo_v2_5) - Balanced quality and latencyFeatures:
Output Formats:
Model:
eleven_text_to_sound_v2)Features:
Use Cases:
Features:
Parameters:
Requirements:
Content Policy:
Analyze the user's request to determine:
For Text-to-Speech:
eleven_multilingual_v2eleven_flash_v2_5eleven_turbo_v2_5For Sound Effects:
eleven_text_to_sound_v2 modelFor Music:
import os
from elevenlabs.client import ElevenLabs
# Initialize client with API key
client = ElevenLabs(api_key=os.environ.get("ELEVENLABS_API_KEY"))
API key should be set as environment variable:
export ELEVENLABS_API_KEY="your-api-key-here"
Basic Speech Generation:
from elevenlabs.client import ElevenLabs
from pathlib import Path
client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
# Generate speech
audio = client.text_to_speech.convert(
text="Your text content here",
voice_id="JBFqnCBsd6RMkjVDRZzb", # Default voice (George)
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128"
)
# Save to file
output_path = Path("speech_output.mp3")
with output_path.open("wb") as f:
for chunk in audio:
f.write(chunk)
print(f"Audio saved to: {output_path}")
Streaming Speech (Real-time):
from elevenlabs.client import ElevenLabs
from elevenlabs import stream
client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
# Stream audio in real-time
audio_stream = client.text_to_speech.convert_as_stream(
text="This will be streamed as it generates",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_flash_v2_5", # Low latency model for streaming
output_format="mp3_44100_128"
)
# Stream to speakers
stream(audio_stream)
Multi-Speaker Dialogue:
# Generate conversation with multiple voices
speakers = [
{
"voice_id": "JBFqnCBsd6RMkjVDRZzb", # Speaker 1
"text": "Hello, how are you today?"
},
{
"voice_id": "21m00Tcm4TlvDq8ikWAM", # Speaker 2 (Rachel)
"text": "I'm doing great, thanks for asking!"
}
]
# Generate each speaker's audio and combine
from pydub import AudioSegment
combined = AudioSegment.empty()
for speaker in speakers:
audio = client.text_to_speech.convert(
text=speaker["text"],
voice_id=speaker["voice_id"],
model_id="eleven_multilingual_v2"
)
# Save temp file
temp_path = Path(f"temp_{speaker['voice_id']}.mp3")
with temp_path.open("wb") as f:
for chunk in audio:
f.write(chunk)
# Add to combined audio
segment = AudioSegment.from_mp3(str(temp_path))
combined += segment
temp_path.unlink() # Clean up
# Export final dialogue
combined.export("dialogue.mp3", format="mp3")
List Available Voices:
# Get all available voices
voices = client.voices.get_all()
print("Available voices:")
for voice in voices.voices:
print(f"- {voice.name} (ID: {voice.voice_id})")
print(f" Labels: {voice.labels}")
print(f" Description: {voice.description}")
Common Voice IDs:
JBFqnCBsd6RMkjVDRZzb - George (male, English, middle-aged)21m00Tcm4TlvDq8ikWAM - Rachel (female, English, young)AZnzlk1XvdvUeBnXmlld - Domi (female, English, young)EXAVITQu4vr4xnSDxMaL - Bella (female, English, young)ErXwobaYiN019PkySvjV - Antoni (male, English, young)MF3mGyEYCl7XYWbV9V6O - Elli (female, English, young)TxGEqnHWrfWFTfGW9XjX - Josh (male, English, young)Basic Sound Effect Generation:
from elevenlabs.client import ElevenLabs
from pathlib import Path
client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
# Generate sound effect
audio = client.text_to_sound_effects.convert(
text="footsteps on wooden floor, slow paced walking",
duration_seconds=5.0,
prompt_influence=0.5 # How closely to follow prompt (0.0-1.0)
)
# Save to file
output_path = Path("footsteps.mp3")
with output_path.open("wb") as f:
for chunk in audio:
f.write(chunk)
print(f"Sound effect saved to: {output_path}")
Looping Sound Effect:
# Generate seamlessly looping audio
audio = client.text_to_sound_effects.convert(
text="gentle rain falling on leaves, ambient nature sound",
duration_seconds=10.0,
prompt_influence=0.5
# Note: loop parameter may be available in newer API versions
)
output_path = Path("rain_loop.mp3")
with output_path.open("wb") as f:
for chunk in audio:
f.write(chunk)
Multiple Sound Effects:
# Generate various sound effects for a game
sound_effects = [
{
"name": "explosion",
"description": "large explosion, debris falling, action movie style",
"duration": 3.0
},
{
"name": "door_open",
"description": "creaky wooden door slowly opening, horror atmosphere",
"duration": 2.0
},
{
"name": "ui_click",
"description": "soft button click, UI feedback sound, pleasant tone",
"duration": 0.5
}
]
for sfx in sound_effects:
audio = client.text_to_sound_effects.convert(
text=sfx["description"],
duration_seconds=sfx["duration"]
)
output_path = Path(f"{sfx['name']}.mp3")
with output_path.open("wb") as f:
for chunk in audio:
f.write(chunk)
print(f"Generated: {output_path}")
Basic Music Composition:
from elevenlabs.client import ElevenLabs
from pathlib import Path
client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
# Generate music from prompt
prompt = """Upbeat indie pop song with acoustic guitar, light drums, and cheerful
melody. Modern and energetic feel, perfect for background music in a lifestyle video.
Instrumental only, no vocals."""