技能档案

ElevenLabs Audio Generation

Name: ElevenLabs Audio Generation
Author: jkitchin

AI-powered audio generation using ElevenLabs API - text-to-speech with lifelike voices, sound effects generation, and music creation from text descriptions. Generate natural-sounding speech in 32 languages, create custom sound effects for games and videos, and compose royalty-free music tracks. Use this skill when the user requests: - Voice generation or text-to-speech conversion - Audio narration for content (videos, audiobooks, podcasts) - Sound effects for games, videos, or applications - Music generation from text descriptions - Multi-speaker dialogue or conversation audio - Voice cloning or custom voice creation - Audio streaming for real-time applications Capabilities: Text-to-speech (32 languages, 100+ voices), sound effects generation, music composition, voice cloning, real-time audio streaming Python SDK: elevenlabs (pip install elevenlabs)

jkitchin25 星标2026年1月30日

职业
分类: 内容创作

技能内容

Purpose

This skill enables AI-powered audio generation through ElevenLabs API. Create lifelike text-to-speech in 32 languages, generate custom sound effects for games and videos, and compose royalty-free music from text descriptions. Support for 100+ professional voices, custom voice cloning, real-time streaming, and multi-speaker dialogue.

When to Use

This skill should be invoked when the user asks to:

Generate speech from text ("convert this to speech", "create audio narration...")
Create voiceovers for videos, presentations, or content
Generate audio in specific voices or languages
Create sound effects ("generate footstep sounds", "create explosion audio...")
Compose music from descriptions ("generate upbeat background music...")
Build multi-speaker dialogue or conversations
Clone voices from audio samples
Stream audio in real-time applications
Create audiobooks, podcasts, or audio content

Available Capabilities

1. Text-to-Speech (Voice Generation)

ElevenLabs Audio Generation

jkitchin25 星标2026年1月30日

职业
分类: 内容创作

技能内容

Purpose

When to Use

This skill should be invoked when the user asks to:

Generate speech from text ("convert this to speech", "create audio narration...")
Create voiceovers for videos, presentations, or content
Generate audio in specific voices or languages
Create sound effects ("generate footstep sounds", "create explosion audio...")
Compose music from descriptions ("generate upbeat background music...")
Build multi-speaker dialogue or conversations
Clone voices from audio samples
Stream audio in real-time applications
Create audiobooks, podcasts, or audio content

Available Capabilities

1. Text-to-Speech (Voice Generation)

相关技能

import os
from elevenlabs.client import ElevenLabs

# Initialize client with API key
client = ElevenLabs(api_key=os.environ.get("ELEVENLABS_API_KEY"))

export ELEVENLABS_API_KEY="your-api-key-here"

from elevenlabs.client import ElevenLabs
from pathlib import Path

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

# Generate speech
audio = client.text_to_speech.convert(
    text="Your text content here",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # Default voice (George)
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128"
)

# Save to file
output_path = Path("speech_output.mp3")
with output_path.open("wb") as f:
    for chunk in audio:
        f.write(chunk)

print(f"Audio saved to: {output_path}")

from elevenlabs.client import ElevenLabs
from elevenlabs import stream

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

# Stream audio in real-time
audio_stream = client.text_to_speech.convert_as_stream(
    text="This will be streamed as it generates",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_flash_v2_5",  # Low latency model for streaming
    output_format="mp3_44100_128"
)

# Stream to speakers
stream(audio_stream)

# Generate conversation with multiple voices
speakers = [
    {
        "voice_id": "JBFqnCBsd6RMkjVDRZzb",  # Speaker 1
        "text": "Hello, how are you today?"
    },
    {
        "voice_id": "21m00Tcm4TlvDq8ikWAM",  # Speaker 2 (Rachel)
        "text": "I'm doing great, thanks for asking!"
    }
]

# Generate each speaker's audio and combine
from pydub import AudioSegment
combined = AudioSegment.empty()

for speaker in speakers:
    audio = client.text_to_speech.convert(
        text=speaker["text"],
        voice_id=speaker["voice_id"],
        model_id="eleven_multilingual_v2"
    )

    # Save temp file
    temp_path = Path(f"temp_{speaker['voice_id']}.mp3")
    with temp_path.open("wb") as f:
        for chunk in audio:
            f.write(chunk)

    # Add to combined audio
    segment = AudioSegment.from_mp3(str(temp_path))
    combined += segment
    temp_path.unlink()  # Clean up

# Export final dialogue
combined.export("dialogue.mp3", format="mp3")

# Get all available voices
voices = client.voices.get_all()

print("Available voices:")
for voice in voices.voices:
    print(f"- {voice.name} (ID: {voice.voice_id})")
    print(f"  Labels: {voice.labels}")
    print(f"  Description: {voice.description}")

from elevenlabs.client import ElevenLabs
from pathlib import Path

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

# Generate sound effect
audio = client.text_to_sound_effects.convert(
    text="footsteps on wooden floor, slow paced walking",
    duration_seconds=5.0,
    prompt_influence=0.5  # How closely to follow prompt (0.0-1.0)
)

# Save to file
output_path = Path("footsteps.mp3")
with output_path.open("wb") as f:
    for chunk in audio:
        f.write(chunk)

print(f"Sound effect saved to: {output_path}")

# Generate seamlessly looping audio
audio = client.text_to_sound_effects.convert(
    text="gentle rain falling on leaves, ambient nature sound",
    duration_seconds=10.0,
    prompt_influence=0.5
    # Note: loop parameter may be available in newer API versions
)

output_path = Path("rain_loop.mp3")
with output_path.open("wb") as f:
    for chunk in audio:
        f.write(chunk)

# Generate various sound effects for a game
sound_effects = [
    {
        "name": "explosion",
        "description": "large explosion, debris falling, action movie style",
        "duration": 3.0
    },
    {
        "name": "door_open",
        "description": "creaky wooden door slowly opening, horror atmosphere",
        "duration": 2.0
    },
    {
        "name": "ui_click",
        "description": "soft button click, UI feedback sound, pleasant tone",
        "duration": 0.5
    }
]

for sfx in sound_effects:
    audio = client.text_to_sound_effects.convert(
        text=sfx["description"],
        duration_seconds=sfx["duration"]
    )

    output_path = Path(f"{sfx['name']}.mp3")
    with output_path.open("wb") as f:
        for chunk in audio:
            f.write(chunk)

    print(f"Generated: {output_path}")

from elevenlabs.client import ElevenLabs
from pathlib import Path

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

# Generate music from prompt
prompt = """Upbeat indie pop song with acoustic guitar, light drums, and cheerful
melody. Modern and energetic feel, perfect for background music in a lifestyle video.
Instrumental only, no vocals."""

ElevenLabs Audio Generation

Purpose

When to Use

Available Capabilities

1. Text-to-Speech (Voice Generation)

ElevenLabs Audio Generation

Purpose

When to Use

Available Capabilities

1. Text-to-Speech (Voice Generation)

2. Sound Effects Generation

3. Music Generation

Instructions

Step 1: Understand the Request

Step 2: Select Appropriate Model/Capability

Step 3: Set Up API Authentication

Step 4: Implement Based on Task Type

Text-to-Speech Implementation

Sound Effects Implementation

Music Generation Implementation

Openai Whisper

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

Model Usage

Sag