When to Use

You are building a real-time voice or multimodal AI application that uses Daily or Pipecat-style transports.
You need guidance on low-latency audio, video, text, and AI service orchestration in one pipeline.
You want a capability reference before choosing services, transports, or workflow patterns for an interactive agent.

Capabilities

Pipecat enables agents to build production-ready voice and multimodal AI applications with real-time processing. Agents can orchestrate complex AI service pipelines that handle audio, video, and text simultaneously while maintaining ultra-low latency (500-800ms round-trip). The framework abstracts away the complexity of coordinating multiple AI services, network transports, and audio processing, allowing agents to focus on application logic.

Key capabilities include:

Real-time voice conversations with natural turn-taking and interruption handling
Multimodal processing combining audio, video, images, and text
Integration with 50+ AI services (LLMs, speech recognition, text-to-speech, vision models)
Function calling for external API integration and tool use

When to Use

You are building a real-time voice or multimodal AI application that uses Daily or Pipecat-style transports.
You need guidance on low-latency audio, video, text, and AI service orchestration in one pipeline.
You want a capability reference before choosing services, transports, or workflow patterns for an interactive agent.

Capabilities

Key capabilities include:

Real-time voice conversations with natural turn-taking and interruption handling
Multimodal processing combining audio, video, images, and text
Integration with 50+ AI services (LLMs, speech recognition, text-to-speech, vision models)
Function calling for external API integration and tool use

Daily

When to Use

Capabilities

Daily

When to Use

Capabilities

Skills

Pipeline Architecture & Frame Processing

Speech Recognition & Audio Input

Text-to-Speech & Audio Output

Language Model Integration

Function Calling & Tool Integration

Context Management & Conversation History

Voice Activity Detection & Turn Management

Transport & Connection Management

Multimodal Processing

Custom Frame Processors

Structured Conversations with Pipecat Flows

Metrics & Observability

Client SDKs for Frontend Integration

Deployment & Scaling

Workflows

Building a Voice Assistant

Implementing Function Calling

Building a Phone Agent with Twilio

Handling Interruptions & Turn-Taking

Managing Long Conversations

Deploying to Pipecat Cloud

Integration

Context

Limitations

My Workflow

Create Instructions

Init

Everything Claude Code Conventions

Codebase Onboarding

Ui Demo