스킬 파일

Gemini Live API — Realtime Voice/Video Patterns

Name: Gemini Live API — Realtime Voice/Video Patterns
Author: v-mobilelabs

**GEMINI LIVE API SKILL** — Real-time Gemini Live API implementation patterns for this Next.js 16 + React 19 + TypeScript project. USE FOR: voice/video streaming architecture; Live API WebSocket sessions; setup/realtimeInput/clientContent/toolResponse message design; interruption handling; VAD tuning; session resumption; context window compression; GoAway handling; ephemeral token security for browser-direct connections; model migration caveats for gemini-3.1-flash-live-preview. DO NOT USE FOR: Firestore CRUD/domain modeling (use data-layer skill); Mantine component styling (use frontend skill); generic non-realtime chat flows (use ai-sdk skill).

v-mobilelabs0 스타2026. 4. 4.

직업
카테고리: Backend

스킬 내용

Last reviewed: 2026-03-30 (Gemini Live docs/API reference/best practices + @google/genai SDK migration)

Golden Rules

Use @google/genai SDK — always use ai.live.connect() from the official SDK instead of raw WebSocket connections.
Prefer server-to-server for protected backends; use browser-direct with SDK + ephemeral tokens for lowest latency.
SDK handles setup automatically — pass systemInstruction, speechConfig, etc. in LiveConnectConfig; wait for setupComplete in onmessage callback.
Use session.sendRealtimeInput() for ongoing conversation (audio/video/text). For gemini-3.1-flash-live-preview, sendClientContent is mainly for initial seeded history.
Treat server events as multi-part — LiveServerMessage.serverContent.modelTurn.parts can contain multiple parts (audio + transcript). Iterate all parts.
Implement interruption-safe playback — if , stop and clear queued audio immediately.

Gemini Live API — Realtime Voice/Video Patterns

v-mobilelabs0 스타2026. 4. 4.

직업
카테고리: Backend

스킬 내용

Last reviewed: 2026-03-30 (Gemini Live docs/API reference/best practices + @google/genai SDK migration)

Golden Rules

Use @google/genai SDK — always use ai.live.connect() from the official SDK instead of raw WebSocket connections.
Prefer server-to-server for protected backends; use browser-direct with SDK + ephemeral tokens for lowest latency.
SDK handles setup automatically — pass systemInstruction, speechConfig, etc. in LiveConnectConfig; wait for setupComplete in onmessage callback.
Use session.sendRealtimeInput() for ongoing conversation (audio/video/text). For gemini-3.1-flash-live-preview, sendClientContent is mainly for initial seeded history.
Treat server events as multi-part — LiveServerMessage.serverContent.modelTurn.parts can contain multiple parts (audio + transcript). Iterate all parts.
Implement interruption-safe playback — if , stop and clear queued audio immediately.

관련 스킬

serverContent.interrupted === true

PERSONA:
You are Laura, a career coach from Brooklyn, NY. You specialize in data-driven career advice using statistics and research. You speak only in English, regardless of user language.

CONVERSATIONAL RULES:
1. Greet the user warmly and introduce yourself.
2. Intake: Ask for full name, date of birth, and state. Call `create_profile`.
3. Main discussion: Listen to their career concern without repeating it back. Provide data-driven insights.
4. Action items: If they mention desired actions, call `add_actions_to_profile`.
5. Next steps: Call `get_next_appointment` to check for existing bookings, or `get_available_appointments` if none exist.
6. Scheduling: After user picks a time, call `schedule_appointment`.

GUARDRAILS:
- Do not be a therapist; focus on career/data insights only.
- If the user is self-critical, never reinforce negativity.
- Do not offer platitudes; provide specific, research-backed guidance.
- Keep responses short and progressive; don't recap what they said.

import {
  GoogleGenAI,
  Modality,
  type Session,
  type LiveServerMessage,
} from "@google/genai";

// For ephemeral tokens (browser-direct), apiKey is the token.name (starts with "auth_tokens/")
const ai = new GoogleGenAI({ apiKey: ephemeralToken, apiVersion: "v1alpha" });

const session: Session = await ai.live.connect({
  model: "models/gemini-3.1-flash-live-preview",
  callbacks: {
    onopen: () => {
      /* connection established */
    },
    onmessage: (message: LiveServerMessage) => {
      /* handle typed message */
    },
    onerror: (event: ErrorEvent) => {
      /* handle error */
    },
    onclose: (event: CloseEvent) => {
      /* handle close, reconnect if needed */
    },
  },
  config: {
    responseModalities: [Modality.AUDIO],
    temperature: 0.6,
    speechConfig: {
      voiceConfig: { prebuiltVoiceConfig: { voiceName: "Peri" } },
    },
    systemInstruction: { parts: [{ text: "..." }] },
    contextWindowCompression: {
      triggerTokens: 104857,
      slidingWindow: { targetTokens: 52428 },
    },
    realtimeInputConfig: {
      automaticActivityDetection: { disabled: false },
    },
    inputAudioTranscription: {},
    outputAudioTranscription: {},
    sessionResumption: {}, // or { handle: previousHandle }
  },
});

// Send audio chunks
session.sendRealtimeInput({ audio: { data: base64Pcm, mimeType: "audio/pcm;rate=16000" } });

// Send text input
session.sendRealtimeInput({ text: "Hello" });

// Signal mic off (flush VAD)
session.sendRealtimeInput({ audioStreamEnd: true });

// Send conversation history
session.sendClientContent({ turns: [...], turnComplete: true });

// Respond to tool calls
session.sendToolResponse({ functionResponses: [{ id, name, response }] });

// Close session
session.close();

import { GoogleGenAI, Modality } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: serverApiKey, apiVersion: "v1alpha" });

const token = await ai.authTokens.create({
  config: {
    expireTime: new Date(Date.now() + 30 * 60_000).toISOString(),
    newSessionExpireTime: new Date(Date.now() + 60_000).toISOString(),
    liveConnectConstraints: {
      model: "gemini-3.1-flash-live-preview",
      config: {
        temperature: 0.6,
        responseModalities: [Modality.AUDIO],
        sessionResumption: {},
        contextWindowCompression: { slidingWindow: {} },
        inputAudioTranscription: {},
        outputAudioTranscription: {},
        realtimeInputConfig: {
          automaticActivityDetection: { disabled: false },
        },
      },
    },
  },
});

// Return token.name to the browser (starts with "auth_tokens/")

// Browser receives token.name from backend
const ai = new GoogleGenAI({ apiKey: token.name, apiVersion: "v1alpha" });
const session = await ai.live.connect({ model, callbacks, config });
// SDK auto-detects "auth_tokens/" prefix → uses constrained BidiGenerateContent endpoint

// Token API returns the ephemeral token name
const tokenPayload = await fetch("/api/live/token", { method: "POST", ... });
const { accessToken, model } = await tokenPayload.json();

// SDK auto-detects ephemeral token → uses constrained endpoint
const ai = new GoogleGenAI({ apiKey: accessToken, apiVersion: "v1alpha" });
const session = await ai.live.connect({
  model: `models/${model}`,
  callbacks: { onopen, onmessage, onerror, onclose },
  config: {
    responseModalities: [Modality.AUDIO],
    temperature: 0.6,
    speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: "Peri" } } },
    systemInstruction: { parts: [{ text: builtInstruction }] },
    contextWindowCompression: { triggerTokens: 104857, slidingWindow: { targetTokens: 52428 } },
    realtimeInputConfig: { automaticActivityDetection: { disabled: false } },
    inputAudioTranscription: {},
    outputAudioTranscription: {},
    sessionResumption: {},
  },
});

**PERSONA:**
You are Laura, a career coach from Brooklyn, NY. You specialize in providing
data-driven advice to give your clients a fresh perspective on career questions.
Your special strength is quantitative insights rooted in statistics, research, and psychology.
You speak only in English, no matter what language a client speaks to you in.

**CONVERSATIONAL RULES:**

1. **Introduce yourself:** Warmly greet the client and briefly explain your approach.

2. **Intake:** Ask for your client's full name, date of birth, and state they're calling in from.
   Call `create_client_profile` to create a new patient profile. Do this only once at the start.

3. **Discuss the client's issue:** Get a sense of what the client wants to cover in the session.
   DO NOT repeat what the client is saying back to them in your response. Don't ask more than a few questions here.

4. **Reframe the client's issue with real data:** NO PLATITUDES. Start providing data-driven insights
   for the client, but embed these as general facts within conversation. This is what they're coming to
   you for: your unique thinking on the subjects that are stressing them out. Show them a new way of
   thinking about something. Let this step go on for as long as the client wants. As part of this,
   if the client mentions wanting to take any actions, update `add_action_items_to_profile` to remind
   the client later.

5. **Next appointment:** Call `get_next_appointment` to see if another appointment has already been
   scheduled for the client. If so, then share the date and time with the client and confirm if
   they'll be able to attend. If there is no appointment, then call `get_available_appointments`
   to see openings. Share the list of openings with the client and ask what they would prefer.
   Save their preference with `schedule_appointment`. If the client prefers to schedule offline,
   then let them know that's perfectly fine and to use the patient portal.

**GENERAL GUIDELINES:**
You're meant to be a witty, snappy conversational partner. Keep your responses short and
progressively disclose more information if the client requests it. Don't repeat back what the
client says back to them. Each response you give should be a net new addition to the conversation,
not a recap of what the client said. Be relatable by bringing in your own background growing up
professionally in Brooklyn, NY. If a client tries to get you off track, gently bring them back
to the workflow articulated above.

**GUARDRAILS:**
If the client is being hard on themselves, never encourage that. Remember that your ultimate goal
is to create a supportive environment for your clients to thrive.

[
  {
    "name": "create_client_profile",
    "description": "Creates a new client profile with their personal details. Returns a unique client ID. \n**Invocation Condition:** Invoke this tool *only after* the client has provided their full name, date of birth, AND state. This should only be called once at the beginning of the 'Intake' step.",
    "parameters": {
      "type": "object",
      "properties": {
        "full_name": {
          "type": "string",
          "description": "The client's full name."
        },
        "date_of_birth": {
          "type": "string",
          "description": "The client's date of birth in YYYY-MM-DD format."
        },
        "state": {
          "type": "string",
          "description": "The 2-letter postal abbreviation for the client's state (e.g., 'NY', 'CA')."
        }
      },
      "required": ["full_name", "date_of_birth", "state"]
    }
  },
  {
    "name": "add_action_items_to_profile",
    "description": "Adds a list of actionable next steps to a client's profile using their client ID. \n**Invocation Condition:** Invoke this tool *only after* a list of actionable next steps has been discussed and agreed upon with the client during the main discussion phase. Requires the `client_id` obtained from the start of the session.",
    "parameters": {
      "type": "object",
      "properties": {
        "client_id": {
          "type": "string",
          "description": "The unique ID of the client, obtained from create_client_profile."
        },
        "action_items": {
          "type": "array",
          "items": {
            "type": "string"
          },
          "description": "A list of action items for the client (e.g., ['Update resume', 'Research three companies'])."
        }
      },
      "required": ["client_id", "action_items"]
    }
  },
  {
    "name": "get_next_appointment",
    "description": "Checks if a client has a future appointment already scheduled using their client ID. Returns the appointment details or null. \n**Invocation Condition:** Invoke this tool at the *start* of the 'Next Appointment' workflow step, immediately after the main discussion phase is complete. This is used to check if an appointment *already exists*.",
    "parameters": {
      "type": "object",
      "properties": {
        "client_id": {
          "type": "string",
          "description": "The unique ID of the client."
        }
      },
      "required": ["client_id"]
    }
  },
  {
    "name": "get_available_appointments",
    "description": "Fetches a list of the next available appointment slots. \n**Invocation Condition:** Invoke this tool *only if* the `get_next_appointment` tool was called and it returned `null` (or an empty response), indicating no future appointment is scheduled.",
    "parameters": {
      "type": "object",
      "properties": {}
    }
  },
  {
    "name": "schedule_appointment",
    "description": "Books a new appointment for a client at a specific date and time. \n**Invocation Condition:** Invoke this tool *only after* `get_available_appointments` has been called, a list of openings has been presented to the client, and the client has *explicitly confirmed* which specific date and time they want to book.",
    "parameters": {
      "type": "object",
      "properties": {
        "client_id": {
          "type": "string",
          "description": "The unique ID of the client."
        },
        "appointment_datetime": {
          "type": "string",
          "description": "The chosen appointment slot in ISO 8601 format (e.g., '2025-10-30T14:30:00')."
        }
      },
      "required": ["client_id", "appointment_datetime"]
    }
  }
]

Gemini Live API — Realtime Voice/Video Patterns

Golden Rules

Gemini Live API — Realtime Voice/Video Patterns

Golden Rules

Best Practices at a Glance

System Instructions

Tool Definitions

Prompts & Prompting

Language Specification

Audio Streaming

Session Reliability

When to Use This Skill

Choose the Right Integration Pattern

A) Server-to-server (recommended default)

B) Client-to-server (browser direct to Gemini via SDK)

System Instructions & Prompt Design

Structure your system instructions

Example system instruction structure

Prompt chaining for complex flows

Prompts that initialize conversation (Critical for agent-initiated greetings)

SDK Session Contract (@google/genai)

Connection via SDK

Session methods

Server message types (LiveServerMessage)

Message Semantics

session.sendClientContent() (history/context)

session.sendRealtimeInput() (ongoing conversation)

LiveServerMessage.serverContent

Audio/Video Contracts

Audio input

Audio output

Video input

VAD and Barge-In

Automatic VAD (default)

Manual activity markers

Interruption handling (critical)

Tool Calling in Live Sessions

Overview

Tool definition best practices

Execution guidelines

Model-specific behavior

Session Lifetime, Resumption, and Reliability

Limits and defaults

Make sessions robust

Context window compression

Ephemeral Tokens (Browser Direct via SDK)

Why

Backend token creation

Browser SDK connection with ephemeral token

Practical defaults

Model Notes: gemini-3.1-flash-live-preview

Next.js 16 Integration Pattern (This Repo)

Key patterns

Troubleshooting Checklist

Example: Career Coach Agent

Full system instruction

Tool definitions

Key takeaways

References

Xurl

Acp Router

Coding Standards

Api Design

Mcp Server Patterns

Backend Patterns

SDK Session Contract (`@google/genai`)

Server message types (`LiveServerMessage`)

`session.sendClientContent()` (history/context)

`session.sendRealtimeInput()` (ongoing conversation)

`LiveServerMessage.serverContent`

Model Notes: `gemini-3.1-flash-live-preview`