**GEMINI LIVE API SKILL** — Real-time Gemini Live API implementation patterns for this Next.js 16 + React 19 + TypeScript project. USE FOR: voice/video streaming architecture; Live API WebSocket sessions; setup/realtimeInput/clientContent/toolResponse message design; interruption handling; VAD tuning; session resumption; context window compression; GoAway handling; ephemeral token security for browser-direct connections; model migration caveats for gemini-3.1-flash-live-preview. DO NOT USE FOR: Firestore CRUD/domain modeling (use data-layer skill); Mantine component styling (use frontend skill); generic non-realtime chat flows (use ai-sdk skill).
Last reviewed: 2026-03-30 (Gemini Live docs/API reference/best practices + @google/genai SDK migration)
@google/genai SDK — always use ai.live.connect() from the official SDK instead of raw WebSocket connections.systemInstruction, speechConfig, etc. in LiveConnectConfig; wait for setupComplete in onmessage callback.session.sendRealtimeInput() for ongoing conversation (audio/video/text). For gemini-3.1-flash-live-preview, sendClientContent is mainly for initial seeded history.LiveServerMessage.serverContent.modelTurn.parts can contain multiple parts (audio + transcript). Iterate all parts.serverContent.interrupted === truegoAway, sessionResumptionUpdate, and reconnect with resumption handle.contextWindowCompression with triggerTokens and slidingWindow.targetTokens.audio/pcm;rate=16000), output: raw PCM 24kHz.session.sendToolResponse() — execute locally, then send typed functionResponses back.get_client_profile only after user has confirmed their account ID"language_code in API request to match expected languageRESPOND IN {LANGUAGE}. YOU MUST RESPOND UNMISTAKABLY IN {LANGUAGE}.interrupted: true in server event → immediately clear output audio queueSessionResumptionUpdate, reconnect with ittimeLeft to gracefully close or reconnectUse this skill when a task asks for any of the following:
gemini-2.5-flash-native-audio-preview-12-2025 to gemini-3.1-flash-live-previewUse when:
Pattern:
Use when:
Pattern:
ai.authTokens.create()new GoogleGenAI({ apiKey: token.name, apiVersion: "v1alpha" })ai.live.connect({ model, callbacks, config }) — SDK auto-detects ephemeral token and uses constrained endpointsession.sendRealtimeInput() directlySecurity note: never ship long-lived API keys to the browser. The SDK handles the constrained WebSocket endpoint when apiKey starts with auth_tokens/.
Clear, well-structured system instructions are the foundation of high-quality Live API interactions. To get the best performance, follow this order:
Agent persona (50–100 words)
Conversational rules (map the workflow)
Tool invocation rules (if using tools)
get_user_info with these details."Guardrails (what not to do)
PERSONA:
You are Laura, a career coach from Brooklyn, NY. You specialize in data-driven career advice using statistics and research. You speak only in English, regardless of user language.
CONVERSATIONAL RULES:
1. Greet the user warmly and introduce yourself.
2. Intake: Ask for full name, date of birth, and state. Call `create_profile`.
3. Main discussion: Listen to their career concern without repeating it back. Provide data-driven insights.
4. Action items: If they mention desired actions, call `add_actions_to_profile`.
5. Next steps: Call `get_next_appointment` to check for existing bookings, or `get_available_appointments` if none exist.
6. Scheduling: After user picks a time, call `schedule_appointment`.
GUARDRAILS:
- Do not be a therapist; focus on career/data insights only.
- If the user is self-critical, never reinforce negativity.
- Do not offer platitudes; provide specific, research-backed guidance.
- Keep responses short and progressive; don't recap what they said.
For workflows with many steps, avoid packing everything into a single system instruction. Instead:
clientContent when transitioning major stepsKey insight: Live API expects user input to start responding. To have the agent initiate the conversation:
clientContent synthetic prompts: The model should greet naturally when conversation starts, not via artificial user input@google/genai)Use GoogleGenAI.live.connect() instead of raw WebSocket. The SDK handles protocol negotiation, setup messages, and typed message parsing automatically.
import {
GoogleGenAI,
Modality,
type Session,
type LiveServerMessage,
} from "@google/genai";
// For ephemeral tokens (browser-direct), apiKey is the token.name (starts with "auth_tokens/")
const ai = new GoogleGenAI({ apiKey: ephemeralToken, apiVersion: "v1alpha" });
const session: Session = await ai.live.connect({
model: "models/gemini-3.1-flash-live-preview",
callbacks: {
onopen: () => {
/* connection established */
},
onmessage: (message: LiveServerMessage) => {
/* handle typed message */
},
onerror: (event: ErrorEvent) => {
/* handle error */
},
onclose: (event: CloseEvent) => {
/* handle close, reconnect if needed */
},
},
config: {
responseModalities: [Modality.AUDIO],
temperature: 0.6,
speechConfig: {
voiceConfig: { prebuiltVoiceConfig: { voiceName: "Peri" } },
},
systemInstruction: { parts: [{ text: "..." }] },
contextWindowCompression: {
triggerTokens: 104857,
slidingWindow: { targetTokens: 52428 },
},
realtimeInputConfig: {
automaticActivityDetection: { disabled: false },
},
inputAudioTranscription: {},
outputAudioTranscription: {},
sessionResumption: {}, // or { handle: previousHandle }
},
});
// Send audio chunks
session.sendRealtimeInput({ audio: { data: base64Pcm, mimeType: "audio/pcm;rate=16000" } });
// Send text input
session.sendRealtimeInput({ text: "Hello" });
// Signal mic off (flush VAD)
session.sendRealtimeInput({ audioStreamEnd: true });
// Send conversation history
session.sendClientContent({ turns: [...], turnComplete: true });
// Respond to tool calls
session.sendToolResponse({ functionResponses: [{ id, name, response }] });
// Close session
session.close();
LiveServerMessage)The SDK callback receives a typed LiveServerMessage with these fields:
setupComplete — session readyserverContent — model output (audio, text, transcriptions)toolCall — function call requesttoolCallCancellation — cancel pending tool callsgoAway — server disconnecting soonsessionResumptionUpdate — new resumption handleusageMetadata — token usagesession.sendClientContent() (history/context)gemini-3.1-flash-live-preview, use primarily for initial seeded history{ turnComplete: true } before switching to ongoing sendRealtimeInputsession.sendRealtimeInput() (ongoing conversation){ audioStreamEnd: true } when mic is turned off to flush VADgemini-3.1-flash-live-preview, use this for ongoing text turnsLiveServerMessage.serverContentImportant fields:
generationCompleteturnCompleteinterruptedmodelTurn (contains parts[] — iterate all)inputTranscription / outputTranscriptionImplementation detail: do not assume one part per event; process all modelTurn.parts.
audio/pcm;rate=16000)realtimeInputConfig.automaticActivityDetectionstartOfSpeechSensitivityendOfSpeechSensitivityprefixPaddingMssilenceDurationMsWhen mic stream pauses, send audioStreamEnd: true to flush.
If automatic VAD disabled, client must send:
activityStartactivityEndOn serverContent.interrupted === true:
toolCallCancellation, abandon local tool executionThis is essential for natural multi-turn conversation and barge-in UX.
Live API tool loop with SDK:
onmessage receives LiveServerMessage with toolCall.functionCalls[]session.sendToolResponse({ functionResponses: [{ id, name, response }] })get_user_profile, submit_order)gemini-3.1-flash-live-preview: function calling is synchronous (conversation pauses until tool response received)sessionResumptionsessionResumptionUpdate.newHandlesessionResumption.handlegoAway.timeLeft proactivelyEnable contextWindowCompression with sliding-window config for long sessions. This prevents abrupt termination due to context growth.
Ephemeral tokens reduce risk versus embedding long-lived API keys client-side.
Use @google/genai SDK on the server:
import { GoogleGenAI, Modality } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: serverApiKey, apiVersion: "v1alpha" });
const token = await ai.authTokens.create({
config: {
expireTime: new Date(Date.now() + 30 * 60_000).toISOString(),
newSessionExpireTime: new Date(Date.now() + 60_000).toISOString(),
liveConnectConstraints: {
model: "gemini-3.1-flash-live-preview",
config: {
temperature: 0.6,
responseModalities: [Modality.AUDIO],
sessionResumption: {},
contextWindowCompression: { slidingWindow: {} },
inputAudioTranscription: {},
outputAudioTranscription: {},
realtimeInputConfig: {
automaticActivityDetection: { disabled: false },
},
},
},
},
});
// Return token.name to the browser (starts with "auth_tokens/")
// Browser receives token.name from backend
const ai = new GoogleGenAI({ apiKey: token.name, apiVersion: "v1alpha" });
const session = await ai.live.connect({ model, callbacks, config });
// SDK auto-detects "auth_tokens/" prefix → uses constrained BidiGenerateContent endpoint
uses: 1 is usually fine (session resumption does not count as additional use)liveConnectConstraintsapiKey starts with auth_tokens/gemini-3.1-flash-live-previewCurrent high-impact behavior:
AUDIO response modality; use output transcription for text UXthinkingLevel replaces thinkingBudgetclientContent behavior changed for ongoing turnsturnCoverage changedFor this repo's browser-direct architecture:
src/app/api/live/token/route.ts): Creates ephemeral token via GeminiLiveTokenService → returns { accessToken, model }src/data/live/service/gemini-live-system-instruction.service.ts): Builds personalized SI following Career Coach bold-Markdown structure (**PERSONA:** → **CONVERSATIONAL RULES:** → **GENERAL GUIDELINES:** → **GUARDRAILS:**)src/ui/ai/components/gemini-live-standalone.tsx): Uses @google/genai SDK's ai.live.connect() with ephemeral token + typed LiveServerMessage callbackssession.sendRealtimeInput({ audio: { data, mimeType } })// Token API returns the ephemeral token name
const tokenPayload = await fetch("/api/live/token", { method: "POST", ... });
const { accessToken, model } = await tokenPayload.json();
// SDK auto-detects ephemeral token → uses constrained endpoint
const ai = new GoogleGenAI({ apiKey: accessToken, apiVersion: "v1alpha" });
const session = await ai.live.connect({
model: `models/${model}`,
callbacks: { onopen, onmessage, onerror, onclose },
config: {
responseModalities: [Modality.AUDIO],
temperature: 0.6,
speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: "Peri" } } },
systemInstruction: { parts: [{ text: builtInstruction }] },
contextWindowCompression: { triggerTokens: 104857, slidingWindow: { targetTokens: 52428 } },
realtimeInputConfig: { automaticActivityDetection: { disabled: false } },
inputAudioTranscription: {},
outputAudioTranscription: {},
sessionResumption: {},
},
});
Include clear loading/connecting/error states and interruption-safe audio UX in live streaming pages.
onmessage receive setupComplete? (SDK sends setup automatically)goAway and reconnecting with session resumption?serverContent.interrupted, clear playback queue instantly.serverContent.modelTurn.parts?new GoogleGenAI({ apiKey: token.name, apiVersion: "v1alpha" })?auth_tokens/ for SDK auto-detection.sendRealtimeInput.contextWindowCompression with explicit triggerTokens + targetTokens.sessionResumption.**PERSONA:** section, NOT as a separate labeled data block.This example combines persona, conversational rules, tool flow, and guardrails for a career coach agent. It demonstrates best practices for clarity and user experience:
**PERSONA:**
You are Laura, a career coach from Brooklyn, NY. You specialize in providing
data-driven advice to give your clients a fresh perspective on career questions.
Your special strength is quantitative insights rooted in statistics, research, and psychology.
You speak only in English, no matter what language a client speaks to you in.
**CONVERSATIONAL RULES:**
1. **Introduce yourself:** Warmly greet the client and briefly explain your approach.
2. **Intake:** Ask for your client's full name, date of birth, and state they're calling in from.
Call `create_client_profile` to create a new patient profile. Do this only once at the start.
3. **Discuss the client's issue:** Get a sense of what the client wants to cover in the session.
DO NOT repeat what the client is saying back to them in your response. Don't ask more than a few questions here.
4. **Reframe the client's issue with real data:** NO PLATITUDES. Start providing data-driven insights
for the client, but embed these as general facts within conversation. This is what they're coming to
you for: your unique thinking on the subjects that are stressing them out. Show them a new way of
thinking about something. Let this step go on for as long as the client wants. As part of this,
if the client mentions wanting to take any actions, update `add_action_items_to_profile` to remind
the client later.
5. **Next appointment:** Call `get_next_appointment` to see if another appointment has already been
scheduled for the client. If so, then share the date and time with the client and confirm if
they'll be able to attend. If there is no appointment, then call `get_available_appointments`
to see openings. Share the list of openings with the client and ask what they would prefer.
Save their preference with `schedule_appointment`. If the client prefers to schedule offline,
then let them know that's perfectly fine and to use the patient portal.
**GENERAL GUIDELINES:**
You're meant to be a witty, snappy conversational partner. Keep your responses short and
progressively disclose more information if the client requests it. Don't repeat back what the
client says back to them. Each response you give should be a net new addition to the conversation,
not a recap of what the client said. Be relatable by bringing in your own background growing up
professionally in Brooklyn, NY. If a client tries to get you off track, gently bring them back
to the workflow articulated above.
**GUARDRAILS:**
If the client is being hard on themselves, never encourage that. Remember that your ultimate goal
is to create a supportive environment for your clients to thrive.
Each tool definition includes a clear description and invocation condition to guide the model:
[
{
"name": "create_client_profile",
"description": "Creates a new client profile with their personal details. Returns a unique client ID. \n**Invocation Condition:** Invoke this tool *only after* the client has provided their full name, date of birth, AND state. This should only be called once at the beginning of the 'Intake' step.",
"parameters": {
"type": "object",
"properties": {
"full_name": {
"type": "string",
"description": "The client's full name."
},
"date_of_birth": {
"type": "string",
"description": "The client's date of birth in YYYY-MM-DD format."
},
"state": {
"type": "string",
"description": "The 2-letter postal abbreviation for the client's state (e.g., 'NY', 'CA')."
}
},
"required": ["full_name", "date_of_birth", "state"]
}
},
{
"name": "add_action_items_to_profile",
"description": "Adds a list of actionable next steps to a client's profile using their client ID. \n**Invocation Condition:** Invoke this tool *only after* a list of actionable next steps has been discussed and agreed upon with the client during the main discussion phase. Requires the `client_id` obtained from the start of the session.",
"parameters": {
"type": "object",
"properties": {
"client_id": {
"type": "string",
"description": "The unique ID of the client, obtained from create_client_profile."
},
"action_items": {
"type": "array",
"items": {
"type": "string"
},
"description": "A list of action items for the client (e.g., ['Update resume', 'Research three companies'])."
}
},
"required": ["client_id", "action_items"]
}
},
{
"name": "get_next_appointment",
"description": "Checks if a client has a future appointment already scheduled using their client ID. Returns the appointment details or null. \n**Invocation Condition:** Invoke this tool at the *start* of the 'Next Appointment' workflow step, immediately after the main discussion phase is complete. This is used to check if an appointment *already exists*.",
"parameters": {
"type": "object",
"properties": {
"client_id": {
"type": "string",
"description": "The unique ID of the client."
}
},
"required": ["client_id"]
}
},
{
"name": "get_available_appointments",
"description": "Fetches a list of the next available appointment slots. \n**Invocation Condition:** Invoke this tool *only if* the `get_next_appointment` tool was called and it returned `null` (or an empty response), indicating no future appointment is scheduled.",
"parameters": {
"type": "object",
"properties": {}
}
},
{
"name": "schedule_appointment",
"description": "Books a new appointment for a client at a specific date and time. \n**Invocation Condition:** Invoke this tool *only after* `get_available_appointments` has been called, a list of openings has been presented to the client, and the client has *explicitly confirmed* which specific date and time they want to book.",
"parameters": {
"type": "object",
"properties": {
"client_id": {
"type": "string",
"description": "The unique ID of the client."
},
"appointment_datetime": {
"type": "string",
"description": "The chosen appointment slot in ISO 8601 format (e.g., '2025-10-30T14:30:00')."
}
},
"required": ["client_id", "appointment_datetime"]
}
}
]