Gemini Live API 支持与 Gemini 建立低延迟的双向文本和语音互动。借助 Live API,您可以为最终用户提供自然的、类似人类的语音对话体验,并能够使用文本或语音指令中断模型的回答。该模型可以处理文本和音频输入(视频输入即将推出!),并提供文本和音频输出。
您可以在 Google AI Studio 或 Vertex AI Studio 中使用提示和 Live API 进行原型设计。
Live API 是一种有状态 API,用于创建 WebSocket 连接,以便在客户端与 Gemini 服务器之间建立会话。如需了解详情,请参阅 Live API 参考文档(Gemini Developer API | Vertex AI Gemini API)。
准备工作
| 点击您的 Gemini API 提供商,以查看此页面上特定于提供商的内容和代码。 |
如果您尚未完成入门指南,请先完成该指南。该指南介绍了如何设置 Firebase 项目、将应用连接到 Firebase、添加 SDK、为所选的 Gemini API 提供程序初始化后端服务,以及创建 LiveModel 实例。
支持此功能的模型
支持 Live API 的模型取决于您选择的 Gemini API 提供方。
Gemini Developer API
gemini-live-2.5-flash(非公开正式版*)gemini-live-2.5-flash-previewgemini-2.0-flash-live-001gemini-2.0-flash-live-preview-04-09
Vertex AI Gemini API
gemini-live-2.5-flash(非公开正式版*)gemini-2.0-flash-live-preview-04-09(仅可在us-central1中访问)
请注意,对于 Live API 的 2.5 版模型名称,live 段紧跟在 gemini 段之后。
* 请与您的 Google Cloud 账号团队代表联系,申请访问权限。
使用 Live API 的标准功能
本部分介绍了如何使用 Live API 的标准功能,特别是如何以流式传输各种类型的输入和输出:
根据流式文本输入生成流式文本
| 在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。 在该部分中,您还需要点击所选Gemini API提供商对应的按钮,以便在此页面上看到特定于提供商的内容。 |
您可以发送流式文本输入,并接收流式文本输出。请务必创建 liveModel 实例,并将响应模态设置为 Text。
Swift
import FirebaseAILogic // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel( modelName: "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with text generationConfig: LiveGenerationConfig( responseModalities: [.text] ) ) do { let session = try await model.connect() // Provide a text prompt let text = "tell a short story" await session.sendTextRealtime(text) var outputText = "" for try await message in session.responses { if case let .content(content) = message.payload { content.modelTurn?.parts.forEach { part in if let part = part as? TextPart { outputText += part.text } } // Optional: if you don't require to send more requests. if content.isTurnComplete { await session.close() } } } // Output received from the server. print(outputText) } catch { fatalError(error.localizedDescription) } Kotlin
// Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel( modelName = "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with text generationConfig = liveGenerationConfig { responseModality = ResponseModality.TEXT } ) val session = model.connect() // Provide a text prompt val text = "tell a short story" session.send(text) var outputText = "" session.receive().collect { if(it.turnComplete) { // Optional: if you don't require to send more requests. session.stopReceiving(); } outputText = outputText + it.text } // Output received from the server. println(outputText) Java
ExecutorService executor = Executors.newFixedThreadPool(1); // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel( "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with text new LiveGenerationConfig.Builder() .setResponseModalities(ResponseModality.TEXT) .build() ); LiveModelFutures model = LiveModelFutures.from(lm); ListenableFuture<LiveSession> sessionFuture = model.connect(); class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> { @Override public void onSubscribe(Subscription s) { s.request(Long.MAX_VALUE); // Request an unlimited number of items } @Override public void onNext(LiveContentResponse liveContentResponse) { // Handle the response from the server. System.out.println(liveContentResponse.getText()); } @Override public void onError(Throwable t) { System.err.println("Error: " + t.getMessage()); } @Override public void onComplete() { System.out.println("Done receiving messages!"); } } Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() { @Override public void onSuccess(LiveSession ses) { LiveSessionFutures session = LiveSessionFutures.from(ses); // Provide a text prompt String text = "tell me a short story?"; session.send(text); Publisher<LiveContentResponse> publisher = session.receive(); publisher.subscribe(new LiveContentResponseSubscriber()); } @Override public void onFailure(Throwable t) { // Handle exceptions } }, executor); Web
// Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API) const model = getLiveGenerativeModel(ai, { model: "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with text generationConfig: { responseModalities: [ResponseModality.TEXT], }, }); const session = await model.connect(); // Provide a text prompt const prompt = "tell a short story"; session.send(prompt); // Collect text from model's turn let text = ""; const messages = session.receive(); for await (const message of messages) { switch (message.type) { case "serverContent": if (message.turnComplete) { console.log(text); } else { const parts = message.modelTurn?.parts; if (parts) { text += parts.map((part) => part.text).join(""); } } break; case "toolCall": // Ignore case "toolCallCancellation": // Ignore } } Dart
import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; late LiveModelSession _session; await Firebase.initializeApp( options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) final model = FirebaseAI.googleAI().liveGenerativeModel( model: 'gemini-2.0-flash-live-preview-04-09', // Configure the model to respond with text liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.text]), ); _session = await model.connect(); // Provide a text prompt final prompt = Content.text('tell a short story'); await _session.send(input: prompt, turnComplete: true); // In a separate thread, receive the response await for (final message in _session.receive()) { // Process the received message } Unity
using Firebase; using Firebase.AI; async Task SendTextReceiveText() { // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel( modelName: "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with text liveGenerationConfig: new LiveGenerationConfig( responseModalities: new[] { ResponseModality.Text }) ); LiveSession session = await model.ConnectAsync(); // Provide a text prompt var prompt = ModelContent.Text("tell a short story"); await session.SendAsync(content: prompt, turnComplete: true); // Receive the response await foreach (var message in session.ReceiveAsync()) { // Process the received message if (!string.IsNullOrEmpty(message.Text)) { UnityEngine.Debug.Log("Received message: " + message.Text); } } } 根据流式音频输入生成流式音频
| 在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。 在该部分中,您还需要点击所选Gemini API提供商对应的按钮,以便在此页面上看到特定于提供商的内容。 |
您可以发送流式音频输入,并接收流式音频输出。请务必创建 LiveModel 实例,并将响应模态设置为 Audio。
请参阅本页下文,了解如何配置和自定义回答语音。
Swift
import FirebaseAILogic // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel( modelName: "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with audio generationConfig: LiveGenerationConfig( responseModalities: [.audio] ) ) do { let session = try await model.connect() // Load the audio file, or tap a microphone guard let audioFile = NSDataAsset(name: "audio.pcm") else { fatalError("Failed to load audio file") } // Provide the audio data await session.sendAudioRealtime(audioFile.data) var outputText = "" for try await message in session.responses { if case let .content(content) = message.payload { content.modelTurn?.parts.forEach { part in if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") { // Handle 16bit pcm audio data at 24khz playAudio(part.data) } } // Optional: if you don't require to send more requests. if content.isTurnComplete { await session.close() } } } } catch { fatalError(error.localizedDescription) } Kotlin
// Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel( modelName = "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with audio generationConfig = liveGenerationConfig { responseModality = ResponseModality.AUDIO } ) val session = model.connect() // This is the recommended way. // However, you can create your own recorder and handle the stream. session.startAudioConversation() Java
ExecutorService executor = Executors.newFixedThreadPool(1); // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel( "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with audio new LiveGenerationConfig.Builder() .setResponseModalities(ResponseModality.AUDIO) .build() ); LiveModelFutures model = LiveModelFutures.from(lm); ListenableFuture<LiveSession> sessionFuture = model.connect(); Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() { @Override public void onSuccess(LiveSession ses) { LiveSessionFutures session = LiveSessionFutures.from(ses); session.startAudioConversation(); } @Override public void onFailure(Throwable t) { // Handle exceptions } }, executor); Web
// Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API) const model = getLiveGenerativeModel(ai, { model: "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with audio generationConfig: { responseModalities: [ResponseModality.AUDIO], }, }); const session = await model.connect(); // Start the audio conversation const audioConversationController = await startAudioConversation(session); // ... Later, to stop the audio conversation // await audioConversationController.stop() Dart
import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; import 'package:your_audio_recorder_package/your_audio_recorder_package.dart'; late LiveModelSession _session; final _audioRecorder = YourAudioRecorder(); await Firebase.initializeApp( options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) final model = FirebaseAI.googleAI().liveGenerativeModel( model: 'gemini-2.0-flash-live-preview-04-09', // Configure the model to respond with audio liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.audio]), ); _session = await model.connect(); final audioRecordStream = _audioRecorder.startRecordingStream(); // Map the Uint8List stream to InlineDataPart stream final mediaChunkStream = audioRecordStream.map((data) { return InlineDataPart('audio/pcm', data); }); await _session.startMediaStream(mediaChunkStream); // In a separate thread, receive the audio response from the model await for (final message in _session.receive()) { // Process the received message } Unity
using Firebase; using Firebase.AI; async Task SendTextReceiveAudio() { // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel( modelName: "gemini-2.0-flash-live-preview-04-09", // Configure the model to respond with audio liveGenerationConfig: new LiveGenerationConfig( responseModalities: new[] { ResponseModality.Audio }) ); LiveSession session = await model.ConnectAsync(); // Start a coroutine to send audio from the Microphone var recordingCoroutine = StartCoroutine(SendAudio(session)); // Start receiving the response await ReceiveAudio(session); } IEnumerator SendAudio(LiveSession liveSession) { string microphoneDeviceName = null; int recordingFrequency = 16000; int recordingBufferSeconds = 2; var recordingClip = Microphone.Start(microphoneDeviceName, true, recordingBufferSeconds, recordingFrequency); int lastSamplePosition = 0; while (true) { if (!Microphone.IsRecording(microphoneDeviceName)) { yield break; } int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName); if (currentSamplePosition != lastSamplePosition) { // The Microphone uses a circular buffer, so we need to check if the // current position wrapped around to the beginning, and handle it // accordingly. int sampleCount; if (currentSamplePosition > lastSamplePosition) { sampleCount = currentSamplePosition - lastSamplePosition; } else { sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition; } if (sampleCount > 0) { // Get the audio chunk float[] samples = new float[sampleCount]; recordingClip.GetData(samples, lastSamplePosition); // Send the data, discarding the resulting Task to avoid the warning _ = liveSession.SendAudioAsync(samples); lastSamplePosition = currentSamplePosition; } } // Wait for a short delay before reading the next sample from the Microphone const float MicrophoneReadDelay = 0.5f; yield return new WaitForSeconds(MicrophoneReadDelay); } } Queue audioBuffer = new(); async Task ReceiveAudio(LiveSession liveSession) { int sampleRate = 24000; int channelCount = 1; // Create a looping AudioClip to fill with the received audio data int bufferSamples = (int)(sampleRate * channelCount); AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount, sampleRate, true, OnAudioRead); // Attach the clip to an AudioSource and start playing it AudioSource audioSource = GetComponent(); audioSource.clip = clip; audioSource.loop = true; audioSource.Play(); // Start receiving the response await foreach (var message in liveSession.ReceiveAsync()) { // Process the received message foreach (float[] pcmData in message.AudioAsFloat) { lock (audioBuffer) { foreach (float sample in pcmData) { audioBuffer.Enqueue(sample); } } } } } // This method is called by the AudioClip to load audio data. private void OnAudioRead(float[] data) { int samplesToProvide = data.Length; int samplesProvided = 0; lock(audioBuffer) { while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) { data[samplesProvided] = audioBuffer.Dequeue(); samplesProvided++; } } while (samplesProvided < samplesToProvide) { data[samplesProvided] = 0.0f; samplesProvided++; } } 打造更具吸引力的互动体验
本部分介绍了如何创建和管理 Live API 中更具吸引力或互动性的功能。
更改回答语音
Live API 使用 Chirp 3 来支持合成语音回答。使用 Firebase AI Logic 时,您可以发送各种高清语音语言的音频。如需查看每种语音的完整列表和演示,请参阅 Chirp 3:高清语音。
如需指定语音,请在 speechConfig 对象中设置语音名称,作为模型配置的一部分。如果您未指定语音,则默认使用 Puck。
| 在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。 在该部分中,您还需要点击所选Gemini API提供商对应的按钮,以便在此页面上看到特定于提供商的内容。 |
Swift
import FirebaseAILogic // ... let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel( modelName: "gemini-2.0-flash-live-preview-04-09", // Configure the model to use a specific voice for its audio response generationConfig: LiveGenerationConfig( responseModalities: [.audio], speech: SpeechConfig(voiceName: "VOICE_NAME") ) ) // ... Kotlin
// ... val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel( modelName = "gemini-2.0-flash-live-preview-04-09", // Configure the model to use a specific voice for its audio response generationConfig = liveGenerationConfig { responseModality = ResponseModality.AUDIO speechConfig = SpeechConfig(voice = Voice("VOICE_NAME")) } ) // ... Java
// ... LiveModel model = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel( "gemini-2.0-flash-live-preview-04-09", // Configure the model to use a specific voice for its audio response new LiveGenerationConfig.Builder() .setResponseModalities(ResponseModality.AUDIO) .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME"))) .build() ); // ... Web
// Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) const model = getLiveGenerativeModel(ai, { model: "gemini-2.0-flash-live-preview-04-09", // Configure the model to use a specific voice for its audio response generationConfig: { responseModalities: [ResponseModality.AUDIO], speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: "VOICE_NAME" }, }, }, }, }); Dart
// ... final model = FirebaseAI.googleAI().liveGenerativeModel( model: 'gemini-2.0-flash-live-preview-04-09', // Configure the model to use a specific voice for its audio response liveGenerationConfig: LiveGenerationConfig( responseModalities: ResponseModalities.audio, speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'), ), ); // ... Unity
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel( modelName: "gemini-2.0-flash-live-preview-04-09", liveGenerationConfig: new LiveGenerationConfig( responseModalities: new[] { ResponseModality.Audio }, speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")) ); 在以非英语语言提示模型并要求模型以非英语语言进行回答时,为了获得最佳结果,请在系统指令中添加以下内容:
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE. 在会话和请求之间保持上下文
您可以使用聊天结构在会话和请求之间保持上下文。请注意,此功能仅适用于文本输入和文本输出。
此方法最适合简短的上下文;您可以发送逐轮互动来表示确切的事件序列。 对于较长的上下文,建议提供单个消息摘要,以释放上下文窗口,以便进行后续互动。
处理中断
Firebase AI Logic 尚不支持处理中断。 请过段时间再来查看!
使用函数调用(工具)
您可以定义工具(例如可用函数)以与 Live API 搭配使用,就像使用标准内容生成方法一样。本部分介绍了将 Live API 与函数调用搭配使用时的一些细微差别。如需查看函数调用的完整说明和示例,请参阅函数调用指南。
根据单个提示,模型可以生成多个函数调用以及将这些函数的输出串联所需的代码。此代码在沙盒环境中执行,生成后续的 BidiGenerateContentToolCall 消息。执行会暂停,直到每个函数调用的结果可用为止,从而确保顺序处理。
此外,将 Live API 与函数调用搭配使用非常强大,因为模型可以向用户请求后续信息或澄清信息。例如,如果模型没有足够的信息来为要调用的函数提供形参值,则模型可以要求用户提供更多或更清晰的信息。
客户端应使用 BidiGenerateContentToolResponse 进行响应。
限制和要求
请注意 Live API 的以下限制和要求。
转录
Firebase AI Logic 尚不支持转写。请过段时间再来查看!
语言
- 输入语言:查看Gemini 模型支持的输入语言的完整列表
- 输出语言:如需查看可用的输出语言的完整列表,请参阅 Chirp 3:高清语音
音频格式
Live API 支持以下音频格式:
- 输入音频格式:原始 16 位 PCM 音频,采样率 16kHz,小端字节序
- 输出音频格式:原始 16 位 PCM 音频,采样率 24kHz,小端序
速率限制
Live API 对每个 Firebase 项目的并发会话数以及每分钟的令牌数 (TPM) 都有速率限制。
Gemini Developer API:
- 限制因项目的 Gemini Developer API“使用层级”(请参阅其速率限制文档)而异
Vertex AI Gemini API:
- 每个 Firebase 项目 5,000 个并发会话
- 每分钟 400 万个 token
会话时长
会话的默认时长为 10 分钟。当会话时长超过限制时,连接会终止。
模型还受上下文大小的限制。发送大量输入内容可能会导致会话提前终止。
语音活动检测 (VAD)
模型会自动对连续的音频输入流执行语音活动检测 (VAD)。VAD 默认处于启用状态。
token 计数
您无法将 CountTokens API 与 Live API 搭配使用。
就您使用 Firebase AI Logic 的体验提供反馈