使用 Gemini Live API 进行双向流式传输


Gemini Live API 支持与 Gemini 建立低延迟的双向文本和语音互动。借助 Live API,您可以为最终用户提供自然的、类似人类的语音对话体验,并能够使用文本或语音指令中断模型的回答。该模型可以处理文本和音频输入(视频输入即将推出!),并提供文本和音频输出。

您可以在 Google AI StudioVertex AI Studio 中使用提示和 Live API 进行原型设计。

Live API 是一种有状态 API,用于创建 WebSocket 连接,以便在客户端与 Gemini 服务器之间建立会话。如需了解详情,请参阅 Live API 参考文档(Gemini Developer API | Vertex AI Gemini API)。

准备工作

点击您的 Gemini API 提供商,以查看此页面上特定于提供商的内容和代码。

如果您尚未完成入门指南,请先完成该指南。该指南介绍了如何设置 Firebase 项目、将应用连接到 Firebase、添加 SDK、为所选的 Gemini API 提供程序初始化后端服务,以及创建 LiveModel 实例。

支持此功能的模型

支持 Live API 的模型取决于您选择的 Gemini API 提供方。

  • Gemini Developer API

    • gemini-live-2.5-flash (非公开正式版*
    • gemini-live-2.5-flash-preview
    • gemini-2.0-flash-live-001
    • gemini-2.0-flash-live-preview-04-09
  • Vertex AI Gemini API

    • gemini-live-2.5-flash (非公开正式版*
    • gemini-2.0-flash-live-preview-04-09 (仅可在 us-central1 中访问)

请注意,对于 Live API 的 2.5 版模型名称,live 段紧跟在 gemini 段之后。

* 请与您的 Google Cloud 账号团队代表联系,申请访问权限。

使用 Live API 的标准功能

本部分介绍了如何使用 Live API 的标准功能,特别是如何以流式传输各种类型的输入和输出:

根据流式文本输入生成流式文本

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在该部分中,您还需要点击所选Gemini API提供商对应的按钮,以便在此页面上看到特定于提供商的内容

您可以发送流式文本输入,并接收流式文本输出。请务必创建 liveModel 实例,并将响应模态设置为 Text

Swift

 import FirebaseAILogic // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(  modelName: "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with text  generationConfig: LiveGenerationConfig(  responseModalities: [.text]  ) ) do {  let session = try await model.connect()  // Provide a text prompt  let text = "tell a short story"  await session.sendTextRealtime(text)  var outputText = ""  for try await message in session.responses {  if case let .content(content) = message.payload {  content.modelTurn?.parts.forEach { part in  if let part = part as? TextPart {  outputText += part.text  }  }  // Optional: if you don't require to send more requests.  if content.isTurnComplete {  await session.close()  }  }  }  // Output received from the server.  print(outputText) } catch {  fatalError(error.localizedDescription) } 

Kotlin

 // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(  modelName = "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with text  generationConfig = liveGenerationConfig {  responseModality = ResponseModality.TEXT   } ) val session = model.connect() // Provide a text prompt val text = "tell a short story" session.send(text) var outputText = "" session.receive().collect {  if(it.turnComplete) {  // Optional: if you don't require to send more requests.  session.stopReceiving();  }  outputText = outputText + it.text } // Output received from the server. println(outputText) 

Java

 ExecutorService executor = Executors.newFixedThreadPool(1); // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(  "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with text  new LiveGenerationConfig.Builder()  .setResponseModalities(ResponseModality.TEXT)  .build() ); LiveModelFutures model = LiveModelFutures.from(lm); ListenableFuture<LiveSession> sessionFuture = model.connect(); class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {  @Override  public void onSubscribe(Subscription s) {  s.request(Long.MAX_VALUE); // Request an unlimited number of items  }  @Override  public void onNext(LiveContentResponse liveContentResponse) {  // Handle the response from the server. System.out.println(liveContentResponse.getText());  }  @Override  public void onError(Throwable t) {  System.err.println("Error: " + t.getMessage());  }  @Override  public void onComplete() {  System.out.println("Done receiving messages!");  } } Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {  @Override  public void onSuccess(LiveSession ses) {  LiveSessionFutures session = LiveSessionFutures.from(ses);  // Provide a text prompt  String text = "tell me a short story?";  session.send(text);  Publisher<LiveContentResponse> publisher = session.receive();  publisher.subscribe(new LiveContentResponseSubscriber());  }  @Override  public void onFailure(Throwable t) {  // Handle exceptions  } }, executor); 

Web

 // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API) const model = getLiveGenerativeModel(ai, {  model: "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with text  generationConfig: {  responseModalities: [ResponseModality.TEXT],  }, }); const session = await model.connect(); // Provide a text prompt const prompt = "tell a short story"; session.send(prompt); // Collect text from model's turn let text = ""; const messages = session.receive(); for await (const message of messages) {  switch (message.type) {  case "serverContent":  if (message.turnComplete) {  console.log(text);  } else {  const parts = message.modelTurn?.parts;  if (parts) {  text += parts.map((part) => part.text).join("");  }  }  break;  case "toolCall":  // Ignore  case "toolCallCancellation":  // Ignore  } } 

Dart

 import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; late LiveModelSession _session; await Firebase.initializeApp(  options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) final model = FirebaseAI.googleAI().liveGenerativeModel(  model: 'gemini-2.0-flash-live-preview-04-09',  // Configure the model to respond with text  liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.text]), ); _session = await model.connect(); // Provide a text prompt final prompt = Content.text('tell a short story'); await _session.send(input: prompt, turnComplete: true); // In a separate thread, receive the response await for (final message in _session.receive()) {  // Process the received message } 

Unity

 using Firebase; using Firebase.AI; async Task SendTextReceiveText() {  // Initialize the Gemini Developer API backend service  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(  modelName: "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with text  liveGenerationConfig: new LiveGenerationConfig(  responseModalities: new[] { ResponseModality.Text })  );  LiveSession session = await model.ConnectAsync();  // Provide a text prompt  var prompt = ModelContent.Text("tell a short story");  await session.SendAsync(content: prompt, turnComplete: true);  // Receive the response  await foreach (var message in session.ReceiveAsync()) {  // Process the received message  if (!string.IsNullOrEmpty(message.Text)) {  UnityEngine.Debug.Log("Received message: " + message.Text);  }  } } 

根据流式音频输入生成流式音频

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在该部分中,您还需要点击所选Gemini API提供商对应的按钮,以便在此页面上看到特定于提供商的内容

您可以发送流式音频输入,并接收流式音频输出。请务必创建 LiveModel 实例,并将响应模态设置为 Audio

请参阅本页下文,了解如何配置和自定义回答语音

Swift

 import FirebaseAILogic // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(  modelName: "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with audio  generationConfig: LiveGenerationConfig(  responseModalities: [.audio]  ) ) do {  let session = try await model.connect()  // Load the audio file, or tap a microphone  guard let audioFile = NSDataAsset(name: "audio.pcm") else {  fatalError("Failed to load audio file")  }  // Provide the audio data  await session.sendAudioRealtime(audioFile.data)  var outputText = ""  for try await message in session.responses {  if case let .content(content) = message.payload {  content.modelTurn?.parts.forEach { part in  if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {  // Handle 16bit pcm audio data at 24khz  playAudio(part.data)  }  }  // Optional: if you don't require to send more requests.  if content.isTurnComplete {  await session.close()  }  }  } } catch {  fatalError(error.localizedDescription) } 

Kotlin

 // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(  modelName = "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with audio  generationConfig = liveGenerationConfig {  responseModality = ResponseModality.AUDIO   } ) val session = model.connect() // This is the recommended way. // However, you can create your own recorder and handle the stream. session.startAudioConversation() 

Java

 ExecutorService executor = Executors.newFixedThreadPool(1); // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(  "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with audio  new LiveGenerationConfig.Builder()  .setResponseModalities(ResponseModality.AUDIO)  .build() ); LiveModelFutures model = LiveModelFutures.from(lm); ListenableFuture<LiveSession> sessionFuture = model.connect(); Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {  @Override  public void onSuccess(LiveSession ses) {  LiveSessionFutures session = LiveSessionFutures.from(ses);  session.startAudioConversation();  }  @Override  public void onFailure(Throwable t) {  // Handle exceptions  } }, executor); 

Web

 // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API) const model = getLiveGenerativeModel(ai, {  model: "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with audio  generationConfig: {  responseModalities: [ResponseModality.AUDIO],  }, }); const session = await model.connect(); // Start the audio conversation const audioConversationController = await startAudioConversation(session); // ... Later, to stop the audio conversation // await audioConversationController.stop() 

Dart

 import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; import 'package:your_audio_recorder_package/your_audio_recorder_package.dart'; late LiveModelSession _session; final _audioRecorder = YourAudioRecorder(); await Firebase.initializeApp(  options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Gemini Developer API backend service // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) final model = FirebaseAI.googleAI().liveGenerativeModel(  model: 'gemini-2.0-flash-live-preview-04-09',  // Configure the model to respond with audio  liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.audio]), ); _session = await model.connect(); final audioRecordStream = _audioRecorder.startRecordingStream(); // Map the Uint8List stream to InlineDataPart stream final mediaChunkStream = audioRecordStream.map((data) {  return InlineDataPart('audio/pcm', data); }); await _session.startMediaStream(mediaChunkStream); // In a separate thread, receive the audio response from the model await for (final message in _session.receive()) {  // Process the received message } 

Unity

 using Firebase; using Firebase.AI; async Task SendTextReceiveAudio() {  // Initialize the Gemini Developer API backend service  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(  modelName: "gemini-2.0-flash-live-preview-04-09",  // Configure the model to respond with audio  liveGenerationConfig: new LiveGenerationConfig(  responseModalities: new[] { ResponseModality.Audio })  );  LiveSession session = await model.ConnectAsync();  // Start a coroutine to send audio from the Microphone  var recordingCoroutine = StartCoroutine(SendAudio(session));  // Start receiving the response  await ReceiveAudio(session); } IEnumerator SendAudio(LiveSession liveSession) {  string microphoneDeviceName = null;  int recordingFrequency = 16000;  int recordingBufferSeconds = 2;  var recordingClip = Microphone.Start(microphoneDeviceName, true,  recordingBufferSeconds, recordingFrequency);  int lastSamplePosition = 0;  while (true) {  if (!Microphone.IsRecording(microphoneDeviceName)) {  yield break;  }  int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);  if (currentSamplePosition != lastSamplePosition) {  // The Microphone uses a circular buffer, so we need to check if the  // current position wrapped around to the beginning, and handle it  // accordingly.  int sampleCount;  if (currentSamplePosition > lastSamplePosition) {  sampleCount = currentSamplePosition - lastSamplePosition;  } else {  sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;  }  if (sampleCount > 0) {  // Get the audio chunk  float[] samples = new float[sampleCount];  recordingClip.GetData(samples, lastSamplePosition);  // Send the data, discarding the resulting Task to avoid the warning  _ = liveSession.SendAudioAsync(samples);  lastSamplePosition = currentSamplePosition;  }  }  // Wait for a short delay before reading the next sample from the Microphone  const float MicrophoneReadDelay = 0.5f;  yield return new WaitForSeconds(MicrophoneReadDelay);  } } Queue audioBuffer = new(); async Task ReceiveAudio(LiveSession liveSession) {  int sampleRate = 24000;  int channelCount = 1;  // Create a looping AudioClip to fill with the received audio data  int bufferSamples = (int)(sampleRate * channelCount);  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,  sampleRate, true, OnAudioRead);  // Attach the clip to an AudioSource and start playing it  AudioSource audioSource = GetComponent();  audioSource.clip = clip;  audioSource.loop = true;  audioSource.Play();  // Start receiving the response  await foreach (var message in liveSession.ReceiveAsync()) {  // Process the received message  foreach (float[] pcmData in message.AudioAsFloat) {  lock (audioBuffer) {  foreach (float sample in pcmData) {  audioBuffer.Enqueue(sample);  }  }  }  } } // This method is called by the AudioClip to load audio data. private void OnAudioRead(float[] data) {  int samplesToProvide = data.Length;  int samplesProvided = 0;  lock(audioBuffer) {  while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {  data[samplesProvided] = audioBuffer.Dequeue();  samplesProvided++;  }  }  while (samplesProvided < samplesToProvide) {  data[samplesProvided] = 0.0f;  samplesProvided++;  } } 



打造更具吸引力的互动体验

本部分介绍了如何创建和管理 Live API 中更具吸引力或互动性的功能。

更改回答语音

Live API 使用 Chirp 3 来支持合成语音回答。使用 Firebase AI Logic 时,您可以发送各种高清语音语言的音频。如需查看每种语音的完整列表和演示,请参阅 Chirp 3:高清语音

如需指定语音,请在 speechConfig 对象中设置语音名称,作为模型配置的一部分。如果您未指定语音,则默认使用 Puck

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在该部分中,您还需要点击所选Gemini API提供商对应的按钮,以便在此页面上看到特定于提供商的内容

Swift

 import FirebaseAILogic // ... let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(  modelName: "gemini-2.0-flash-live-preview-04-09",  // Configure the model to use a specific voice for its audio response  generationConfig: LiveGenerationConfig(  responseModalities: [.audio],  speech: SpeechConfig(voiceName: "VOICE_NAME")  ) ) // ... 

Kotlin

 // ... val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(  modelName = "gemini-2.0-flash-live-preview-04-09",  // Configure the model to use a specific voice for its audio response  generationConfig = liveGenerationConfig {  responseModality = ResponseModality.AUDIO  speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))  } ) // ... 

Java

 // ... LiveModel model = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(  "gemini-2.0-flash-live-preview-04-09",  // Configure the model to use a specific voice for its audio response  new LiveGenerationConfig.Builder()  .setResponseModalities(ResponseModality.AUDIO)  .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))  .build() ); // ... 

Web

 // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API) const model = getLiveGenerativeModel(ai, {  model: "gemini-2.0-flash-live-preview-04-09",  // Configure the model to use a specific voice for its audio response  generationConfig: {  responseModalities: [ResponseModality.AUDIO],  speechConfig: {  voiceConfig: {  prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },  },  },  }, }); 

Dart

 // ... final model = FirebaseAI.googleAI().liveGenerativeModel(  model: 'gemini-2.0-flash-live-preview-04-09',  // Configure the model to use a specific voice for its audio response  liveGenerationConfig: LiveGenerationConfig(  responseModalities: ResponseModalities.audio,  speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),  ), ); // ... 

Unity

 var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(  modelName: "gemini-2.0-flash-live-preview-04-09",  liveGenerationConfig: new LiveGenerationConfig(  responseModalities: new[] { ResponseModality.Audio },  speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")) ); 

在以非英语语言提示模型并要求模型以非英语语言进行回答时,为了获得最佳结果,请在系统指令中添加以下内容:

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE. 

在会话和请求之间保持上下文

您可以使用聊天结构在会话和请求之间保持上下文。请注意,此功能仅适用于文本输入和文本输出。

此方法最适合简短的上下文;您可以发送逐轮互动来表示确切的事件序列。 对于较长的上下文,建议提供单个消息摘要,以释放上下文窗口,以便进行后续互动。

处理中断

Firebase AI Logic支持处理中断。 请过段时间再来查看!

使用函数调用(工具)

您可以定义工具(例如可用函数)以与 Live API 搭配使用,就像使用标准内容生成方法一样。本部分介绍了将 Live API 与函数调用搭配使用时的一些细微差别。如需查看函数调用的完整说明和示例,请参阅函数调用指南

根据单个提示,模型可以生成多个函数调用以及将这些函数的输出串联所需的代码。此代码在沙盒环境中执行,生成后续的 BidiGenerateContentToolCall 消息。执行会暂停,直到每个函数调用的结果可用为止,从而确保顺序处理。

此外,将 Live API 与函数调用搭配使用非常强大,因为模型可以向用户请求后续信息或澄清信息。例如,如果模型没有足够的信息来为要调用的函数提供形参值,则模型可以要求用户提供更多或更清晰的信息。

客户端应使用 BidiGenerateContentToolResponse 进行响应。



限制和要求

请注意 Live API 的以下限制和要求。

转录

Firebase AI Logic支持转写。请过段时间再来查看!

语言

音频格式

Live API 支持以下音频格式:

  • 输入音频格式:原始 16 位 PCM 音频,采样率 16kHz,小端字节序
  • 输出音频格式:原始 16 位 PCM 音频,采样率 24kHz,小端序

速率限制

Live API 对每个 Firebase 项目的并发会话数以及每分钟的令牌数 (TPM) 都有速率限制。

  • Gemini Developer API

    • 限制因项目的 Gemini Developer API“使用层级”(请参阅其速率限制文档)而异
  • Vertex AI Gemini API

    • 每个 Firebase 项目 5,000 个并发会话
    • 每分钟 400 万个 token

会话时长

会话的默认时长为 10 分钟。当会话时长超过限制时,连接会终止。

模型还受上下文大小的限制。发送大量输入内容可能会导致会话提前终止。

语音活动检测 (VAD)

模型会自动对连续的音频输入流执行语音活动检测 (VAD)。VAD 默认处于启用状态。

token 计数

您无法将 CountTokens API 与 Live API 搭配使用。


就您使用 Firebase AI Logic 的体验提供反馈