Qwen-Omni server events

Qwen-Omni-Realtime API 的服务端事件。

error

服务端错误消息。

Example

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "error",
  "error": {
  "type": "invalid_request_error",
  "code": "invalid_value",
  "message": "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text'].",
  "param": "session.modalities"
  }
}

string

body

事件唯一标识。

string

body

固定为 error。

object

body

错误详情。

显示properties

string

body

错误类型。

string

body

错误码。

string

body

错误信息。

string

body

相关参数（如 session.modalities）。

session.created

连接后收到的第一个事件，包含默认会话配置。

Example

{
  "event_id": "event_RdvlSpbBb2ssyBjYrDHjt",
  "type": "session.created",
  "session": {
  "object": "realtime.session",
  "model": "qwen3-omni-flash-realtime",
  "modalities": [
      "text",
      "audio"
  ],
  "voice": "Cherry",
  "input_audio_format": "pcm16",
  "output_audio_format": "pcm24",
  "input_audio_transcription": {
      "model": "gummy-realtime-v1"
  },
  "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 800,
      "create_response": true,
      "interrupt_response": true
  },
  "tools": [],
  "tool_choice": "auto",
  "temperature": 0.8,
  "id": "sess_Ov7GOXoNXhNjlxXtOGKQS"
  }
}

string

body

事件唯一标识。

string

body

固定为 session.created。

object

body

会话配置。

显示properties

string

body

固定为 realtime.session。

string

body

模型名称。

array

body

输出模态。

string

body

音频输出的语音。

string

body

用户输入音频的格式，固定为 pcm16。输入音频要求为 16 kHz 采样率的 PCM 音频流。

string

body

模型输出音频的格式。输出音频为 24 kHz 采样率的 PCM 音频流。当前不支持自定义输出采样率。各模型支持的格式：

Qwen3.5-Omni-Realtime：仅支持 pcm24
Qwen3-Omni-Flash-Realtime：仅支持 pcm24
Qwen-Omni-Turbo-Realtime：仅支持 pcm16

object

body

语音转录配置。

显示properties

string

body

转录模型，固定为 gummy-realtime-v1。

object

body

语音活动检测（VAD）配置。

显示properties

string

body

固定为 server_vad。

float

body

VAD 检测阈值。

integer

body

判定语音结束前的静默时长（毫秒）。

integer

body

静默超时时间（毫秒）。仅在server_vad模式下，使用qwen3.5-omni-plus-realtime或qwen3.5-omni-flash-realtime模型时返回。

integer

body

语音开始前保留的音频时长（毫秒）。

boolean

body

检测到语音结束后是否自动创建响应。

boolean

body

检测到新语音时是否中断当前响应。

array

body

可供模型调用的工具列表。

string

body

工具调用策略。

string

body

会话唯一 ID。

float

body

采样温度。

session.updated

成功处理 session.update 请求后发送。如果出错，服务端会发送 error 事件。

Example

{
  "event_id": "event_X1HsXS4b4uptp6yo1LgKd",
  "type": "session.updated",
  "session": {
  "id": "sess_Aih6vAcY5Ddt6jwFx1tCa",
  "object": "realtime.session",
  "model": "qwen3-omni-flash-realtime",
  "modalities": [
      "text",
      "audio"
  ],
  "instructions": "You are a personal assistant named Xiaoyun. Please answer user questions accurately and in a friendly manner, always responding with a helpful attitude.",
  "voice": "Cherry",
  "input_audio_format": "pcm16",
  "output_audio_format": "pcm24",
  "input_audio_transcription": {
      "model": "gummy-realtime-v1"
  },
  "turn_detection": {
      "type": "server_vad",
      "threshold": 0.1,
      "prefix_padding_ms": 500,
      "silence_duration_ms": 900,
      "create_response": true,
      "interrupt_response": true
  },
  "temperature": 0.8,
  "max_response_output_token": "inf",
  "max_tokens": 16384,
  "repetition_penalty": 1.05,
  "presence_penalty": 0.0,
  "top_k": 50,
  "top_p": 1.0,
  "seed": -1
  }
}

string

body

事件唯一标识。

string

body

固定为 session.updated。

object

body

会话配置。

显示properties

string

body

会话唯一 ID。

string

body

固定为 realtime.session。

string

body

模型名称。

float

body

采样温度。

array

body

输出模态。

string

body

音频输出的语音。

string

body

模型的系统指令。

string

body

用户输入音频的格式，固定为 pcm16。输入音频要求为 16 kHz 采样率的 PCM 音频流。

string

body

模型输出音频的格式，默认为 pcm24，实际值取决于模型。输出音频为 24 kHz 采样率的 PCM 音频流。当前不支持自定义输出采样率。

object

body

语音转录配置。

显示properties

string

body

转录模型，固定为 gummy-realtime-v1。

object

body

VAD 配置。

显示properties

string

body

固定为 server_vad。

float

body

VAD 检测阈值。

integer

body

判定语音结束前的静默时长（毫秒）。

integer

body

静默超时时间（毫秒）。仅在server_vad模式下，使用qwen3.5-omni-plus-realtime或qwen3.5-omni-flash-realtime模型时返回。

integer

body

语音开始前保留的音频时长（毫秒）。

boolean

body

检测到语音结束后是否自动创建响应。

boolean

body

检测到新语音时是否中断当前响应。

string

body

响应输出的最大 token 数，"inf" 表示不限制。

float

body

核采样概率阈值。

integer

body

采样时的候选 token 数量。

integer

body

响应的最大 token 数。

float

body

重复序列惩罚系数。

float

body

重复内容惩罚系数。

integer

body

用于结果复现的随机种子。

input_audio_buffer.speech_started

VAD 模式下，当音频缓冲区中检测到语音开始时发送。

在检测到语音之前，每次向缓冲区添加音频时也可能触发此事件。

Example

{
  "event_id": "event_Pvp8nEhsQuGCQbFJ9x58n",
  "type": "input_audio_buffer.speech_started",
  "audio_start_ms": 3647,
  "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

string

body

事件唯一标识。

string

body

固定为 input_audio_buffer.speech_started。

integer

body

从音频输入开始到首次检测到语音的毫秒数。

string

body

用户消息项 ID，在语音结束时创建。该消息项将用户输入追加到对话历史中用于推理。

input_audio_buffer.speech_stopped

VAD 模式下，当音频缓冲区中语音结束时发送。服务端同时会发送 conversation.item.created 来创建用户消息项。

Example

{
  "event_id": "event_UhQiqNVRsgUiq4KUS5Xb5",
  "type": "input_audio_buffer.speech_stopped",
  "audio_end_ms": 4453,
  "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

string

body

事件唯一标识。

string

body

固定为 input_audio_buffer.speech_stopped。

integer

body

从会话开始到语音结束的毫秒数。

string

body

用户消息项 ID（将被创建）。

input_audio_buffer.committed

输入音频缓冲区提交时发送。

VAD 模式下，用户说话结束后缓冲区自动提交。
手动模式下，在客户端发送 input_audio_buffer.commit 后触发。

Example

{
  "event_id": "event_Iy6sUzL1nmdFgshFYxJEz",
  "type": "input_audio_buffer.committed",
  "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

string

body

事件唯一标识。

string

body

固定为 input_audio_buffer.committed。

string

body

用户消息项 ID（将被创建）。

input_audio_buffer.cleared

客户端发送 input_audio_buffer.clear 后触发。

Example

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "input_audio_buffer.cleared"
}

string

body

事件唯一标识。

string

body

固定为 input_audio_buffer.cleared。

conversation.item.created

创建对话项时发送。

Example

{
  "event_id": "event_JEfkrr9gO3Ny7Xcv9bGVd",
  "type": "conversation.item.created",
  "item": {
  "id": "item_YbAiGvK2H7YaS34o4R6Ba",
  "object": "realtime.item",
  "type": "message",
  "status": "in_progress",
  "role": "user",
  "content": [
      {
    "type": "input_audio"
      }
  ]
  }
}

string

body

事件唯一标识。

string

body

固定为 conversation.item.created。

object

body

对话项。

显示properties

string

body

对话项唯一 ID。

string

body

固定为 realtime.item。

string

body

对话项类型，当前值为 message。

string

body

对话项状态。

string

body

消息角色。

array

body

消息内容。

conversation.item.input_audio_transcription.delta

开启输入音频转录后，此事件会在用户说话过程中高频发送，用于展示实时识别的中间结果。您可以通过拼接 text + stash 获取当前最完整的句子预览。

Example

{
  "event_id": "event_C7jzoeSFuiwOZS6tR14yx",
  "type": "conversation.item.input_audio_transcription.delta",
  "item_id": "item_ThVYhLHOdeXb4bBSvzSFF",
  "content_index": 0,
  "text": "",
  "stash": "今天天气怎么样？",
  "language": "zh",
  "emotion": "neutral",
  "obfuscation": "ABEXGYmxdmc97u"
}

在任何时刻，要获取当前最完整的句子预览，都需要将这两个字段拼接起来：实时预览句子 = text + stash。

点击查看示例

假设用户正在说："今天天气不错，阳光明媚。"以下是您可能会收到的事件流以及如何解读它们：

时间点	用户说话进度	API 响应 (text 和 stash)	客户端 UI 应显示 (text + stash)
T1	"今天……"	text: "" / stash: "今天"	今天
T2	"……天气……"	text: "" / stash: "今天天气"	今天天气
T3	"……不错"	text: "今天" / stash: "天气不错"	今天天气不错（"今天"已被确认并移入 text）
T4	（短暂停顿）	text: "今天天气不错，" / stash: ""	今天天气不错，（前半句完全确认）
T5	"……阳光……"	text: "今天天气不错，" / stash: "阳光"	今天天气不错，阳光
T6	"……明媚。"	text: "今天天气不错，" / stash: "阳光明媚。"	今天天气不错，阳光明媚。
T7	（结束说话）	-	使用 conversation.item.input_audio_transcription.completed 的 transcript 内容作为最终结果。

string

body

本次事件唯一标识符。

string

body

事件类型，固定为 conversation.item.input_audio_transcription.delta。

string

body

关联的对话项 ID。

integer

body

包含音频的内容部分的索引。

string

body

已确认的文本前缀。这是当前句子中，模型已确认不会再变更的部分。

string

body

预识别的文本后缀。这是紧跟在已确认部分之后，模型仍在处理、可能会被修正的临时草稿。

string

body

当前识别到的语言代码（如 zh、en）。

string

body

当前检测到的用户情绪（如 neutral、happy）。

conversation.item.input_audio_transcription.completed

音频缓冲并转录完成后发送。转录使用独立模型（gummy-realtime-v1）。

转录文本可能与 Qwen-Omni-Realtime 处理的文本有所不同，仅供参考。

Example

{
  "event_id": "event_FrrZcxiDfTB9LD9p4pVng",
  "type": "conversation.item.input_audio_transcription.completed",
  "item_id": "item_YbAiGvK2H7YaS34o4R6Ba",
  "content_index": 0,
  "transcript": "Hello."
}

string

body

事件唯一标识。

string

body

固定为 conversation.item.input_audio_transcription.completed。

string

body

用户消息项 ID。

integer

body

固定为 0。

string

body

转录文本。

conversation.item.input_audio_transcription.failed

输入音频转录失败时发送（需已启用转录功能）。与 error 事件相互独立。

Example

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "conversation.item.input_audio_transcription.failed",
  "item_id": "<item_id>",
  "content_index": 0,
  "error": {
  "code": "<code>",
  "message": "<message>",
  "param": "<param>"
  }
}

string

body

事件唯一标识。

string

body

固定为 conversation.item.input_audio_transcription.failed。

string

body

用户消息项 ID。

integer

body

固定为 0。

object

body

错误详情。

显示properties

string

body

错误码。

string

body

错误信息。

string

body

response.created

模型开始生成响应时发送。

Example

{
  "event_id": "event_XuDavMzQN3KKepqGu3KRh",
  "type": "response.created",
  "response": {
  "id": "resp_HaVOPdbmX6vifiV5pAfJY",
  "object": "realtime.response",
  "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
  "status": "in_progress",
  "modalities": [
      "text",
      "audio"
  ],
  "voice": "Cherry",
  "output_audio_format": "pcm24",
  "output": []
  }
}

string

body

事件唯一标识。

string

body

固定为 response.created。

object

body

响应对象。

显示properties

string

body

响应唯一 ID。

string

body

会话 ID。

string

body

固定为 realtime.response。

string

body

响应状态：completed、failed、in_progress 或 incomplete。

array

body

响应模态。

string

body

音频输出的语音。

string

body

输出音频格式。

array

body

该事件中为空。

response.done

响应生成完成后发送。response 对象包含所有输出项，但不含原始音频数据。

Example

{
  "event_id": "event_CSaxRRYLvbrfexDXAEuDG",
  "type": "response.done",
  "response": {
  "id": "resp_HaVOPdbmX6vifiV5pAfJY",
  "object": "realtime.response",
  "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
  "status": "completed",
  "modalities": [
      "text",
      "audio"
  ],
  "voice": "Cherry",
  "output_audio_format": "pcm24",
  "output": [
      {
    "id": "item_Ls6MtCUWO7LM4E59QziNv",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "assistant",
    "content": [
          {
      "type": "audio",
      "transcript": "Hello! Is there anything I can help you with?"
          }
    ]
      }
  ],
  "usage": {
      "total_tokens": 377,
      "input_tokens": 336,
      "output_tokens": 41,
      "input_tokens_details": {
    "text_tokens": 228,
    "audio_tokens": 108
      },
      "output_tokens_details": {
    "text_tokens": 9,
    "audio_tokens": 32
      }
  }
  }
}

string

body

事件唯一标识。

string

body

固定为 response.done。

object

body

响应对象。

显示properties

string

body

响应唯一 ID。

string

body

会话 ID。

string

body

固定为 realtime.response。

string

body

响应状态。

array

body

响应模态。

string

body

音频输出的语音。

string

body

输出音频格式。

array

body

响应输出。

显示properties

string

body

输出项 ID。

string

body

输出项类型，当前为 message。

string

body

输出项对象类型，当前为 realtime.item。

string

body

输出项状态。

string

body

输出项角色。

array

body

输出项内容。

显示properties

string

body

内容类型：text 为纯文本，audio 为音频输出。

string

body

文本内容。

string

body

音频的转录文本。

object

body

本次响应的 token 用量。

response.text.delta

输出模态为纯文本时，模型生成文本片段时发送。

Example

{
  "delta": "Hello",
  "event_id": "event_TH49MauuPmRo1RGaMSlP7",
  "type": "response.text.delta",
  "response_id": "resp_PrRSvPVpnCExdUOGHHLuP",
  "item_id": "item_L8IRm9kRXFpxoOjDqDC96",
  "output_index": 0,
  "content_index": 0
}

string

body

事件唯一标识。

string

body

固定为 response.text.delta。

string

body

增量文本片段。

string

body

响应 ID。

string

body

消息项 ID，用于关联同一消息的各项内容。

integer

body

输出项索引，固定为 0。

integer

body

内容部分索引，固定为 0。

response.text.done

纯文本输出生成完成时发送。

响应被中断、未完成或取消时也会发送此事件。

Example

{
  "event_id": "event_B1lIeE2Nac33zn5V7h2mm",
  "type": "response.text.done",
  "response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
  "item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
  "output_index": 0,
  "content_index": 0,
  "text": "How can I assist you today?"
}

string

body

事件唯一标识。

string

body

固定为 response.text.done。

string

body

响应 ID。

string

body

消息项 ID。

integer

body

输出项索引。

integer

body

内容部分索引。

string

body

完整文本输出。

response.audio.delta

输出模态包含音频时，模型生成音频片段时发送。

Example

{
  "event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
  "type": "response.audio.delta",
  "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
  "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
  "output_index": 0,
  "content_index": 0,
  "delta": "{base64 audio}"
}

string

body

事件唯一标识。

string

body

固定为 response.audio.delta。

string

body

响应 ID。

string

body

消息项 ID。

integer

body

输出项索引。

integer

body

内容部分索引。

string

body

Base64 编码的音频片段。

response.audio.done

音频输出生成完成时发送。

响应被中断、未完成或取消时也会发送此事件。

Example

{
  "event_id": "event_Le1TDl7VfyHQxl47DtGxI",
  "type": "response.audio.done",
  "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
  "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
  "output_index": 0,
  "content_index": 0
}

string

body

事件唯一标识。

string

body

固定为 response.audio.done。

string

body

响应 ID。

string

body

消息项 ID。

integer

body

输出项索引。

integer

body

内容部分索引。

response.audio_transcript.delta

输出模态包含音频时，模型生成转录文本片段时发送。

Example

{
  "event_id": "event_BksW7fOwnyavZdDxIzZYM",
  "type": "response.audio_transcript.delta",
  "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
  "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
  "output_index": 0,
  "content_index": 0,
  "delta": "Is there anything"
}

string

body

事件唯一标识。

string

body

固定为 response.audio_transcript.delta。

string

body

响应 ID。

string

body

消息项 ID。

integer

body

输出项索引。

integer

body

内容部分索引。

string

body

增量转录文本。

response.audio_transcript.done

音频转录文本生成完成时发送。

Example

{
  "event_id": "event_X49tL2WerT4WjxcmH16lS",
  "type": "response.audio_transcript.done",
  "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
  "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
  "output_index": 0,
  "content_index": 0,
  "transcript": "Hello! Is there anything I can help you with?"
}

string

body

事件唯一标识。

string

body

固定为 response.audio_transcript.done。

string

body

响应 ID。

string

body

消息项 ID。

integer

body

输出项索引。

integer

body

内容部分索引。

string

body

完整转录文本。

response.output_item.added

响应生成过程中创建新输出项时发送。

Example

{
  "event_id": "event_DsCO341DEVtiATtCB6BUY",
  "type": "response.output_item.added",
  "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
  "output_index": 0,
  "item": {
  "id": "item_Ls6MtCUWO7LM4E59QziNv",
  "object": "realtime.item",
  "type": "message",
  "status": "in_progress",
  "role": "assistant",
  "content": []
  }
}

string

body

事件唯一标识。

string

body

固定为 response.output_item.added。

string

body

响应 ID。

integer

body

输出项索引。

object

body

输出项。

显示properties

string

body

输出项唯一 ID。

string

body

固定为 realtime.item。

string

body

输出项状态。

string

body

发送者角色。

array

body

消息内容。

response.output_item.done

输出项完成时发送。

Example

{
  "event_id": "event_MEu5nlLw1LsOguHiehIP8",
  "type": "response.output_item.done",
  "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
  "output_index": 0,
  "item": {
  "id": "item_Ls6MtCUWO7LM4E59QziNv",
  "object": "realtime.item",
  "type": "message",
  "status": "completed",
  "role": "assistant",
  "content": [
      {
    "type": "audio",
    "transcript": "Hello! Is there anything I can help you with?"
      }
  ]
  }
}

string

body

事件唯一标识。

string

body

固定为 response.output_item.done。

string

body

响应 ID。

integer

body

输出项索引。

object

body

输出项。

显示properties

string

body

输出项唯一 ID。

string

body

固定为 realtime.item。

string

body

输出项状态。

string

body

发送者角色。

array

body

消息内容。

response.content_part.added

响应生成过程中，向助手消息添加新内容部分时发送。

Example

{
  "event_id": "event_AVBOmrgY3C8bjlRajfSUT",
  "type": "response.content_part.added",
  "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
  "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
  "output_index": 0,
  "content_index": 0,
  "part": {
  "type": "audio",
  "text": ""
  }
}

string

body

事件唯一标识。

string

body

固定为 response.content_part.added。

string

body

响应 ID。

string

body

消息项 ID。

integer

body

输出项索引，固定为 0。

integer

body

内容部分索引，固定为 0。

object

body

内容部分。

显示properties

string

body

内容类型。

string

body

文本内容。

response.content_part.done

助手消息中的内容部分流式传输完成时发送。

Example

{
  "event_id": "event_Il8HD19v58Qr5IBkw7LtN",
  "type": "response.content_part.done",
  "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
  "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
  "output_index": 0,
  "content_index": 0,
  "part": {
  "type": "audio",
  "text": "Hello! Is there anything I can help you with?"
  }
}

string

body

事件唯一标识。

string

body

固定为 response.content_part.done。

string

body

响应 ID。

string

body

消息项 ID。

integer

body

输出项索引，固定为 0。

integer

body

内容部分索引，固定为 0。

object

body

内容部分。

显示properties

string

body

内容类型。

string

body

文本内容。

​error

​session.created

​session.updated

​input_audio_buffer.speech_started

​input_audio_buffer.speech_stopped

​input_audio_buffer.committed

​input_audio_buffer.cleared

​conversation.item.created

​conversation.item.input_audio_transcription.delta

​conversation.item.input_audio_transcription.completed

​conversation.item.input_audio_transcription.failed

​response.created

​response.done

​response.text.delta

​response.text.done

​response.audio.delta

​response.audio.done

​response.audio_transcript.delta

​response.audio_transcript.done

​response.output_item.added

​response.output_item.done

​response.content_part.added

​response.content_part.done

error

session.created

session.updated

input_audio_buffer.speech_started

input_audio_buffer.speech_stopped

input_audio_buffer.committed

input_audio_buffer.cleared

conversation.item.created

conversation.item.input_audio_transcription.delta

conversation.item.input_audio_transcription.completed

conversation.item.input_audio_transcription.failed

response.created

response.done

response.text.delta

response.text.done

response.audio.delta

response.audio.done

response.audio_transcript.delta

response.audio_transcript.done

response.output_item.added

response.output_item.done

response.content_part.added

response.content_part.done