实时语音识别（Qwen-ASR-Realtime）服务端事件

WebSocket 会话中服务端发送的事件。

使用指南：功能概述和示例代码请参见实时语音识别。

error

客户端或服务端发生错误时发送。

Example

{
  "event_id": "event_B2uoU7VOt1AAITsPRPH9n",
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_value",
    "message": "Invalid value: 'pcm16'. Supported values are: 'pcm', 'opus'.",
    "param": "session.input_audio_format",
    "event_id": "event_123"
  }
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 error。

object

body

错误详情。

显示properties

string

body

错误类型。

string

body

错误码。

string

body

错误信息。解决方案请参见错误信息。

string

body

与错误相关的参数。

string

body

与错误相关的事件 ID。

session.created

连接建立后的第一个事件，包含默认的会话配置。

Example

{
  "event_id": "event_1234",
  "type": "session.created",
  "session": {
    "id": "sess_001",
    "object": "realtime.session",
    "model": "qwen3-asr-flash-realtime",
    "modalities": ["text"],
    "input_audio_format": "pcm",
    "input_audio_transcription": null,
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.2,
      "silence_duration_ms": 800
    }
  }
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 session.created。

object

body

会话配置。

显示properties

string

body

当前 WebSocket 会话的 ID。

string

body

固定为 realtime.session。

string

body

模型名称。

array

body

输出模态，固定为 ["text"]。

string

body

输入音频格式。

object

body

语音识别设置。详见 session.update 客户端事件的 input_audio_transcription 参数。

object

body

语音活动检测（VAD）设置。

显示properties

string

body

固定为 server_vad。

float

body

VAD 检测阈值。

integer

body

检测到句子断点前的静默时长（毫秒）。

session.updated

session.update 事件处理完成后发送。如果处理失败，则发送 error 事件。其他参数说明请参见 session.created。

Example

{
  "event_id": "event_1234",
  "type": "session.updated",
  "session": {
    "id": "sess_001",
    "object": "realtime.session",
    "model": "qwen3-asr-flash-realtime",
    "modalities": ["text"],
    "input_audio_format": "pcm",
    "input_audio_transcription": null,
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.2,
      "silence_duration_ms": 800
    }
  }
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 session.updated。

input_audio_buffer.speech_started

VAD 模式下，检测到音频缓冲区中有语音开始时发送。

每次向缓冲区添加音频时都会触发，除非语音起始点已被检测到。

Example

{
  "event_id": "event_B1lV7FPbgTv9qGxPI1tH4",
  "type": "input_audio_buffer.speech_started",
  "audio_start_ms": 64,
  "item_id": "item_B1lV7jWLscp4mMV8hSs8c"
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 input_audio_buffer.speech_started。

integer

body

从缓冲区起始到检测到语音的时间（毫秒）。

string

body

即将创建的用户消息项 ID。

input_audio_buffer.speech_stopped

VAD 模式下，检测到音频缓冲区中语音结束时发送。紧接着会发送 conversation.item.created 事件，包含用户消息项。

Example

{
  "event_id": "event_B3GGEYh2orwNIdhUagZPz",
  "type": "input_audio_buffer.speech_stopped",
  "audio_end_ms": 28128,
  "item_id": "item_B3GGE8ry4yqbqJGzrVhEM"
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 input_audio_buffer.speech_stopped。

integer

body

从会话开始到语音结束的时间（毫秒）。

string

body

语音结束时创建的用户消息项 ID。

input_audio_buffer.committed

输入音频缓冲区提交后发送。

VAD 模式：服务端检测到语音段结束后自动触发。
手动模式：通过 input_audio_buffer.append 发送完音频，再发送 input_audio_buffer.commit 后触发。

Example

{
  "event_id": "event_1121",
  "type": "input_audio_buffer.committed",
  "previous_item_id": "msg_001",
  "item_id": "msg_002"
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 input_audio_buffer.committed。

string

body

上一个对话项的 ID。

string

body

即将创建的用户对话项 ID。

conversation.item.created

对话项创建时发送。

Example

{
  "type": "conversation.item.created",
  "event_id": "event_B3GGKbCfBZTpqFHZ0P8vg",
  "previous_item_id": "item_B3GGE8ry4yqbqJGzrVhEM",
  "item": {
    "id": "item_B3GGEPlolCqdMiVbYIf5L",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "transcript": null
      }
    ]
  }
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 conversation.item.created。

string

body

上一个对话项的 ID。

object

body

对话项。

显示properties

string

body

对话项的唯一 ID。

string

body

固定为 realtime.item。

string

body

固定为 message。

string

body

对话项的状态。

string

body

消息发送者的角色。

array

body

消息内容。

显示properties

string

body

固定为 input_audio。

string

body

固定为 null。最终结果在 conversation.item.input_audio_transcription.completed 事件中返回。

conversation.item.input_audio_transcription.text

高频发送，包含实时识别结果。

Example

{
  "event_id": "event_R7Pfu8QVBfP5HmpcbEFSd",
  "type": "conversation.item.input_audio_transcription.text",
  "item_id": "item_MpJQPNQzqVRc9aC9zMwSj",
  "content_index": 0,
  "language": "en",
  "emotion": "neutral",
  "text": "",
  "stash": "Beijing's"
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 conversation.item.input_audio_transcription.text。

string

body

关联的对话项 ID。

integer

body

包含音频的 content 部分的索引。

string

body

检测到的语言。如果您设置了 language 请求参数，此值与该设置一致。可选值：

zh：中文（普通话、四川话、闽南语、吴语）
yue：粤语
en：英语
ja：日语
de：德语
ko：韩语
ru：俄语
fr：法语
pt：葡萄牙语
ar：阿拉伯语
it：意大利语
es：西班牙语
hi：印地语
id：印尼语
th：泰语
tr：土耳其语
uk：乌克兰语
vi：越南语
cs：捷克语
da：丹麦语
fil：菲律宾语
fi：芬兰语
is：冰岛语
ms：马来语
no：挪威语
pl：波兰语
sv：瑞典语

string

body

检测到的情绪。可选值：surprised、neutral、happy、sad、disgusted、angry、fearful。

string

body

已确认的文本前缀。模型已完成对这部分内容的识别，不会再修改。

string

body

预识别的文本后缀。跟在已确认部分之后的临时草稿，模型可能会修正。

拼接 text + stash 可获得最完整的实时预览。

显示点击查看示例

假设用户说了"今天天气真不错，阳光明媚"，下表展示了您可能收到的事件：

时间点	用户语音进度	API 返回（`text` 和 `stash`）	UI 显示（`text + stash`）
T1	"今天..."	`text`: `""` / `stash`: `"今天"`	今天
T2	"...天气真..."	`text`: `""` / `stash`: `"今天天气真"`	今天天气真
T3	"...不错"	`text`: `"今天"` / `stash`: `"天气真不错"`	今天天气真不错
T4	（短暂停顿）	`text`: `"今天天气真不错，"` / `stash`: `""`	今天天气真不错，
T5	"...阳光..."	`text`: `"今天天气真不错，"` / `stash`: `"阳光"`	今天天气真不错，阳光
T6	"...明媚。"	`text`: `"今天天气真不错，"` / `stash`: `"阳光明媚。"`	今天天气真不错，阳光明媚。
T7	（用户停止说话）	-	以 conversation.item.input_audio_transcription.completed 事件中的 `transcript` 作为最终结果。

conversation.item.input_audio_transcription.completed

发送最终识别结果，标志着一个对话项的结束。

Example

{
  "event_id": "event_B3GGEjPT2sLzjBM74W6kB",
  "type": "conversation.item.input_audio_transcription.completed",
  "item_id": "item_B3GGC53jGOuIFcjZkmEQ9",
  "content_index": 0,
  "language": "en",
  "emotion": "neutral",
  "transcript": "What's the weather like today?"
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 conversation.item.input_audio_transcription.completed。

string

body

关联的对话项 ID。

integer

body

包含音频的 content 部分的索引。

string

body

检测到的语言。如果您设置了 language 请求参数，此值与该设置一致。可选值：

zh：中文（普通话、四川话、闽南语、吴语）
yue：粤语
en：英语
ja：日语
de：德语
ko：韩语
ru：俄语
fr：法语
pt：葡萄牙语
ar：阿拉伯语
it：意大利语
es：西班牙语
hi：印地语
id：印尼语
th：泰语
tr：土耳其语
uk：乌克兰语
vi：越南语
cs：捷克语
da：丹麦语
fil：菲律宾语
fi：芬兰语
is：冰岛语
ms：马来语
no：挪威语
pl：波兰语
sv：瑞典语

string

body

检测到的情绪。可选值：surprised、neutral、happy、sad、disgusted、angry、fearful。

string

body

转写结果。

conversation.item.input_audio_transcription.failed

输入音频识别失败时发送。该事件独立于其他 error 事件，便于定位失败的具体项。

Example

{
  "event_id": "event_B4KHRpC2nXs7dLmqTVo1f",
  "type": "conversation.item.input_audio_transcription.failed",
  "item_id": "item_B4KHRmVbcQwp9yZk2UeN3",
  "content_index": 0,
  "error": {
    "code": "audio_unintelligible",
    "message": "The audio could not be transcribed.",
    "param": null
  }
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 conversation.item.input_audio_transcription.failed。

string

body

关联的对话项 ID。

integer

body

包含音频的 content 部分的索引。

object

body

错误详情。

显示properties

string

body

错误码。

string

body

错误信息。解决方案请参见错误信息。

string

body

与错误相关的参数。

session.finished

确认所有识别已完成。在您发送 session.finish 后返回。收到此事件后即可断开连接。

Example

{
  "event_id": "event_2239",
  "type": "session.finished"
}

string

body

事件的唯一标识符。

string

body

事件类型，固定为 session.finished。

​error

​session.created

​session.updated

​input_audio_buffer.speech_started

​input_audio_buffer.speech_stopped

​input_audio_buffer.committed

​conversation.item.created

​conversation.item.input_audio_transcription.text

​conversation.item.input_audio_transcription.completed

​conversation.item.input_audio_transcription.failed

​session.finished

error

session.created

session.updated

input_audio_buffer.speech_started

input_audio_buffer.speech_stopped

input_audio_buffer.committed

conversation.item.created

conversation.item.input_audio_transcription.text

conversation.item.input_audio_transcription.completed

conversation.item.input_audio_transcription.failed

session.finished