将连续音频流实时转写为文字
模型可用性、支持语言和功能对比,请参见语音转文字模型。
快速开始
- Fun-ASR
- Qwen-ASR
更多代码示例,请参见 GitHub。获取 API Key 并将其设置为环境变量。如需使用 SDK,请先安装。
安装依赖
模型可用性
| 模型 | 版本 | 单价 | 免费额度 (说明) |
|---|---|---|---|
| fun-asr-realtime 当前版本:fun-asr-realtime-2025-11-07 | 稳定版 | 0.00033元/秒 | 36,000 秒(10 小时) 有效期 90 天 |
| fun-asr-realtime-2025-11-07 | 快照版 | 0.00033元/秒 | 36,000 秒(10 小时) 有效期 90 天 |
- 支持语言:普通话、粤语、吴语、闽南语、客家话、赣语、湘语、晋语,以及中原、西南、冀鲁、江淮、兰银、胶辽、东北、北京、港台等地区的普通话口音——涵盖河南、陕西、湖北、四川、重庆、云南、贵州、广东、广西、河北、天津、山东、安徽、南京、江苏、杭州、甘肃、宁夏等地。同时支持英语和日语。
- 采样率:16 kHz
- 音频格式:pcm、wav、mp3、opus、speex、aac、amr
从麦克风实时识别
从麦克风采集音频并实时输出识别结果。复制
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;
import java.nio.ByteBuffer;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Main {
public static void main(String[] args) throws InterruptedException {
Constants.baseWebsocketApiUrl = "wss://dashscope.aliyuncs.com/api-ws/v1/inference";
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(new RealtimeRecognitionTask());
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
}
class RealtimeRecognitionTask implements Runnable {
@Override
public void run() {
RecognitionParam param = RecognitionParam.builder()
.model("fun-asr-realtime")
// 如果未配置环境变量,请将下一行替换为您的 API Key:.apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.format("wav")
.sampleRate(16000)
.build();
Recognition recognizer = new Recognition();
ResultCallback<RecognitionResult> callback = new ResultCallback<RecognitionResult>() {
@Override
public void onEvent(RecognitionResult result) {
if (result.isSentenceEnd()) {
System.out.println("Final Result: " + result.getSentence().getText());
} else {
System.out.println("Intermediate Result: " + result.getSentence().getText());
}
}
@Override
public void onComplete() {
System.out.println("Recognition complete");
}
@Override
public void onError(Exception e) {
System.out.println("RecognitionCallback error: " + e.getMessage());
}
};
try {
recognizer.call(param, callback);
// 创建音频格式
AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
// 根据格式匹配默认录音设备
TargetDataLine targetDataLine =
AudioSystem.getTargetDataLine(audioFormat);
targetDataLine.open(audioFormat);
// 开始录音
targetDataLine.start();
ByteBuffer buffer = ByteBuffer.allocate(1024);
long start = System.currentTimeMillis();
// 录制 50 秒并实时转写
while (System.currentTimeMillis() - start < 50000) {
int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
if (read > 0) {
buffer.limit(read);
// 将录制的音频数据发送至流式识别服务
recognizer.sendAudioFrame(buffer);
buffer = ByteBuffer.allocate(1024);
// 限制录音速率,短暂休眠以防止 CPU 占用过高
Thread.sleep(20);
}
}
recognizer.stop();
} catch (Exception e) {
e.printStackTrace();
} finally {
// 任务完成后关闭 WebSocket 连接
recognizer.getDuplexApi().close(1000, "bye");
}
System.out.println(
"[Metric] requestId: "
+ recognizer.getLastRequestId()
+ ", first package delay ms: "
+ recognizer.getFirstPackageDelay()
+ ", last package delay ms: "
+ recognizer.getLastPackageDelay());
}
}
运行 Python 示例前,请先执行
pip install pyaudio 安装第三方音频播放和采集套件。pyaudio 依赖 portaudio 库:Ubuntu/Debian 执行 sudo apt-get install libportaudio2 portaudio19-dev,macOS 执行 brew install portaudio。识别本地音频文件
该功能用于识别并转写本地音频文件,适合需要近实时处理短音频的场景,如语音聊天、语音指令、语音输入和语音搜索。以下示例使用的音频文件为 asr_example.wav。
复制
import com.alibaba.dashscope.api.GeneralApi;
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.base.HalfDuplexParamBase;
import com.alibaba.dashscope.common.GeneralListParam;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.protocol.GeneralServiceOption;
import com.alibaba.dashscope.protocol.HttpMethod;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.protocol.StreamingMode;
import com.alibaba.dashscope.utils.Constants;
import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
class TimeUtils {
private static final DateTimeFormatter formatter =
DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS");
public static String getTimestamp() {
return LocalDateTime.now().format(formatter);
}
}
public class Main {
public static void main(String[] args) throws InterruptedException {
Constants.baseWebsocketApiUrl = "wss://dashscope.aliyuncs.com/api-ws/v1/inference";
// 在实际应用中,仅在程序启动时调用一次此方法
warmUp();
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(new RealtimeRecognitionTask(Paths.get(System.getProperty("user.dir"), "asr_example.wav")));
executorService.shutdown();
// 等待所有任务完成
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
public static void warmUp() {
try {
// 使用轻量级 GET 请求预建连接
GeneralServiceOption warmupOption = GeneralServiceOption.builder()
.protocol(Protocol.HTTP)
.httpMethod(HttpMethod.GET)
.streamingMode(StreamingMode.OUT)
.path("assistants")
.build();
warmupOption.setBaseHttpUrl(Constants.baseHttpApiUrl);
GeneralApi<HalfDuplexParamBase> api = new GeneralApi<>();
api.get(GeneralListParam.builder().limit(1L).build(), warmupOption);
} catch (Exception e) {
// 预热失败时允许重试
}
}
}
class RealtimeRecognitionTask implements Runnable {
private Path filepath;
public RealtimeRecognitionTask(Path filepath) {
this.filepath = filepath;
}
@Override
public void run() {
RecognitionParam param = RecognitionParam.builder()
.model("fun-asr-realtime")
// 如果未配置环境变量,请将下一行替换为您的 API Key:.apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.format("wav")
.sampleRate(16000)
.build();
Recognition recognizer = new Recognition();
String threadName = Thread.currentThread().getName();
ResultCallback<RecognitionResult> callback = new ResultCallback<RecognitionResult>() {
@Override
public void onEvent(RecognitionResult message) {
if (message.isSentenceEnd()) {
System.out.println(TimeUtils.getTimestamp()+" "+
"[process " + threadName + "] Final Result:" + message.getSentence().getText());
} else {
System.out.println(TimeUtils.getTimestamp()+" "+
"[process " + threadName + "] Intermediate Result: " + message.getSentence().getText());
}
}
@Override
public void onComplete() {
System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] Recognition complete");
}
@Override
public void onError(Exception e) {
System.out.println(TimeUtils.getTimestamp()+" "+
"[" + threadName + "] RecognitionCallback error: " + e.getMessage());
}
};
try {
recognizer.call(param, callback);
// 请将路径替换为您的音频文件路径
System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] Input file_path is: " + this.filepath);
// 读取文件并分块发送音频
FileInputStream fis = new FileInputStream(this.filepath.toFile());
byte[] allData = new byte[fis.available()];
int ret = fis.read(allData);
fis.close();
int sendFrameLength = 3200;
for (int i = 0; i * sendFrameLength < allData.length; i ++) {
int start = i * sendFrameLength;
int end = Math.min(start + sendFrameLength, allData.length);
ByteBuffer byteBuffer = ByteBuffer.wrap(allData, start, end - start);
recognizer.sendAudioFrame(byteBuffer);
Thread.sleep(100);
}
System.out.println(TimeUtils.getTimestamp()+" "+LocalDateTime.now());
recognizer.stop();
} catch (Exception e) {
e.printStackTrace();
} finally {
// 任务完成后关闭 WebSocket 连接
recognizer.getDuplexApi().close(1000, "bye");
}
System.out.println(
"["
+ threadName
+ "][Metric] requestId: "
+ recognizer.getLastRequestId()
+ ", first package delay ms: "
+ recognizer.getFirstPackageDelay()
+ ", last package delay ms: "
+ recognizer.getLastPackageDelay());
}
}
WebSocket API
以下示例演示如何通过原生 WebSocket 连接发送本地音频文件并获取识别结果。以下示例使用的音频文件为 asr_example.wav。
复制
pip uninstall websocket-client
pip uninstall websocket
pip install websocket-client
请勿将示例代码文件命名为
websocket.py,否则可能出现以下错误:AttributeError: module 'websocket' has no attribute 'WebSocketApp'. Did you mean: 'WebSocket'?复制
# pip install websocket-client
import os
import json
import time
import uuid
import threading
import websocket
# 若没有配置环境变量,请用 API Key 将下行替换为:api_key = "sk-xxx"
api_key = os.environ.get('DASHSCOPE_API_KEY')
url = 'wss://dashscope.aliyuncs.com/api-ws/v1/inference/'
audio_file = 'asr_example.wav' # 替换为您的音频文件路径
TASK_ID = uuid.uuid4().hex[:32]
task_started = False
def send_run_task(ws):
run_task_message = {
'header': {
'action': 'run-task',
'task_id': TASK_ID,
'streaming': 'duplex'
},
'payload': {
'task_group': 'audio',
'task': 'asr',
'function': 'recognition',
'model': 'fun-asr-realtime',
'parameters': {
'sample_rate': 16000,
'format': 'wav'
},
'input': {}
}
}
ws.send(json.dumps(run_task_message))
def send_finish_task(ws):
finish_task_message = {
'header': {
'action': 'finish-task',
'task_id': TASK_ID,
'streaming': 'duplex'
},
'payload': {
'input': {}
}
}
ws.send(json.dumps(finish_task_message))
def send_audio_stream(ws):
chunk_size = 3200 # 100ms @ 16kHz 16bit 单声道
try:
with open(audio_file, 'rb') as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
ws.send(chunk, opcode=websocket.ABNF.OPCODE_BINARY)
time.sleep(0.1)
print('音频流结束')
send_finish_task(ws)
except Exception as e:
print('读取音频文件错误:', e)
ws.close()
def on_open(ws):
print('连接到服务器')
send_run_task(ws)
def on_message(ws, data):
global task_started
message = json.loads(data)
event = message['header']['event']
if event == 'task-started':
print('任务开始')
task_started = True
threading.Thread(target=send_audio_stream, args=(ws,), daemon=True).start()
elif event == 'result-generated':
print('识别结果:', message['payload']['output']['sentence']['text'])
if message['payload'].get('usage'):
print('任务计费时长(秒):', message['payload']['usage']['duration'])
elif event == 'task-finished':
print('任务完成')
ws.close()
elif event == 'task-failed':
print('任务失败:', message['header'].get('error_message'))
ws.close()
else:
print('未知事件:', event)
def on_close(ws, close_status_code, close_msg):
if not task_started:
print('任务未启动,关闭连接')
def on_error(ws, error):
print('WebSocket错误:', error)
if __name__ == '__main__':
ws = websocket.WebSocketApp(
url,
header={'Authorization': f'bearer {api_key}'},
on_open=on_open,
on_message=on_message,
on_error=on_error,
on_close=on_close
)
ws.run_forever()
- DashScope SDK
- WebSocket API
1
安装 SDK
安装 SDK。请确保 DashScope SDK 版本不低于 2.22.5(Java)或 1.25.6(Python)。
2
获取 API Key
获取 API Key。为安全起见,建议将 API Key 设置为环境变量,避免硬编码在代码中。
3
运行示例代码
复制
import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.sound.sampled.LineUnavailableException;
import java.io.File;
import java.io.FileInputStream;
import java.util.Base64;
import java.util.Collections;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
public class Qwen3AsrRealtimeUsage {
private static final Logger log = LoggerFactory.getLogger(Qwen3AsrRealtimeUsage.class);
private static final int AUDIO_CHUNK_SIZE = 1024; // 音频块大小(字节)
private static final int SLEEP_INTERVAL_MS = 30; // 休眠间隔(毫秒)
public static void main(String[] args) throws InterruptedException, LineUnavailableException {
CountDownLatch finishLatch = new CountDownLatch(1);
OmniRealtimeParam param = OmniRealtimeParam.builder()
.model("qwen3-asr-flash-realtime")
.url("wss://dashscope.aliyuncs.com/api-ws/v1/realtime")
// 如果未配置环境变量,请将下一行替换为您的 API Key:.apikey("sk-xxx")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
OmniRealtimeConversation conversation = null;
final AtomicReference<OmniRealtimeConversation> conversationRef = new AtomicReference<>(null);
conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
@Override
public void onOpen() {
System.out.println("connection opened");
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
break;
case "conversation.item.input_audio_transcription.completed":
System.out.println("transcription: " + message.get("transcript").getAsString());
finishLatch.countDown();
break;
case "input_audio_buffer.speech_started":
System.out.println("======VAD Speech Start======");
break;
case "input_audio_buffer.speech_stopped":
System.out.println("======VAD Speech Stop======");
break;
case "conversation.item.input_audio_transcription.text":
System.out.println("transcription: " + message.get("text").getAsString());
break;
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
System.out.println("connection closed code: " + code + ", reason: " + reason);
}
});
conversationRef.set(conversation);
try {
conversation.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
OmniRealtimeTranscriptionParam transcriptionParam = new OmniRealtimeTranscriptionParam();
transcriptionParam.setLanguage("zh");
transcriptionParam.setInputAudioFormat("pcm");
transcriptionParam.setInputSampleRate(16000);
OmniRealtimeConfig config = OmniRealtimeConfig.builder()
.modalities(Collections.singletonList(OmniRealtimeModality.TEXT))
.transcriptionConfig(transcriptionParam)
.build();
conversation.updateSession(config);
String filePath = "your_audio_file.pcm";
File audioFile = new File(filePath);
if (!audioFile.exists()) {
log.error("Audio file not found: {}", filePath);
return;
}
try (FileInputStream audioInputStream = new FileInputStream(audioFile)) {
byte[] audioBuffer = new byte[AUDIO_CHUNK_SIZE];
int bytesRead;
int totalBytesRead = 0;
log.info("Starting to send audio data from: {}", filePath);
// 分块读取并发送音频数据
while ((bytesRead = audioInputStream.read(audioBuffer)) != -1) {
totalBytesRead += bytesRead;
String audioB64 = Base64.getEncoder().encodeToString(audioBuffer);
// 将音频块发送至会话
conversation.appendAudio(audioB64);
// 短暂延迟以模拟实时音频流
Thread.sleep(SLEEP_INTERVAL_MS);
}
log.info("Finished sending audio data. Total bytes sent: {}", totalBytesRead);
} catch (Exception e) {
log.error("Error sending audio from file: {}", filePath, e);
}
// 发送 session.finish,等待会话结束后关闭连接
conversation.endSession();
log.info("Task finished");
System.exit(0);
}
}
以下示例演示如何通过 WebSocket 连接发送本地音频文件并获取识别结果。
1
获取 API Key
获取 API Key。为安全起见,建议将 API Key 设置为环境变量。
2
安装依赖
Python:运行示例前,请安装以下依赖:Java:添加 Java-WebSocket 依赖:Node.js:
复制
pip uninstall websocket-client
pip uninstall websocket
pip install websocket-client
请勿将示例代码文件命名为
websocket.py,否则可能出现以下错误:AttributeError: module 'websocket' has no attribute 'WebSocketApp'. Did you mean: 'WebSocket'?复制
<dependency>
<groupId>org.java-websocket</groupId>
<artifactId>Java-WebSocket</artifactId>
<version>1.5.6</version>
</dependency>
复制
npm install ws
3
编写并运行代码
实现完整的鉴权、连接、发送音频和接收结果流程。详情请参见交互流程。
复制
# pip install websocket-client
import os
import time
import json
import threading
import base64
import websocket
import logging
import logging.handlers
from datetime import datetime
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# 如果未配置环境变量,请将下一行替换为您的 API Key:API_KEY="sk-xxx"
API_KEY = os.environ.get("DASHSCOPE_API_KEY", "sk-xxx")
QWEN_MODEL = "qwen3-asr-flash-realtime"
baseUrl = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime"
url = f"{baseUrl}?model={QWEN_MODEL}"
print(f"Connecting to server: {url}")
# 注意:非 VAD 模式下,连续发送音频的累计时长不得超过 60 秒
enableServerVad = True
is_running = True # 添加运行标志
headers = [
"Authorization: Bearer " + API_KEY,
"OpenAI-Beta: realtime=v1"
]
def init_logger():
formatter = logging.Formatter('%(asctime)s|%(levelname)s|%(message)s')
f_handler = logging.handlers.RotatingFileHandler(
"omni_tester.log", maxBytes=100 * 1024 * 1024, backupCount=3
)
f_handler.setLevel(logging.DEBUG)
f_handler.setFormatter(formatter)
console = logging.StreamHandler()
console.setLevel(logging.DEBUG)
console.setFormatter(formatter)
logger.addHandler(f_handler)
logger.addHandler(console)
def on_open(ws):
logger.info("Connected to server.")
# 会话更新事件
event_manual = {
"event_id": "event_123",
"type": "session.update",
"session": {
"modalities": ["text"],
"input_audio_format": "pcm",
"sample_rate": 16000,
"input_audio_transcription": {
# 语言标识符,可选。如有明确的语言信息,请设置此项
"language": "zh"
},
"turn_detection": None
}
}
event_vad = {
"event_id": "event_123",
"type": "session.update",
"session": {
"modalities": ["text"],
"input_audio_format": "pcm",
"sample_rate": 16000,
"input_audio_transcription": {
"language": "zh"
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.0,
"silence_duration_ms": 400
}
}
}
if enableServerVad:
logger.info(f"Sending event: {json.dumps(event_vad, indent=2)}")
ws.send(json.dumps(event_vad))
else:
logger.info(f"Sending event: {json.dumps(event_manual, indent=2)}")
ws.send(json.dumps(event_manual))
def on_message(ws, message):
global is_running
try:
data = json.loads(message)
logger.info(f"Received event: {json.dumps(data, ensure_ascii=False, indent=2)}")
if data.get("type") == "session.finished":
logger.info(f"Final transcript: {data.get('transcript')}")
logger.info("Closing WebSocket connection after session finished...")
is_running = False # 停止音频发送线程
ws.close()
except json.JSONDecodeError:
logger.error(f"Failed to parse message: {message}")
def on_error(ws, error):
logger.error(f"Error: {error}")
def on_close(ws, close_status_code, close_msg):
logger.info(f"Connection closed: {close_status_code} - {close_msg}")
def send_audio(ws, local_audio_path):
time.sleep(3) # 等待会话更新完成
global is_running
with open(local_audio_path, 'rb') as audio_file:
logger.info(f"Start reading the file: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")
while is_running:
audio_data = audio_file.read(3200) # 约 0.1 秒的 PCM16/16kHz 数据
if not audio_data:
logger.info(f"Finished reading the file: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")
if ws.sock and ws.sock.connected:
if not enableServerVad:
commit_event = {
"event_id": "event_789",
"type": "input_audio_buffer.commit"
}
ws.send(json.dumps(commit_event))
finish_event = {
"event_id": "event_987",
"type": "session.finish"
}
ws.send(json.dumps(finish_event))
break
if not ws.sock or not ws.sock.connected:
logger.info("The WebSocket is closed. Stop sending audio.")
break
encoded_data = base64.b64encode(audio_data).decode('utf-8')
eventd = {
"event_id": f"event_{int(time.time() * 1000)}",
"type": "input_audio_buffer.append",
"audio": encoded_data
}
ws.send(json.dumps(eventd))
logger.info(f"Sending audio event: {eventd['event_id']}")
time.sleep(0.1) # 模拟实时采集
# 初始化日志
init_logger()
logger.info(f"Connecting to WebSocket server at {url}...")
local_audio_path = "your_audio_file.pcm"
ws = websocket.WebSocketApp(
url,
header=headers,
on_open=on_open,
on_message=on_message,
on_error=on_error,
on_close=on_close
)
thread = threading.Thread(target=send_audio, args=(ws, local_audio_path))
thread.start()
ws.run_forever()
上线部署
提升识别准确率
- 选择采样率匹配的模型:对于 8 kHz 电话音频,请直接使用 8 kHz 模型,而非将其上采样至 16 kHz 后再识别。上采样会导致信息失真,影响识别效果。
- 使用自定义词汇功能:针对业务专有名词、人名、品牌名等,可配置自定义词汇,显著提升识别准确率。详情请参见自定义词汇。
- 优化输入音频质量:尽量使用高质量麦克风,保证较高的信噪比(SNR)和无回声的录音环境。在应用层,可集成降噪(如 RNNoise)和声学回声消除(AEC)等算法对音频进行预处理,获取更干净的信号。
- 指定识别语言:对于多语言模型,若在调用时能预先确定音频语言,有助于模型快速收敛,避免发音相似的语言之间产生混淆,从而提升准确率。
设置容错策略
- 客户端断线重连:客户端应实现自动重连机制,以应对网络抖动。对于 Python SDK,建议:
- 捕获异常:在
Callback类中实现on_error方法。网络错误或其他异常发生时,dashscopeSDK 会调用此方法。 - 通知状态:
on_error触发时,设置重连信号。在 Python 中,可使用线程安全标志threading.Event。 - 重连循环:将主逻辑包裹在
for循环中(例如重试 3 次)。检测到重连信号时,中断当前识别、清理资源,并在等待数秒后重启循环以建立新连接。
- 捕获异常:在
- 设置心跳防止连接断开:为保持与服务器的持久连接,请将
heartbeat参数设置为true。即使音频长时间静音,也能确保连接不中断。 - 限流:调用模型接口时,请注意遵守模型的限流规则。
核心功能:上下文增强(Qwen-ASR)
通过提供上下文,可优化特定领域词汇的识别效果,例如人名、地名和产品术语。 长度限制: 上下文内容不得超过 10,000 个 token。 使用方式:- WebSocket API:在 session.update 事件中设置
session.input_audio_transcription.corpus.text参数。 - Python SDK:设置
corpus_text参数。 - Java SDK:设置
corpusText参数。
- 各类分隔符格式的热词列表,如:热词1、热词2、热词3、热词4
- 任意格式和长度的文本段落或章节
- 混合内容:词汇列表与段落的任意组合
- 无关或无意义的文本,包括乱码。该功能容错性强,几乎不受无关文本的负面影响。
| 无上下文增强 | 有上下文增强 |
|---|---|
| 无上下文增强时,部分投行名称可能被误识。例如,"Bulge Bracket"被识别为"鸟石"。识别结果:"你了解哪些投行圈的内部黑话?首先是九大外资投行,即鸟石,BB……" | 有上下文增强时,投行名称被正确识别。识别结果:"你了解哪些投行圈的内部黑话?首先是九大外资投行,即 Bulge Bracket,BB……" |
- 词汇列表:
- 词汇列表 1:
复制
Bulge Bracket, Boutique, Middle Market, domestic securities firms
- 词汇列表 2:
复制
Bulge Bracket Boutique Middle Market domestic securities firms
- 词汇列表 3:
复制
['Bulge Bracket', 'Boutique', 'Middle Market', 'domestic securities firms']
- 自然语言:
复制
Investment Banking Categories Revealed!
Recently, many friends from Australia have asked me, what exactly is an investment bank? Today, I'll explain it. For international students, investment banks can be mainly divided into four categories: Bulge Bracket, Boutique, Middle Market, and domestic securities firms.
Bulge Bracket Investment Banks: These are what we often call the nine major investment banks, including Goldman Sachs, Morgan Stanley, etc. These large banks are enormous in both business scope and scale.
Boutique Investment Banks: These banks are relatively small but highly specialized in their business areas. For example, Lazard, Evercore, etc., have deep professional knowledge and experience in specific fields.
Middle Market Investment Banks: This type of bank mainly serves medium-sized companies, providing services such as mergers and acquisitions, and IPOs. Although not as large as the major banks, they have a high influence in specific markets.
Domestic Securities Firms: With the rise of the Chinese market, domestic securities firms are also playing an increasingly important role in the international market.
In addition, there are some Position and business divisions, you can refer to the relevant charts. I hope this information helps you better understand investment banking and prepare for your future career!
- 含干扰信息的自然语言:部分文本与识别内容无关,例如以下示例中的人名列表。
复制
Investment Banking Categories Revealed!
Recently, many friends from Australia have asked me, what exactly is an investment bank? Today, I'll explain it. For international students, investment banks can be mainly divided into four categories: Bulge Bracket, Boutique, Middle Market, and domestic securities firms.
Bulge Bracket Investment Banks: These are what we often call the nine major investment banks, including Goldman Sachs, Morgan Stanley, etc. These large banks are enormous in both business scope and scale.
Boutique Investment Banks: These banks are relatively small but highly specialized in their business areas. For example, Lazard, Evercore, etc., have deep professional knowledge and experience in specific fields.
Middle Market Investment Banks: This type of bank mainly serves medium-sized companies, providing services such as mergers and acquisitions, and IPOs. Although not as large as the major banks, they have a high influence in specific markets.
Domestic Securities Firms: With the rise of the Chinese market, domestic securities firms are also playing an increasingly important role in the international market.
In addition, there are some Position and business divisions, you can refer to the relevant charts. I hope this information helps you better understand investment banking and prepare for your future career!
Wang Haoxuan, Li Zihan, Zhang Jingxing, Liu Xinyi, Chen Junjie, Yang Siyuan, Zhao Yutong, Huang Zhiqiang, Zhou Zimo, Wu Yajing, Xu Ruoxi, Sun Haoran, Hu Jinyu, Zhu Chenxi, Guo Wenbo, He Jingshu, Gao Yuhang, Lin Yifei,
Zheng Xiaoyan, Liang Bowen, Luo Jiaqi, Song Mingzhe, Xie Wanting, Tang Ziqian, Han Mengyao, Feng Yiran, Cao Qinxue, Deng Zirui, Xiao Wangshu, Xu Jiashu,
Cheng Yinuo, Yuan Zhiruo, Peng Haoyu, Dong Simiao, Fan Jingyu, Su Zijin, Lv Wenxuan, Jiang Shihan, Ding Muchen,
Wei Shuyao, Ren Tianyou, Jiang Yichen, Hua Qingyu, Shen Xinghe, Fu Jinyu, Yao Xingchen, Zhong Lingyu, Yan Licheng, Jin Ruoshui, Taoranting, Qi Shaoshang, Xue Zhilan, Zou Yunfan, Xiong Ziang, Bai Wenfeng, Yi Qianfan
API 参考
- Fun-ASR
- Qwen-ASR
交互流程(Qwen-ASR-Realtime)
Qwen 实时语音识别通过 WebSocket 流式传输音频。提供两种模式:VAD 模式(默认) 和手动模式。URL
将<model_name> 替换为您的模型名称。
复制
wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=<model_name>
请求头
复制
"Authorization": "Bearer $DASHSCOPE_API_KEY"
VAD 模式(默认)
服务端检测语音边界并自动分句。客户端流式推送音频,服务端在每句话结束时返回识别结果。适合对话和会议转写场景。 启用方式: 在session.update 事件中设置 session.turn_detection。
-
客户端发送
input_audio_buffer.append,向缓冲区追加音频。 -
服务端检测到语音时,返回
input_audio_buffer.speech_started。若客户端在此事件之前发送了session.finish,服务端将返回session.finished,客户端须断开连接。 -
客户端继续发送
input_audio_buffer.append。 -
所有音频发送完毕后,客户端发送
session.finish结束会话。 -
服务端检测到语音结束时,返回
input_audio_buffer.speech_stopped。 -
服务端返回
input_audio_buffer.committed。 -
服务端返回
conversation.item.created。 -
服务端返回
conversation.item.input_audio_transcription.text,包含实时转写结果。 -
服务端返回
conversation.item.input_audio_transcription.completed,包含最终转写结果。 -
识别完成后,服务端返回
session.finished,客户端须断开连接。
手动模式
由客户端控制分句:发送一句话完整的音频后,再发送input_audio_buffer.commit。适合客户端已知句子边界的场景,例如聊天应用中的语音消息。
启用方式: 在 session.update 事件中将 session.turn_detection 设置为 null。
-
客户端发送
input_audio_buffer.append,向缓冲区追加音频。 -
客户端发送
input_audio_buffer.commit,创建新的用户消息。 -
客户端发送
session.finish结束会话。 -
服务端返回
input_audio_buffer.committed。 -
服务端返回
conversation.item.input_audio_transcription.text,包含实时转写结果。 -
服务端返回
conversation.item.input_audio_transcription.completed,包含最终转写结果。 -
识别完成后,服务端返回
session.finished,客户端须断开连接。
备选方案:使用 Qwen-Omni
您也可以使用 Qwen-Omni(qwen3-omni-flash-realtime)通过 WebSocket 进行实时语音识别。Qwen-Omni 是一个能理解音频的大语言模型——您可以通过系统提示词提供领域上下文,而无需使用热词列表。
适合使用 Omni 进行 ASR 的场景: 输入音频干净(麦克风、语音通话),且需要通过提示词处理特定领域术语。
适合使用专用 ASR 模型的场景: 音频嘈杂或混合(含背景音乐的会议、含音效的视频),或需要热词、说话人分离、时间戳等功能。
Qwen-Omni 会处理所有音频内容,而不仅仅是语音。音乐、打字声或环境噪声可能产生描述性文字而非转写结果。对于混合音频,请提前使用 VAD 隔离语音,或改用专用 ASR 模型。
复制
messages = [
{"role": "system", "content": "Transcribe the following audio exactly as spoken. Output only the transcription text. Ignore non-speech sounds."},
{"role": "user", "content": [{"type": "input_audio", "input_audio": {"data": audio_data, "format": "wav"}}]}
]
Qwen-Omni-Realtime 使用 WebSocket 进行双向流式传输。完整的 API 和 SDK 参考,请参见实时对话。