实时流式语音合成
实时语音合成基于 WebSocket 协议将文本实时转换为自然语音。千问云提供 CosyVoice、Qwen-TTS 和 Sambert 系列模型,支持流式输入输出,并提供声音复刻、声音设计及精细化音频控制能力,适用于语音助手、有声读物、智能客服等场景。
如需最低延迟,推荐使用流式输出搭配 PCM 格式。PCM 无需编码开销,可直接送入音频设备播放。
下面是调用API的示例代码。更多常用场景的代码示例,请参见 GitHub。
获取 API Key 并设置为环境变量。如需使用 SDK,请先安装 SDK。
更多代码示例请参见 GitHub。

完整代码示例请参见快速开始。
核心功能
- 实时生成高保真语音,支持中英等多语种自然发声
- 提供声音复刻与声音设计两种音色定制方式
- 支持流式输入输出,首包延迟低,适用于实时对话场景
- 可调节语速、语调、音量与码率,精细控制语音表现
- 兼容主流音频格式(PCM、WAV、MP3、Opus),最高支持48kHz采样率输出
- 支持指令控制,可通过自然语言指令控制语音表现力(仅Qwen-TTS Instruct系列及部分CosyVoice模型)
适用范围
支持的模型: 调用以下模型时,请使用 API Key:- CosyVoice: cosyvoice-v3.5-plus, cosyvoice-v3.5-flash, cosyvoice-v3-plus, cosyvoice-v3-flash, cosyvoice-v2, cosyvoice-v1
- Qwen-TTS: qwen3-tts-flash-realtime, qwen3-tts-instruct-flash-realtime, qwen3-tts-vd-realtime, qwen3-tts-vc-realtime, qwen-tts-realtime
- Sambert: 详情请参见语音合成模型列表
快速开始
在编写代码前,请根据业务场景选择合适的调用方式:| 调用方式 | 适用场景 | 流式支持 |
|---|---|---|
| 非流式(同步) | 批量任务、短文本、生成完整音频文件 | 否 |
| 流式输出(单向) | 对首包延迟敏感的实时应用 | 是 |
| 流式输入+输出(双向,WebSocket) | 对话式AI、LLM语音输出、交互式语音助手 | 是 |
- CosyVoice
- Qwen-TTS-Realtime
cosyvoice-v3.5-plus 和 cosyvoice-v3.5-flash 模型专门用于声音设计和声音复刻场景(无系统音色)。在使用它们进行语音合成之前,请先参见CosyVoice声音复刻/设计API创建目标音色。创建完成后,只需将代码中的 voice 字段更新为您的音色 ID,并将 model 字段指定为对应模型,即可正常运行。- 使用系统音色进行语音合成
以下示例演示如何使用系统音色(参见CosyVoice音色列表)进行语音合成。如需非实时合成(发送完整文本,接收完整音频),请参见非实时语音合成。
将大模型生成的文本实时转为语音并播放
将 Qwen 模型(qwen3.5-flash)的输出文本实时合成语音,并在本地设备播放。- Python
- Java
运行 Python 示例前,请通过 pip 安装第三方音频播放库。
复制
# coding=utf-8
# pyaudio 安装说明:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import os
import pyaudio
import dashscope
from dashscope.audio.tts_v2 import *
from http import HTTPStatus
from dashscope import Generation
# 如果未配置环境变量,请将下行替换为您的 API key:dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')
dashscope.base_websocket_api_url='wss://dashscope.aliyuncs.com/api-ws/v1/inference'
# cosyvoice-v3-flash/cosyvoice-v3-plus:可选用 longanyang 等音色。
# 每种音色支持的语言不同。合成日语、韩语等非中文语言时,请选择支持相应语言的音色。详见 CosyVoice 音色列表。
model = "cosyvoice-v3-flash"
voice = "longanyang"
class Callback(ResultCallback):
_player = None
_stream = None
def on_open(self):
print("websocket is open.")
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=22050, output=True
)
def on_complete(self):
print("speech synthesis task complete successfully.")
def on_error(self, message: str):
print(f"speech synthesis task failed, {message}")
def on_close(self):
print("websocket is closed.")
# 停止播放
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
def on_event(self, message):
print(f"recv speech synthsis message {message}")
def on_data(self, data: bytes) -> None:
print("audio result length:", len(data))
self._stream.write(data)
def synthesizer_with_llm():
callback = Callback()
synthesizer = SpeechSynthesizer(
model=model,
voice=voice,
format=AudioFormat.PCM_22050HZ_MONO_16BIT,
callback=callback,
)
messages = [{"role": "user", "content": "Please introduce yourself"}]
responses = Generation.call(
model="qwen3.5-flash",
messages=messages,
result_format="message", # 设置返回格式为 message
stream=True, # 启用流式输出
incremental_output=True, # 启用增量输出
)
for response in responses:
if response.status_code == HTTPStatus.OK:
print(response.output.choices[0]["message"]["content"], end="")
synthesizer.streaming_call(response.output.choices[0]["message"]["content"])
else:
print(
"Request id: %s, Status code: %s, error code: %s, error message: %s"
% (
response.request_id,
response.status_code,
response.code,
response.message,
)
)
synthesizer.streaming_complete()
print('requestId: ', synthesizer.get_last_request_id())
if __name__ == "__main__":
synthesizer_with_llm()
复制
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import javax.sound.sampled.*;
import java.util.Arrays;
import java.util.concurrent.CountDownLatch;
public class Main {
private static String ttsModel = "cosyvoice-v3-flash";
private static String voice = "longanyang";
public static void synthesizerWithLlm() throws Exception {
CountDownLatch latch = new CountDownLatch(1);
// 配置音频播放(PCM 22050 Hz,单声道,16 位)
AudioFormat audioFormat = new AudioFormat(22050, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
SourceDataLine speakers = (SourceDataLine) AudioSystem.getLine(info);
speakers.open(audioFormat);
speakers.start();
// 配置 TTS 回调
ResultCallback<SpeechSynthesisResult> callback = new ResultCallback<SpeechSynthesisResult>() {
@Override
public void onEvent(SpeechSynthesisResult result) {
if (result.getAudioFrame() != null) {
byte[] audio = result.getAudioFrame().array();
speakers.write(audio, 0, audio.length);
}
}
@Override
public void onComplete() {
System.out.println("Speech synthesis completed.");
latch.countDown();
}
@Override
public void onError(Exception e) {
System.err.println("TTS error: " + e.getMessage());
latch.countDown();
}
};
// 初始化 TTS 合成器
SpeechSynthesisParam ttsParam = SpeechSynthesisParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(ttsModel)
.voice(voice)
.format(SpeechSynthesisAudioFormat.PCM_22050HZ_MONO_16BIT)
.build();
SpeechSynthesizer synthesizer = new SpeechSynthesizer(ttsParam, callback);
// 将 Qwen 大模型文本流式传入 TTS
Generation gen = new Generation(
Protocol.HTTP.getValue(),
"https://dashscope.aliyuncs.com/api/v1");
Message userMsg = Message.builder()
.role(Role.USER.getValue())
.content("Please introduce yourself")
.build();
GenerationParam llmParam = GenerationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3.5-flash")
.messages(Arrays.asList(userMsg))
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.incrementalOutput(true)
.build();
// 将大模型输出流式传入 TTS
gen.streamCall(llmParam).blockingForEach(result -> {
String text = result.getOutput().getChoices().get(0).getMessage().getContent();
if (text != null && !text.isEmpty()) {
System.out.print(text);
synthesizer.streamingCall(text);
}
});
// 完成 TTS 合成
synthesizer.streamingComplete();
latch.await();
// 释放资源
speakers.drain();
speakers.close();
synthesizer.getDuplexApi().close(1000, "bye");
}
public static void main(String[] args) throws Exception {
Constants.baseWebsocketApiUrl = "wss://dashscope.aliyuncs.com/api-ws/v1/inference";
synthesizerWithLlm();
System.exit(0);
}
}
通过回调函数流式接收音频
发送完整文本,通过回调函数增量接收音频数据。适用于短文本场景,可在不阻塞主线程的情况下实现低延迟音频输出。- Python
- Java
复制
# coding=utf-8
import os
import dashscope
from dashscope.audio.tts_v2 import *
from datetime import datetime
def get_timestamp():
now = datetime.now()
formatted_timestamp = now.strftime("[%Y-%m-%d %H:%M:%S.%f]")
return formatted_timestamp
# 如果未配置环境变量,请取消下一行注释并替换为你的 API Key:dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')
dashscope.base_websocket_api_url='wss://dashscope.aliyuncs.com/api-ws/v1/inference'
# 模型
model = "cosyvoice-v3-flash"
# 音色
voice = "longanyang"
# 定义回调接口
class Callback(ResultCallback):
_player = None
_stream = None
def on_open(self):
self.file = open("output.mp3", "wb")
print("连接已建立:" + get_timestamp())
def on_complete(self):
print("语音合成完成,已接收全部结果:" + get_timestamp())
# 仅在 on_complete 触发后才可调用 get_first_package_delay
# 首次请求的首包延迟包含 WebSocket 建连时间
print('[Metric] requestId: {}, first-package delay: {} ms'.format(
synthesizer.get_last_request_id(),
synthesizer.get_first_package_delay()))
def on_error(self, message: str):
print(f"语音合成错误:{message}")
def on_close(self):
print("连接已关闭:" + get_timestamp())
self.file.close()
def on_event(self, message):
pass
def on_data(self, data: bytes) -> None:
print(get_timestamp() + " 音频二进制数据长度:" + str(len(data)))
self.file.write(data)
callback = Callback()
# 实例化 SpeechSynthesizer,在构造方法中传入 model、voice 等请求参数
synthesizer = SpeechSynthesizer(
model=model,
voice=voice,
callback=callback,
)
# 发送待合成文本,通过回调接口的 on_data 方法实时获取二进制音频
synthesizer.call("How is the weather today?")
复制
import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.concurrent.CountDownLatch;
class TimeUtils {
private static final DateTimeFormatter formatter =
DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS");
public static String getTimestamp() {
return LocalDateTime.now().format(formatter);
}
}
public class Main {
private static String model = "cosyvoice-v3-flash";
private static String voice = "longanyang";
public static void streamAudioDataToSpeaker() {
CountDownLatch latch = new CountDownLatch(1);
// 实现 ResultCallback 接口
ResultCallback<SpeechSynthesisResult> callback = new ResultCallback<SpeechSynthesisResult>() {
@Override
public void onEvent(SpeechSynthesisResult result) {
if (result.getAudioFrame() != null) {
// 在此添加音频处理逻辑
System.out.println(TimeUtils.getTimestamp() + " 已接收音频");
}
}
@Override
public void onComplete() {
System.out.println(TimeUtils.getTimestamp() + " 接收完成,语音合成结束。");
latch.countDown();
}
@Override
public void onError(Exception e) {
System.out.println("发生异常:" + e.toString());
latch.countDown();
}
};
SpeechSynthesisParam param =
SpeechSynthesisParam.builder()
// 如果未配置环境变量,请将下行替换为您的 API key:.apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(model)
.voice(voice)
.build();
// 将回调作为第二个参数传入,启用异步模式
SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, callback);
// 非阻塞调用:立即返回 null,结果通过 onEvent 回调获取
try {
synthesizer.call("What's the weather like today?");
latch.await();
} catch (Exception e) {
throw new RuntimeException(e);
} finally {
// 任务完成后关闭 WebSocket 连接
synthesizer.getDuplexApi().close(1000, "bye");
}
// 首次调用的首包延迟包含 WebSocket 建连时间
System.out.println(
"[Metric] Request ID: "
+ synthesizer.getLastRequestId()
+ ", First-packet latency (ms): "
+ synthesizer.getFirstPackageDelay());
}
public static void main(String[] args) {
Constants.baseWebsocketApiUrl = "wss://dashscope.aliyuncs.com/api-ws/v1/inference";
streamAudioDataToSpeaker();
System.exit(0);
}
}
流式文本实时合成
增量发送文本片段,通过回调函数实时接收音频数据。这种双向流式方式适用于长文本或与大语言模型集成等文本分段到达的场景。- Python
- Java
复制
# coding=utf-8
#
# PyAudio 安装说明:
# macOS 系统:
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu 系统:
# sudo apt-get install python-pyaudio python3-pyaudio
# 或
# pip install pyaudio
# CentOS 系统:
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Windows 系统:
# python -m pip install pyaudio
import os
import time
import pyaudio
import dashscope
from dashscope.api_entities.dashscope_response import SpeechSynthesisResponse
from dashscope.audio.tts_v2 import *
from datetime import datetime
def get_timestamp():
now = datetime.now()
formatted_timestamp = now.strftime("[%Y-%m-%d %H:%M:%S.%f]")
return formatted_timestamp
# 如果未配置环境变量,请取消下一行注释并替换为你的 API Key:dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')
dashscope.base_websocket_api_url='wss://dashscope.aliyuncs.com/api-ws/v1/inference'
# 模型
model = "cosyvoice-v3-flash"
# 音色
voice = "longanyang"
# 定义回调接口
class Callback(ResultCallback):
_player = None
_stream = None
def on_open(self):
print("连接已建立:" + get_timestamp())
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=22050, output=True
)
def on_complete(self):
print("语音合成完成,已接收全部结果:" + get_timestamp())
def on_error(self, message: str):
print(f"语音合成错误:{message}")
def on_close(self):
print("连接已关闭:" + get_timestamp())
# 停止播放器
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
def on_event(self, message):
pass
def on_data(self, data: bytes) -> None:
print(get_timestamp() + " 音频二进制数据长度:" + str(len(data)))
self._stream.write(data)
callback = Callback()
test_text = [
"流式文本语音合成 SDK,",
"可以将输入文本",
"转换为二进制音频数据。",
"相较于非流式语音合成,",
"流式合成具有更优的实时性能。",
"用户在输入的同时即可听到近乎同步的音频输出,",
"大幅提升交互体验",
"并减少等待时间。",
"非常适合与大语言模型(LLM)集成,",
"将文本流式传输进行语音合成。",
]
# 实例化 SpeechSynthesizer,在构造方法中传入 model、voice 等请求参数
synthesizer = SpeechSynthesizer(
model=model,
voice=voice,
format=AudioFormat.PCM_22050HZ_MONO_16BIT,
callback=callback,
)
# 流式发送文本进行合成,通过回调接口的 on_data 方法实时获取二进制音频
for text in test_text:
synthesizer.streaming_call(text)
time.sleep(0.1)
# 结束流式语音合成
synthesizer.streaming_complete()
# 首次请求的首包延迟包含 WebSocket 建连时间
print('[Metric] requestId: {}, first-package delay: {} ms'.format(
synthesizer.get_last_request_id(),
synthesizer.get_first_package_delay()))
复制
import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
class TimeUtils {
private static final DateTimeFormatter formatter =
DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS");
public static String getTimestamp() {
return LocalDateTime.now().format(formatter);
}
}
public class Main {
private static String[] textArray = {"流式文本语音合成 SDK,",
"可以将输入文本", "转换为二进制音频数据。", "相较于非流式语音合成,",
"流式合成具有更优的实时性能。", "用户在输入的同时即可听到近乎同步的音频输出,",
"大幅提升交互体验,", "并减少等待时间。",
"非常适合与大语言模型", "(LLM)集成,",
"将文本流式传输进行语音合成。"};
private static String model = "cosyvoice-v3-flash";
private static String voice = "longanyang";
public static void streamAudioDataToSpeaker() {
// 配置回调
ResultCallback<SpeechSynthesisResult> callback = new ResultCallback<SpeechSynthesisResult>() {
@Override
public void onEvent(SpeechSynthesisResult result) {
if (result.getAudioFrame() != null) {
// 在此添加音频处理逻辑
System.out.println(TimeUtils.getTimestamp() + " 已接收音频");
}
}
@Override
public void onComplete() {
System.out.println(TimeUtils.getTimestamp() + " 接收完成,语音合成结束。");
}
@Override
public void onError(Exception e) {
System.out.println("发生异常:" + e.toString());
}
};
// 请求参数
SpeechSynthesisParam param =
SpeechSynthesisParam.builder()
// 如果未配置环境变量,请将下行替换为您的 API key:.apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(model)
.voice(voice)
.format(SpeechSynthesisAudioFormat
.PCM_22050HZ_MONO_16BIT) // 流式合成请使用 PCM 或 MP3 格式
.build();
SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, callback);
try {
for (String text : textArray) {
// 发送文本片段,音频通过 onEvent 实时返回
synthesizer.streamingCall(text);
}
// 等待流式合成完成
synthesizer.streamingComplete();
} catch (Exception e) {
throw new RuntimeException(e);
} finally {
// 任务完成后关闭 WebSocket 连接
synthesizer.getDuplexApi().close(1000, "bye");
}
// 首次调用的首包延迟包含 WebSocket 建连时间
System.out.println(
"[Metric] Request ID: "
+ synthesizer.getLastRequestId()
+ ", First-packet latency (ms): "
+ synthesizer.getFirstPackageDelay());
}
public static void main(String[] args) {
Constants.baseWebsocketApiUrl = "wss://dashscope.aliyuncs.com/api-ws/v1/inference";
streamAudioDataToSpeaker();
System.exit(0);
}
}
SDK 版本要求:Python SDK 1.25.11 及以上;Java SDK 2.22.7 及以上。更多代码示例请参见 GitHub。运行代码前,请先获取 API Key 并安装 SDK。更多示例代码请参见 GitHub。
- 使用系统音色进行语音合成
- 使用复刻音色
- 使用设计音色
可用音色请参见支持的音色列表。如需使用指令控制功能,请将
model 参数替换为 qwen3-tts-instruct-flash-realtime,并通过 instructions 参数设置指令。- DashScope SDK
- WebSocket API
- Python
- Java
Server commit 模式:Commit 模式:
复制
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'Right? I love supermarkets like this.',
'Especially during Chinese New Year,',
'I go shopping at supermarkets.',
'And I feel',
'absolutely thrilled!',
'I want to buy so many things!'
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
设置 DashScope API Key。详细信息请参见:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY') # 如未设置环境变量,请替换为:dashscope.api_key = "sk-xxx"
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
self.complete_event = threading.Event()
self.file = open('result_24k.pcm', 'wb')
def on_open(self) -> None:
print('连接已建立,初始化播放器')
def on_close(self, close_status_code, close_msg) -> None:
self.file.close()
print('连接已关闭,状态码:{},消息:{},销毁播放器'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('会话已开始:{}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
if 'session.finished' == type:
print('会话已结束')
self.complete_event.set()
except Exception as e:
print('[错误] {}'.format(e))
return
def wait_for_finished(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('正在初始化...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
# 如需使用指令控制功能,请将模型替换为 qwen3-tts-instruct-flash-realtime
model='qwen3-tts-flash-realtime',
callback=callback,
url='wss://dashscope.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
# 如需使用指令控制功能,请取消以下注释并将模型替换为 qwen3-tts-instruct-flash-realtime
# instructions='语速较快,语调上扬,适合介绍时尚产品。',
# optimize_instructions=True,
mode = 'server_commit'
)
for text_chunk in text_to_synthesize:
print(f'发送文本:{text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))
复制
import base64
import os
import threading
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'This is the first sentence.',
'This is the second sentence.',
'This is the third sentence.',
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
设置 DashScope API Key。详细信息请参见:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY') # 如未设置环境变量,请替换为:dashscope.api_key = "sk-xxx"
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
super().__init__()
self.response_counter = 0
self.complete_event = threading.Event()
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
def reset_event(self):
self.response_counter += 1
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
self.complete_event = threading.Event()
def on_open(self) -> None:
print('连接已建立,初始化播放器')
def on_close(self, close_status_code, close_msg) -> None:
print('连接已关闭,状态码:{},消息:{},销毁播放器'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('会话已开始:{}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
self.complete_event.set()
self.file.close()
if 'session.finished' == type:
print('会话已结束')
self.complete_event.set()
except Exception as e:
print('[错误] {}'.format(e))
return
def wait_for_response_done(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('正在初始化...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
# 如需使用指令控制功能,请将模型替换为 qwen3-tts-instruct-flash-realtime
model='qwen3-tts-flash-realtime',
callback=callback,
url='wss://dashscope.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
# 如需使用指令控制功能,请取消以下注释并将模型替换为 qwen3-tts-instruct-flash-realtime
# instructions='语速较快,语调上扬,适合介绍时尚产品。',
# optimize_instructions=True,
mode = 'commit'
)
print(f'发送文本:{text_to_synthesize[0]}')
qwen_tts_realtime.append_text(text_to_synthesize[0])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'发送文本:{text_to_synthesize[1]}')
qwen_tts_realtime.append_text(text_to_synthesize[1])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'发送文本:{text_to_synthesize[2]}')
qwen_tts_realtime.append_text(text_to_synthesize[2])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
qwen_tts_realtime.finish()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))
Server commit 模式:Commit 模式:
复制
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
static String[] textToSynthesize = {
"Right? I just really love this kind of supermarket",
"Especially during the New Year",
"Going to the supermarket",
"Makes me feel",
"Super, super happy!",
"I want to buy so many things!"
};
// 实时 PCM 音频播放器类
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// 构造方法:初始化音频格式和音频输出线路
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// 播放音频片段,阻塞至播放完成
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// 等待缓冲区中的音频播放完毕
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
// 如需使用指令控制功能,请将模型替换为 qwen3-tts-instruct-flash-realtime
.model("qwen3-tts-flash-realtime")
.url("wss://dashscope.aliyuncs.com/api-ws/v1/realtime")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// 创建实时音频播放器实例
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// 处理连接建立事件
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// 处理会话创建事件
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// 实时播放音频
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// 处理响应完成事件
break;
case "session.finished":
// 处理会话结束事件
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// 处理连接关闭事件
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
// 如需使用指令控制功能,请取消以下注释并将模型替换为 qwen3-tts-instruct-flash-realtime。
// .instructions("")
// .optimizeInstructions(true)
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
qwenTtsRealtime.close();
// 等待音频播放完成后关闭播放器
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}
复制
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.Scanner;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class commit {
// 实时 PCM 音频播放器类
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// 构造方法:初始化音频格式和音频输出线路
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// 播放音频片段,阻塞至播放完成
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// 等待缓冲区中的音频播放完毕
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
// 等待缓冲区中所有音频数据播放完毕
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
// 等待音频输出线路播放完毕
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
Scanner scanner = new Scanner(System.in);
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
// To use the instruction control feature, replace the model with qwen3-tts-instruct-flash-realtime.
.model("qwen3-tts-flash-realtime")
.url("wss://dashscope.aliyuncs.com/api-ws/v1/realtime")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
// 创建实时播放器实例
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
// File file = new File("result_24k.pcm");
// FileOutputStream fos = new FileOutputStream(file);
@Override
public void onOpen() {
System.out.println("connection opened");
System.out.println("Enter text and press Enter to send. Enter 'quit' to exit the program.");
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
byte[] rawAudio = Base64.getDecoder().decode(recvAudioB64);
// fos.write(rawAudio);
// 实时播放音频
audioPlayer.write(recvAudioB64);
break;
case "response.done":
System.out.println("response done");
// 等待音频播放完成
try {
audioPlayer.waitForComplete();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
// 准备接收下一次输入
completeLatch.get().countDown();
break;
case "session.finished":
System.out.println("session finished");
if (qwenTtsRef.get() != null) {
System.out.println("[Metric] response: " + qwenTtsRef.get().getResponseId() +
", first audio delay: " + qwenTtsRef.get().getFirstAudioDelay() + " ms");
}
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
System.out.println("connection closed code: " + code + ", reason: " + reason);
try {
// fos.close();
// 等待播放完成后关闭播放器
audioPlayer.waitForComplete();
audioPlayer.shutdown();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("commit")
// 如需使用指令控制功能,请取消以下注释并将模型替换为 qwen3-tts-instruct-flash-realtime。
// .instructions("")
// .optimizeInstructions(true)
.build();
qwenTtsRealtime.updateSession(config);
// 循环读取用户输入
while (true) {
System.out.print("Enter the text to synthesize: ");
String text = scanner.nextLine();
// 用户输入 quit 时退出程序
if ("quit".equalsIgnoreCase(text.trim())) {
System.out.println("Closing the connection...");
qwenTtsRealtime.finish();
completeLatch.get().await();
break;
}
// 用户输入为空则跳过
if (text.trim().isEmpty()) {
continue;
}
// 重新初始化 CountDownLatch
completeLatch.set(new CountDownLatch(1));
// 发送文本
qwenTtsRealtime.appendText(text);
qwenTtsRealtime.commit();
// 等待当前合成完成
completeLatch.get().await();
}
// 清理资源
audioPlayer.waitForComplete();
audioPlayer.shutdown();
scanner.close();
System.exit(0);
}
}
1
准备运行环境
根据您的操作系统安装 pyaudio。然后通过 pip 安装 WebSocket 依赖:
- macOS
- Debian/Ubuntu
- CentOS
- Windows
复制
brew install portaudio && pip install pyaudio
复制
sudo apt-get install python3-pyaudio
or
pip install pyaudio
复制
sudo yum install -y portaudio portaudio-devel && pip install pyaudio
复制
pip install pyaudio
复制
pip install websocket-client==1.8.0 websockets
2
创建客户端
在本地新建名为
tts_realtime_client.py 的 Python 文件,将以下代码复制到文件中:tts_realtime_client.py
tts_realtime_client.py
复制
# -- coding: utf-8 --
import asyncio
import websockets
import json
import base64
import time
from typing import Optional, Callable, Dict, Any
from enum import Enum
class SessionMode(Enum):
SERVER_COMMIT = "server_commit"
COMMIT = "commit"
class TTSRealtimeClient:
"""
TTS Realtime API 客户端。
该类提供了连接 TTS Realtime API、发送文本数据、接收音频输出以及管理 WebSocket 连接的方法。
Attributes:
base_url (str):
Realtime API 的基础 URL。
api_key (str):
用于身份验证的 API Key。
voice (str):
服务端用于语音合成的音色。
mode (SessionMode):
会话模式,server_commit 或 commit。
audio_callback (Callable[[bytes], None]):
接收音频数据的回调函数。
language_type(str)
合成语音的语言。可选值:Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian, Auto
"""
def __init__(
self,
base_url: str,
api_key: str,
voice: str = "Cherry",
mode: SessionMode = SessionMode.SERVER_COMMIT,
audio_callback: Optional[Callable[[bytes], None]] = None,
language_type: str = "Auto"):
self.base_url = base_url
self.api_key = api_key
self.voice = voice
self.mode = mode
self.ws = None
self.audio_callback = audio_callback
self.language_type = language_type
# 当前响应状态
self._current_response_id = None
self._current_item_id = None
self._is_responding = False
self._response_done_future = None
async def connect(self) -> None:
"""建立与 TTS Realtime API 的 WebSocket 连接。"""
headers = {
"Authorization": f"Bearer {self.api_key}"
}
self.ws = await websockets.connect(self.base_url, additional_headers=headers)
# 设置默认会话配置
await self.update_session({
"mode": self.mode.value,
"voice": self.voice,
# 取消以下两行的注释,并将 server_commit.py 或 commit.py 中的 model 替换为 qwen3-tts-instruct-flash-realtime,即可使用指令控制功能
# "instructions": "Speak quickly with a noticeably rising intonation, suitable for introducing fashion products.",
# "optimize_instructions": true
"language_type": self.language_type,
"response_format": "pcm",
"sample_rate": 24000
})
async def send_event(self, event) -> None:
"""向服务端发送事件。"""
event['event_id'] = "event_" + str(int(time.time() * 1000))
print(f"Sending event: type={event['type']}, event_id={event['event_id']}")
await self.ws.send(json.dumps(event))
async def update_session(self, config: Dict[str, Any]) -> None:
"""更新会话配置。"""
event = {
"type": "session.update",
"session": config
}
print("Updating session configuration: ", event)
await self.send_event(event)
async def append_text(self, text: str) -> None:
"""向 API 发送文本数据。"""
event = {
"type": "input_text_buffer.append",
"text": text
}
await self.send_event(event)
async def commit_text_buffer(self) -> None:
"""提交文本缓冲区以触发处理。"""
event = {
"type": "input_text_buffer.commit"
}
await self.send_event(event)
async def clear_text_buffer(self) -> None:
"""清空文本缓冲区。"""
event = {
"type": "input_text_buffer.clear"
}
await self.send_event(event)
async def finish_session(self) -> None:
"""结束会话。"""
event = {
"type": "session.finish"
}
await self.send_event(event)
async def wait_for_response_done(self):
"""等待 response.done 事件"""
if self._response_done_future:
await self._response_done_future
async def handle_messages(self) -> None:
"""处理来自服务端的消息。"""
try:
async for message in self.ws:
event = json.loads(message)
event_type = event.get("type")
if event_type != "response.audio.delta":
print(f"Received event: {event_type}")
if event_type == "error":
print("Error: ", event.get('error', {}))
continue
elif event_type == "session.created":
print("Session created, ID: ", event.get('session', {}).get('id'))
elif event_type == "session.updated":
print("Session updated, ID: ", event.get('session', {}).get('id'))
elif event_type == "input_text_buffer.committed":
print("Text buffer committed, item ID: ", event.get('item_id'))
elif event_type == "input_text_buffer.cleared":
print("Text buffer cleared")
elif event_type == "response.created":
self._current_response_id = event.get("response", {}).get("id")
self._is_responding = True
# 创建新的 Future 以等待 response.done
self._response_done_future = asyncio.Future()
print("Response created, ID: ", self._current_response_id)
elif event_type == "response.output_item.added":
self._current_item_id = event.get("item", {}).get("id")
print("Output item added, ID: ", self._current_item_id)
# 处理音频增量数据
elif event_type == "response.audio.delta" and self.audio_callback:
audio_bytes = base64.b64decode(event.get("delta", ""))
self.audio_callback(audio_bytes)
elif event_type == "response.audio.done":
print("Audio generation completed")
elif event_type == "response.done":
self._is_responding = False
self._current_response_id = None
self._current_item_id = None
# 将 Future 标记为完成
if self._response_done_future and not self._response_done_future.done():
self._response_done_future.set_result(True)
print("Response completed")
elif event_type == "session.finished":
print("Session ended")
except websockets.exceptions.ConnectionClosed:
print("Connection closed")
except Exception as e:
print("Error handling messages: ", str(e))
async def close(self) -> None:
"""关闭 WebSocket 连接。"""
if self.ws:
await self.ws.close()
3
选择语音合成模式
Realtime API 支持两种模式:
- Server commit 模式:客户端只需发送文本,服务端自动判断文本分句和合成时机。适用于无需手动控制合成过程的低延迟场景,如 GPS 导航。
- Commit 模式:先将文本添加到缓冲区,再触发服务端合成指定文本。适用于需要精细控制停顿和断句的场景,如新闻播报。
- Server commit 模式
- Commit 模式
在
运行
tts_realtime_client.py 所在目录下新建名为 server_commit.py 的 Python 文件,将以下代码复制到文件中:server_commit.py
server_commit.py
复制
import os
import asyncio
import logging
import wave
from tts_realtime_client import TTSRealtimeClient, SessionMode
import pyaudio
# QwenTTS 服务配置
# 将 model 替换为 qwen3-tts-instruct-flash-realtime 并取消 tts_realtime_client.py 中指令相关代码的注释,即可使用指令控制功能
URL = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime"
# 未配置环境变量时,请替换为您的千问云 API Key:API_KEY="sk-xxx"
API_KEY = os.getenv("DASHSCOPE_API_KEY")
if not API_KEY:
raise ValueError("Please set DASHSCOPE_API_KEY environment variable")
# 收集音频数据
_audio_chunks = []
# 实时播放设置
_AUDIO_SAMPLE_RATE = 24000
_audio_pyaudio = pyaudio.PyAudio()
_audio_stream = None # Will be opened at runtime
def _audio_callback(audio_bytes: bytes):
"""TTSRealtimeClient 音频回调:实时播放并缓存"""
global _audio_stream
if _audio_stream is not None:
try:
_audio_stream.write(audio_bytes)
except Exception as exc:
logging.error(f"PyAudio playback error: {exc}")
_audio_chunks.append(audio_bytes)
logging.info(f"Received audio chunk: {len(audio_bytes)} bytes")
def _save_audio_to_file(filename: str = "output.wav", sample_rate: int = 24000) -> bool:
"""将收集到的音频数据保存为 WAV 文件"""
if not _audio_chunks:
logging.warning("No audio data to save")
return False
try:
audio_data = b"".join(_audio_chunks)
with wave.open(filename, 'wb') as wav_file:
wav_file.setnchannels(1) # 单声道
wav_file.setsampwidth(2) # 16-bit
wav_file.setframerate(sample_rate)
wav_file.writeframes(audio_data)
logging.info(f"Audio saved to: {filename}")
return True
except Exception as exc:
logging.error(f"Failed to save audio: {exc}")
return False
async def _produce_text(client: TTSRealtimeClient):
"""向服务端发送文本片段"""
text_fragments = [
"千问云是一个集模型开发与应用构建于一体的平台。",
"Both developers and business personnel can deeply participate in designing and building model applications.",
"You can develop a model application in just 5 minutes through simple UI operations,",
"or train a custom model within hours, allowing you to focus more on application innovation.",
]
logging.info("Sending text fragments…")
for text in text_fragments:
logging.info(f"Sending fragment: {text}")
await client.append_text(text)
await asyncio.sleep(0.1) # 片段之间的短暂延迟
# 等待服务端完成内部处理后再结束会话
await asyncio.sleep(1.0)
await client.finish_session()
async def _run_demo():
"""运行完整示例"""
global _audio_stream
# 打开 PyAudio 输出流
_audio_stream = _audio_pyaudio.open(
format=pyaudio.paInt16,
channels=1,
rate=_AUDIO_SAMPLE_RATE,
output=True,
frames_per_buffer=1024
)
client = TTSRealtimeClient(
base_url=URL,
api_key=API_KEY,
voice="Cherry",
mode=SessionMode.SERVER_COMMIT,
audio_callback=_audio_callback
)
# 建立连接
await client.connect()
# 并行执行消息处理和文本发送
consumer_task = asyncio.create_task(client.handle_messages())
producer_task = asyncio.create_task(_produce_text(client))
await producer_task # 等待文本发送完成
# 等待 response.done
await client.wait_for_response_done()
# 关闭连接并取消消费者任务
await client.close()
consumer_task.cancel()
# 关闭音频流
if _audio_stream is not None:
_audio_stream.stop_stream()
_audio_stream.close()
_audio_pyaudio.terminate()
# 保存音频数据
os.makedirs("outputs", exist_ok=True)
_save_audio_to_file(os.path.join("outputs", "qwen_tts_output.wav"))
def main():
"""同步入口"""
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s [%(levelname)s] %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logging.info("Starting QwenTTS Realtime Client demo…")
asyncio.run(_run_demo())
if __name__ == "__main__":
main()
server_commit.py,即可实时收听 Realtime API 生成的音频。在
运行
tts_realtime_client.py 所在目录下新建名为 commit.py 的 Python 文件,将以下代码复制到文件中:commit.py
commit.py
复制
import os
import asyncio
import logging
import wave
from tts_realtime_client import TTSRealtimeClient, SessionMode
import pyaudio
# QwenTTS 服务配置
# 将 model 替换为 qwen3-tts-instruct-flash-realtime 并取消 tts_realtime_client.py 中指令相关代码的注释,即可使用指令控制功能
URL = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime"
# 未配置环境变量时,请替换为您的千问云 API Key:API_KEY="sk-xxx"
API_KEY = os.getenv("DASHSCOPE_API_KEY")
if not API_KEY:
raise ValueError("Please set DASHSCOPE_API_KEY environment variable")
# 收集音频数据
_audio_chunks = []
_AUDIO_SAMPLE_RATE = 24000
_audio_pyaudio = pyaudio.PyAudio()
_audio_stream = None
def _audio_callback(audio_bytes: bytes):
"""TTSRealtimeClient 音频回调:实时播放并缓存"""
global _audio_stream
if _audio_stream is not None:
try:
_audio_stream.write(audio_bytes)
except Exception as exc:
logging.error(f"PyAudio playback error: {exc}")
_audio_chunks.append(audio_bytes)
logging.info(f"Received audio chunk: {len(audio_bytes)} bytes")
def _save_audio_to_file(filename: str = "output.wav", sample_rate: int = 24000) -> bool:
"""将收集到的音频数据保存为 WAV 文件"""
if not _audio_chunks:
logging.warning("No audio data to save")
return False
try:
audio_data = b"".join(_audio_chunks)
with wave.open(filename, 'wb') as wav_file:
wav_file.setnchannels(1) # 单声道
wav_file.setsampwidth(2) # 16-bit
wav_file.setframerate(sample_rate)
wav_file.writeframes(audio_data)
logging.info(f"Audio saved to: {filename}")
return True
except Exception as exc:
logging.error(f"Failed to save audio: {exc}")
return False
async def _user_input_loop(client: TTSRealtimeClient):
"""Continuously get user input and send text. When user enters empty text, send commit event and end current session"""
print("Enter text (press Enter directly to send commit event and end current session, press Ctrl+C or Ctrl+D to exit entire program):")
while True:
try:
user_text = input("> ")
if not user_text: # User entered empty input
# Empty input signifies end of conversation: submit buffer -> end session -> break loop
logging.info("Empty input, sending commit event and ending current session")
await client.commit_text_buffer()
# Wait briefly for server to process commit to prevent losing audio from premature session end
await asyncio.sleep(0.3)
await client.finish_session()
break # Exit user input loop directly, no need to press Enter again
else:
logging.info(f"Sending text: {user_text}")
await client.append_text(user_text)
except EOFError: # User pressed Ctrl+D
break
except KeyboardInterrupt: # User pressed Ctrl+C
break
# End session
logging.info("Ending session...")
async def _run_demo():
"""运行完整示例"""
global _audio_stream
# 打开 PyAudio 输出流
_audio_stream = _audio_pyaudio.open(
format=pyaudio.paInt16,
channels=1,
rate=_AUDIO_SAMPLE_RATE,
output=True,
frames_per_buffer=1024
)
client = TTSRealtimeClient(
base_url=URL,
api_key=API_KEY,
voice="Cherry",
mode=SessionMode.COMMIT, # Change to COMMIT mode
audio_callback=_audio_callback
)
# 建立连接
await client.connect()
# Execute message handling and user input in parallel
consumer_task = asyncio.create_task(client.handle_messages())
producer_task = asyncio.create_task(_user_input_loop(client))
await producer_task # Wait for user input to complete
# 等待 response.done
await client.wait_for_response_done()
# 关闭连接并取消消费者任务
await client.close()
consumer_task.cancel()
# 关闭音频流
if _audio_stream is not None:
_audio_stream.stop_stream()
_audio_stream.close()
_audio_pyaudio.terminate()
# 保存音频数据
os.makedirs("outputs", exist_ok=True)
_save_audio_to_file(os.path.join("outputs", "qwen_tts_output.wav"))
def main():
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s [%(levelname)s] %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logging.info("Starting QwenTTS Realtime Client demo…")
asyncio.run(_run_demo())
if __name__ == "__main__":
main()
commit.py,输入多段文本进行合成。直接按回车(不输入文本)即可通过扬声器收听 Realtime API 返回的音频。声音复刻服务不提供预览音频,请通过语音合成接口测试和评估效果。建议先使用短文本进行初步测试。本示例基于"server commit 模式"代码修改,将
voice 参数替换为复刻音色。- 关键原则:声音复刻模型(
target_model)必须与语音合成模型(model)匹配,否则合成会失败。 - 示例使用本地音频文件
voice.mp3进行声音复刻,运行代码时请替换为实际文件。
- Python
- Java
复制
# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope # DashScope Python SDK version must be at least 1.23.9
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
# ======= Constants =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15" # Use the same model for voice cloning and speech synthesis
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3" # Relative path to local audio file for voice cloning
TEXT_TO_SYNTHESIZE = [
'Right? I really love this kind of supermarket,',
'especially during Chinese New Year',
'when I go shopping',
'I feel',
'super super happy!',
'I want to buy so many things!'
]
def create_voice(file_path: str,
target_model: str = DEFAULT_TARGET_MODEL,
preferred_name: str = DEFAULT_PREFERRED_NAME,
audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
"""
Create voice and return voice parameter
"""
# Replace with your 千问云 API Key if environment variable is not configured: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
url = "https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization"
payload = {
"model": "qwen-voice-enrollment", # Do not modify this value
"input": {
"action": "create",
"target_model": target_model,
"preferred_name": preferred_name,
"audio": {"data": data_uri}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code != 200:
raise RuntimeError(f"Voice creation failed: {resp.status_code}, {resp.text}")
try:
return resp.json()["output"]["voice"]
except (KeyError, ValueError) as e:
raise RuntimeError(f"Failed to parse voice response: {e}")
def init_dashscope_api_key():
"""
Initialize DashScope SDK API key
"""
# Replace with your 千问云 API Key if environment variable is not configured: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# ======= Callback class =======
class MyCallback(QwenTtsRealtimeCallback):
"""
Custom TTS streaming callback
"""
def __init__(self):
self.complete_event = threading.Event()
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=24000, output=True
)
def on_open(self) -> None:
print('[TTS] Connection established')
def on_close(self, close_status_code, close_msg) -> None:
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
print(f'[TTS] Connection closed code={close_status_code}, msg={close_msg}')
def on_event(self, response: dict) -> None:
try:
event_type = response.get('type', '')
if event_type == 'session.created':
print(f'[TTS] Session started: {response["session"]["id"]}')
elif event_type == 'response.audio.delta':
audio_data = base64.b64decode(response['delta'])
self._stream.write(audio_data)
elif event_type == 'response.done':
print(f'[TTS] Response completed, Response ID: {qwen_tts_realtime.get_last_response_id()}')
elif event_type == 'session.finished':
print('[TTS] Session ended')
self.complete_event.set()
except Exception as e:
print(f'[Error] Error handling callback event: {e}')
def wait_for_finished(self):
self.complete_event.wait()
# ======= Main execution logic =======
if __name__ == '__main__':
init_dashscope_api_key()
print('[System] Initializing Qwen TTS Realtime ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model=DEFAULT_TARGET_MODEL,
callback=callback,
url='wss://dashscope.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice=create_voice(VOICE_FILE_PATH), # Replace voice parameter with cloned custom voice
response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
mode='server_commit'
)
for text_chunk in TEXT_TO_SYNTHESIZE:
print(f'[Sending text]: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')
需要导入 Gson 依赖。如果使用 Maven 或 Gradle,请按如下方式添加依赖:
- Maven
- Gradle
在
pom.xml 中添加以下内容:复制
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>
在
build.gradle 中添加以下内容:复制
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")
复制
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
// ===== Constants =====
// Use the same model for voice cloning and speech synthesis
private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15";
private static final String PREFERRED_NAME = "guanyu";
// Relative path to local audio file for voice cloning
private static final String AUDIO_FILE = "voice.mp3";
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
private static String[] textToSynthesize = {
"Right? I really love this kind of supermarket",
"especially during Chinese New Year",
"when I go shopping",
"I feel",
"super super happy!",
"I want to buy so many things!"
};
// Generate data URI
public static String toDataUrl(String filePath) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
// Call API to create voice
public static String createVoice() throws Exception {
// Replace with your 千问云 API Key if environment variable is not configured: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-enrollment\"," // Do not modify this value
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"" + TARGET_MODEL + "\","
+ "\"preferred_name\": \"" + PREFERRED_NAME + "\","
+ "\"audio\": {"
+ "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
+ "}"
+ "}"
+ "}";
HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
}
int status = con.getResponseCode();
System.out.println("HTTP status code: " + status);
try (BufferedReader br = new BufferedReader(
new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
StandardCharsets.UTF_8))) {
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
System.out.println("Response content: " + response);
if (status == 200) {
JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
return jsonObj.getAsJsonObject("output").get("voice").getAsString();
}
throw new IOException("Voice creation failed: " + status + " - " + response);
}
}
// 实时 PCM 音频播放器类
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// Constructor to initialize audio format and audio line
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// Play an audio chunk and block until playback completes
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// Wait for audio in buffer to finish playing
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws Exception {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model(TARGET_MODEL)
.url("wss://dashscope.aliyuncs.com/api-ws/v1/realtime")
// Replace with your 千问云 API Key if environment variable is not configured: .apikey("sk-xxx")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// Create real-time audio player instance
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// Handle connection established
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// Handle session created
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// Play audio in real time
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// Handle response completed
break;
case "session.finished":
// Handle session finished
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// Handle connection closed
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice(createVoice()) // Replace voice parameter with cloned custom voice
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
// Wait for audio playback to complete and shut down player
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}
声音设计功能会返回预览音频数据。请先试听预览音频,确认效果满足预期后再用于语音合成。
1
生成自定义音色并预览效果
如果对效果满意,请继续下一步;否则重新生成。
- Python
- Java
复制
import requests
import base64
import os
def create_voice_and_play():
# If the environment variable is not set, replace the following line with your API key: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
if not api_key:
print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.")
return None, None, None
# Prepare request data
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
data = {
"model": "qwen-voice-design",
"input": {
"action": "create",
"target_model": "qwen3-tts-vd-realtime-2026-01-15",
"voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
"preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
"preferred_name": "announcer",
"language": "en"
},
"parameters": {
"sample_rate": 24000,
"response_format": "wav"
}
}
url = "https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization"
try:
# Send the request
response = requests.post(
url,
headers=headers,
json=data,
timeout=60 # Add a timeout setting
)
if response.status_code == 200:
result = response.json()
# Get the voice name
voice_name = result["output"]["voice"]
print(f"Voice name: {voice_name}")
# Get the preview audio data
base64_audio = result["output"]["preview_audio"]["data"]
# Decode the Base64 audio data
audio_bytes = base64.b64decode(base64_audio)
# Save the audio file locally
filename = f"{voice_name}_preview.wav"
# Write the audio data to a local file
with open(filename, 'wb') as f:
f.write(audio_bytes)
print(f"Audio saved to local file: {filename}")
print(f"File path: {os.path.abspath(filename)}")
return voice_name, audio_bytes, filename
else:
print(f"Request failed with status code: {response.status_code}")
print(f"Response content: {response.text}")
return None, None, None
except requests.exceptions.RequestException as e:
print(f"A network request error occurred: {e}")
return None, None, None
except KeyError as e:
print(f"Response data format error, missing required field: {e}")
print(f"Response content: {response.text if 'response' in locals() else 'No response'}")
return None, None, None
except Exception as e:
print(f"An unknown error occurred: {e}")
return None, None, None
if __name__ == "__main__":
print("Starting to create voice...")
voice_name, audio_data, saved_filename = create_voice_and_play()
if voice_name:
print(f"\nSuccessfully created voice '{voice_name}'")
print(f"Audio file saved as: '{saved_filename}'")
print(f"File size: {os.path.getsize(saved_filename)} bytes")
else:
print("\nVoice creation failed")
需要导入 Gson 依赖。如果使用 Maven 或 Gradle,请按如下方式添加依赖:
- Maven
- Gradle
在
pom.xml 中添加以下内容:复制
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>
在
build.gradle 中添加以下内容:复制
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")
复制
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;
public class Main {
public static void main(String[] args) {
Main example = new Main();
example.createVoice();
}
public void createVoice() {
// If the environment variable is not set, replace the following line with your API key: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
// Create the JSON request body string
String jsonBody = "{\n" +
" \"model\": \"qwen-voice-design\",\n" +
" \"input\": {\n" +
" \"action\": \"create\",\n" +
" \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" +
" \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
" \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
" \"preferred_name\": \"announcer\",\n" +
" \"language\": \"en\"\n" +
" },\n" +
" \"parameters\": {\n" +
" \"sample_rate\": 24000,\n" +
" \"response_format\": \"wav\"\n" +
" }\n" +
"}";
HttpURLConnection connection = null;
try {
URL url = new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization");
connection = (HttpURLConnection) url.openConnection();
// Set the request method and headers
connection.setRequestMethod("POST");
connection.setRequestProperty("Authorization", "Bearer " + apiKey);
connection.setRequestProperty("Content-Type", "application/json");
connection.setDoOutput(true);
connection.setDoInput(true);
// Send the request body
try (OutputStream os = connection.getOutputStream()) {
byte[] input = jsonBody.getBytes("UTF-8");
os.write(input, 0, input.length);
os.flush();
}
// Get the response
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
// Read the response content
StringBuilder response = new StringBuilder();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
String responseLine;
while ((responseLine = br.readLine()) != null) {
response.append(responseLine.trim());
}
}
// Parse the JSON response
JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
JsonObject outputObj = jsonResponse.getAsJsonObject("output");
JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");
// Get the voice name
String voiceName = outputObj.get("voice").getAsString();
System.out.println("Voice name: " + voiceName);
// Get the Base64-encoded audio data
String base64Audio = previewAudioObj.get("data").getAsString();
// Decode the Base64 audio data
byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
// Save the audio to a local file
String filename = voiceName + "_preview.wav";
saveAudioToFile(audioBytes, filename);
System.out.println("Audio saved to local file: " + filename);
} else {
// Read the error response
StringBuilder errorResponse = new StringBuilder();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
String responseLine;
while ((responseLine = br.readLine()) != null) {
errorResponse.append(responseLine.trim());
}
}
System.out.println("Request failed with status code: " + responseCode);
System.out.println("Error response: " + errorResponse.toString());
}
} catch (Exception e) {
System.err.println("An error occurred during the request: " + e.getMessage());
e.printStackTrace();
} finally {
if (connection != null) {
connection.disconnect();
}
}
}
private void saveAudioToFile(byte[] audioBytes, String filename) {
try {
File file = new File(filename);
try (FileOutputStream fos = new FileOutputStream(file)) {
fos.write(audioBytes);
}
System.out.println("Audio saved to: " + file.getAbsolutePath());
} catch (IOException e) {
System.err.println("An error occurred while saving the audio file: " + e.getMessage());
e.printStackTrace();
}
}
}
2
使用自定义音色进行语音合成
本示例参考 DashScope SDK 使用系统音色进行语音合成的"server commit 模式"示例代码,将
voice 参数替换为声音设计生成的自定义音色。关键原则:声音设计使用的模型(target_model)必须与后续语音合成使用的模型(model)相同,否则合成会失败。- Python
- Java
复制
# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import pyaudio
import os
import base64
import threading
import time
import dashscope # DashScope Python SDK version must be 1.23.9 or later
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
# ======= Constant Configuration =======
TEXT_TO_SYNTHESIZE = [
'Right? I really like this kind of supermarket,',
'especially during the New Year.',
'Going to the supermarket',
'just makes me feel',
'super, super happy!',
'I want to buy so many things!'
]
def init_dashscope_api_key():
"""
Initialize the API key for the DashScope SDK.
"""
# If the environment variable is not set, replace the following line with your API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# ======= Callback Class =======
class MyCallback(QwenTtsRealtimeCallback):
"""
Custom TTS streaming callback.
"""
def __init__(self):
self.complete_event = threading.Event()
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=24000, output=True
)
def on_open(self) -> None:
print('[TTS] Connection established')
def on_close(self, close_status_code, close_msg) -> None:
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}')
def on_event(self, response: dict) -> None:
try:
event_type = response.get('type', '')
if event_type == 'session.created':
print(f'[TTS] Session started: {response["session"]["id"]}')
elif event_type == 'response.audio.delta':
audio_data = base64.b64decode(response['delta'])
self._stream.write(audio_data)
elif event_type == 'response.done':
print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}')
elif event_type == 'session.finished':
print('[TTS] Session finished')
self.complete_event.set()
except Exception as e:
print(f'[Error] Exception processing callback event: {e}')
def wait_for_finished(self):
self.complete_event.wait()
# ======= Main Execution Logic =======
if __name__ == '__main__':
init_dashscope_api_key()
print('[System] Initializing Qwen TTS Realtime ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
# Use the same model for voice design and speech synthesis
model="qwen3-tts-vd-realtime-2026-01-15",
callback=callback,
url='wss://dashscope.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design
response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
mode='server_commit'
)
for text_chunk in TEXT_TO_SYNTHESIZE:
print(f'[Sending text]: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')
复制
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.io.*;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
// ===== Constant Definitions =====
private static String[] textToSynthesize = {
"Right? I really like this kind of supermarket,",
"especially during the New Year.",
"Going to the supermarket",
"just makes me feel",
"super, super happy!",
"I want to buy so many things!"
};
// Real-time audio player class
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// Constructor initializes audio format and audio line
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// Plays an audio chunk and blocks until playback is complete
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// Wait for the audio in the buffer to finish playing
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws Exception {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
// Use the same model for voice design and speech synthesis
.model("qwen3-tts-vd-realtime-2026-01-15")
.url("wss://dashscope.aliyuncs.com/api-ws/v1/realtime")
// If the environment variable is not set, replace the following line with your API key: .apikey("sk-xxx")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// Create a real-time audio player instance
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// Handling for when the connection is established
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// Handling for when the session is created
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// Play audio in real time
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// Handling for when the response is complete
break;
case "session.finished":
// Handling for when the session is finished
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// Handling for when the connection is closed
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
// Wait for audio playback to complete and shut down the player
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}
Qwen-TTS 进阶功能
以下功能仅适用于 Qwen-TTS 系列模型。Qwen-TTS 交互模式
Qwen-TTS Realtime API 提供两种 WebSocket 交互模式,通过session.mode 参数切换:
- server_commit 模式:服务端智能处理文本分段和合成时机,适合大段文本的连续合成场景。客户端只需持续追加文本,无需关注切分和提交。
- commit 模式:客户端主动提交文本缓冲区以触发合成,适合需要精确控制合成时机的场景(如对话式 AI 逐轮合成)。
交互流程
- CosyVoice
- Qwen-TTS-Realtime
CosyVoice 使用基于 WebSocket 的流式协议。协议详情请参见 CosyVoice WebSocket API 参考。
连接 API
使用 Qwen-TTS-Realtime 需要建立 WebSocket 连接,参数如下:| Parameter | Value |
|---|---|
| WebSocket URL | wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=<model_name> |
| 认证方式 | 在 Authorization 请求头中使用 Bearer token |
| 模型参数 | 将 <model_name> 替换为支持的模型。详见支持的模型。 |
复制
# Example connection URL
wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime
- 服务端提交模式
- 客户端提交模式
将
session.update 事件的 session.mode 属性设置为 "server_commit" 以启用此模式。服务端会智能处理文本分割和合成时机。交互流程:- 客户端发送
session.update事件。服务端返回session.created和session.updated事件。 - 客户端发送
input_text_buffer.append事件,将文本追加到服务端缓冲区。 - 服务端智能处理文本分割和合成时机,返回
response.created、response.output_item.added、response.content_part.added和response.audio.delta事件。 - 完成响应后,服务端返回
response.audio.done、response.content_part.done、response.output_item.done和response.done。 - 服务端返回
session.finished结束会话。
| 生命周期 | 客户端事件 | 服务端事件 |
|---|---|---|
| 会话初始化 | session.update 会话配置 | session.created 会话已创建 |
session.updated 会话配置已更新 | ||
| 用户文本输入 | input_text_buffer.append 向服务端追加文本 | |
input_text_buffer.commit 立即合成服务端缓存的文本 | ||
session.finish 通知服务端不再有文本输入 | input_text_buffer.committed 服务端已接收提交的文本 | |
| 服务端音频输出 | 无 | response.created 服务端开始生成响应 |
response.output_item.added 响应中有新的输出内容 | ||
response.content_part.added 助手消息中添加了新的输出内容 | ||
response.audio.delta 模型生成的增量音频 | ||
response.content_part.done 助手消息的文本或音频内容流已完成 | ||
response.output_item.done 助手消息的整个输出项流已完成 | ||
response.audio.done 音频生成完成 | ||
response.done 响应完成 |
将
session.update 事件的 session.mode 属性设置为 "commit" 以启用此模式。客户端需要主动将文本缓冲区提交给服务端以获取响应。交互流程:- 客户端发送
session.update事件。服务端返回session.created和session.updated事件。 - 客户端发送
input_text_buffer.append事件,将文本追加到服务端缓冲区。 - 客户端发送
input_text_buffer.commit事件将缓冲区提交给服务端,并发送session.finish事件表示不再有文本输入。 - 服务端返回
response.created,开始生成响应。 - 服务端返回
response.output_item.added、response.content_part.added和response.audio.delta事件。 - 完成响应后,服务端返回
response.audio.done、response.content_part.done、response.output_item.done和response.done。 - 服务端返回
session.finished结束会话。
| 生命周期 | 客户端事件 | 服务端事件 |
|---|---|---|
| 会话初始化 | session.update 会话配置 | session.created 会话已创建 |
session.updated 会话配置已更新 | ||
| 用户文本输入 | input_text_buffer.append 向缓冲区追加文本 | |
input_text_buffer.commit 将缓冲区提交给服务端 | ||
input_text_buffer.clear 清空缓冲区 | input_text_buffer.committed 服务端已接收提交的文本 | |
| 服务端音频输出 | 无 | response.created 服务端开始生成响应 |
response.output_item.added 响应中有新的输出内容 | ||
response.content_part.added 助手消息中添加了新的输出内容 | ||
response.audio.delta 模型生成的增量音频 | ||
response.content_part.done 助手消息的文本或音频内容流已完成 | ||
response.output_item.done 助手消息的整个输出项流已完成 | ||
response.audio.done 音频生成完成 | ||
response.done 响应完成 |
指令控制
- CosyVoice
- Qwen-TTS-Realtime
支持的模型:
cosyvoice-v3.5-plus、cosyvoice-v3.5-flash、cosyvoice-v3-flashcosyvoice-v3.5-plus、cosyvoice-v3.5-flash:无系统音色,仅支持使用声音设计或声音复刻音色,可输入任意指令控制合成效果(如情感、语速等)。cosyvoice-v3-flash的声音设计或声音复刻音色:可输入任意指令控制合成效果。cosyvoice-v3-flash的系统音色:指令必须使用固定格式和内容,详情请参见CosyVoice音色列表。
cosyvoice-v3.5-plus、cosyvoice-v3.5-flash:中文、英文、法语、德语、日语、韩语、俄语、葡萄牙语、泰语、印尼语、越南语cosyvoice-v3-flash:中文、英文、法语、德语、日语、韩语、俄语
通过自然语言描述来控制语气、语速、情感和声音特征,无需设置音频参数。
示例:
- 支持模型:仅 Qwen3-TTS-Instruct-Flash-Realtime 模型支持。
- 使用方式:通过
instructions参数指定指令内容,例如:"语速较快,语调明显上扬,适合介绍时尚产品。" - 支持语言:描述文本仅支持中文和英文。
- 长度限制:不超过 1600 个 token。
- 有声书和广播剧配音
- 广告和宣传视频配音
- 游戏角色和动画配音
- 情感智能语音助手
- 纪录片和新闻播报
- 具体而非模糊:使用描述具体声音特征的词汇,如"低沉""清脆""语速快"等。避免使用缺乏信息的主观词汇,如"好听""普通"。
- 多维度而非单一维度:好的描述通常结合多个维度(如下文所述:音高、语速、情感等)。单一维度的描述(如仅用"高音")过于宽泛,无法生成独特的效果。
- 客观而非主观:聚焦于声音本身的物理和感知特征,而非个人偏好。例如,使用"略高音调带有活力"而非"我最喜欢的声音"。
- 原创而非模仿:描述声音特征,而非要求模仿特定人物(如明星或演员)。此类请求存在版权风险,且模型不支持直接模仿。
- 简洁而非冗余:确保每个词都有意义。避免重复同义词或使用无意义的强调词(如"非常非常好听的声音")。
| 维度 | 描述示例 |
|---|---|
| 音高 | 高、中、低、偏高、偏低 |
| 语速 | 快、中、慢、偏快、偏慢 |
| 情感 | 欢快、沉稳、温柔、严肃、活泼、冷静、舒缓 |
| 特征 | 磁性、清脆、沙哑、醇厚、甜美、浑厚、有力 |
| 用途 | 新闻播报、广告配音、有声书、动画角色、语音助手、纪录片解说 |
- 标准播报风格:发音清晰准确,吐字圆润
- 情绪递进效果:音量从正常对话迅速增大到呼喊,性格直爽,容易激动和表现力强
- 特殊情绪状态:因哭泣导致发音略显含混,略带沙哑,带有明显的哭腔紧张感
- 广告配音风格:音调偏高,语速适中,充满活力和感染力,适合广告
- 温柔舒缓风格:语速偏慢,语调温柔甜美,像好友一样关怀温暖
声音定制
- CosyVoice
- Qwen-TTS-Realtime
声音复刻:输入音频格式要求
高质量的输入音频是实现优秀复刻效果的基础。| 项目 | 要求 |
|---|---|
| 支持格式 | WAV(16-bit)、MP3、M4A |
| 音频时长 | 推荐:10~20秒。最长:60秒。 |
| 文件大小 | ≤ 10 MB |
| 采样率 | ≥ 16 kHz |
| 声道 | 单声道或立体声。立体声音频仅处理第一声道,请确保第一声道包含清晰的人声。 |
| 内容 | 音频必须包含至少5秒的连续、清晰人声,不含背景音。其余部分仅允许短暂停顿(≤ 2秒)。整段音频应无背景音乐、噪音或其他人声,以确保核心语音内容的高质量。请使用正常说话的音频作为输入,不要上传歌曲或演唱音频,以确保复刻效果的准确性和可用性。 |
声音设计:编写高质量的声音描述
限制条件
编写声音描述(voice_prompt)时,请遵循以下技术约束:- 长度限制:
voice_prompt的内容不得超过500个字符。 - 支持语言:描述文本仅支持中文和英文。
核心原则
voice_prompt 用于引导模型生成具有特定特征的声音。编写声音描述时,请遵循以下核心原则:- 具体而非模糊:使用能够描绘具体声音特质的词语,如"低沉"、"清脆"、"语速偏快"。避免使用"好听"、"普通"等主观且缺乏信息量的词汇。
- 多维而非单一:优秀的描述通常结合多个维度(如性别、年龄、情感等)。单一维度的描述(如仅"女声")过于宽泛,难以生成特色鲜明的效果。
- 客观而非主观:专注于声音本身的物理和感知特征,而不是个人喜好。例如,用"音调偏高,带有活力"代替"我最喜欢的声音"。
- 原创而非模仿:请描述声音的特质,而不是要求模仿特定人物(如名人、演员)。此类请求涉及版权风险,且模型不支持直接模仿。
- 简洁而非冗余:确保每个词都有其意义。避免重复使用同义词或无意义的强调词(如"非常非常棒的声音")。
描述维度参考
| 维度 | 示例 |
|---|---|
| 性别 | 男性、女性、中性 |
| 年龄 | 儿童(5-12岁)、青少年(13-18岁)、青年(19-35岁)、中年(36-55岁)、老年(55岁以上) |
| 音调 | 高、中、低、偏高、偏低 |
| 语速 | 快、中、慢、偏快、偏慢 |
| 情感 | 欢快、沉稳、温柔、严肃、活泼、冷酷、舒缓 |
| 特质 | 磁性、清脆、沙哑、浑厚、甜美、浓郁、有力 |
| 用途 | 新闻播报、广告配音、有声读物、动画角色、语音助手、纪录片解说 |
示例对比
好的案例:- "年轻活泼的女声,语速较快,带有明显的上扬语调,适合介绍时尚产品。"
- 分析:该描述结合了年龄、性格、语速和语调,并指定了使用场景,形成了清晰的声音画像。
- "沉稳的中年男声,语速偏慢,低沉而富有磁性,适合新闻播报或纪录片解说。"
- 分析:该描述清晰定义了性别、年龄段、语速、音质和用途。
- "可爱的童声,约8岁女孩,说话略带稚气,适合动画角色配音。"
- 分析:该描述精准定位了年龄和声音特质(稚气),且有明确用途。
- "温柔知性的女性,约30岁,语气平和,适合有声读物朗读。"
- 分析:该描述通过"知性"、"平和"等词有效传达了声音的情感和风格。
| 不好的案例 | 主要问题 | 改进建议 |
|---|---|---|
| "好听的声音" | 描述过于模糊和主观,缺乏可操作的细节。 | 添加具体维度,如"音色清亮的年轻女声,语调轻柔"。 |
| "像某明星的声音" | 涉及版权风险,模型不支持直接模仿。 | 提取声音特征进行描述,如"成熟、磁性、语速沉稳的男声"。 |
| "非常非常非常好听的女声" | 描述冗余,重复用词无法帮助定义声音。 | 去除重复,添加有效描述,如"20~24岁的女声,音色轻快,语调活泼,音质甜美"。 |
| 123456 | 无效输入,无法解析为声音特征。 | 请提供有意义的文字描述,参见上方推荐示例。 |
Qwen3-TTS 支持声音克隆(Qwen3-TTS-VC)和声音设计(Qwen3-TTS-VD)。详见声音克隆指南。
API 参考
- CosyVoice
- Qwen-TTS-Realtime
系统音色
- CosyVoice
- Qwen-TTS-Realtime
不同模型支持不同的音色。发起请求时,将
voice 请求参数设置为音色列表中 voice 参数列对应的值。voice 参数 | 详情 | 支持语言 | 支持模型 |
|---|---|---|---|
| Cherry | 音色名:芊悦。阳光积极、亲切自然小姐姐(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18; Qwen-TTS-Realtime: qwen-tts-realtime |
| Serena | 音色名:苏瑶。温柔小姐姐(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27; Qwen-TTS-Realtime: qwen-tts-realtime |
| Ethan | 音色名:晨煦。标准普通话,带部分北方口音。阳光、温暖、活力、朝气(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18; Qwen-TTS-Realtime: qwen-tts-realtime |
| Chelsie | 音色名:千雪。二次元虚拟女友(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27; Qwen-TTS-Realtime: qwen-tts-realtime |
| Momo | 音色名:茉兔。撒娇搞怪,逗你开心(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Vivian | 音色名:十三。拽拽的、可爱的小暴躁(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Moon | 音色名:月白。率性帅气的月白(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Maia | 音色名:四月。知性与温柔的碰撞(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Kai | 音色名:凯。耳朵的一场SPA(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Nofish | 音色名:不吃鱼。不会翘舌音的设计师(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Bella | 音色名:萌宝。喝酒不打醉拳的小萝莉(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Jennifer | 音色名:詹妮弗。品牌级、电影质感般美语女声(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Ryan | 音色名:甜茶。节奏拉满,戏感炸裂,真实与张力共舞(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Katerina | 音色名:卡捷琳娜。御姐音色,韵律回味十足(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Aiden | 音色名:艾登。精通厨艺的美语大男孩(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Eldric Sage | 音色名:沧明子。沉稳睿智的老者,沧桑如松却心明如镜(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Mia | 音色名:乖小妹。温顺如春水,乖巧如初雪(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Mochi | 音色名:沙小弥。聪明伶俐的小大人,童真未泯却早慧如禅(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Bellona | 音色名:燕铮莺。声音洪亮,吐字清晰,人物鲜活,听得人热血沸腾;金戈铁马入梦来,字正腔圆间尽显千面人声的江湖(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Vincent | 音色名:田叔。一口独特的沙哑烟嗓,一开口便道尽了千军万马与江湖豪情(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Bunny | 音色名:萌小姬。"萌属性"爆棚的小萝莉(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Neil | 音色名:阿闻。平直的基线语调,字正腔圆的咬字发音,这就是最专业的新闻主持人(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Elias | 音色名:墨讲师。既保持学科严谨性,又通过叙事技巧将复杂知识转化为可消化的认知模块(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Arthur | 音色名:须弥。质朴厚重,浸透了岁月与烟火的嗓音,不急不慢地铺展着村头巷尾的奇闻异事(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Nini | 音色名:年糕。软软糯糯、粘人的小年糕(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Ebona | 音色名:黑羽。她的低语,像一把生锈的钥匙,在你意识的最暗角落缓缓转动(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Seren | 音色名:安睡。温柔舒缓的声音,助你快速入睡。晚安,好梦(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Pip | 音色名:皮蛋。一个活泼顽皮、充满童真好奇心的小男孩(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Stella | 音色名:星拉。平时齁甜、一脸懵的少女音,喊叫时瞬间爆发出对爱与正义的坚定(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22; Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Bodega | 音色名:波特加。热情似火的西班牙男人(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Sonrisa | 音色名:Sonrisa。开朗外向的拉美女性(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Alek | 音色名:阿列克。冷冽如同战斗民族的灵魂,温暖如同呢子大衣的内衬(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Dolce | 音色名:多尔切。散漫的意大利男人(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Sohee | 音色名:素熙。温暖、开朗且情感丰富的韩国小姐姐(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Ono Anna | 音色名:小野安娜。机灵可爱的青梅竹马(女性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Lenn | 音色名:Lenn。骨子里理性,细节中叛逆——一个穿西装听后朋克的德国青年(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Emilien | 音色名:埃米利安。浪漫的法国大哥哥(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Andre | 音色名:安德烈。有磁性的、自然的、沉稳的男声 | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Radio Gol | 音色名:拉迪奥·高尔。足球诗人拉迪奥·高尔(男性) | 中文(普通话)、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 |
| Jada | 音色名:嘉嘉。上海——语速快,精力充沛的上海阿姨(女性) | 上海话、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Dylan | 音色名:迪伦。北京——从小在北京胡同长大的小伙(男性) | 北京话、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Li | 音色名:小李。南京——有耐心的瑜伽老师(男性) | 南京话、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Marcus | 音色名:马库斯。陕西——脸宽、话少、心实、声沉——正宗陕西味儿(男性) | 陕西话、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Roy | 音色名:小罗。闽南——幽默爽朗、活泼直率的台湾佬(男性) | 闽南语、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Peter | 音色名:彼得。天津——天津话说相声,捧哏专业户(男性) | 天津话、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Sunny | 音色名:小阳。四川——甜到化的四川妹子(女性) | 四川话、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Eric | 音色名:埃里克。四川——成都出身的,在日常中也能脱颖而出的四川人(男性) | 四川话、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Rocky | 音色名:阿强。粤语——幽默风趣的阿强直播间(男性) | 粤语、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
| Kiki | 音色名:琪琪。粤语——甜蜜的港女闺蜜(女性) | 粤语、英语、法语、德语、俄语、意大利语、西班牙语、葡萄牙语、日语、韩语 | Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 |
常见问题
- CosyVoice
- Qwen-TTS-Realtime
语音合成发音不准怎么办?如何控制多音字的发音?
语音合成发音不准怎么办?如何控制多音字的发音?
- 将多音字替换为同音字,可快速解决发音问题。
- 使用语音合成标记语言(SSML)来控制发音。Sambert和CosyVoice都支持SSML。
使用复刻音色生成的音频无声音如何排查?
使用复刻音色生成的音频无声音如何排查?
- 确认音色状态:调用CosyVoice声音复刻/设计API接口,查看音色
status是否为OK。 - 检查模型版本一致性:确保复刻音色时使用的
target_model参数与语音合成时的model参数完全一致。例如复刻时使用cosyvoice-v3-plus,合成时也必须使用cosyvoice-v3-plus。 - 验证源音频质量:检查复刻音色时使用的源音频是否符合音频要求(音频时长10-20秒、音质清晰、无背景噪音)。
- 检查请求参数:确认语音合成时请求参数
voice设置为复刻音色的ID。
声音复刻后合成效果不稳定或语音不完整如何处理?
声音复刻后合成效果不稳定或语音不完整如何处理?
如果复刻音色后合成的语音出现语音播放不完整、合成效果不稳定、语音中包含异常停顿或静音段等问题,可能是源音频质量不符合要求。解决方案:
- 检查音频连续性:确保源音频中语音内容连续,避免长时间停顿或静音段(超过2秒)。
- 检查语音活动比例:确保有效语音占音频总时长的60%以上。
- 验证音频质量:音频时长10-20秒(推荐15秒左右),发音清晰、语速平稳,无背景噪音、回音、杂音。
Q:音频文件 URL 的有效期是多久?音频文件 URL 在 24 小时后过期。

