创建克隆音色

POST

/services/audio/tts/customization

cURL

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-voice-enrollment",
  "input": {
    "action": "create",
    "target_model": "qwen3-tts-vc-realtime-2026-01-15",
    "preferred_name": "guanyu",
    "audio": {
      "data": "https://xxx.wav"
    }
  }
}'

{
  "output": {
    "voice": "qwen-tts-vc-guanyu-voice-20250812105009984-838b",
    "target_model": "qwen3-tts-vc-realtime-2026-01-15",
    "fallback_mode": false,
    "fallback_reason": ""
  },
  "usage": {
    "count": 1
  },
  "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

model 是克隆模型（固定为 qwen-voice-enrollment）。target_model 是用于合成克隆音色的语音合成模型。后续合成调用中的 model 必须与此处的 target_model 一致，否则会导致调用失败。

显示Base64 编码示例

Python：

import base64, pathlib

# 将 input.mp3 替换为您的音频文件路径
file_path = pathlib.Path("input.mp3")
base64_str = base64.b64encode(file_path.read_bytes()).decode()
data_uri = f"data:audio/mpeg;base64,{base64_str}"

Java：

import java.nio.file.*;
import java.util.Base64;

public class Main {
  public static String toDataUrl(String filePath) throws Exception {
    byte[] bytes = Files.readAllBytes(Paths.get(filePath));
    String encoded = Base64.getEncoder().encodeToString(bytes);
    return "data:audio/mpeg;base64," + encoded;
  }

  public static void main(String[] args) throws Exception {
    System.out.println(toDataUrl("input.mp3"));
  }
}

您须确保对提供的音频拥有合法所有权和使用权。使用本 API 前，请阅读服务条款。

鉴权

string

header

必填

千问云 API Key。详见获取 API Key。

请求体

application/json

enum<string>

必填

声音克隆模型。固定为 qwen-voice-enrollment。

可选值：qwen-voice-enrollment

示例:qwen-voice-enrollment

object

必填

显示子属性

enum<string>

必填

操作类型。固定为 create。

可选值：create

示例:create

enum<string>

必填

克隆声音对应的语音合成模型。必须与后续合成调用中使用的模型一致。可选值：qwen3-tts-vc-realtime-2026-01-15、qwen3-tts-vc-realtime-2025-11-27（实时）、qwen3-tts-vc-2026-01-22（非实时）。

可选值：qwen3-tts-vc-realtime-2026-01-15,qwen3-tts-vc-realtime-2025-11-27,qwen3-tts-vc-2026-01-22

示例:qwen3-tts-vc-realtime-2026-01-15

string

必填

声音名称中的关键词（支持数字、字母、下划线，最多 16 个字符）。会出现在生成的声音名称中。示例：guanyu 对应生成 qwen-tts-vc-guanyu-voice-20250812105009984-838b。

示例:guanyu

取值范围：length <= 16pattern: ^[a-zA-Z0-9_]+$

object

必填

显示子属性

string

必填

用于克隆的音频。支持两种格式：Data URL -- data:<mediatype>;base64,<data>（<mediatype> 为 audio/wav、audio/mpeg 或 audio/mp4；Base64 编码后的数据需小于 10 MB）。音频 URL -- 可公开访问的 URL（无需鉴权）。

示例:https://xxx.wav

string

与音频内容对应的文本。服务器会验证匹配程度，若差异过大则返回 Audio.PreprocessError。

示例:可选。与音频内容对应的文本。

enum<string>

音频语言。指定后必须与音频实际语言一致。

可选值：zh,en,de,it,pt,es,ja,ko,fr,ru

示例:zh

响应

200-application/json

object

显示子属性

string

生成的声音名称。在合成调用中将此值作为 voice 参数传入。

示例:qwen-tts-vc-guanyu-voice-20250812105009984-838b

string

绑定到此声音的语音合成模型。

示例:qwen3-tts-vc-realtime-2026-01-15

boolean

当音频与模型不完全匹配时为 true，表示复刻效果可能不理想。

示例:false

string

降级原因。可能的值：no_merged_segments（无法合并音频段）、no_valid_asr_segments（无有效语音识别段）。仅在 fallback_mode 为 true 时返回。

示例:

object

显示子属性

integer

计费的声音创建次数。成功创建时固定为 1（每次计费 $0.01）。

示例:1

string

请求 ID，用于问题排查。

示例:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx