跳转到主要内容
翻译

音视频文件翻译

支持18种语言翻译

模型信息

模型版本上下文窗口最大输入最大输出
qwen3-livetranslate-flash稳定版53,248 tokens49,152 tokens4,096 tokens
qwen3-livetranslate-flash-2025-12-01快照版53,248 tokens49,152 tokens4,096 tokens
qwen3-livetranslate-flash 当前与 qwen3-livetranslate-flash-2025-12-01 的能力一致。

快速开始

前提条件

  1. 获取 API Key
  2. 将其设置为环境变量
  3. (可选)如果使用 OpenAI SDK,请安装 SDK
以下示例均使用 OpenAI 兼容的流式 API,通过 translation_options 设置源语言和目标语言。默认输入为音频。如需翻译视频文件,取消注释各示例中的视频输入代码块即可。
指定 source_lang 可提升翻译准确率。省略该参数则自动检测语言。
  • Python
  • Node.js
  • curl
import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

# --- 音频输入 ---
messages = [
  {
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "input_audio": {
          "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
          "format": "wav",
        },
      }
    ],
  }
]

# --- 视频输入(取消注释以使用) ---
# messages = [
#     {
#         "role": "user",
#         "content": [
#             {
#                 "type": "video_url",
#                 "video_url": {
#                     "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
#                 },
#             }
#         ],
#     },
# ]

completion = client.chat.completions.create(
  model="qwen3-livetranslate-flash",
  messages=messages,
  modalities=["text", "audio"],
  audio={"voice": "Cherry", "format": "wav"},
  stream=True,
  stream_options={"include_usage": True},
  # translation_options 不是 OpenAI 标准参数,需通过 extra_body 传递
  extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}},
)

for chunk in completion:
  print(chunk)
以上示例使用的是公开文件 URL。

发送 Base64 编码的本地文件

要翻译本地音频文件,需先将文件读取并编码为 Base64。以 data URI 格式传递数据:data:audio/<format>;base64,<base64_data>(例如 data:audio/wav;base64,UklGRiQAAABXQVZFZm10...)。
支持的音频格式:WAV、MP3、FLAC、AAC、OGG、OPUS、M4A、WMA、AMR。采样率:8kHz-48kHz。
  • Python
  • Node.js
  • curl
import os
import base64
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

# 读取并编码本地音频文件
with open("local_audio.wav", "rb") as f:
  audio_base64 = base64.b64encode(f.read()).decode("utf-8")

completion = client.chat.completions.create(
  model="qwen3-livetranslate-flash",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": f"data:audio/wav;base64,{audio_base64}",
            "format": "wav",
          },
        }
      ],
    }
  ],
  modalities=["text", "audio"],
  audio={"voice": "Cherry", "format": "wav"},
  stream=True,
  stream_options={"include_usage": True},
  extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}},
)

for chunk in completion:
  print(chunk)
Qwen

查询音色列表

分页查询账号下的声音列表。

POST
/services/audio/tts/customization
cURL
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-voice-design",
  "input": {
    "action": "list",
    "page_size": 10,
    "page_index": 0
  }
}'
{
  "output": {
    "page_index": 0,
    "page_size": 10,
    "total_count": 26,
    "voice_list": [
      {
        "voice": "qwen-tts-vd-announcer-voice-20251210170454-a1b2",
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "language": "en",
        "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice.",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "gmt_create": "2025-12-10 17:04:54",
        "gmt_modified": "2025-12-10 17:04:54"
      }
    ]
  },
  "usage": {},
  "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

鉴权

string
header
必填

千问云 API Key。详见获取 API Key

请求体

application/json
enum<string>
必填

音色设计模型,固定值为 qwen-voice-design

qwen-voice-design
qwen-voice-design
object
必填

响应

200-application/json
object

每页条目数。

10
integer

账号下的音色总数。

26
object[]

音色对象数组。

object

用量信息(查询操作时为空)。

string

请求 ID,可用于问题排查。

xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx