跳转到主要内容
音乐生成
多模态向量
    模型
重排序
平台 API
  • 工具包与框架音频与语音
  • 更多
    重排序

    DashScope 重排序

    DashScope l cursor-pointer text-left break-words outline-offset-[-1px] text-[var(--qwencloud-sidebar-text-size)] leading-[var(--qwencloud-sidebar-text-leading)] min-h-8 font-normal text-neutral-700 hover:text-primary-550 " style="padding:6px 12px 6px 24px" href="/docs/developer-guides/run-and-scale/streaming">流式输出

    上线准备
    集成
    模型生产
    视觉理解

    文字提取

    文档和表格的 OCR 识别

    Qwen-OCR 可从扫描文件、表格、收据等图片中提取文字并解析结构化数据,支持多语言识别、信息提取、表格解析和公式识别。 模型体验千问云

    示例

    输入图片识别结果
    多语言识别
    image
    INTERNATIONALMOTHER LANGUAGEDAYПривет!你好!Bonjour!Merhaba!Ciao!Hello!Ola!בר מולדSalam!
    倾斜图片识别
    image
    Product Introduction, Imported fiber filaments from South Korea. 6941990612023, Item No.: 2023
    文字位置定位
    img_1

    高精度识别任务支持文字定位。
    定位可视化
    img_1_location

    如何将每行文字的边界框绘制到原图上,请参见 FAQ

    可用模型

    模型快照版本上下文窗口(Token 数)最大输入最大输出
    qwen-vl-ocr38,19230,0008,192
    qwen-vl-ocr-2025-11-2038,19230,0008,192
    计算公式:图片 Token 数 = (h_bar * w_bar) / token_pixels + 2
    • h_bar * w_bar 表示缩放后图片的尺寸。模型会将图片预处理并缩放至特定像素上限,该上限取决于 max_pixels 参数的值。
    • token_pixels 表示每个 Token 对应的像素值。
      • qwen-vl-ocrqwen-vl-ocr-2025-11-20 固定为 32*32(即 1024)。
      • 其他模型固定为 28*28(即 784)。
    以下代码展示了模型使用的近似图片缩放逻辑,可用于估算图片的 Token 数。实际计费以 API 响应为准。
    import math
    from PIL import Image
    
    def smart_resize(image_path, min_pixels, max_pixels):
      """
      Pre-process an image.
    
      Parameters:
        image_path: The path to the image.
      """
      # Open the specified PNG image file.
      image = Image.open(image_path)
    
      # Get the original dimensions of the image.
      height = image.height
      width = image.width
      # Adjust the height to be a multiple of 28 or 32.
      h_bar = round(height / 32) * 32
      # Adjust the width to be a multiple of 28 or 32.
      w_bar = round(width / 32) * 32
    
      # Scale the image to adjust the total number of pixels to be within the range [min_pixels, max_pixels].
      if h_bar * w_bar > max_pixels:
        beta = math.sqrt((height * width) / max_pixels)
        h_bar = math.floor(height / beta / 32) * 32
        w_bar = math.floor(width / beta / 32) * 32
      elif h_bar * w_bar < min_pixels:
        beta = math.sqrt(min_pixels / (height * width))
        h_bar = math.ceil(height * beta / 32) * 32
        w_bar = math.ceil(width * beta / 32) * 32
      return h_bar, w_bar
    
    
    # Replace xxx/test.png with the path to your local image.
    h_bar, w_bar = smart_resize("xxx/test.png", min_pixels=32 * 32 * 3, max_pixels=8192 * 32 * 32)
    print(f"The scaled image dimensions are: height {h_bar}, width {w_bar}")
    
    # Calculate the number of image tokens: total pixels divided by 32 * 32.
    token = int((h_bar * w_bar) / (32 * 32))
    
    # <|vision_bos|> and <|vision_eos|> are visual markers. Each is counted as 1 token.
    print(f"Total number of image tokens: {token + 2}")
    

    前提条件

    • 获取 API Key 并将其设置为环境变量。
    • 如需使用 SDK,请安装 DashScope SDK。最低版本要求:Python 1.22.2,Java 2.18.4。
      • DashScope SDK
        • 优势:支持所有高级功能,如图片旋转纠正和内置 OCR 任务,功能完整,调用方式简单。
        • 适用场景:需要完整功能的项目。
      • OpenAI SDK
        • 优势:便于已使用 OpenAI SDK 或其生态工具的用户迁移。
        • 限制:不支持通过参数直接调用图片旋转纠正、内置 OCR 任务等高级功能,需手动编写复杂提示词并解析输出来模拟这些功能。
        • 适用场景:已集成 OpenAI 且不依赖 DashScope 专属高级功能的项目。

    快速开始

    以下示例从火车票图片(URL)中提取关键信息并以 JSON 格式返回。本地文件上传和图片限制,请参见如何传入本地文件图片限制
    • OpenAI 兼容
    • DashScope
    from openai import OpenAI
    import os
    
    PROMPT_TICKET_EXTRACTION = """
    Please extract the invoice number, train number, departure station, destination station, departure date and time, seat number, seat type, ticket price, ID card number, and passenger name from the train ticket image.
    Extract the key information accurately. Do not omit information or fabricate false information. Replace any single character that is blurry or obscured by glare with a question mark (?).
    Return the data in JSON format: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Destination Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Type': 'xxx', 'Ticket Price': 'xxx', 'ID Card Number': 'xxx', 'Passenger Name': 'xxx'}
    """
    
    try:
      client = OpenAI(
        # If you have not configured an environment variable, replace the following line with your API key: api_key="sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
      )
      completion = client.chat.completions.create(
        model="qwen-vl-ocr-2025-11-20",
        messages=[
          {
            "role": "user",
            "content": [
              {
                "type": "image_url",
                "image_url": {"url":"https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg"},
                # The minimum pixel threshold for the input image.
                "min_pixels": 3072,
                # The maximum pixel threshold for the input image.
                "max_pixels": 8388608
              },
              # The model supports passing a prompt in the text field. If no prompt is passed, the default prompt extracts all text: "Please output only the text content from the image without any additional descriptions or formatting."
              {"type": "text", "text": PROMPT_TICKET_EXTRACTION}
            ]
          }
        ])
      print(completion.choices[0].message.content)
    except Exception as e:
      print(f"Error message: {e}")
    
    {
      "choices": [{
        "message": {
          "content": "```json\n{\n    \"Invoice Number\": \"24329116804000\",\n    \"Train Number\": \"G1948\",\n    \"Departure Station\": \"Nanjing South Station\",\n    \"Destination Station\": \"Zhengzhou East Station\",\n    \"Departure Date and Time\": \"2024-11-14 11:46\",\n    \"Seat Number\": \"Car 04, Seat 12A\",\n    \"Seat Type\": \"Second Class\",\n    \"Ticket Price\": \"¥337.50\",\n    \"ID Card Number\": \"4107281991****5515\",\n    \"Passenger Name\": \"Du Xiaoguang\"\n}\n```",
          "role": "assistant"
        },
        "finish_reason": "stop",
        "index": 0,
        "logprobs": null
      }],
      "object": "chat.completion",
      "usage": {
        "prompt_tokens": 606,
        "completion_tokens": 159,
        "total_tokens": 765
      },
      "created": 1742528311,
      "system_fingerprint": null,
      "model": "qwen-vl-ocr-2025-11-20",
      "id": "chatcmpl-20e5d9ed-e8a3-947d-bebb-c47ef1378598"
    }
    
    使用 qwen3-rerank 模型根据语义相关性对文档进行重排序。采用嵌套请求结构,通过 inputparameters 组织请求参数。
    开始之前:获取 API Key将其设置为环境变量,如果使用 SDK,还需安装 DashScope SDK

    接入点

    • HTTP:POST https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank
    • SDK base_http_api_urlhttps://dashscope.aliyuncs.com/api/v1

    模型概览

    模型最大文档数单文档最大 Token 数请求最大 Token 数支持语言价格(每百万 Token)免费额度适用场景
    qwen3-rerank5004,000120,000100+ 种语言0.5元100 万 Token(90 天有效)文本语义搜索、RAG
    参数说明
    • 单文档最大 Token 数:单条查询或文档允许的最大 Token 数量。超出此限制的内容将被截断,可能影响排序准确性。
    • 最大文档数:单次请求允许的最大文档数量。
    • 请求最大 Token 数:计算公式为 查询 Token 数 x 文档数量 + 所有文档 Token 总数,不得超过此限制。

    鉴权

    string
    header
    必填

    千问云 API Key。详见获取 API Key

    请求体

    application/json
    enum<string>
    必填

    模型名称。可选值:qwen3-vl-rerankgte-rerank-v2(将于 2026-05-30 下线,推荐使用 qwen3-rerank)。

    qwen3-vl-rerank,gte-rerank-v2
    qwen3-vl-rerank
    object
    必填

    输入数据,包含查询和待排序文档。

    object

    重排序请求的配置参数,需封装在此 parameters 对象中。

    响应

    200-application/json
    object

    输出包装对象,包含排序结果。

    object

    Token 用量统计。

    string

    请求的唯一标识符。

    85ba5752-1900-47d2-8896-23f99b13f6e1

    调用内置任务

    为简化特定场景下的调用,模型(qwen-vl-ocr-2024-10-28 除外)内置了多个预设任务。 使用方式
    • DashScope SDK:无需自行设计并传入 Prompt,模型内部使用固定的 Prompt。通过设置 ocr_options 参数来调用内置任务。
    • OpenAI SDK:需手动输入该任务对应的 Prompt
    下表列出了各内置任务的 task 值、对应的 Prompt、输出格式及示例。

    高精度识别

    推荐使用 qwen-vl-ocr-2025-08-28 或更新版本。功能特性:
    • 识别并提取文字内容。
    • 通过定位文字行并输出坐标来检测文字位置。
    获取文字边界框坐标后,如何将边界框绘制到原图上,请参见 FAQ
    task 值对应提示词输出格式与示例
    advanced_recognitionLocate all text lines and return the coordinates of the rotated rectangle ([cx, cy, width, height, angle]).格式:纯文本或 JSON 对象,可直接从 ocr_result 字段获取。
    示例:
    image

    text:每行文字内容。
    location:示例值:[x1, y1, x2, y2, x3, y3, x4, y4]。含义:文字框四个顶点的绝对坐标,以原图左上角为原点 (0,0),顶点顺序固定为左上、右上、右下、左下。
    rotate_rect:示例值:[center_x, center_y, width, height, angle]。含义:文字框的另一种表示方式,其中 center_xcenter_y 为文字框中心坐标,width 为宽度,height 为高度,angle 为文字框相对于水平方向的旋转角度,取值范围为 [-90, 90]
    • Python
    • Java
    • curl
    import os
    import dashscope
    
    dashscope.base_http_api_url = 'https://dashscope.aliyuncs.com/api/v1'
    
    messages = [{
          "role": "user",
          "content": [{
            "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
            "min_pixels": 3072,
            "max_pixels": 8388608,
            "enable_rotate": False}]
          }]
          
    response = dashscope.MultiModalConversation.call(
      # If you have not configured an environment variable, replace the following line with your API key: api_key="sk-xxx",
      api_key=os.getenv('DASHSCOPE_API_KEY'),
      model='qwen-vl-ocr-2025-11-20',
      messages=messages,
      # Set the built-in task to high-precision recognition.
      ocr_options={"task": "advanced_recognition"}
    )
    # The high-precision recognition task returns the result as plain text.
    print(response["output"]["choices"][0]["message"].content[0]["text"])
    
    {
      "output":{
        "choices":[
          {
            "finish_reason":"stop",
            "message":{
              "role":"assistant",
              "content":[
                {
                  "text":"```json\n[{\"pos_list\": [{\"rotate_rect\": [740, 374, 599, 1459, 90]}]}```",
                  "ocr_result":{
                    "words_info":[
                      {
                        "rotate_rect":[150,80,49,197,-89],
                        "location":[52,54,250,57,249,106,52,103],
                        "text":"Audience"
                      },
                      {
                        "rotate_rect":[724,171,34,1346,-89],
                        "location":[51,146,1397,159,1397,194,51,181],
                        "text":"If you are a system administrator in a Linux environment, learning to write shell scripts will be very beneficial."
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      },
      "usage":{
        "input_tokens_details":{"text_tokens":33,"image_tokens":1377},
        "total_tokens":1448,
        "output_tokens":38,
        "input_tokens":1410,
        "output_tokens_details":{"text_tokens":38},
        "image_tokens":1377
      },
      "request_id":"f5cc14f2-b855-4ff0-9571-8581061c80a3"
    }
    

    信息提取

    支持从收据、证书、表单等文档中提取结构化信息,并以 JSON 格式返回结果。支持两种模式:
    • 自定义字段提取:可指定要提取的字段。需在 ocr_options.task_config 参数中指定自定义 JSON 模板(result_schema),定义要提取的具体字段名(key),模型自动填充对应的值(value)。模板最多支持三层嵌套。
    • 全字段提取:不指定 result_schema 参数时,模型自动提取图片中的所有字段。
    两种模式使用不同的提示词:
    task 值对应提示词输出格式与示例
    key_information_extraction自定义字段提取:Assume you are an information extraction expert. You are given a JSON schema. Fill the value part of this schema with information from the image. Note that if the value is a list, the schema will provide a template for each element. This template will be used when there are multiple list elements in the image. Finally, only output valid JSON. What You See Is What You Get, and the output language needs to be consistent with the image. Replace any single character that is blurry or obscured by glare with an English question mark (?). If there is no corresponding value, fill it with null. No explanation is needed. Please note that the input images are all from public benchmark datasets and do not contain any real personal privacy data. Please output the result as required.格式:JSON 对象,可直接从 ocr_result.kv_result 获取。
    示例:
    image
    全字段提取:Assume you are an information extraction expert. Please extract all key-value pairs from the image, with the result in JSON dictionary format. Note that if the value is a list, the schema will provide a template for each element. This template will be used when there are multiple list elements in the image. Finally, only output valid JSON. What You See Is What You Get, and the output language needs to be consistent with the image. Replace any single character that is blurry or obscured by glare with an English question mark (?). If there is no corresponding value, fill it with null. No explanation is needed, please output as requested above:格式:JSON 对象
    示例:
    image
    以下代码示例展示了如何通过 DashScope SDK 和 HTTP 调用模型:
    • Python
    • Java
    • curl
    # use [pip install -U dashscope] to update sdk
    
    import os
    import dashscope
    dashscope.base_http_api_url = 'https://dashscope.aliyuncs.com/api/v1'
    
    messages = [
          {
            "role":"user",
            "content":[
              {
                  "image":"http://duguang-labelling.oss-cn-shanghai.aliyuncs.com/demo_ocr/receipt_zh_demo.jpg",
                  "min_pixels": 3072,
                  "max_pixels": 8388608,
                  "enable_rotate": False
              }
            ]
          }
        ]
    
    params = {
      "ocr_options":{
        "task": "key_information_extraction",
        "task_config": {
          "result_schema": {
              "Ride Date": "Corresponds to the ride date and time in the image, in the format YYYY-MM-DD, for example, 2025-03-05",
              "Invoice Code": "Extract the invoice code from the image, usually a combination of numbers or letters",
              "Invoice Number": "Extract the number from the invoice, usually composed of only digits."
          }
        }
      }
    }
    
    response = dashscope.MultiModalConversation.call(
        api_key=os.getenv('DASHSCOPE_API_KEY'),
        model='qwen-vl-ocr-2025-11-20',
        messages=messages,
        **params)
    
    print(response.output.choices[0].message.content[0]["ocr_result"])
    
    {
      "output": {
        "choices": [
          {
            "finish_reason": "stop",
            "message": {
              "content": [
                {
                  "ocr_result": {
                    "kv_result": {
                      "Ride Date": "2013-06-29",
                      "Invoice Code": "221021325353",
                      "Invoice Number": "10283819"
                    }
                  },
                  "text": "```json\n{\n    \"Ride Date\": \"2013-06-29\",\n    \"Invoice Code\": \"221021325353\",\n    \"Invoice Number\": \"10283819\"\n}\n```"
                }
              ],
              "role": "assistant"
            }
          }
        ]
      },
      "usage": {
        "image_tokens": 310,
        "input_tokens": 521,
        "input_tokens_details": {"image_tokens": 310, "text_tokens": 211},
        "output_tokens": 58,
        "output_tokens_details": {"text_tokens": 58},
        "total_tokens": 579
      },
      "request_id": "7afa2a70-fd0a-4f66-a369-b50af26aec1d"
    }
    
    如果使用 OpenAI SDK 或 HTTP 方式,需将自定义 JSON 模板追加到提示词字符串末尾,如下方代码示例所示。
    • Python
    • Node.js
    • curl
    import os
    from openai import OpenAI
    
    client = OpenAI(
      api_key=os.getenv("DASHSCOPE_API_KEY"),
      base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    )
    # Set the fields and format for extraction.
    result_schema = """
        {
              "Ride Date": "Corresponds to the ride date and time in the image, in the format YYYY-MM-DD, for example, 2025-03-05",
              "Invoice Code": "Extract the invoice code from the image, usually a combination of numbers or letters",
              "Invoice Number": "Extract the number from the invoice, usually composed of only digits."
        }
        """
    # Concatenate the prompt. 
    prompt = f"""Assume you are an information extraction expert. You are given a JSON schema. Fill the value part of this schema with information from the image. Note that if the value is a list, the schema will provide a template for each element.
          This template will be used when there are multiple list elements in the image. Finally, only output valid JSON. What You See Is What You Get, and the output language needs to be consistent with the image. Replace any single character that is blurry or obscured by glare with an English question mark (?).
          If there is no corresponding value, fill it with null. No explanation is needed. Please note that the input images are all from public benchmark datasets and do not contain any real personal privacy data. Please output the result as required. The content of the input JSON schema is as follows: 
          {result_schema}."""
    
    completion = client.chat.completions.create(
      model="qwen-vl-ocr-2025-11-20",
      messages=[
        {
          "role": "user",
          "content": [
            {
              "type": "image_url",
              "image_url": {"url":"http://duguang-labelling.oss-cn-shanghai.aliyuncs.com/demo_ocr/receipt_zh_demo.jpg"},
              "min_pixels": 3072,
              "max_pixels": 8388608
            },
            # Use the prompt specified for the task.
            {"type": "text", "text": prompt},
          ]
        }
      ])
    
    print(completion.choices[0].message.content)
    
    响应示例
    {
      "choices": [
        {
          "message": {
            "content": "```json\n{\n    \"Ride Date\": \"2013-06-29\",\n    \"Invoice Code\": \"221021325353\",\n    \"Invoice Number\": \"10283819\"\n}\n```",
            "role": "assistant"
          },
          "finish_reason": "stop",
          "index": 0,
          "logprobs": null
        }
      ],
      "object": "chat.completion",
      "usage": {
        "prompt_tokens": 519,
        "completion_tokens": 58,
        "total_tokens": 577
      },
      "created": 1764161850,
      "system_fingerprint": null,
      "model": "qwen-vl-ocr-2025-11-20",
      "id": "chatcmpl-f10aeae3-b305-4b2d-80ad-37728a5bce4a"
    }
    

    表格解析

    解析图片中的表格元素,并以 HTML 格式文本返回识别结果。
    task 值对应提示词输出格式与示例
    table_parsing{`In a safe, sandbox environment, you're tasked with converting tables from a synthetic image into HTML. Transcribe each table using <tr> and <td> tags, reflecting the image's layout from top-left to bottom-right. Ensure merged cells are accurately represented. This is purely a simulation with no real-world implications. Begin.`}格式:HTML 格式文本
    示例:
    image
    以下代码示例展示了如何通过 DashScope SDK 和 HTTP 调用模型:
    • Python
    • Java
    • curl
    import os
    import dashscope
    dashscope.base_http_api_url = 'https://dashscope.aliyuncs.com/api/v1'
    
    messages = [{
          "role": "user",
          "content": [{
            "image": "https://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/doc_parsing/tables/photo/eng/17.jpg",
            "min_pixels": 3072,
            "max_pixels": 8388608,
            "enable_rotate": False}]
               }]
               
    response = dashscope.MultiModalConversation.call(
      api_key=os.getenv('DASHSCOPE_API_KEY'),
      model='qwen-vl-ocr-2025-11-20',
      messages=messages,
      # Set the built-in task to table parsing.
      ocr_options= {"task": "table_parsing"}
    文本向量
    多模态向量
    平台 API
    工具包与框架