Qwen-OCR 文字提取模型 - 千问 AI 平台

POST

/compatible-mode/v1/chat/completions

from openai import OpenAI
import os

PROMPT_TICKET_EXTRACTION = """
Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image.
You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?).
Return the data in JSON format as follows: {'invoice_number': 'xxx', 'departure_station': 'xxx', 'arrival_station': 'xxx', 'departure_date_and_time':'xxx', 'seat_number': 'xxx','ticket_price':'xxx', 'id_card_number': 'xxx', 'passenger_name': 'xxx'},
"""

try:
  client = OpenAI(
    # 如果未配置环境变量，请将下行替换为您的 API Key：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
  )
  completion = client.chat.completions.create(
    model="qwen3.5-ocr",
    messages=[
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {"url":"https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg"},
            # 输入图像的最小像素阈值。如果图像的像素数低于该值，则图像会被放大，直到总像素数超过 min_pixels。
            "min_pixels": 32 * 32 * 3,
            # 输入图像的最大像素阈值。如果图像的像素数超过该值，则图像会被缩小，直到总像素数低于 max_pixels。
            "max_pixels": 32 * 32 * 8192
          },
          # 模型支持在以下 text 字段中传入提示词。如果未传入提示词，则使用默认提示词：请输出图片中的文本内容，不要输出其他内容。
          {"type": "text",
                     "text": PROMPT_TICKET_EXTRACTION}
        ]
      }
    ])
  print(completion.choices[0].message.content)
except Exception as e:
  print(f"Error message: {e}")

{
  "id": "<string>",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": {},
      "message": {
        "content": "<string>",
        "processed_text": "<string>",
        "refusal": "<string>",
        "role": "assistant",
        "audio": {},
        "function_call": {},
        "tool_calls": [],
        "annotations": null
      }
    }
  ],
  "created": 0,
  "model": "<string>",
  "object": "chat.completion",
  "service_tier": "<string>",
  "system_fingerprint": "<string>",
  "usage": {
    "completion_tokens": 0,
    "prompt_tokens": 0,
    "total_tokens": 0,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 0,
      "audio_tokens": 0,
      "reasoning_tokens": 0,
      "text_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "audio_tokens": 0,
      "cached_tokens": 0,
      "image_tokens": 0,
      "text_tokens": 0
    }
  }
}

鉴权

string

header

必填

千问 AI 平台 API Key。详见获取 API Key。

请求体

application/json

string

必填

模型名称。支持的模型列表请参见 Qwen-OCR。

示例:qwen3.5-ocr

object[]

必填

按对话顺序向模型提供上下文的消息序列。

显示子属性

enum<string>

必填

用户消息的角色，值必须为 user。

可选值：user

object[]

必填

消息内容。

显示子属性

enum<string>

内容类型。文本输入使用 text，图像输入使用 image_url。

可选值：text,image_url

string

输入文本。默认值：Please output only the text content from the image without any additional descriptions or formatting.

object

输入图像的相关信息。当 type 为 image_url 时必填。

显示子属性

string

必填

图像的 URL 或 Base64 编码的 Data URL。有关传入本地文件的更多信息，请参见文字提取。

integer

输入图像的最小像素阈值（单位：像素）。如果输入图像的像素数低于 min_pixels，则图像会被放大，直到总像素数超过 min_pixels。

图像 token 与像素的换算关系：

qwen3.5-ocr、qwen-vl-ocr-latest：每个 token 对应 32×32 像素。
qwen-vl-ocr、qwen-vl-ocr-2025-08-28 及更早版本：每个 token 对应 28×28 像素。

取值范围：

qwen3.5-ocr、qwen-vl-ocr-latest：默认值和最小值为 3072（3×32×32）。
qwen-vl-ocr、qwen-vl-ocr-2025-08-28 及更早版本：默认值和最小值为 3136（4×28×28）。

integer

输入图像的最大像素阈值（单位：像素）。如果输入图像的像素数在 [min_pixels, max_pixels] 范围内，模型将直接处理原始图像，不进行缩放。如果像素数超过 max_pixels，则图像会被缩小，直到像素数小于 max_pixels。

图像 token 与像素的换算关系：

qwen3.5-ocr、qwen-vl-ocr-latest：每个 token 对应 32×32 像素。
qwen-vl-ocr、qwen-vl-ocr-2025-08-28 及更早版本：每个 token 对应 28×28 像素。

取值范围：

qwen3.5-ocr、qwen-vl-ocr-latest：默认值 8388608（8192×32×32），最大值 30720000（30000×32×32）。
qwen-vl-ocr、qwen-vl-ocr-2025-08-28 及更早版本：默认值 6422528（8192×28×28），最大值 23520000（30000×28×28）。

boolean

默认值false

指定是否以流式模式返回响应。false：一次性返回完整响应。true：随模型生成过程逐块返回数据。

object

流式输出的配置项，仅在 stream 为 true 时生效。

显示子属性

boolean

默认值false

是否在流式输出的最后一个数据块中包含 token 用量信息。

integer

输出的最大 token 数。如果生成内容超过该值，响应将被截断。

对于 qwen3.5-ocr、qwen-vl-ocr-latest 和 qwen-vl-ocr-2024-10-28，默认值和最大值与模型的最大输出长度相同。
对于 qwen-vl-ocr、qwen-vl-ocr-2025-04-13 和 qwen-vl-ocr-2025-08-28，默认值和最大值均为 4096。

boolean

默认值false

指定是否返回输出 token 的对数概率。

integer

默认值0

指定每个生成步骤中返回的最可能 token 数量。取值范围：[0, 5]。仅在 logprobs 为 true 时生效。

number

默认值0.01

采样温度，控制生成文本的多样性。值越高越多样，值越低越确定。取值范围：[0, 2)。temperature 和 top_p 只能设置其中之一。

number

默认值0.001

核采样的概率阈值。值越高越多样，值越低越确定。取值范围：(0, 1.0]。temperature 和 top_p 只能设置其中之一。

integer

默认值1

采样候选集大小。值越大随机性越高。若为 None 或大于 100，则仅 top_p 生效。必须 >= 0。非标准 OpenAI 参数，Python SDK 中请使用 extra_body={"top_k": xxx}。

number

默认值1

重复序列的惩罚系数。值越高越能减少重复。1.0 表示不惩罚。

number

默认值0

控制内容重复程度。取值范围：[-2.0, 2.0]。正值减少重复，负值增加重复。

integer

用于复现结果的随机数种子。取值范围：[0, 2^31−1]。

string

停止词。当出现指定字符串或 token_id 时，立即停止生成。可以是字符串或数组。stop 为数组时，不能混用 token_id 和字符串。

响应

200-application/json

string

本次请求的唯一标识符。

object[]

显示子属性

enum<string>

模型停止生成的原因。完成时为 stop，截断时为 length。

可选值：stop,length

integer

在 choices 数组中的索引。

object | null

对数概率信息。除非启用 logprobs，否则为 null。

object

显示子属性

string

模型返回的内容。

string

对模型原始输出进行后处理的结果，自动删除重复片段等。当模型输出存在重复内容时，该字段提供清洗后的文本。

string | null

始终为 null。

enum<string>

始终为 assistant。

可选值：assistant

object | null

始终为 null。

object | null

始终为 null。

unknown[] | null

始终为 null。

unknown[]

预留字段，当前为 null。

integer

本次请求创建时的 UNIX 时间戳。

string

本次请求使用的模型。

enum<string>

始终为 chat.completion。

可选值：chat.completion

string | null

始终为 null。

string | null

始终为 null。

object

Token 用量信息。

显示子属性

integer

模型输出的 token 数量。

integer

输入的 token 数量。

integer

prompt_tokens 和 completion_tokens 的总和。

object

显示子属性

integer | null

始终为 null。

integer | null

始终为 null。

integer | null

始终为 null。

integer

文本输出的 token 数量。

integer | null

始终为 null。

object

显示子属性

integer | null

始终为 null。

integer | null

始终为 null。

integer

图像输入的 token 数量。

integer

文本输入的 token 数量。