跳转到主要内容
图像生成与编辑

文生图

根据文本提示词生成图像。

根据文本描述生成图像。如需对比各模型并选择合适的方案,请参见图像模型模型体验千问云

模型效果展示

Qwen-Image

复杂文字长段落复杂版式
复杂文字
长段落
复杂版式
海报创作插画设计写实摄影
海报创作
插画设计
写实摄影
复杂文字: Bookstore window display. A sign displays "New Arrivals This Week". Below, a shelf tag with the text "Best-Selling Novels Here". To the side, a colorful poster advertises "Author Meet And Greet on Saturday" with a central portrait of the author. There are four books on the bookshelf, namely "The light between worlds" "When stars are scattered" "The silent patient" "The night circus"长段落: A young girl dressed in a school uniform stands in a classroom, writing on the blackboard. Centered on the board, neatly inscribed in white chalk, is the text: "Introducing Qwen-Image, a foundational image generation model that excels in complex text rendering and precise image editing." Soft natural light streams through the windows, casting gentle shadows. The scene is rendered in a realistic photographic style, with finely detailed textures, shallow depth of field, and warm tonal hues. The girl's focused expression and the chalk dust suspended in the air add a sense of movement and vitality. Background elements-including student desks and educational posters-are slightly blurred to emphasize the central action. Ultra-high 32K resolution, DSLR-quality imagery, soft bokeh effect, and documentary-style composition.复杂版式: Create a classroom PPT slide for a speech. It features artistic, decorative shapes framing neatly arranged textual info as an elegant infographic. Center title: 'Habits for Emotional Wellbeing', surrounded by a symmetrical floral pattern. Left upper: 'Practice Mindfulness' + minimalist lotus icon + text 'Be present, observe without judging, accept without resisting'. Downward: 'Cultivate Gratitude' + open hand illustration + text 'Appreciate simple joys and acknowledge positivity daily'. Bottom - left: 'Stay Connected' + minimalistic chat bubble icon + text 'Build and maintain meaningful relationships to sustain emotional energy'. Bottom right: 'Prioritize Sleep' + crescent moon illustration + text 'Quality sleep benefits both body and mind'. Upward right: 'Regular Physical Activity' + jogging runner icon + text 'Exercise boosts mood and relieves anxiety'. Top right: 'Continuous Learning' + book icon + text 'Engage in new skill and knowledge for growth'. The layout balances clarity & artistry, guiding viewers naturally. --ar 16:9 --style clean - presentation.海报创作: Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title "Come Play Ball!" is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle "Come [Show Off Your Skills]!" appears in green font. A speech bubble adds playful charm with the text: "Hehe, watch me amaze my little friends next!" At the bottom, supplementary text reads: "We get to play ball with our friends again!" The color palette centers on fresh greens and blues, accented with bright pink and yellow tones to highlight a cheerful, childlike atmosphere.插画设计: A vibrant and lively illustration of a sunny, bustling commercial street scene, slice of life. In the foreground, a young boy in a white shirt and shorts is intently choosing items from a market stall. The stall is filled with snacks, drinks, and daily goods. The stall owner, a middle-aged man in an apron, is organizing the products. A wooden sign with "Qwen-Image" in a handwritten style hangs above the stall. The background features modern, colorful buildings with prominent signs for "Qwen Cloud" "Text-to-Image". The sky is azure blue with fluffy white clouds and soaring seagulls. Art Style: Realism illustration, delicate and soft, vibrant colors, rich layers, subtle hand-drawn texture, detailed, strong light and shadow, full composition, strong sense of depth, cheerful and relaxing atmosphere.写实摄影: A realistic, high-fashion street-style photograph of a young Asian woman. She stands confidently on a vibrant, neon-lit city street at night. She is wearing a sleek black bomber jacket with a subtle white geometric logo and the word "Qwen" embroidered on the back, paired with dark cargo pants. The background is filled with the glowing signs and soft bokeh of city lights, creating a cinematic and atmospheric mood. The lighting is dramatic, with highlights from the neon signs casting colors onto her face and jacket. In the bottom-right corner, overlayed text reads "Neon Dreams" and "Urban Pulse". The text is in a modern, stylish, sans-serif font with a slight neon glow effect, seamlessly integrated into the composition. The entire image should be a masterpiece, ultra-detailed, 8K, UHD, with sharp focus and professional photographic quality, capturing a candid yet powerful urban moment.

Wan 系列

人像摄影写实摄影绘画风格
人像摄影
写实摄影
绘画风格
文字生成海报设计图集生成
文字生成
海报设计
图集生成
人像摄影: hyper-realistic Scandinavian woman portrait, flowing platinum blonde hair and piercing blue eyes with prominent freckles, sharp intellectual gaze, Nordic cold-toned directional lighting creating icy atmosphere, minimalist modern styling with clean lines, shallow depth-of-field with a blurred, cold-gradient background, authentic Nordic facial features and porcelain skin texture.写实摄影: a fish-eye perspective forest scene with dramatic perspective distortion, ultra-detailed red fox staring into lens with piercing amber eyes, hyper-realistic fur texture showing individual guard hairs and undercoat layers, radially warped trees forming circular background patterns, watercolor painting style with translucent washes and organic pigment bleeding, soft pastel palette of moss green and earth ochre tones, painterly lighting with atmospheric glow through canopy gaps绘画风格: Vintage oil painting style pastoral scene, a farmer herding sheep across a meadow full of wildflowers, a windmill in the distance turning under blue sky and white clouds, smoke curling from the chimney of a wooden house, bright and soft colors, full of tranquility and comfort.文字生成: A page from a botanical illustration book, hand-drawn watercolor style, depicting a "dandelion" and labeling its various parts.海报设计: Cinematic poster scene: Extreme macro close-up of eye in wooden crack. Minimalist monochrome, watercolor-CGI fusion, low saturation. Slow push-in with tremor for surreal intensity. Vast negative space, hidden title. Optimized for immersive video generation.图集生成: Memories of an old man's life, four portraits in different frames, depicting his childhood (black and white photo), youth (military uniform photo), middle age (business suit work photo), and old age (photo with his wife).

模型可用性

模型详情和定价请参见图像模型

快速开始

前提条件

获取 API key 并将其设置为环境变量。如需使用 SDK,请先安装 SDK
Python SDK 需要 1.25.15+ 版本,Java SDK 需要 2.22.13+ 版本。

示例代码

所有 Wan 模型都支持异步调用。wan2.7-image-prowan2.7-imagewan2.6-imagewan2.6-t2i 还支持同步调用。所有 Qwen-Image 模型支持同步调用,其中 qwen-image-plusqwen-image 还支持异步调用。
  • 同步调用(Qwen-Image)
  • 异步调用(Wan)
请求示例
import json
import os
import dashscope
from dashscope import MultiModalConversation

dashscope.base_http_api_url = 'https://dashscope.aliyuncs.com/api/v1'

messages = [
  {
    "role": "user",
    "content": [
      {"text": "Healing-style hand-drawn poster featuring three puppies playing with a ball on lush green grass, adorned with decorative elements such as birds and stars. The main title \"Come Play Ball!\" is prominently displayed at the top in bold, blue cartoon font. Below it, the subtitle \"Come [Show Off Your Skills]!\" appears in green font. A speech bubble adds playful charm with the text: \"Hehe, watch me amaze my little friends next!\" At the bottom, supplementary text reads: \"数据安全与隐私
  • 零数据留存
  • 审计日志
  • 上线准备

    Token 计算

    了解文本、视觉和音频模型的 Token 用量及计费方式

    Token 是千问云上文本和视觉模型的基本计费与上下文管理单位。了解 Token 的计算方式有助于估算成本、控制上下文长度并优化 Prompt。音频和图像生成模型使用不同的计费单位(秒、字符或图片数),本文也会一并介绍。

    文本 Token

    文本模型将输入和输出切分为子词(subword)单元。粗略估算**:1 个 Token 约等于 4 个英文字符**,或约 1.5 个中文字符。实际数量取决于分词器和词表。

    从 API 响应中读取 Token 用量

    每次文本生成响应都包含 usage 对象,其中有精确的 Token 计数:
    {
      "usage": {
        "prompt_tokens": 34,
        "completion_tokens": 89,
        "total_tokens": 123
      }
    }
    
    使用上下文缓存时,会返回更多细节:
    {
      "usage": {
        "prompt_tokens": 1520,
        "completion_tokens": 85,
        "total_tokens": 1605,
        "prompt_tokens_details": {
          "cached_tokens": 1480,
          "cache_creation_input_tokens": 0
        }
      }
    }
    
    使用推理模式时,响应中会包含推理 Token:
    {
      "usage": {
        "prompt_tokens": 50,
        "completion_tokens": 300,
        "total_tokens": 350,
        "completion_tokens_details": {
          "reasoning_tokens": 245
        }
      }
    }
    
    推理 Token 计入 completion_tokens,按输出 Token 费率计费。对于复杂推理任务,推理 Token 可能会显著增加总 Token 用量。

    发送请求前估算 Token 数

    在调用 API 之前,可以直接使用分词器估算 Token 数量。Qwen 模型使用兼容 tiktoken 的分词器:
    # pip install tiktoken
    import tiktoken
    
    # 使用 Qwen 分词器
    encoding = tiktoken.get_encoding("o200k_base")
    tokens = encoding.encode("Your prompt text here")
    print(f"Token count: {len(tokens)}")
    
    Token 估算有助于控制上下文窗口和预估成本。如需精确数值,请以 API 响应中的 usage 字段为准。

    视觉 Token

    视觉模型(Qwen-VL 系列)将图片和视频帧与文本一起转换为 Token。Token 数量取决于图片分辨率。

    图片 Token 计算公式

    image_tokens = ceil(height / 28) × ceil(width / 28) / 4 + 2
    
    其中:
    • 图片会被缩放至 max_pixels(默认 1003520 像素)范围内,保持宽高比,尺寸四舍五入到 28 的倍数
    • / 4 对应视觉编码器中的 2×2 像素合并操作
    • + 2<vision_bos><vision_eos> 两个特殊 Token
    示例:一张 1024×1024 的图片 ≈ (1008/28) × (1008/28) / 4 + 2 = 326 个 Token

    Python 估算图片 Token

    import math
    
    def estimate_image_tokens(width, height, max_pixels=1003520, min_pixels=3136):
      """估算 Qwen 视觉模型的图片 Token 数量。"""
      # 缩放至像素预算范围内
      total_pixels = width * height
      if total_pixels > max_pixels:
        scale = math.sqrt(max_pixels / total_pixels)
        width = int(width * scale)
        height = int(height * scale)
    
      # 取整到 28 的最近倍数
      width = max(28, round(width / 28) * 28)
      height = max(28, round(height / 28) * 28)
    
      # 计算 Token 数
      return (height // 28) * (width // 28) // 4 + 2
    
    # 示例
    print(estimate_image_tokens(1024, 1024))   # ~326 tokens
    print(estimate_image_tokens(1920, 1080))   # ~326 tokens
    print(estimate_image_tokens(4096, 4096))   # ~326 tokens(缩放后)
    

    高分辨率模式

    启用 vl_high_resolution_images 可以以更高保真度处理图片(每个 Token 块对应 28×28 像素,而非默认的等效比率)。这会增加 Token 数量(每张图片最高可达 16,384 个 Token),但能提升细节识别能力,适用于 OCR 或小字识别等场景。

    视频 Token

    视频输入会被采样为独立帧,每帧使用相同的图片公式计算 Token。视频总 Token 数等于所有采样帧的 Token 之和。帧采样率取决于模型和视频时长。

    音频计费单位

    音频 API 不使用 Token,而是按时长或字符数计费:
    API计费单位说明
    语音识别(ASR)音频秒数按输入音频的秒数计费
    语音合成(TTS)字符数按输入文本的字符数计费
    语音对话音频秒数因模型而异
    ASR 和 TTS 的响应不包含 usage.prompt_tokens 字段。当前单价请查看定价页面。

    图像与视频生成计费

    图像和视频生成 API 同样不使用 Token:
    API计费单位说明
    图像生成按张计费每张生成图片单独计费,与分辨率无关
    视频生成按视频秒数按输出视频的时长和分辨率计费
    图像生成 API 的响应中,usage 字段包含 image_count 而非 Token 计数。input_tokensoutput_tokens 字段可能显示为 0

    成本估算

    文本/视觉 API 调用的成本估算公式:
    cost = (input_tokens × 输入单价) + (output_tokens × 输出单价)
    
    缓存 Token 按折扣费率计费(显式缓存按原价的 10% 收费,隐式缓存按原价的 20% 收费)。各模型单价请查看定价,降低成本的方法请查看成本优化

    上下文窗口限制

    每个模型都有最大上下文窗口,限制总输入 Token 数:
    模型最大输入 Token最大输出 Token
    qwen3.6-plus1M64K
    qwen3.5-flash1M64K
    qwen3-max256K64K
    当输入接近上下文限制时,可以考虑:
    • 多轮对话中裁剪历史消息
    • 使用上下文缓存减少重复计算(不会减少 Token 数,但能降低成本和延迟)
    • 将较早的上下文压缩为精简的系统消息

    后续阅读

    响应示例
    {
      "status_code": 200,
      "request_id": "d2d1a8c0-325f-9b9d-8b90-xxxxxx",
      "code": "",
      "message": "",
      "output": {
        "text": null,
        "finish_reason": null,
        "choices": [
          {
            "finish_reason": "stop",
            "message": {
              "role": "assistant",
              "content": [
                {
                  "image": "https://dashscope-result.oss-cn-shanghai.aliyuncs.com/xxx.png?Expires=xxx"
                }
              ]
            }
          }
        ]
      },
      "usage": {
        "input_tokens": 0,
        "output_tokens": 0,
        "width": 2048,
        "image_count": 1,
        "height": 2048
      }
    }