跳转到主要内容
GUI-Plus

GUI-Plus DashScope

通过 DashScope 原生 HTTP API 调用 GUI-Plus 界面交互专用模型。

POST
/api/v1/services/aigc/multimodal-generation/generation
import os
import dashscope

system_prompt = """# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "computer_use", "description": "Use a mouse and keyboard to interact with a computer, and take screenshots.\n* This is an interface to a desktop GUI. You do not have access to a terminal or applications menu. You must click on desktop icons to start applications.\n* Some applications may take time to start or process actions, so you may need to wait and take successive screenshots to see the results of your actions. E.g. if you click on Firefox and a window doesn't open, try wait and taking another screenshot.\n* The screen's resolution is 1000x1000.\n* Make sure to click any buttons, links, icons, etc with the cursor tip in the center of the element. Don't click boxes on their edges unless asked.", "parameters": {"properties": {"action": {"description": "The action to perform. The available actions are:\n* `key`: Performs key down presses on the arguments passed in order, then performs key releases in reverse order.\n* `type`: Type a string of text on the keyboard.\n* `mouse_move`: Move the cursor to a specified (x, y) pixel coordinate on the screen.\n* `left_click`: Click the left mouse button at a specified (x, y) pixel coordinate on the screen.\n* `left_click_drag`: Click and drag the cursor to a specified (x, y) pixel coordinate on the screen.\n* `right_click`: Click the right mouse button at a specified (x, y) pixel coordinate on the screen.\n* `middle_click`: Click the middle mouse button at a specified (x, y) pixel coordinate on the screen.\n* `double_click`: Double-click the left mouse button at a specified (x, y) pixel coordinate on the screen.\n* `triple_click`: Triple-click the left mouse button at a specified (x, y) pixel coordinate on the screen (simulated as double-click since it's the closest action).\n* `scroll`: Performs a scroll of the mouse scroll wheel.\n* `hscroll`: Performs a horizontal scroll (mapped to regular scroll).\n* `wait`: Wait specified seconds for the change to happen.\n* `terminate`: Terminate the current task and report its completion status.\n* `answer`: Answer a question.\n* `interact`: Resolve the blocking window by interacting with the user.", "enum": ["key", "type", "mouse_move", "left_click", "left_click_drag", "right_click", "middle_click", "double_click", "triple_click", "scroll", "hscroll", "wait", "terminate", "answer", "interact"], "type": "string"}, "keys": {"description": "Required only by `action=key`.", "type": "array"}, "text": {"description": "Required only by `action=type`, `action=answer` and `action=interact`.", "type": "string"}, "coordinate": {"description": "(x, y): The x (pixels from the left edge) and y (pixels from the top edge) coordinates to move the mouse to. Required only by `action=mouse_move` and `action=left_click_drag`.", "type": "array"}, "pixels": {"description": "The amount of scrolling to perform. Positive values scroll up, negative values scroll down. Required only by `action=scroll` and `action=hscroll`.", "type": "number"}, "time": {"description": "The seconds to wait. Required only by `action=wait`.", "type": "number"}, "status": {"description": "The status of the task. Required only by `action=terminate`.", "type": "string", "enum": ["success", "failure"]}}, "required": ["action"], "type": "object"}}}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>

# Response format

Response format for every step:
1) Action: a short imperative describing what to do in the UI.
2) A single <tool_call>...</tool_call> block containing only the JSON: {"name": <function-name>, "arguments": <args-json-object>}.

Rules:
- Output exactly in the order: Action, <tool_call>.
- Be brief: one for Action.
- Do not output anything else outside those two parts.
- If finishing, use action=terminate in the tool call."""

messages = [
  {
    "role": "system",
    "content": system_prompt
  },
  {
    "role": "user",
    "content": [
      {"image": "https://img.alicdn.com/imgextra/i2/O1CN016iJ8ob1C3xP1s2M6z_!!6000000000026-2-tps-3008-1758.png"},
      {"text": "帮我打开浏览器。"}]
  }]

response = dashscope.MultiModalConversation.call(
  api_key=os.getenv('DASHSCOPE_API_KEY'),
  model='gui-plus-2026-02-26',
  messages=messages,
  vl_high_resolution_images=True
)

print(response.output.choices[0].message.content[0]["text"])
{
  "status_code": 200,
  "request_id": "b74b3a25-3968-4059-8c44-63d793c07f02",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "```json\n{\"thought\": \"用户想要打开浏览器,我观察到屏幕截图中有一个Google Chrome的图标,其位置在右上角一排的最后一个。因此,下一步操作应该是点击这个Chrome浏览器图标来启动它。\", \"action\": \"CLICK\", \"parameters\": {\"x\": 1086, \"y\": 127}}\n```"
            }
          ]
        }
      }
    ],
    "audio": null
  },
  "usage": {
    "input_tokens": 2021,
    "output_tokens": 78,
    "characters": 0,
    "image_tokens": 1244,
    "input_tokens_details": {
      "image_tokens": 1244,
      "text_tokens": 777
    },
    "output_tokens_details": {
      "text_tokens": 78
    },
    "total_tokens": 2099
  }
}

鉴权

string
header
必填

千问云 API Key。详见获取 API Key

请求体

application/json
enum<string>
必填

模型名称。

gui-plus,gui-plus-2026-02-26
object
必填
object

响应

200-application/json
integer

本次请求的状态码。200 表示成功。Java SDK 不返回该参数,调用失败会抛出异常。

string

本次调用的唯一标识符。Java SDK 返回参数为 requestId

string

错误码,调用成功时为空值。仅 Python SDK 返回该参数。

string

错误信息。

object
object