DOMAIN EXPANSION: z_{n+1} = z_n^2 + c f(x) dx = F(b) - F(a) ITERATION LIMIT: N -> ∞ SCALE FACTOR: φ ≈ 1.618
FEB 27, 2026

Gemini Computer Use & Playwright

ADKGoogleAgentsPythonComputer Use

First Page

Wrote this document to outline the use of Playwright (browser testing library) with the Gemini Computer Use model. Specifically explaining the structure and behavior of the Playwright tool calls enacted by the Computer Use Agent (gemini-2.5-computer-use-preview-10–2025).

I’m following the sample demo the ADK team has provided here -> https://github.com/google/adk-python/tree/main/contributing/samples/computer_use

Agent Interaction Loop

When the agent operates the browser to complete a user task, it follows a standard Observation-Thought-Action loop:

  1. LLM generateContent Call: The agent sends the current state to the LLM. The LLM processes the prompt, its own internal thoughts, and previous tool outputs, reasoning about the next logical computer action.
  2. Tool Invocation: The LLM issues a function call matching one of the Playwright tools (e.g., click_at, type_text_at).
  3. Playwright Execution: The PlaywrightComputer underlying the ComputerUseToolset performs the action natively via Playwright APIs (mouse clicks, keyboard sequences, JavaScript execution).
  4. Tool Response: After the action finishes and the page’s load state is settled, the tool grabs the overall context by calling current_state(). This returns a ComputerState object to the LLM containing:
  • screenshot: A Base64-encoded PNG representation of the screen’s updated viewport.
  • url: The browser’s active URL string.

The agent uses this screenshot and url to verify whether its action succeeded before proposing the next step.

Playwright Tool Operations

In this demo, the PlaywrightComputer class exposes a variety of methods for navigating and manipulating the DOM. Virtually all actions automatically return the newest ComputerState.

  • open_web_browser(): Initializes or resets the environment, returning the base computer state.
  • navigate(url: str): Commands Playwright’s page.goto(url) to open a specific webpage.
  • search(): Navigates to the session-configured default search engine.
  • go_back() / go_forward(): Triggers browser history traversal.

Interaction Tools

  • click_at(x: int, y: int): Places a visual highlight at the target coordinates, fires page.mouse.click(x, y), and awaits document load.
  • hover_at(x: int, y: int): Triggers page.mouse.move(x, y) to activate CSS hover states or Javascript tooltips.
  • type_text_at(x: int, y: int, text: str): Combines clicking the target input, optionally clearing the input via Control+A + Delete, and using page.keyboard.type(text) to submit strings. Pressing Enter at the end can be toggled via arguments.
  • key_combination(keys: liststr): Presses a combination of raw inputs mapped directly to canonical Playwright keys (e.g., ‘Shift’, ‘Tab’ or ‘Control’, ‘C’).
  • drag_and_drop(x: int, y: int, destination_x: int, destination_y: int): Mouses to the origin, fires mouse.down(), moves to the destination, and fires mouse.up().

Scrolling Tools

  • scroll_document(direction): Broadest scroll implementation, mimicking PageUp/PageDown keyboard patterns for vertical scrolls, or executing internal JavaScript window.scrollBy for horizontal shifts.
  • scroll_at(x: int, y: int, direction, magnitude: int): Mouses to a precise coordinate and fires mouse.wheel(dx, dy) to manage targeted scroll containers (like a modal or sidebar).

Verification

  • wait(seconds: int): Triggers an asyncio.sleep to handle dynamically rendering/AJAX-heavy sites, returning the updated state afterwards.
  • current_state(): At any time, takes a png representation of the page, grabbing both the screenshot chunk and the URL.