Skip to content

Computer Use Tool

The ComputerUseTool allows LLMs to perform automated, screen-based interactions. It bridges text-based AI models with graphical user interfaces, enabling the agent to "see" the screen, click elements, and type text.

Overview

The ComputerUseTool relies on an IComputerDriver that abstracts browser automation. - PlaywrightComputerDriver: Drives Chromium, captures viewports, and executes real mouse/keyboard events. - ConsoleComputerDriver: A lightweight, text-only driver for HTTP scraping.

Usage

Instantiate a driver, wrap it in a ComputerUseToolset, and assign it to the agent.

using GoogleAdk.Core.Agents;
using GoogleAdk.Core.Tools;
using GoogleAdk.Samples.ComputerUse.Drivers;

// 1. Initialize the driver
var driver = new PlaywrightComputerDriver();
await driver.InitializeAsync();

// 2. Attach to the agent
var agent = new LlmAgent(new LlmAgentConfig
{
    Name = "automation_agent",
    Model = "gemini-2.5-flash",
    Instruction = "You are a web automation bot. Use computer_use to interact with elements.",
    Tools = [ new ComputerUseToolset(driver) ]
});

LLM Commands

The LLM issues JSON commands (e.g., {"action": "left_click", "coordinate": [450, 300]}) which the ComputerUseToolset automatically maps to the driver's ClickAsync, TypeAsync, and PressKeyAsync methods.

Important: Ensure you call await driver.CloseAsync(); when shutting down to clean up processes.