Computer Use Assistant (CUA)

Important: You must apply for access in order to use the Computer Use model. Apply here: https://aka.ms/oai/cuaaccess

This is a sample repository demonstrating how to use the Computer Use model, an AI model capable of interacting with graphical user interfaces (GUIs) through natural language instructions. The Computer Use model can understand visual interfaces, take actions, and complete tasks by controlling a computer just like a human would.

This framework provides a bridge between the Computer Use model and computer control, allowing for automated task execution while maintaining safety checks and user consent. It serves as a practical example of how to integrate the Computer Use model into applications that require GUI interaction.

Features

Natural language computer control through AI models
Screenshot capture and analysis
Mouse and keyboard control
Safety checks and user consent mechanisms
Support for both OpenAI and Azure OpenAI endpoints
Cross-platform compatibility (Windows, macOS, Linux)
Screen resolution scaling for consistent AI model input

Getting Started

Prerequisites

Python 3.7 or higher
Operating System: Windows, macOS, or Linux
OpenAI API key or Azure OpenAI credentials

Installation

Clone the repository:

git clone [repository-url]
cd computer-use

Install the required packages:

pip install -r requirements.txt

Set up your environment variables:

# For Azure OpenAI
export AZURE_OPENAI_ENDPOINT="your-azure-endpoint"
export AZURE_OPENAI_API_KEY="your-azure-api-key"

# For OpenAI
export OPENAI_API_KEY="your-openai-api-key"

Usage

Local Computer Control

The framework is designed to work directly with your local computer. Here's how to use it:

Run the example application:

python app.py --instructions "Open web browser and go to microsoft.com"

The AI model will:
- Take screenshots of your screen
- Analyze the visual information
- Execute appropriate actions to complete the task
- Request user consent for safety-critical actions

Command Line Arguments

--instructions: The task to perform (default: "Open web browser and go to microsoft.com")
--model: The AI model to use (default: "computer-use-preview")
--endpoint: The API endpoint to use ("azure" or "openai", default: "azure")
--autoplay: Automatically execute actions without confirmation (default: true)

VM/Remote Control

For scenarios requiring remote computer control or VM automation, we recommend using Playwright. Playwright provides robust browser automation capabilities and is well-suited for VM-based testing and automation scenarios.

For more information on VM automation with Playwright, please refer to:

Demo

The included demo application (app.py) demonstrates how to use the CUA framework:

Start the demo:

python app.py

Enter your instructions when prompted, or use the --instructions parameter to provide them directly.
Watch as the AI model:
- Captures and analyzes your screen
- Performs mouse and keyboard actions
- Requests consent for safety-critical operations
- Provides reasoning for its actions

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
computer-use		computer-use
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer Use Assistant (CUA)

Features

Getting Started

Prerequisites

Installation

Usage

Local Computer Control

Command Line Arguments

VM/Remote Control

Demo

Resources

About

Releases

Packages

Contributors 3

Languages

License

Azure-Samples/computer-use-model

Folders and files

Latest commit

History

Repository files navigation

Computer Use Assistant (CUA)

Features

Getting Started

Prerequisites

Installation

Usage

Local Computer Control

Command Line Arguments

VM/Remote Control

Demo

Resources

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages