Skip to content

Azure OpenAI code resources for using the Computer Use model

License

Notifications You must be signed in to change notification settings

Azure-Samples/computer-use-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Computer Use Assistant (CUA)

Important: You must apply for access in order to use the Computer Use model. Apply here: https://aka.ms/oai/cuaaccess

This is a sample repository demonstrating how to use the Computer Use model, an AI model capable of interacting with graphical user interfaces (GUIs) through natural language instructions. The Computer Use model can understand visual interfaces, take actions, and complete tasks by controlling a computer just like a human would.

This framework provides a bridge between the Computer Use model and computer control, allowing for automated task execution while maintaining safety checks and user consent. It serves as a practical example of how to integrate the Computer Use model into applications that require GUI interaction.

Features

  • Natural language computer control through AI models
  • Screenshot capture and analysis
  • Mouse and keyboard control
  • Safety checks and user consent mechanisms
  • Support for both OpenAI and Azure OpenAI endpoints
  • Cross-platform compatibility (Windows, macOS, Linux)
  • Screen resolution scaling for consistent AI model input

Getting Started

Prerequisites

  • Python 3.7 or higher
  • Operating System: Windows, macOS, or Linux
  • OpenAI API key or Azure OpenAI credentials

Installation

  1. Clone the repository:
git clone [repository-url]
cd computer-use
  1. Install the required packages:
pip install -r requirements.txt
  1. Set up your environment variables:
# For Azure OpenAI
export AZURE_OPENAI_ENDPOINT="your-azure-endpoint"
export AZURE_OPENAI_API_KEY="your-azure-api-key"

# For OpenAI
export OPENAI_API_KEY="your-openai-api-key"

Usage

Local Computer Control

The framework is designed to work directly with your local computer. Here's how to use it:

  1. Run the example application:
python app.py --instructions "Open web browser and go to microsoft.com"
  1. The AI model will:
    • Take screenshots of your screen
    • Analyze the visual information
    • Execute appropriate actions to complete the task
    • Request user consent for safety-critical actions

Command Line Arguments

  • --instructions: The task to perform (default: "Open web browser and go to microsoft.com")
  • --model: The AI model to use (default: "computer-use-preview")
  • --endpoint: The API endpoint to use ("azure" or "openai", default: "azure")
  • --autoplay: Automatically execute actions without confirmation (default: true)

VM/Remote Control

For scenarios requiring remote computer control or VM automation, we recommend using Playwright. Playwright provides robust browser automation capabilities and is well-suited for VM-based testing and automation scenarios.

For more information on VM automation with Playwright, please refer to:

Demo

The included demo application (app.py) demonstrates how to use the CUA framework:

  1. Start the demo:
python app.py
  1. Enter your instructions when prompted, or use the --instructions parameter to provide them directly.

  2. Watch as the AI model:

    • Captures and analyzes your screen
    • Performs mouse and keyboard actions
    • Requests consent for safety-critical operations
    • Provides reasoning for its actions

Resources

About

Azure OpenAI code resources for using the Computer Use model

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages