Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecated openai wcgw relay #6

Merged
merged 5 commits into from
Dec 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 49 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,28 @@
# Shell and Coding agent on Chatgpt and Claude desktop apps
# Shell and Coding agent on Claude desktop app

- An MCP server on claude desktop for autonomous shell, coding and desktop control agent.
- A custom gpt on chatgpt web/desktop apps to interact with your local shell, edit files, run code, etc.

[![Tests](https://github.com/rusiaaman/wcgw/actions/workflows/python-tests.yml/badge.svg?branch=main)](https://github.com/rusiaaman/wcgw/actions/workflows/python-tests.yml)
[![Mypy strict](https://github.com/rusiaaman/wcgw/actions/workflows/python-types.yml/badge.svg?branch=main)](https://github.com/rusiaaman/wcgw/actions/workflows/python-types.yml)
[![Build](https://github.com/rusiaaman/wcgw/actions/workflows/python-publish.yml/badge.svg)](https://github.com/rusiaaman/wcgw/actions/workflows/python-publish.yml)

[New feature] [26-Nov-2024] Claude desktop support for shell, computer-control, coding agent.
[src/wcgw/client/mcp_server/Readme.md](src/wcgw/client/mcp_server/Readme.md)
## Updates

### 🚀 Highlights
- [01 Dec 2024] Deprecated chatgpt app support

- [26 Nov 2024] Introduced claude desktop support through mcp

## 🚀 Highlights

- ⚡ **Full Shell Access**: No restrictions, complete control.
- ⚡ **Desktop control on Claude**: Screen capture, mouse control, keyboard control on claude desktop (on mac with docker linux)
- ⚡ **Create, Execute, Iterate**: Ask the gpt to keep running compiler checks till all errors are fixed, or ask it to keep checking for the status of a long running command till it's done.
- ⚡ **Create, Execute, Iterate**: Ask claude to keep running compiler checks till all errors are fixed, or ask it to keep checking for the status of a long running command till it's done.
- ⚡ **Interactive Command Handling**: Supports interactive commands using arrow keys, interrupt, and ansi escape sequences.
- ⚡ **REPL support**: [beta] Supports python/node and other REPL execution.

## Claude

Full readme [src/wcgw/client/mcp_server/Readme.md](src/wcgw/client/mcp_server/Readme.md)
## Setup

### Setup

Update `claude_desktop_config.json`
Update `claude_desktop_config.json` (~/Library/Application Support/Claude/claude_desktop_config.json)

```json
{
Expand All @@ -46,99 +44,69 @@ Update `claude_desktop_config.json`
```

Then restart claude app.
You can then ask claude to execute shell commands, read files, edit files, run your code, etc.

## ChatGPT

### 🪜 Steps:
## [Optional] Computer use support using desktop on docker

1. Run the [cli client](https://github.com/rusiaaman/wcgw?tab=readme-ov-file#client) in any directory of choice.
2. Share the generated id with this GPT: `https://chatgpt.com/g/g-Us0AAXkRh-wcgw-giving-shell-access`
3. The custom GPT can now run any command on your cli
Computer use is disabled by default. Add `--computer-use` to enable it. This will add necessary tools to Claude including ScreenShot, Mouse and Keyboard control.

### Client

You need to keep running this client for GPT to access your shell. Run it in a version controlled project's root.

#### Option 1: using uv [Recommended]

```sh
$ curl -LsSf https://astral.sh/uv/install.sh | sh
$ uvx wcgw@latest
```json
{
"mcpServers": {
"wcgw": {
"command": "uv",
"args": [
"tool",
"run",
"--from",
"wcgw@latest",
"--python",
"3.12",
"wcgw_mcp",
"--computer-use"
]
}
}
}
```

#### Option 2: using pip
Claude will be able to connect to any docker container with linux environment. Native system control isn't supported outside docker.

Supports python >=3.10 and <3.13
You'll need to run a docker image with desktop and optional VNC connection. Here's a demo image:

```sh
$ pip3 install wcgw
$ wcgw
docker run -p 6080:6080 ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
```

This will print a UUID that you need to share with the gpt.
Then ask claude desktop app to control the docker os. It'll connect to the docker container and control it.

### Chat
Connect to `http://localhost:6080/vnc.html` for desktop view (VNC) of the system running in the docker.

Open the following link or search the "wcgw" custom gpt using "Explore GPTs" on chatgpt.com
## Usage

https://chatgpt.com/g/g-Us0AAXkRh-wcgw-giving-shell-access
Wait for a few seconds. You should be able to see this icon if everything goes right.

Finally, let the chatgpt know your user id in any format. E.g., "user_id=<your uuid>" followed by rest of your instructions.
![mcp icon](https://github.com/rusiaaman/wcgw/blob/main/static/rocket-icon.png?raw=true)
over here

NOTE: you can resume a broken connection
`wcgw --client-uuid $previous_uuid`
![mcp icon](https://github.com/rusiaaman/wcgw/blob/main/static/claude-ss.jpg?raw=true)

### How it works on chatgpt app?
Then ask claude to execute shell commands, read files, edit files, run your code, etc.

Your commands are relayed through a server to the terminal client. [You could host the server on your own](https://github.com/rusiaaman/wcgw?tab=readme-ov-file#creating-your-own-custom-gpt-and-the-relay-server). For public convenience I've hosted one at https://wcgw.arcfu.com thanks to the gcloud free tier plan.
If you've run the docker for LLM to access, you can ask it to control the "docker os". If you don't provide the docker container id to it, it'll try to search for available docker using `docker ps` command.

Chatgpt sends a request to the relay server using the user id that you share with it. The relay server holds a websocket with the terminal client against the user id and acts as a proxy to pass the request.
## Example

It's secure in both the directions. Either a malicious actor or a malicious Chatgpt has to correctly guess your UUID for any security breach.
### Computer use example

# Showcase
![computer-use](https://github.com/rusiaaman/wcgw/blob/main/static/computer-use.jpg?raw=true)

## Claude desktop

### Resize image and move it to a new dir
### Shell example

![example](https://github.com/rusiaaman/wcgw/blob/main/static/example.jpg?raw=true)

## Chatgpt app

### Unit tests and github actions

[The first version of unit tests and github workflow to test on multiple python versions were written by the custom chatgpt](https://chatgpt.com/share/6717f922-8998-8005-b825-45d4b348b4dd)

### Create a todo app using react + typescript + vite

![Screenshot](https://github.com/rusiaaman/wcgw/blob/main/static/ss1.png?raw=true)

# Privacy

The relay server doesn't store any data. I can't access any information passing through it and only secure channels are used to communicate.

You may host the server on your own and create a custom gpt using the following section.

# Creating your own custom gpt and the relay server.

I've used the following instructions and action json schema to create the custom GPT. (Replace wcgw.arcfu.com with the address to your server)

https://github.com/rusiaaman/wcgw/blob/main/gpt_instructions.txt
https://github.com/rusiaaman/wcgw/blob/main/gpt_action_json_schema.json

Run the server
`gunicorn --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:443 src.wcgw.relay.serve:app --certfile fullchain.pem --keyfile privkey.pem`

If you don't have public ip and domain name, you can use `ngrok` or similar services to get a https address to the api.

The specify the server url in the `wcgw` command like so
`wcgw --server-url https://your-url/v1/register`

# [Optional] Local shell access with openai API key or anthropic API key
## [Optional] Local shell access with openai API key or anthropic API key

## Openai
### Openai

Add `OPENAI_API_KEY` and `OPENAI_ORG_ID` env variables.

Expand All @@ -148,7 +116,7 @@ Then run

You can now directly write messages or press enter key to open vim for multiline message and text pasting.

## Anthropic
### Anthropic

Add `ANTHROPIC_API_KEY` env variable.

Expand Down
6 changes: 0 additions & 6 deletions add.py

This file was deleted.

71 changes: 71 additions & 0 deletions openai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# ChatGPT Integration Guide

## 🪜 Steps:

1. Run a relay server with a domain name and https support (or use ngrok) use the instructions in next section.
2. Create a custom gpt that connects to the relay server, instructions in next sections.
3. Run the [cli client](https://github.com/rusiaaman/wcgw?tab=readme-ov-file#client) in any directory of choice.
4. The custom GPT can now run any command on your cli

## Creating the relay server

### If you've a domain name and ssl certificate

Run the server
`gunicorn --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:443 src.wcgw.relay.serve:app --certfile fullchain.pem --keyfile privkey.pem`

If you don't have public ip and domain name, you can use `ngrok` or similar services to get a https address to the api.

Then specify the server url in the `wcgw` command like so:
`uv tool run --python 3.12 wcgw@latest --server-url wss://your-url/v1/register`

### Using ngrok

Run the server
`uv tool run --python 3.12 --from wcgw@latest wcgw_relay`

This will start an uvicorn server on port 8000. You can use ngrok to get a public address to the server.

`ngrok http 8000`

Then specify the ngrok address in the `wcgw` command like so:
`uv tool run --python 3.12 wcgw@latest --server-url wss://4900-1c2c-6542-b922-a596-f8f8.ngrok-free.app/v1/register`

## Creating the custom gpt

I've used the following instructions and action json schema to create the custom GPT. (Replace wcgw.arcfu.com with the address to your server)

https://github.com/rusiaaman/wcgw/blob/main/gpt_instructions.txt
https://github.com/rusiaaman/wcgw/blob/main/gpt_action_json_schema.json

### Chat

Let the chatgpt know your user id in any format. E.g., "user_id=<your uuid>" followed by rest of your instructions.

### How it works on chatgpt app?

Your commands are relayed through a server to the terminal client.

Chatgpt sends a request to the relay server using the user id that you share with it. The relay server holds a websocket with the terminal client against the user id and acts as a proxy to pass the request.

It's secure in both the directions. Either a malicious actor or a malicious Chatgpt has to correctly guess your UUID for any security breach.

## Showcase

### Unit tests and github actions

[The first version of unit tests and github workflow to test on multiple python versions were written by the custom chatgpt](https://chatgpt.com/share/6717f922-8998-8005-b825-45d4b348b4dd)

### Create a todo app using react + typescript + vite

![Screenshot](https://github.com/rusiaaman/wcgw/blob/main/static/ss1.png?raw=true)

## Local shell access with OpenAI API key

Add `OPENAI_API_KEY` and `OPENAI_ORG_ID` env variables.

Then run:

`uvx --from wcgw@latest wcgw_local --limit 0.1` # Cost limit $0.1

You can now directly write messages or press enter key to open vim for multiline message and text pasting.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[project]
authors = [{ name = "Aman Rusia", email = "gapypi@arcfu.com" }]
name = "wcgw"
version = "1.5.4"
version = "2.0.0"
description = "What could go wrong giving full shell access to chatgpt?"
readme = "README.md"
requires-python = ">=3.11, <3.13"
Expand Down
2 changes: 1 addition & 1 deletion src/wcgw/client/computer_use.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def get_screen_info(self) -> tuple[int, int, Optional[int]]:
assert not result.error, result.error
assert result.output, "Could not get screen info"
width, height, display_num = map(
lambda x: None if not x else int(x), result.output.split(",")
lambda x: None if not x else int(x), result.output.strip().split(",")
)
if width is None:
width = 1080
Expand Down
15 changes: 13 additions & 2 deletions src/wcgw/client/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -962,15 +962,26 @@ def register_client(server_url: str, client_uuid: str = "") -> None:

@run.command()
def app(
server_url: str = "wss://wcgw.arcfu.com/v1/register",
server_url: str = "",
client_uuid: Optional[str] = None,
version: bool = typer.Option(False, "--version", "-v"),
) -> None:
if version:
version_ = importlib.metadata.version("wcgw")
print(f"wcgw version: {version_}")
exit()

if not server_url:
server_url = os.environ.get("WCGW_RELAY_SERVER", "")
if not server_url:
print(
"Error: Please provide relay server url using --server_url or WCGW_RELAY_SERVER environment variable"
)
print(
"\tNOTE: you need to run a relay server first, author doesn't host a relay server anymore."
)
print("\thttps://github.com/rusiaaman/wcgw/blob/main/openai.md")
print("\tExample `--server-url=ws://localhost:8000/v1/register`")
raise typer.Exit(1)
register_client(server_url, client_uuid or "")


Expand Down