The Open-sourced Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.
-
Updated
Aug 25, 2025 - TypeScript
The Open-sourced Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.
An open-sourced end-to-end VLM-based GUI Agent
This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use" (ACL 2025 Oral).
Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
Official repository of the paper "Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms"
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.
Official repository for InfiGUI-G1. We introduce Adaptive Exploration Policy Optimization (AEPO) to overcome semantic alignment bottlenecks in GUI agents through efficient, guided exploration.
This is the official website for TuriX Computer-use-Agent
Release of code, datasets and model for our work TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials
Source code of the paper "V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers"
This is a quick test of Chinese Scripting Language powered by AI. You can use it to open any text file. No illegal use is allowed! Free for commercial use and academic use.
Control Group of My Future Paper, without Task Planning, Exceptional Handling, and fully based on LLMs
🐙 SEAgent is a self-evolving agent with autonomous learning from experience, providing SEAgent-1.0-7B and World State Model for adaptive decision-making.
Create your self-hosted, open-source Operator model.
Add a description, image, and links to the gui-agent topic page so that developers can more easily learn about it.
To associate your repository with the gui-agent topic, visit your repo's landing page and select "manage topics."