MCP cover image
See in Github
2025-03-02

Automatic operation of on-screen GUI.

4

Github Watches

3

Github Forks

27

Github Stars

omniparser-autogui-mcp

日本語版はこちら

This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.

License notes

This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).

Installation

  1. Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py

(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)

  1. Add this to your claude_desktop_config.json:
{
  "mcpServers": {
    "omniparser_autogui_mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "D:\\CLONED_PATH\\omniparser-autogui-mcp",
        "run",
        "omniparser-autogui-mcp"
      ],
      "env": {
        "PYTHONIOENCODING": "utf-8",
        "OCR_LANG": "en"
      }
    }
  }
}

(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)

env allows for the following additional configurations:

  • OMNI_PARSER_BACKEND_LOAD
    If it does not work with other clients (such as LibreChat), specify 1.

  • TARGET_WINDOW_NAME
    If you want to specify the window to operate, please specify the window name.
    If not specified, operates on the entire screen.

  • OMNI_PARSER_SERVER
    If you want OmniParser processing to be done on another device, specify the server's address and port, such as 127.0.0.1:8000.
    The server can be started with uv run omniparserserver.

  • SSE_HOST, SSE_PORT
    If specified, communication will be done via SSE instead of stdio.

  • SOM_MODEL_PATH, CAPTION_MODEL_NAME, CAPTION_MODEL_PATH, OMNI_PARSER_DEVICE, BOX_TRESHOLD
    These are for OmniParser configuration.
    Usually, they are not necessary.

Usage Examples

  • Search for "MCP server" in the on-screen browser.

etc.

相关推荐

  • https://suefel.com
  • Latest advice and best practices for custom GPT development.

  • NiKole Maxwell
  • I craft unique cereal names, stories, and ridiculously cute Cereal Baby images.

  • Yusuf Emre Yeşilyurt
  • I find academic articles and books for research and literature reviews.

  • https://maiplestudio.com
  • Find Exhibitors, Speakers and more

  • Carlos Ferrin
  • Encuentra películas y series en plataformas de streaming.

  • Bora Yalcin
  • Evaluator for marketplace product descriptions, checks for relevancy and keyword stuffing.

  • Joshua Armstrong
  • Confidential guide on numerology and astrology, based of GG33 Public information

  • Contraband Interactive
  • Emulating Dr. Jordan B. Peterson's style in providing life advice and insights.

  • Elijah Ng Shi Yi
  • Advanced software engineer GPT that excels through nailing the basics.

  • rustassistant.com
  • Your go-to expert in the Rust ecosystem, specializing in precise code interpretation, up-to-date crate version checking, and in-depth source code analysis. I offer accurate, context-aware insights for all your Rust programming questions.

  • Emmet Halm
  • Converts Figma frames into front-end code for various mobile frameworks.

  • apappascs
  • Discover the most comprehensive and up-to-date collection of MCP servers in the market. This repository serves as a centralized hub, offering an extensive catalog of open-source and proprietary MCP servers, complete with features, documentation links, and contributors.

  • Mintplex-Labs
  • The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

  • modelcontextprotocol
  • Model Context Protocol Servers

  • ShrimpingIt
  • Micropython I2C-based manipulation of the MCP series GPIO expander, derived from Adafruit_MCP230xx

  • n8n-io
  • Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

  • OffchainLabs
  • Go implementation of Ethereum proof of stake

  • WangRongsheng
  • 🧑‍🚀 全世界最好的LLM资料总结(Agent框架、辅助编程、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.

    Reviews

    3 (1)
    Avatar
    user_K7k9iw9O
    2025-04-16

    I've been using omniparser-autogui-mcp for a while now, and it has significantly improved my workflow. The user interface is intuitive and easy to navigate. Kudos to NON906 for developing such a useful tool. If you're looking for a reliable parser with a user-friendly GUI, check it out at https://github.com/NON906/omniparser-autogui-mcp. Highly recommend!