Cover image
Try Now
2025-03-10

Un servidor de protocolo de contexto modelo para el análisis de la visión de IA utilizando la API de Vision Gemini

3 years

Works with Finder

1

Github Watches

1

Github Forks

0

Github Stars

AI Vision MCP Server

A Model Context Protocol (MCP) server that provides AI-powered visual analysis capabilities for Claude and other MCP-compatible AI assistants.

Features

  • Screenshot URL: Capture screenshots of any website by providing a URL
  • Visual Analysis: Analyze UI elements, layouts, and content in screenshots
  • File Operations: Read and modify files with line-specific precision
  • Report Generation: Create comprehensive UI/UX analysis reports
  • Debugging Session: Maintain context across multiple analysis steps

Installation

# Clone the repository
git clone https://github.com/samihalawa/mcp-server-ai-vision.git
cd mcp-server-ai-vision

# Install dependencies
npm install

# Build the server
npm run build

Usage

Starting the Server

npm start

Configuration

Add the server to your MCP configuration:

{
  "servers": {
    "ai-vision": {
      "command": "/path/to/node",
      "args": ["/path/to/mcp-server-ai-vision/build/index.js"],
      "enabled": true,
      "port": 3005,
      "environment": {
        "NODE_PATH": "/path/to/node_modules",
        "PATH": "/usr/local/bin:/usr/bin:/bin",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

Available Tools

screenshot_url

Take a screenshot of a URL using a web browser.

Parameters:

  • url (string, required): URL to capture a screenshot of (e.g., http://localhost:4999, https://google.com)
  • fullPage (boolean, optional): Whether to capture full page or just viewport. Default: false
  • waitForSelector (string, optional): CSS selector to wait for before taking screenshot
  • waitTime (number, optional): Time to wait in milliseconds before taking screenshot. Default: 1000

analyze_screen

Analyze a screenshot with AI vision.

Parameters: None (uses the most recent screenshot)

read_file

Read content from a file between specified line numbers.

Parameters:

  • path (string): Path to the file
  • startLine (number): Starting line number (1-indexed)
  • endLine (number): Ending line number (1-indexed)

modify_file

Modify content in a file between specified line numbers.

Parameters:

  • path (string): Path to the file
  • startLine (number): Starting line number to replace (1-indexed)
  • endLine (number): Ending line number to replace (1-indexed)
  • content (string): New content to replace the specified lines

generate_report

Generate a comprehensive UI/UX analysis report.

Parameters:

  • testUrl (string): URL of the application being tested
  • appName (string, optional): Name of the application being analyzed
  • date (string, optional): Date of the analysis (YYYY-MM-DD)
  • observations (object): Observations structured as components, data state, interactions, etc.

Example Workflow

  1. Take a screenshot of a website:

    screenshot_url(url: "https://example.com")
    
  2. Analyze the screenshot:

    analyze_screen()
    
  3. Generate a report based on the analysis:

    generate_report(testUrl: "https://example.com", observations: {...})
    

Requirements

  • Node.js 14+
  • Playwright for browser automation
  • Gemini API key for AI vision analysis

License

MIT

相关推荐

  • NiKole Maxwell
  • I craft unique cereal names, stories, and ridiculously cute Cereal Baby images.

  • Joshua Armstrong
  • Confidential guide on numerology and astrology, based of GG33 Public information

  • https://suefel.com
  • Latest advice and best practices for custom GPT development.

  • Callycode Limited
  • A geek-themed horoscope generator blending Bitcoin prices, tech jargon, and astrological whimsy.

  • Emmet Halm
  • Converts Figma frames into front-end code for various mobile frameworks.

  • Elijah Ng Shi Yi
  • Advanced software engineer GPT that excels through nailing the basics.

  • https://maiplestudio.com
  • Find Exhibitors, Speakers and more

  • apappascs
  • Descubra la colección más completa y actualizada de servidores MCP en el mercado. Este repositorio sirve como un centro centralizado, que ofrece un extenso catálogo de servidores MCP de código abierto y propietarios, completos con características, enlaces de documentación y colaboradores.

  • ShrimpingIt
  • Manipulación basada en Micrypthon I2C del expansor GPIO de la serie MCP, derivada de AdaFruit_MCP230xx

  • OffchainLabs
  • Implementación de la prueba de estaca Ethereum

  • huahuayu
  • Una puerta de enlace de API unificada para integrar múltiples API de explorador de blockchain similar a Esterscan con soporte de protocolo de contexto modelo (MCP) para asistentes de IA.

  • deemkeen
  • Controle su MBOT2 con un combo de potencia: MQTT+MCP+LLM

    Reviews

    3 (1)
    Avatar
    user_qqHrFQon
    2025-04-16

    I'm thoroughly impressed with DiffuGen - Advanced Local Image Generator with MCP Integration by CLOUDWERX-DEV. It seamlessly integrates with MCP, making image generation fast and efficient. The user-friendly interface and advanced features have notably enhanced my productivity. Highly recommend!