I craft unique cereal names, stories, and ridiculously cute Cereal Baby images.

mcp-server-ai-vision
A Model Context Protocol server for AI vision analysis using Gemini Vision API
1
Github Watches
1
Github Forks
0
Github Stars
AI Vision MCP Server
A Model Context Protocol (MCP) server that provides AI-powered visual analysis capabilities for Claude and other MCP-compatible AI assistants.
Features
- Screenshot URL: Capture screenshots of any website by providing a URL
- Visual Analysis: Analyze UI elements, layouts, and content in screenshots
- File Operations: Read and modify files with line-specific precision
- Report Generation: Create comprehensive UI/UX analysis reports
- Debugging Session: Maintain context across multiple analysis steps
Installation
# Clone the repository
git clone https://github.com/samihalawa/mcp-server-ai-vision.git
cd mcp-server-ai-vision
# Install dependencies
npm install
# Build the server
npm run build
Usage
Starting the Server
npm start
Configuration
Add the server to your MCP configuration:
{
"servers": {
"ai-vision": {
"command": "/path/to/node",
"args": ["/path/to/mcp-server-ai-vision/build/index.js"],
"enabled": true,
"port": 3005,
"environment": {
"NODE_PATH": "/path/to/node_modules",
"PATH": "/usr/local/bin:/usr/bin:/bin",
"GEMINI_API_KEY": "your-gemini-api-key"
}
}
}
}
Available Tools
screenshot_url
Take a screenshot of a URL using a web browser.
Parameters:
-
url
(string, required): URL to capture a screenshot of (e.g., http://localhost:4999, https://google.com) -
fullPage
(boolean, optional): Whether to capture full page or just viewport. Default: false -
waitForSelector
(string, optional): CSS selector to wait for before taking screenshot -
waitTime
(number, optional): Time to wait in milliseconds before taking screenshot. Default: 1000
analyze_screen
Analyze a screenshot with AI vision.
Parameters: None (uses the most recent screenshot)
read_file
Read content from a file between specified line numbers.
Parameters:
-
path
(string): Path to the file -
startLine
(number): Starting line number (1-indexed) -
endLine
(number): Ending line number (1-indexed)
modify_file
Modify content in a file between specified line numbers.
Parameters:
-
path
(string): Path to the file -
startLine
(number): Starting line number to replace (1-indexed) -
endLine
(number): Ending line number to replace (1-indexed) -
content
(string): New content to replace the specified lines
generate_report
Generate a comprehensive UI/UX analysis report.
Parameters:
-
testUrl
(string): URL of the application being tested -
appName
(string, optional): Name of the application being analyzed -
date
(string, optional): Date of the analysis (YYYY-MM-DD) -
observations
(object): Observations structured as components, data state, interactions, etc.
Example Workflow
-
Take a screenshot of a website:
screenshot_url(url: "https://example.com")
-
Analyze the screenshot:
analyze_screen()
-
Generate a report based on the analysis:
generate_report(testUrl: "https://example.com", observations: {...})
Requirements
- Node.js 14+
- Playwright for browser automation
- Gemini API key for AI vision analysis
License
MIT
相关推荐
I find academic articles and books for research and literature reviews.
Evaluator for marketplace product descriptions, checks for relevancy and keyword stuffing.
Confidential guide on numerology and astrology, based of GG33 Public information
This GPT assists in finding a top-rated business CPA - local or virtual. We account for their qualifications, experience, testimonials and reviews. Business operators provide a short description of your business, services wanted, and city or state.
Emulating Dr. Jordan B. Peterson's style in providing life advice and insights.
Your go-to expert in the Rust ecosystem, specializing in precise code interpretation, up-to-date crate version checking, and in-depth source code analysis. I offer accurate, context-aware insights for all your Rust programming questions.
Advanced software engineer GPT that excels through nailing the basics.
Converts Figma frames into front-end code for various mobile frameworks.
Discover the most comprehensive and up-to-date collection of MCP servers in the market. This repository serves as a centralized hub, offering an extensive catalog of open-source and proprietary MCP servers, complete with features, documentation links, and contributors.
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
Micropython I2C-based manipulation of the MCP series GPIO expander, derived from Adafruit_MCP230xx
Mirror ofhttps://github.com/agentience/practices_mcp_server
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
A unified API gateway for integrating multiple etherscan-like blockchain explorer APIs with Model Context Protocol (MCP) support for AI assistants.
Reviews

user_qqHrFQon
I'm thoroughly impressed with DiffuGen - Advanced Local Image Generator with MCP Integration by CLOUDWERX-DEV. It seamlessly integrates with MCP, making image generation fast and efficient. The user-friendly interface and advanced features have notably enhanced my productivity. Highly recommend!