
mcp_server_webcrawl
Servidor MCP personalizado para conectar datos y archivos de rastreadores web
1
Github Watches
0
Github Forks
0
Github Stars
mcp-server-webcrawl
Bridge the gap between your web crawl and AI language models using Model Context Protocol (MCP). With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously. The server includes a full-text search interface with boolean support, resource filtering by type, HTTP status, and more.
mcp-server-webcrawl provides the LLM a complete menu with which to search your web content, and works with a variety of web crawlers:
mcp-server-webcrawl is free and open source, and requires Claude Desktop, Python (>=3.10). It is installed on the command line, via pip install:
pip install mcp_server_webcrawl
Features
- Claude Desktop ready
- Fulltext search support
- Filter by type, status, and more
- Multi-crawler compatible
- Quick MCP configuration
- ChatGPT support coming soon
MCP Configuration
From the Claude Desktop menu, navigate to File > Settings > Developer. Click Edit Config to locate the configuration file, open in the editor of your choice and modify the example to reflect your datasrc path.
You can set up more mcp-server-webcrawl connections under mcpServers as needed.
{
"mcpServers": {
"webcrawl": {
"command": "mcp-server-webcrawl",
"args": [varies by crawler, see below]
}
}
}
wget (using --mirror)
The datasrc argument should be set to the parent directory of the mirrors.
"args": ["--crawler", "wget", "--datasrc", "/path/to/wget/archives/"]
WARC
The datasrc argument should be set to the parent directory of the WARC files.
"args": ["--crawler", "warc", "--datasrc", "/path/to/warc/archives/"]
InterroBot
The datasrc argument should be set to the direct path to the database.
"args": ["--crawler", "interrobot", "--datasrc", "/path/to/Documents/InterroBot/interrobot.v2.db"]
Katana
The datasrc argument should be set to the parent directory of the text cache files.
"args": ["--crawler", "katana", "--datasrc", "/path/to/katana/archives/"]
SiteOne (using archiving)
The datasrc argument should be set to the parent directory of the archives, archiving must be enabled.
"args": ["--crawler", "siteone", "--datasrc", "/path/to/SiteOne/archives/"]
相关推荐
I find academic articles and books for research and literature reviews.
Confidential guide on numerology and astrology, based of GG33 Public information
Emulating Dr. Jordan B. Peterson's style in providing life advice and insights.
Your go-to expert in the Rust ecosystem, specializing in precise code interpretation, up-to-date crate version checking, and in-depth source code analysis. I offer accurate, context-aware insights for all your Rust programming questions.
Advanced software engineer GPT that excels through nailing the basics.
Converts Figma frames into front-end code for various mobile frameworks.
Take an adjectivised noun, and create images making it progressively more adjective!
Descubra la colección más completa y actualizada de servidores MCP en el mercado. Este repositorio sirve como un centro centralizado, que ofrece un extenso catálogo de servidores MCP de código abierto y propietarios, completos con características, enlaces de documentación y colaboradores.
La aplicación AI de escritorio todo en uno y Docker con trapo incorporado, agentes de IA, creador de agentes sin código, compatibilidad de MCP y más.
Manipulación basada en Micrypthon I2C del expansor GPIO de la serie MCP, derivada de AdaFruit_MCP230xx
Plataforma de automatización de flujo de trabajo de código justo con capacidades de IA nativas. Combine el edificio visual con código personalizado, auto-anfitrión o nube, más de 400 integraciones.
🧑🚀 全世界最好的 llM 资料总结(数据处理、模型训练、模型部署、 O1 模型、 MCP 、小语言模型、视觉语言模型) | Resumen de los mejores recursos del mundo.
Una lista curada de servidores de protocolo de contexto del modelo (MCP)
Reviews

user_Vr7sBXwu
As a dedicated user of mcp_server_webcrawl by pragmar, I find it to be an exceptionally reliable and efficient web crawling tool. The seamless integration and user-friendly interface make it ideal for diverse web scraping tasks. Highly recommend it for anyone looking to streamline their data extraction process. Check it out here: https://github.com/pragmar/mcp_server_webcrawl.