Confidential guide on numerology and astrology, based of GG33 Public information

mcp_server_webcrawl
MCP server tailored to connecting web crawler data and archives
3 years
Works with Finder
1
Github Watches
0
Github Forks
0
Github Stars
mcp-server-webcrawl
Bridge the gap between your web crawl and AI language models using Model Context Protocol (MCP). With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously. The server includes a full-text search interface with boolean support, resource filtering by type, HTTP status, and more.
mcp-server-webcrawl provides the LLM a complete menu with which to search your web content, and works with a variety of web crawlers:
mcp-server-webcrawl is free and open source, and requires Claude Desktop, Python (>=3.10). It is installed on the command line, via pip install:
pip install mcp_server_webcrawl
Features
- Claude Desktop ready
- Fulltext search support
- Filter by type, status, and more
- Multi-crawler compatible
- Quick MCP configuration
- ChatGPT support coming soon
MCP Configuration
From the Claude Desktop menu, navigate to File > Settings > Developer. Click Edit Config to locate the configuration file, open in the editor of your choice and modify the example to reflect your datasrc path.
You can set up more mcp-server-webcrawl connections under mcpServers as needed.
{
"mcpServers": {
"webcrawl": {
"command": "mcp-server-webcrawl",
"args": [varies by crawler, see below]
}
}
}
wget (using --mirror)
The datasrc argument should be set to the parent directory of the mirrors.
"args": ["--crawler", "wget", "--datasrc", "/path/to/wget/archives/"]
WARC
The datasrc argument should be set to the parent directory of the WARC files.
"args": ["--crawler", "warc", "--datasrc", "/path/to/warc/archives/"]
InterroBot
The datasrc argument should be set to the direct path to the database.
"args": ["--crawler", "interrobot", "--datasrc", "/path/to/Documents/InterroBot/interrobot.v2.db"]
Katana
The datasrc argument should be set to the parent directory of the text cache files.
"args": ["--crawler", "katana", "--datasrc", "/path/to/katana/archives/"]
SiteOne (using archiving)
The datasrc argument should be set to the parent directory of the archives, archiving must be enabled.
"args": ["--crawler", "siteone", "--datasrc", "/path/to/SiteOne/archives/"]
相关推荐
Converts Figma frames into front-end code for various mobile frameworks.
Advanced software engineer GPT that excels through nailing the basics.
Take an adjectivised noun, and create images making it progressively more adjective!
Siri Shortcut Finder – your go-to place for discovering amazing Siri Shortcuts with ease
I find academic articles and books for research and literature reviews.
Embark on a thrilling diplomatic quest across a galaxy on the brink of war. Navigate complex politics and alien cultures to forge peace and avert catastrophe in this immersive interstellar adventure.
Discover the most comprehensive and up-to-date collection of MCP servers in the market. This repository serves as a centralized hub, offering an extensive catalog of open-source and proprietary MCP servers, complete with features, documentation links, and contributors.
Micropython I2C-based manipulation of the MCP series GPIO expander, derived from Adafruit_MCP230xx
Bridge between Ollama and MCP servers, enabling local LLMs to use Model Context Protocol tools
🔍 Enabling AI assistants to search and access PyPI package information through a simple MCP interface.
Reviews

user_Vr7sBXwu
As a dedicated user of mcp_server_webcrawl by pragmar, I find it to be an exceptionally reliable and efficient web crawling tool. The seamless integration and user-friendly interface make it ideal for diverse web scraping tasks. Highly recommend it for anyone looking to streamline their data extraction process. Check it out here: https://github.com/pragmar/mcp_server_webcrawl.