Confidential guide on numerology and astrology, based of GG33 Public information

mcp_server_webcrawl
用于连接Web爬网数据和档案的MCP服务器
3 years
Works with Finder
1
Github Watches
0
Github Forks
0
Github Stars
mcp-server-webcrawl
Bridge the gap between your web crawl and AI language models using Model Context Protocol (MCP). With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously. The server includes a full-text search interface with boolean support, resource filtering by type, HTTP status, and more.
mcp-server-webcrawl provides the LLM a complete menu with which to search your web content, and works with a variety of web crawlers:
mcp-server-webcrawl is free and open source, and requires Claude Desktop, Python (>=3.10). It is installed on the command line, via pip install:
pip install mcp_server_webcrawl
Features
- Claude Desktop ready
- Fulltext search support
- Filter by type, status, and more
- Multi-crawler compatible
- Quick MCP configuration
- ChatGPT support coming soon
MCP Configuration
From the Claude Desktop menu, navigate to File > Settings > Developer. Click Edit Config to locate the configuration file, open in the editor of your choice and modify the example to reflect your datasrc path.
You can set up more mcp-server-webcrawl connections under mcpServers as needed.
{
"mcpServers": {
"webcrawl": {
"command": "mcp-server-webcrawl",
"args": [varies by crawler, see below]
}
}
}
wget (using --mirror)
The datasrc argument should be set to the parent directory of the mirrors.
"args": ["--crawler", "wget", "--datasrc", "/path/to/wget/archives/"]
WARC
The datasrc argument should be set to the parent directory of the WARC files.
"args": ["--crawler", "warc", "--datasrc", "/path/to/warc/archives/"]
InterroBot
The datasrc argument should be set to the direct path to the database.
"args": ["--crawler", "interrobot", "--datasrc", "/path/to/Documents/InterroBot/interrobot.v2.db"]
Katana
The datasrc argument should be set to the parent directory of the text cache files.
"args": ["--crawler", "katana", "--datasrc", "/path/to/katana/archives/"]
SiteOne (using archiving)
The datasrc argument should be set to the parent directory of the archives, archiving must be enabled.
"args": ["--crawler", "siteone", "--datasrc", "/path/to/SiteOne/archives/"]
相关推荐
Take an adjectivised noun, and create images making it progressively more adjective!
Embark on a thrilling diplomatic quest across a galaxy on the brink of war. Navigate complex politics and alien cultures to forge peace and avert catastrophe in this immersive interstellar adventure.
Reviews

user_Vr7sBXwu
As a dedicated user of mcp_server_webcrawl by pragmar, I find it to be an exceptionally reliable and efficient web crawling tool. The seamless integration and user-friendly interface make it ideal for diverse web scraping tasks. Highly recommend it for anyone looking to streamline their data extraction process. Check it out here: https://github.com/pragmar/mcp_server_webcrawl.