MCP cover image
See in Github
2025-04-10

MCP -Server, der auf die Verbindung von Web -Crawler -Daten und Archiven zugeschnitten ist

1

Github Watches

0

Github Forks

0

Github Stars

MCP_SERVER_Webcrawl sphinx/_static/images/mcpswc.svg

Website | Github | Docs

mcp-server-webcrawl

Bridge the gap between your web crawl and AI language models using Model Context Protocol (MCP). With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously. The server includes a full-text search interface with boolean support, resource filtering by type, HTTP status, and more.

mcp-server-webcrawl provides the LLM a complete menu with which to search your web content, and works with a variety of web crawlers:

mcp-server-webcrawl is free and open source, and requires Claude Desktop, Python (>=3.10). It is installed on the command line, via pip install:

pip install mcp_server_webcrawl

Features

  • Claude Desktop ready
  • Fulltext search support
  • Filter by type, status, and more
  • Multi-crawler compatible
  • Quick MCP configuration
  • ChatGPT support coming soon

MCP Configuration

From the Claude Desktop menu, navigate to File > Settings > Developer. Click Edit Config to locate the configuration file, open in the editor of your choice and modify the example to reflect your datasrc path.

You can set up more mcp-server-webcrawl connections under mcpServers as needed.

{ 
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
       "args": [varies by crawler, see below]
    }
  }
}

wget (using --mirror)

The datasrc argument should be set to the parent directory of the mirrors.

"args": ["--crawler", "wget", "--datasrc", "/path/to/wget/archives/"]

WARC

The datasrc argument should be set to the parent directory of the WARC files.

"args": ["--crawler", "warc", "--datasrc", "/path/to/warc/archives/"]

InterroBot

The datasrc argument should be set to the direct path to the database.

"args": ["--crawler", "interrobot", "--datasrc", "/path/to/Documents/InterroBot/interrobot.v2.db"]

Katana

The datasrc argument should be set to the parent directory of the text cache files.

"args": ["--crawler", "katana", "--datasrc", "/path/to/katana/archives/"]

SiteOne (using archiving)

The datasrc argument should be set to the parent directory of the archives, archiving must be enabled.

"args": ["--crawler", "siteone", "--datasrc", "/path/to/SiteOne/archives/"]

相关推荐

  • https://suefel.com
  • Latest advice and best practices for custom GPT development.

  • Yusuf Emre Yeşilyurt
  • I find academic articles and books for research and literature reviews.

  • https://maiplestudio.com
  • Find Exhibitors, Speakers and more

  • Carlos Ferrin
  • Encuentra películas y series en plataformas de streaming.

  • Joshua Armstrong
  • Confidential guide on numerology and astrology, based of GG33 Public information

  • Contraband Interactive
  • Emulating Dr. Jordan B. Peterson's style in providing life advice and insights.

  • rustassistant.com
  • Your go-to expert in the Rust ecosystem, specializing in precise code interpretation, up-to-date crate version checking, and in-depth source code analysis. I offer accurate, context-aware insights for all your Rust programming questions.

  • Elijah Ng Shi Yi
  • Advanced software engineer GPT that excels through nailing the basics.

  • Emmet Halm
  • Converts Figma frames into front-end code for various mobile frameworks.

  • lumpenspace
  • Take an adjectivised noun, and create images making it progressively more adjective!

  • apappascs
  • Entdecken Sie die umfassendste und aktuellste Sammlung von MCP-Servern auf dem Markt. Dieses Repository dient als zentraler Hub und bietet einen umfangreichen Katalog von Open-Source- und Proprietary MCP-Servern mit Funktionen, Dokumentationslinks und Mitwirkenden.

  • modelcontextprotocol
  • Modellkontext -Protokollserver

  • Mintplex-Labs
  • Die All-in-One-Desktop & Docker-AI-Anwendung mit integriertem Lappen, AI-Agenten, No-Code-Agent Builder, MCP-Kompatibilität und vielem mehr.

  • n8n-io
  • Fair-Code-Workflow-Automatisierungsplattform mit nativen KI-Funktionen. Kombinieren Sie visuelles Gebäude mit benutzerdefiniertem Code, SelbstHost oder Cloud, 400+ Integrationen.

  • ravitemer
  • Ein leistungsstarkes Neovim -Plugin für die Verwaltung von MCP -Servern (Modellkontextprotokoll)

  • WangRongsheng
  • 🧑‍🚀 全世界最好的 llm 资料总结(数据处理、模型训练、模型部署、 O1 模型、 MCP 、小语言模型、视觉语言模型) | Zusammenfassung der weltbesten LLM -Ressourcen.

  • jae-jae
  • MCP -Server für den Fetch -Webseiteninhalt mit dem Headless -Browser von Dramatikern.

  • patruff
  • Brücke zwischen Ollama und MCP -Servern und ermöglicht es lokalen LLMs, Modellkontextprotokoll -Tools zu verwenden

    Reviews

    4 (1)
    Avatar
    user_Vr7sBXwu
    2025-04-17

    As a dedicated user of mcp_server_webcrawl by pragmar, I find it to be an exceptionally reliable and efficient web crawling tool. The seamless integration and user-friendly interface make it ideal for diverse web scraping tasks. Highly recommend it for anyone looking to streamline their data extraction process. Check it out here: https://github.com/pragmar/mcp_server_webcrawl.