Cover image
Try Now
5 天前

MCP Server pour récupérer le contenu de la page Web à l'aide du navigateur sans tête du dramwright.

3 years

Works with Finder

4

Github Watches

37

Github Forks

567

Github Stars

Fetcher MCP Icon

Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

Advantages

  • JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.

  • Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.

  • Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.

  • Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.

  • Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.

  • Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.

  • Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup - install the required browser by running the following command in your terminal:

npx playwright install chromium

Debug Mode

Run with the --debug option to show the browser window for debugging:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Features

  • fetch_url - Retrieve web page content from a specified URL

    • Uses Playwright headless browser to parse JavaScript
    • Supports intelligent extraction of main content and conversion to Markdown
    • Supports the following parameters:
      • url: The URL of the web page to fetch (required parameter)
      • timeout: Page loading timeout in milliseconds, default is 30000 (30 seconds)
      • waitUntil: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'
      • extractContent: Whether to intelligently extract the main content, default is true
      • maxLength: Maximum length of returned content (in characters), default is no limit
      • returnHtml: Whether to return HTML content instead of Markdown, default is false
      • waitForNavigation: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false
      • navigationTimeout: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)
      • disableMedia: Whether to disable media resources (images, stylesheets, fonts, media), default is true
      • debug: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified
  • fetch_urls - Batch retrieve web page content from multiple URLs in parallel

    • Uses multi-tab parallel fetching for improved performance
    • Returns combined results with clear separation between webpages
    • Supports the following parameters:
      • urls: Array of URLs to fetch (required parameter)
      • Other parameters are the same as fetch_url

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

  • Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:

    Please wait for the page to fully load
    

    This will use the waitForNavigation: true parameter.

  • Increase Timeout Duration: For websites that load slowly:

    Please set the page loading timeout to 60 seconds
    

    This adjusts both timeout and navigationTimeout parameters accordingly.

Content Retrieval Adjustments

  • Preserve Original HTML Structure: When content extraction might fail:

    Please preserve the original HTML content
    

    Sets extractContent: false and returnHtml: true.

  • Fetch Complete Page Content: When extracted content is too limited:

    Please fetch the complete webpage content instead of just the main content
    

    Sets extractContent: false.

  • Return Content as HTML: When HTML format is needed instead of default Markdown:

    Please return the content in HTML format
    

    Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

  • Dynamic Debug Activation: To display the browser window during a specific fetch operation:
    Please enable debug mode for this fetch operation
    
    This sets debug: true even if the server was started without the --debug flag.

Using Custom Cookies for Authentication

  • Manual Login: To login using your own credentials:

    Please run in debug mode so I can manually log in to the website
    

    Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.

  • Interacting with Debug Browser: When debug mode is enabled:

    1. The browser window remains open
    2. You can manually log into the website using your credentials
    3. After login is complete, content will be fetched with your authenticated session
  • Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:

    Please enable debug mode for this authentication step
    

    Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

npm install

Install Playwright Browser

Install the browsers needed for Playwright:

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

You can also enable visible browser mode for debugging:

node build/index.js --debug

Related Projects

  • g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.

License

Licensed under the MIT License

相关推荐

  • Emmet Halm
  • Converts Figma frames into front-end code for various mobile frameworks.

  • Elijah Ng Shi Yi
  • Advanced software engineer GPT that excels through nailing the basics.

  • https://maiplestudio.com
  • Find Exhibitors, Speakers and more

  • https://zenepic.net
  • Embark on a thrilling diplomatic quest across a galaxy on the brink of war. Navigate complex politics and alien cultures to forge peace and avert catastrophe in this immersive interstellar adventure.

  • lumpenspace
  • Take an adjectivised noun, and create images making it progressively more adjective!

  • pontusab
  • La communauté du curseur et de la planche à voile, recherchez des règles et des MCP

  • av
  • Exécutez sans effort LLM Backends, API, Frontends et Services avec une seule commande.

  • 1Panel-dev
  • 🔥 1Panel fournit une interface Web intuitive et un serveur MCP pour gérer des sites Web, des fichiers, des conteneurs, des bases de données et des LLM sur un serveur Linux.

  • Mintplex-Labs
  • L'application tout-en-un desktop et Docker AI avec chiffon intégré, agents AI, constructeur d'agent sans code, compatibilité MCP, etc.

  • GeyserMC
  • Une bibliothèque de communication avec un client / serveur Minecraft.

  • awslabs
  • Serveurs AWS MCP - Serveurs MCP spécialisés qui apportent les meilleures pratiques AWS directement à votre flux de travail de développement

  • WangRongsheng
  • 🧑‍🚀 全世界最好的 LLM 资料总结 (数据处理、模型训练、模型部署、 O1 模型、 MCP 、小语言模型、视觉语言模型) | Résumé des meilleures ressources LLM du monde.

  • appcypher
  • Serveurs MCP géniaux - une liste organisée de serveurs de protocole de contexte de modèle

  • chongdashu
  • Activer les clients adjoints AI comme Cursor, Windsurf et Claude Desktop pour contrôler le moteur Unreal à travers le langage naturel à l'aide du Protocole de contexte modèle (MCP).

  • patruff
  • Pont entre les serveurs Olllama et MCP, permettant aux LLM locaux d'utiliser des outils de protocole de contexte de modèle

    Reviews

    4 (1)
    Avatar
    user_qALXI9eR
    2025-04-17

    Fetcher-mcp by jae-jae is an exceptional tool for MCP enthusiasts. It efficiently fetches and processes data, significantly improving workflow. The GitHub link, https://github.com/jae-jae/fetcher-mcp, provides clear guidance and resources. I highly recommend it for anyone looking to enhance their MCP experience.