Cover image
Try Now
2025-03-30

Référentiel GitHub pour Hackathon MCP non structuré.

3 years

Works with Finder

1

Github Watches

0

Github Forks

3

Github Stars

Unstructured API MCP Server for Research Paper Data Processing

By leveraging the Unstructured API, this server facilitates easy access to a set of powerful tools that extract meaningful information from research papers, which can then be used for fine-tuning a language model (LLM) to reduce the literature review time for researchers.

Check out the Blog here:

Table of Contents:

  1. Setup
  2. Requirements
  3. Project Flow
  4. Available Tools
  5. Follow Along
  6. Claude Desktop Integration
  7. Debugging Tools
  8. Running locally minimal client with server

Setup

Install dependencies:

  • uv add "mcp[cli]"
  • uv pip install --upgrade unstructured-client python-dotenv

or use uv sync.

Requirements

Before you can begin working with the UNS_MCP project, make sure you have the following setup:

  1. UNSTRUCTURED_API_KEY

  2. GOOGLEDRIVE_SERVICE_ACCOUNT_KEY

    • Set up a Google Cloud project and create a service account to enable access to Google Drive for reading PDFs. Check the set up process here.
    • Save the JSON credentials for your service account and use it to set up the GOOGLEDRIVE_SERVICE_ACCOUNT_KEY.
  3. MONGO_DB_CONNECTION_STRING

    • Set up a MongoDB database (cloud) and get the connection string for connecting to the database. Check out set up process here.
  4. .env.template

    • The .env.template file includes all the required environment variables. Copy this file to .env and set the necessary values for the keys mentioned above.

    Example .env file:

    UNSTRUCTURED_API_KEY="<key-here>"
    MONGO_DB_CONNECTION_STRING="<CONNECTION_STRING>"
    GOOGLEDRIVE_SERVICE_ACCOUNT_KEY="<converted string>"
    
    
    

Project Flow

  1. User Query to MCP Client

  2. Claude Interacts with UNS_MCP Server

    • Claude forwards the user's query to the custom MCP server named UNS_MCP.
  3. MCP Tool Executes Unstructured API

    • UNS_MCP interacts with the Unstructured API to process the research paper PDF, extract relevant information, and convert it into structured JSON data.
  4. Structured Data (JSON) Output is stored in the destination source

    • The result from the Unstructured API is transformed into JSON format, which can then be further utilized to fine-tune LLMs, helping researchers quickly find the relevant information without manually reading the entire paper.

Available Tools

Tool Description
list_sources Lists available sources from the Unstructured API.
get_source_info Get detailed information about a specific source connector.
create_gdrive_source Create a google drive source connector.
update_gdrive_source Update an existing google source connector by params.
delete_gdrive_source Delete a source connector by source id.
list_destinations Lists available destinations from the Unstructured API.
get_destination_info Get detailed info about a specific destination connector. Currently, we have s3/weaviate/astra/neo4j/mongo DB (more to come!)
create_mongodb_destination Create a mongodb destination connector by params.
update_mongodb_destination Update an existing mongodb destination connector by destination id.
delete_mongodb_destination Delete a mongodb destination connector by destination id.
list_workflows Lists workflows from the Unstructured API.
get_workflow_info Get detailed information about a specific workflow.
create_workflow Create a new workflow with source, destination id, etc.
run_workflow Run a specific workflow with workflow id
update_workflow Update an existing workflow by params.
delete_workflow Delete a specific workflow by id.
list_jobs Lists jobs for a specific workflow from the Unstructured API.
get_job_info Get detailed information about a specific job by job id.
cancel_job Delete a specific job by id.

Follow Along

1. Set Up Required Connectors

Google Drive Source Connector:

  • Create a Google Drive Source Connector to connect your service account with Google Drive and retrieve PDFs.
  • Test the connection to ensure accessibility.

MongoDB Destination Connector:

  • Set up the MongoDB Destination Connector to store processed data.
  • Test the connection to ensure accessibility.

2. Develop the Workflow

  1. Define Connectors: Set up the Google Drive source and MongoDB destination connectors.

  2. Partitioning: Use Auto partitioning for optimal document splitting.

  3. Chunking: Apply by-page chunking for manageable text segments.

  4. Enrichment: Use NER to extract entities and table enrichment for any tables.

  5. Embedding: Convert text into embeddings for querying or analysis.

Note: Tweak the Flow: Adjust any step (partitioning, chunking, enrichment, embedding) as needed.


3. Set Up Claude Desktop

  1. Install Claude Desktop and integrate it with the UNS_MCP server by following steps given below.
  2. Restart Claude to link with the MCP server and ensure workflow functionality.

4. Query and Run the Workflow

  • Use Claude to interact with the system and execute queries to list, create, edit, delete and run the workflow. You can perform many such tasks, go through Available Tools given above.

5. Results

Claude Desktop Integration

To install in Claude Desktop:

  1. Go to claude_desktop_config.json by running the below command.
# For macOS or Linux:
code ~/Library/Application\ Support/Claude/claude_desktop_config.json

# For Windows:
code $env:AppData\Claude\claude_desktop_config.json
  1. In that file add:
{
    "mcpServers":
    {
        "UNS_MCP":
        {
            "command": "ABSOLUTE/PATH/TO/.local/bin/uv",
            "args":
            [
                "--directory",
                "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp",
                "run",
                "server.py"
            ],
            "env":
            [
            "UNSTRUCTURED_API_KEY":"<your key>"
            ],
            "disabled": false
        }
    }
}
  1. Restart Claude Desktop.

  2. Example Issues seen from Claude Desktop.

    • You will see No destinations found when you query for a list of destination connectors. Check your API key in .env or in your config json, it needs to be your personal key in https://platform.unstructured.io/app/account/api-keys.

Debugging tools

Anthropic provides MCP Inspector tool to debug/test your MCP server. Run the following command to spin up a debugging UI. From there, you will be able to add environment variables (pointing to your local env) on the left pane. Include your personal API key there as env var. Go to tools, you can test out the capabilities you add to the MCP server.

mcp dev uns_mcp/server.py

If you need to log request call parameters to UnstructuredClient, set the environment variable DEBUG_API_REQUESTS=false. The logs are stored in a file with the format unstructured-client-{date}.log, which can be examined to debug request call parameters to UnstructuredClient functions.

Running locally minimal client, accessing local the MCP server over HTTP + SSE

The main difference here is it becomes easier to set breakpoints on the server side during development -- the client and server are decoupled.

# in one terminal, run the server:
uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080

or
make sse-server

# in another terminal, run the client:
uv run python minimal_client/client.py "http://127.0.0.1:8080/sse"
or
make sse-client

Hint: ctrl+c out of the client first, then the server. Otherwise the server appears to hang.

相关推荐

  • Joshua Armstrong
  • Confidential guide on numerology and astrology, based of GG33 Public information

  • https://suefel.com
  • Latest advice and best practices for custom GPT development.

  • Emmet Halm
  • Converts Figma frames into front-end code for various mobile frameworks.

  • Elijah Ng Shi Yi
  • Advanced software engineer GPT that excels through nailing the basics.

  • J. DE HARO OLLE
  • Especialista en juegos de palabras en varios idiomas.

  • https://maiplestudio.com
  • Find Exhibitors, Speakers and more

  • Yusuf Emre Yeşilyurt
  • I find academic articles and books for research and literature reviews.

  • Carlos Ferrin
  • Encuentra películas y series en plataformas de streaming.

  • https://zenepic.net
  • Embark on a thrilling diplomatic quest across a galaxy on the brink of war. Navigate complex politics and alien cultures to forge peace and avert catastrophe in this immersive interstellar adventure.

  • apappascs
  • Découvrez la collection la plus complète et la plus à jour de serveurs MCP sur le marché. Ce référentiel sert de centre centralisé, offrant un vaste catalogue de serveurs MCP open-source et propriétaires, avec des fonctionnalités, des liens de documentation et des contributeurs.

  • ShrimpingIt
  • Manipulation basée sur Micropython I2C de l'exposition GPIO de la série MCP, dérivée d'Adafruit_MCP230XX

  • pontusab
  • La communauté du curseur et de la planche à voile, recherchez des règles et des MCP

  • av
  • Exécutez sans effort LLM Backends, API, Frontends et Services avec une seule commande.

  • jae-jae
  • MCP Server pour récupérer le contenu de la page Web à l'aide du navigateur sans tête du dramwright.

  • ravitemer
  • Un puissant plugin Neovim pour gérer les serveurs MCP (Protocole de contexte modèle)

  • patruff
  • Pont entre les serveurs Olllama et MCP, permettant aux LLM locaux d'utiliser des outils de protocole de contexte de modèle

  • 1Panel-dev
  • 🔥 1Panel fournit une interface Web intuitive et un serveur MCP pour gérer des sites Web, des fichiers, des conteneurs, des bases de données et des LLM sur un serveur Linux.

    Reviews

    4 (1)
    Avatar
    user_2G7frzsG
    2025-04-16

    As a dedicated user of MCP applications, I must commend HeetVekariya’s MCPHackathon for its incredible functionality and user-friendly design. It serves as an essential tool for hackathon participants, offering seamless integration and comprehensive resources. I highly recommend checking it out at https://github.com/HeetVekariya/MCPHackathon for anyone looking to elevate their hackathon experience to the next level!