MCP cover image
See in Github
2025-06-01

Querying local documents, powered by LLM

588

Github Watches

67

Github Forks

588

Github Stars

Open In Colab

pyLLMSearch - Advanced RAG

Documentation

The purpose of this package is to offer an advanced question-answering (RAG) system with a simple YAML-based configuration that enables interaction with a collection of local documents. Special attention is given to improvements in various components of the system in addition to basic LLM-based RAGs - better document parsing, hybrid search, HyDE, chat history, deep linking, re-ranking, the ability to customize embeddings, and more. The package is designed to work with custom Large Language Models (LLMs) – whether from OpenAI or installed locally.

Interaction with the package is supported through the built-in frontend, or by exposing an MCP server, allowing clients like Cursor, Windsurf or VSCode GH Copilot to interact with the RAG system.

Features

  • Fast, incremental parsing and embedding of medium size document bases (tested on up to few gigabytes of markdown and pdfs)

  • Supported document formats

    • Build-in parsers:
      • .md - Divides files based on logical components such as headings, subheadings, and code blocks. Supports additional features like cleaning image links, adding custom metadata, and more.
      • .pdf - MuPDF-based parser.
      • .docx - custom parser, supports nested tables.
    • Other common formats are supported by Unstructured pre-processor:
      • List of formats see here.
  • Allows interaction with embedded documents, internally supporting the following models and methods (including locally hosted):

    • OpenAI compatible models and APIs.
    • HuggingFace models.
  • Interoperability with LiteLLM + Ollama via OpenAI API, supporting hundreds of different models (see Model configuration for LiteLLM)

  • SSE MCP Server enabling interface with popular MCP clients.

  • Generates dense embeddings from a folder of documents and stores them in a vector database (ChromaDB).

    • The following embedding models are supported:
      • Hugging Face embeddings.
      • Sentence-transformers-based models, e.g., multilingual-e5-base.
      • Instructor-based models, e.g., instructor-large.
      • OpenAI embeddings.
  • Generates sparse embeddings using SPLADE (https://github.com/naver/splade) to enable hybrid search (sparse + dense).

  • An ability to update the embeddings incrementally, without a need to re-index the entire document base.

  • Support for table parsing via open-source gmft (https://github.com/conjuncts/gmft) or Azure Document Intelligence.

  • Optional support for image parsing using Gemini API.

  • Supports the "Retrieve and Re-rank" strategy for semantic search, see here.

    • Besides the originally ms-marco-MiniLM cross-encoder, more modern bge-reranker is supported.
  • Supports HyDE (Hypothetical Document Embeddings) - see here.

    • WARNING: Enabling HyDE (via config OR webapp) can significantly alter the quality of the results. Please make sure to read the paper before enabling.
    • From my own experiments, enabling HyDE significantly boosts quality of the output on a topics where user can't formulate the quesiton using domain specific language of the topic - e.g. when learning new topics.
  • Support for multi-querying, inspired by RAG Fusion - https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1

    • When multi-querying is turned on (either config or webapp), the original query will be replaced by 3 variants of the same query, allowing to bridge the gap in the terminology and "offer different angles or perspectives" according to the article.
  • Supprts optional chat history with question contextualization

  • Other features

    • Simple web interfaces.
    • Deep linking into document sections - jump to an individual PDF page or a header in a markdown file.
    • Ability to save responses to an offline database for future analysis.
    • FastAPI based API + MCP server, allowing communicating with RAG via any mcp client, including VSCode/Windsurf/Cursor and others.

Demo

Demo

Documentation

Browse Documentation

相关推荐

  • langgenius
  • Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

  • alibaba
  • an easy-to-use dynamic service discovery, configuration and service management platform for building AI cloud native applications.

  • microsoft
  • Python tool for converting files and office documents to Markdown.

  • mindsdb
  • AI's query engine - Platform for building AI that can answer questions over large scale federated data. - The only MCP Server you'll ever need

  • WangRongsheng
  • 🧑‍🚀 全世界最好的LLM资料总结(视频生成、Agent、辅助编程、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.

  • labring
  • FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.

  • mem0ai
  • Memory for AI Agents; SOTA in AI Agent Memory; Announcing OpenMemory MCP - local and secure memory management.

  • 0xJacky
  • Yet another WebUI for Nginx

  • mindsdb
  • AI's query engine - Platform for building AI that can answer questions over large scale federated data. - The only MCP Server you'll ever need

  • ibelick
  • The open source ChatGPT alternative for developers. Fast, multi-model AI chat. Agents + MCP coming soon.

  • udecode
  • Rich-text editor with AI, MCP, and shadcn/ui

  • strands-agents
  • A model-driven approach to building AI agents in just a few lines of code.

    Reviews

    5 (0)