Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

llm-search
Querying local documents, powered by LLM
588
Github Watches
67
Github Forks
588
Github Stars
pyLLMSearch - Advanced RAG
The purpose of this package is to offer an advanced question-answering (RAG) system with a simple YAML-based configuration that enables interaction with a collection of local documents. Special attention is given to improvements in various components of the system in addition to basic LLM-based RAGs - better document parsing, hybrid search, HyDE, chat history, deep linking, re-ranking, the ability to customize embeddings, and more. The package is designed to work with custom Large Language Models (LLMs) – whether from OpenAI or installed locally.
Interaction with the package is supported through the built-in frontend, or by exposing an MCP server, allowing clients like Cursor, Windsurf or VSCode GH Copilot to interact with the RAG system.
Features
-
Fast, incremental parsing and embedding of medium size document bases (tested on up to few gigabytes of markdown and pdfs)
-
Supported document formats
- Build-in parsers:
-
.md
- Divides files based on logical components such as headings, subheadings, and code blocks. Supports additional features like cleaning image links, adding custom metadata, and more. -
.pdf
- MuPDF-based parser. -
.docx
- custom parser, supports nested tables.
-
- Other common formats are supported by
Unstructured
pre-processor:- List of formats see here.
- Build-in parsers:
-
Allows interaction with embedded documents, internally supporting the following models and methods (including locally hosted):
- OpenAI compatible models and APIs.
- HuggingFace models.
-
Interoperability with LiteLLM + Ollama via OpenAI API, supporting hundreds of different models (see Model configuration for LiteLLM)
-
SSE MCP Server enabling interface with popular MCP clients.
-
Generates dense embeddings from a folder of documents and stores them in a vector database (ChromaDB).
- The following embedding models are supported:
- Hugging Face embeddings.
- Sentence-transformers-based models, e.g.,
multilingual-e5-base
. - Instructor-based models, e.g.,
instructor-large
. - OpenAI embeddings.
- The following embedding models are supported:
-
Generates sparse embeddings using SPLADE (https://github.com/naver/splade) to enable hybrid search (sparse + dense).
-
An ability to update the embeddings incrementally, without a need to re-index the entire document base.
-
Support for table parsing via open-source gmft (https://github.com/conjuncts/gmft) or Azure Document Intelligence.
-
Optional support for image parsing using Gemini API.
-
Supports the "Retrieve and Re-rank" strategy for semantic search, see here.
- Besides the originally
ms-marco-MiniLM
cross-encoder, more modernbge-reranker
is supported.
- Besides the originally
-
Supports HyDE (Hypothetical Document Embeddings) - see here.
- WARNING: Enabling HyDE (via config OR webapp) can significantly alter the quality of the results. Please make sure to read the paper before enabling.
- From my own experiments, enabling HyDE significantly boosts quality of the output on a topics where user can't formulate the quesiton using domain specific language of the topic - e.g. when learning new topics.
-
Support for multi-querying, inspired by
RAG Fusion
- https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1- When multi-querying is turned on (either config or webapp), the original query will be replaced by 3 variants of the same query, allowing to bridge the gap in the terminology and "offer different angles or perspectives" according to the article.
-
Supprts optional chat history with question contextualization
-
Other features
- Simple web interfaces.
- Deep linking into document sections - jump to an individual PDF page or a header in a markdown file.
- Ability to save responses to an offline database for future analysis.
- FastAPI based API + MCP server, allowing communicating with RAG via any mcp client, including VSCode/Windsurf/Cursor and others.
Demo
Documentation
相关推荐
an easy-to-use dynamic service discovery, configuration and service management platform for building AI cloud native applications.
Memory for AI Agents; SOTA in AI Agent Memory; Announcing OpenMemory MCP - local and secure memory management.
The open source ChatGPT alternative for developers. Fast, multi-model AI chat. Agents + MCP coming soon.
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
🧑🚀 全世界最好的LLM资料总结(视频生成、Agent、辅助编程、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
AI's query engine - Platform for building AI that can answer questions over large scale federated data. - The only MCP Server you'll ever need
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
ChatGPT CLI is a versatile tool for interacting with LLMs through OpenAI, Azure, and other popular providers like Perplexity AI and Llama. It supports prompt files, history tracking, and live data injection via MCP (Model Context Protocol), making it ideal for both casual users and developers seeking a powerful, customizable GPT experience.