udditdocgpt

Private document Analysis with no risk to exposure to internet.

https://github.com/udditwork/udditdocgpt

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: researchgate.net
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Private document Analysis with no risk to exposure to internet.

Basic Info
  • Host: GitHub
  • Owner: UDDITwork
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 491 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Citation

README.md

UdditdocGPT

Apr 22, 2025, 03_36_46 PM

Website LinkedIn GitHub YouTube ResearchGate

UdditdocGPT UI

UdditdocGPT is a production-ready AI project that allows you to ask questions about your documents using the power of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your execution environment at any point.

[!TIP] If you are looking for an enterprise-ready, fully private AI workspace check out my Portfolio or contact me for a demo. Crafted by Uddit Kant Sinha, UdditdocGPT is a best-in-class AI document analyzer that can be easily deployed on-premise (data center, bare metal...) or in your private cloud (AWS, GCP, Azure...).

The project provides an API offering all the primitives required to build private, context-aware AI applications. It follows and extends the OpenAI API standard, and supports both normal and streaming responses.

The API is divided into two logical blocks:

High-level API, which abstracts all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation: - Ingestion of documents: internally managing document parsing, splitting, metadata extraction, embedding generation and storage. - Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt engineering and the response generation.

Low-level API, which allows advanced users to implement their own complex pipelines: - Embeddings generation: based on a piece of text. - Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested documents.

In addition to this, a working Gradio UI client is provided to test the API, together with a set of useful tools such as bulk model download script, ingestion script, documents folder watch, etc.

🎞️ Overview

[!WARNING] This README is not updated as frequently as the documentation. Please check the docs for the latest updates!

Motivation behind UdditdocGPT

Generative AI is a game changer for our society, but adoption in companies of all sizes and data-sensitive domains like healthcare or legal is limited by a clear concern: privacy. Not being able to ensure that your data is fully under your control when using third-party AI tools is a risk those industries cannot take.

As a PatentTech innovator and AI Engineer, I created UdditdocGPT to address these concerns head-on.

Evolution of UdditdocGPT

UdditdocGPT is evolving towards becoming a gateway to generative AI models and primitives, including completions, document ingestion, RAG pipelines and other low-level building blocks. I want to make it easier for any developer to build AI applications and experiences, as well as provide a suitable extensive architecture for the community to keep contributing.

Stay tuned to the releases to check out all the new features and changes included.

📄 Documentation

Full documentation on installation, dependencies, configuration, running the server, deployment options, ingesting local documents, API details and UI features can be found in the project documentation.

🧩 Architecture

Conceptually, UdditdocGPT is an API that wraps a RAG pipeline and exposes its primitives. * The API is built using FastAPI and follows OpenAI's API scheme. * The RAG pipeline is based on LlamaIndex.

The design of UdditdocGPT allows to easily extend and adapt both the API and the RAG implementation. Some key architectural decisions are: * Dependency Injection, decoupling the different components and layers. * Usage of LlamaIndex abstractions such as LLM, BaseEmbedding or VectorStore, making it immediate to change the actual implementations of those abstractions. * Simplicity, adding as few layers and new abstractions as possible. * Ready to use, providing a full implementation of the API and RAG pipeline.

Main building blocks: * APIs are defined in udditdoc_gpt:server:<api>. Each package contains an <api>_router.py (FastAPI layer) and an <api>_service.py (the service implementation). Each Service uses LlamaIndex base abstractions instead of specific implementations, decoupling the actual implementation from its usage. * Components are placed in udditdoc_gpt:components:<component>. Each Component is in charge of providing actual implementations to the base abstractions used in the Services - for example LLMComponent is in charge of providing an actual implementation of an LLM (for example LlamaCPP or OpenAI).

💡 Contributing

Contributions are welcomed! To ensure code quality I have enabled several format and typing checks, just run make check before committing to make sure your code is ok. Remember to test your code! You'll find a tests folder with helpers, and you can run tests using make test command.

Don't know what to contribute? Check out the Project Board with several ideas.

Head over to our communication channels and ask for write permissions on the GitHub project.

💬 Community

Join the conversation around UdditdocGPT on: - LinkedIn - GitHub - YouTube

📖 Citation

If you use UdditdocGPT in a paper, please cite it appropriately.

Here are a couple of examples:

BibTeX

bibtex @software{Uddit_UdditdocGPT_2025, author = {Uddit Kant Sinha}, license = {Apache-2.0}, month = april, title = {{UdditdocGPT}}, url = {https://github.com/UDDITwork/UdditdocGPT}, year = {2025} }

APA

Uddit Kant Sinha (2025). UdditdocGPT [Computer software]. https://github.com/UDDITwork/UdditdocGPT

📑 Related Projects

UdditdocGPT complements my other AI initiatives:

  • Patent Diagram Generator Tool: Converts patent text into FreeCAD/AutoCAD diagrams
  • ABC_AI: Smart AI Assistant for Hiring & Task Automation
  • Document Analyzer with OCR + Gemini API: Chat with PDFs, DOCX, TXT using Tesseract + NLP

🤗 Acknowledgments

UdditdocGPT is actively supported by various technologies: * Qdrant, providing the default vector database * LlamaIndex, providing the base RAG framework and abstractions

This project has been strongly influenced and supported by other amazing projects like LangChain, GPT4All, LlamaCpp, Chroma and SentenceTransformers.

📞 Contact

Want to collaborate or hire me for your AI/ML projects? - Email: udditkantsinha2@gmail.com - LinkedIn: lorduddit- - Portfolio: udditwork.github.io/PORTFOLIO-Uddit

Owner

  • Login: UDDITwork
  • Kind: user

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: PrivateGPT
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - name: Zylon by PrivateGPT
    address: hello@zylon.ai
    website: 'https://www.zylon.ai/'
repository-code: 'https://github.com/zylon-ai/private-gpt'
license: Apache-2.0
date-released: '2023-05-02'

GitHub Events

Total
  • Push event: 1
  • Create event: 2
Last Year
  • Push event: 1
  • Create event: 2

Dependencies

.github/workflows/actions/install_dependencies/action.yml actions
  • actions/setup-python v4 composite
  • snok/install-poetry v1 composite
.github/workflows/fern-check.yml actions
  • actions/checkout v4 composite
.github/workflows/generate-release.yml actions
  • actions/checkout v4 composite
  • docker/build-push-action v6 composite
  • docker/login-action v3 composite
  • docker/metadata-action v5 composite
  • docker/setup-buildx-action v3 composite
  • docker/setup-qemu-action v3 composite
  • jlumbroso/free-disk-space main composite
.github/workflows/preview-docs.yml actions
  • actions/checkout v4 composite
  • actions/github-script v7 composite
  • actions/setup-node v4 composite
.github/workflows/publish-docs.yml actions
  • actions/checkout v4 composite
  • actions/setup-node v3 composite
.github/workflows/release-please.yml actions
  • google-github-actions/release-please-action v4 composite
.github/workflows/stale.yml actions
  • actions/stale v8 composite
.github/workflows/tests.yml actions
  • ./.github/workflows/actions/install_dependencies * composite
  • actions/checkout v4 composite
  • actions/upload-artifact v3 composite
poetry.lock pypi
  • 255 dependencies
pyproject.toml pypi
  • black ^24 develop
  • mypy ^1.11 develop
  • pre-commit ^3 develop
  • pytest ^8 develop
  • pytest-asyncio ^0.24.0 develop
  • pytest-cov ^5 develop
  • ruff ^0 develop
  • types-pyyaml ^6.0.12.20240917 develop
  • asyncpg ^0.29.0
  • boto3 ^1.35.26
  • clickhouse-connect ^0.7.19
  • cryptography ^3.1
  • docx2txt ^0.8
  • einops ^0.8.0
  • fastapi ^0.115.0
  • ffmpy ^0.4.0
  • gradio ^4.44.0
  • injector ^0.22.0
  • llama-index-core >=0.11.2,<0.12.0
  • llama-index-embeddings-azure-openai *
  • llama-index-embeddings-gemini *
  • llama-index-embeddings-huggingface *
  • llama-index-embeddings-mistralai *
  • llama-index-embeddings-ollama *
  • llama-index-embeddings-openai *
  • llama-index-llms-azure-openai *
  • llama-index-llms-gemini *
  • llama-index-llms-llama-cpp *
  • llama-index-llms-ollama *
  • llama-index-llms-openai *
  • llama-index-llms-openai-like *
  • llama-index-readers-file *
  • llama-index-storage-docstore-postgres *
  • llama-index-storage-index-store-postgres *
  • llama-index-vector-stores-chroma *
  • llama-index-vector-stores-clickhouse *
  • llama-index-vector-stores-milvus *
  • llama-index-vector-stores-postgres *
  • llama-index-vector-stores-qdrant *
  • psycopg2-binary ^2.9.9
  • python >=3.11,<3.12
  • python-multipart ^0.0.10
  • pyyaml ^6.0.2
  • retry-async ^0.1.4
  • sentence-transformers ^3.1.1
  • torch ^2.4.1
  • transformers ^4.44.2
  • watchdog ^4.0.1