Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: tungduong0708
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 193 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 12 months ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Citation

README.md

UniDocR: Universal Multimodal Document Retrieval

Overview

UniDocR is a retrieval system designed to enhance document search capabilities by incorporating both text and image-based queries. It extends traditional text-based retrieval methods by integrating Vision-Language Models (VLMs) and Optical Character Recognition (OCR) to improve the understanding and retrieval of multimodal documents.

Features

  • Multimodal Query Support: Supports text-only, image-only, and combined text-image queries.
  • Vision-Language Integration: Utilizes Qwen2-VL for enhanced document understanding.
  • Late-Interaction Mechanism: Employs ColBERT and ColPali for efficient and precise retrieval.
  • Optimized for M2KR Benchmark: Evaluated on Q2A, I2A, and IQ2A tasks.

Technical Details

  • Retrieval Architecture: Uses a hybrid approach combining neural retrieval (ColBERT) and vision-language processing.
  • OCR Integration: Extracts textual information from images for improved search accuracy.
  • Neural Ranking: Enhances document ranking through multimodal feature fusion.
  • Frameworks Used: ColBERT, ColPali, Vision Transformers (ViTs), and Qwen2-VL.

Installation

To set up the UniDocR environment, follow these steps: ```bash

Clone the repository

git clone https://github.com/yourusername/unidocr.git cd unidocr ```

Usage

  1. Indexing Documents bash python index.py --data_path /path/to/documents
  2. Querying bash python search.py --query "Your search text" --image_path /path/to/query/image

Dataset

UniDocR is tested on the M2KR benchmark, supporting: - Q2A (Text-to-Text Retrieval) - I2A (Image-to-Text Retrieval) - IQ2A (Image + Text to Text Retrieval)

Contributions

If you'd like to contribute, feel free to submit a pull request or open an issue.

Contact

For questions, reach out at tungduong0708@gmail.com.

Owner

  • Login: tungduong0708
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Faysse"
  given-names: "Manuel"
  email: "manuel.faysse@illuin.tech"
- family-names: "Sibille"
  given-names: "Hugues"
  email: "hugues.sibille@illuin.tech"
- family-names: "Wu"
  given-names: "Tony"
  email: "tony.wu@illuin.tech"
title: "Vision Document Retrieval (ViDoRe): Benchmark"
date-released: 2024-06-26
url: "https://github.com/illuin-tech/vidore-benchmark"
preferred-citation:
  type: article
  authors:
  - family-names: "Faysse"
    given-names: "Manuel"
  - family-names: "Sibille"
    given-names: "Hugues"
  - family-names: "Wu"
    given-names: "Tony"
  - family-names: "Omrani"
    given-names: "Bilel"
  - family-names: "Viaud"
    given-names: "Gautier"
  - family-names: "Hudelot"
    given-names: "Céline"
  - family-names: "Colombo"
    given-names: "Pierre"
  doi: "arXiv.2407.01449"
  month: 6
  title: "ColPali: Efficient Document Retrieval with Vision Language Models"
  year: 2024
  url: "https://arxiv.org/abs/2407.01449"

GitHub Events

Total
  • Member event: 1
  • Push event: 28
  • Create event: 3
Last Year
  • Member event: 1
  • Push event: 28
  • Create event: 3

Dependencies

pyproject.toml pypi
  • GPUtil *
  • numpy *
  • peft >=0.14.0,<0.15.0
  • pillow >=10.0.0
  • requests *
  • scipy *
  • torch >=2.2.0
  • transformers >=4.49.0,<4.50.0