unidocr

https://github.com/tungduong0708/unidocr

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: tungduong0708
License: mit
Language: Python
Default Branch: main
Size: 193 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License Citation

UniDocR: Universal Multimodal Document Retrieval

Overview

UniDocR is a retrieval system designed to enhance document search capabilities by incorporating both text and image-based queries. It extends traditional text-based retrieval methods by integrating Vision-Language Models (VLMs) and Optical Character Recognition (OCR) to improve the understanding and retrieval of multimodal documents.

Features

Multimodal Query Support: Supports text-only, image-only, and combined text-image queries.
Vision-Language Integration: Utilizes Qwen2-VL for enhanced document understanding.
Late-Interaction Mechanism: Employs ColBERT and ColPali for efficient and precise retrieval.
Optimized for M2KR Benchmark: Evaluated on Q2A, I2A, and IQ2A tasks.

Technical Details

Retrieval Architecture: Uses a hybrid approach combining neural retrieval (ColBERT) and vision-language processing.
OCR Integration: Extracts textual information from images for improved search accuracy.
Neural Ranking: Enhances document ranking through multimodal feature fusion.
Frameworks Used: ColBERT, ColPali, Vision Transformers (ViTs), and Qwen2-VL.

Installation

To set up the UniDocR environment, follow these steps: ```bash

Clone the repository

git clone https://github.com/yourusername/unidocr.git cd unidocr ```

Usage

Indexing Documents bash python index.py --data_path /path/to/documents
Querying bash python search.py --query "Your search text" --image_path /path/to/query/image

Dataset

UniDocR is tested on the M2KR benchmark, supporting: - Q2A (Text-to-Text Retrieval) - I2A (Image-to-Text Retrieval) - IQ2A (Image + Text to Text Retrieval)

Contributions

If you'd like to contribute, feel free to submit a pull request or open an issue.

Contact

For questions, reach out at tungduong0708@gmail.com.

Owner

Login: tungduong0708
Kind: user

Repositories: 1
Profile: https://github.com/tungduong0708

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Faysse"
  given-names: "Manuel"
  email: "manuel.faysse@illuin.tech"
- family-names: "Sibille"
  given-names: "Hugues"
  email: "hugues.sibille@illuin.tech"
- family-names: "Wu"
  given-names: "Tony"
  email: "tony.wu@illuin.tech"
title: "Vision Document Retrieval (ViDoRe): Benchmark"
date-released: 2024-06-26
url: "https://github.com/illuin-tech/vidore-benchmark"
preferred-citation:
  type: article
  authors:
  - family-names: "Faysse"
    given-names: "Manuel"
  - family-names: "Sibille"
    given-names: "Hugues"
  - family-names: "Wu"
    given-names: "Tony"
  - family-names: "Omrani"
    given-names: "Bilel"
  - family-names: "Viaud"
    given-names: "Gautier"
  - family-names: "Hudelot"
    given-names: "Céline"
  - family-names: "Colombo"
    given-names: "Pierre"
  doi: "arXiv.2407.01449"
  month: 6
  title: "ColPali: Efficient Document Retrieval with Vision Language Models"
  year: 2024
  url: "https://arxiv.org/abs/2407.01449"

GitHub Events

Total

Member event: 1
Push event: 28
Create event: 3

Last Year

Member event: 1
Push event: 28
Create event: 3

Dependencies

pyproject.toml pypi

GPUtil *
numpy *
peft >=0.14.0,<0.15.0
pillow >=10.0.0
requests *
scipy *
torch >=2.2.0
transformers >=4.49.0,<4.50.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science