Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: tungduong0708
- License: mit
- Language: Python
- Default Branch: main
- Size: 193 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
UniDocR: Universal Multimodal Document Retrieval
Overview
UniDocR is a retrieval system designed to enhance document search capabilities by incorporating both text and image-based queries. It extends traditional text-based retrieval methods by integrating Vision-Language Models (VLMs) and Optical Character Recognition (OCR) to improve the understanding and retrieval of multimodal documents.
Features
- Multimodal Query Support: Supports text-only, image-only, and combined text-image queries.
- Vision-Language Integration: Utilizes Qwen2-VL for enhanced document understanding.
- Late-Interaction Mechanism: Employs ColBERT and ColPali for efficient and precise retrieval.
- Optimized for M2KR Benchmark: Evaluated on Q2A, I2A, and IQ2A tasks.
Technical Details
- Retrieval Architecture: Uses a hybrid approach combining neural retrieval (ColBERT) and vision-language processing.
- OCR Integration: Extracts textual information from images for improved search accuracy.
- Neural Ranking: Enhances document ranking through multimodal feature fusion.
- Frameworks Used: ColBERT, ColPali, Vision Transformers (ViTs), and Qwen2-VL.
Installation
To set up the UniDocR environment, follow these steps: ```bash
Clone the repository
git clone https://github.com/yourusername/unidocr.git cd unidocr ```
Usage
- Indexing Documents
bash python index.py --data_path /path/to/documents - Querying
bash python search.py --query "Your search text" --image_path /path/to/query/image
Dataset
UniDocR is tested on the M2KR benchmark, supporting: - Q2A (Text-to-Text Retrieval) - I2A (Image-to-Text Retrieval) - IQ2A (Image + Text to Text Retrieval)
Contributions
If you'd like to contribute, feel free to submit a pull request or open an issue.
Contact
For questions, reach out at tungduong0708@gmail.com.
Owner
- Login: tungduong0708
- Kind: user
- Repositories: 1
- Profile: https://github.com/tungduong0708
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Faysse"
given-names: "Manuel"
email: "manuel.faysse@illuin.tech"
- family-names: "Sibille"
given-names: "Hugues"
email: "hugues.sibille@illuin.tech"
- family-names: "Wu"
given-names: "Tony"
email: "tony.wu@illuin.tech"
title: "Vision Document Retrieval (ViDoRe): Benchmark"
date-released: 2024-06-26
url: "https://github.com/illuin-tech/vidore-benchmark"
preferred-citation:
type: article
authors:
- family-names: "Faysse"
given-names: "Manuel"
- family-names: "Sibille"
given-names: "Hugues"
- family-names: "Wu"
given-names: "Tony"
- family-names: "Omrani"
given-names: "Bilel"
- family-names: "Viaud"
given-names: "Gautier"
- family-names: "Hudelot"
given-names: "Céline"
- family-names: "Colombo"
given-names: "Pierre"
doi: "arXiv.2407.01449"
month: 6
title: "ColPali: Efficient Document Retrieval with Vision Language Models"
year: 2024
url: "https://arxiv.org/abs/2407.01449"
GitHub Events
Total
- Member event: 1
- Push event: 28
- Create event: 3
Last Year
- Member event: 1
- Push event: 28
- Create event: 3
Dependencies
- GPUtil *
- numpy *
- peft >=0.14.0,<0.15.0
- pillow >=10.0.0
- requests *
- scipy *
- torch >=2.2.0
- transformers >=4.49.0,<4.50.0