bentocolpali
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Keywords from Contributors
Repository
Basic Info
- Host: GitHub
- Owner: bentoml
- Language: Python
- Default Branch: main
- Size: 837 KB
Statistics
- Stars: 20
- Watchers: 6
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Serving ColPali with BentoML
ColPali leverages VLMs to construct efficient multi-vector embeddings in the visual space for document retrieval. By feeding the ViT output patches from PaliGemma-3B to a linear projection, ColPali create a multi-vector representation of documents. The model is trained to maximize the similarity between these document embeddings and the query embeddings, following the ColBERT method.
Using ColPali removes the need for potentially complex and brittle layout recognition and OCR pipelines with a single model that can take into account both the textual and visual content (layout, charts, ...) of a document.

This is a BentoML example project, demonstrating how to build a ColPali inference API server for ColPali. See here for a full list of BentoML example projects.
[!NOTE] The recommended ColPali checkpoint for this repository is
vidore/colpali-v1.2.
Fore more information on ColPali, please refer to:
- The original ColPali arXiv paper: ColPali: Efficient Document Retrieval with Vision Language Models 📝
- The official ColPali blog post: HuggingFace Blog 🤗
- The code/package for ColPali: colpali-engine. 🧑🏻💻
Install dependencies
```bash git clone https://github.com/bentoml/BentoColPali.git cd BentoColPali
Supports Python 3.9+
pip install -r requirements.txt ```
Build the model
Before running the BentoML service, you need to download the ColPali model checkpoint and build the model using the following command:
bash
python bentocolpali/models.py --model-name vidore/colpali-v1.2 --hf-token <YOUR_TOKEN>
[!IMPORTANT] Because ColPali uses the PaliGemma (Gemma-licensed) as its VLM backbone, the account associated to the input HuggingFace token must have accepted the terms and conditions of
google/paligemma-3b-mix-448.
Run the BentoML Service
We have defined a BentoML Service in service.py. Run bentoml serve in your project directory to start the Service.
bash
bentoml serve .
The Service is accessible at http://localhost:3000. You can interact with it using the Swagger UI or in other different ways detailed in the Examples section.
API Routes
| Route | Input | Output | Description |
| ------------------- | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ |
| /embed_images | - items: List of ImagePayload | Multi-vector embeddings | Generates image embeddings with shape (batchsize, sequencelength, embeddingdim). |
| `/embedqueries| -items: List of strings | Multi-vector embeddings | Generates query embeddings with shape (batch_size, sequence_length, embedding_dim). |
|/scoreembeddings| -imageembeddings: List of 2D-arrays<br>-queryembeddings`: List of 2D-arrays | Scores | Computes late-interaction/MaxSim scores between pre-computed embeddings. Returns scores with shape (numqueries, numimages). |
| /score | - images: List of ImagePayload
- queries: List of strings | Scores | One-shot computation of similarity scores between images and queries, i.e. run the 3 routes above in the right order.
Returns scores with shape (numqueries, num_images). |
An ImagePayload is a JSON object with a single field url that contains a base64-encoded image. The url field should be formatted like this:
json
{
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEU..."
}
Examples
With a Python client
```python import bentoml from PIL import Image
from bentocolpali.interfaces import ImagePayload from bentocolpali.utils import convertpiltob64image
imagefilepaths = ["page1.jpg", "page2.jpg"] imagepayloads = [] for filepath in imagefilepaths: image = Image.open(filepath) imagepayloads.append(ImagePayload(url=convertpiltob64image(image)))
queries = [ "How does the positional encoding work?", "How does the scaled dot attention product work?", ]
with bentoml.SyncHTTPClient("http://localhost:3000") as client: imageembeddings = client.embedimages(items=imagepayloads) queryembeddings = client.embed_queries(items=queries)
scores = client.score_embeddings(
image_embeddings=image_embeddings,
query_embeddings=query_embeddings,
)
print(scores) ```
You should get a response similar to:
json
[
[15.25727272, 6.47964382],
[11.67781448, 16.54862022]
]
With CURL
Note: the strings in the base_64 fields are dummy examples.
bash
curl -X POST -H "content-type: application/json" --data '{
"queries": [
"How does the positional encoding work?",
"How does the scaled dot attention product work?"
],
"images": [
{
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEU..."
},
{
"url": "data:image/png;base64,iVBORw0KGFEWAAAANSUhU..."
}
]
}' http://localhost:3000/score
Deploy to BentoCloud
After the Service is ready, you can deploy the application to BentoCloud for better management and scalability. Sign up if you haven't got a BentoCloud account.
Make sure you have logged in to BentoCloud, then run the following command to deploy it.
bash
bentoml deploy bento
Once the application is up and running on BentoCloud, you can access it via the exposed URL.
Note: For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.
Citation
ColPali: Efficient Document Retrieval with Vision Language Models
Authors: Manuel Faysse*, Hugues Sibille*, Tony Wu*, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo (* denotes equal contribution)
latex
@misc{faysse2024colpaliefficientdocumentretrieval,
title={ColPali: Efficient Document Retrieval with Vision Language Models},
author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
year={2024},
eprint={2407.01449},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2407.01449},
}
Owner
- Name: BentoML
- Login: bentoml
- Kind: organization
- Location: San Francisco
- Website: https://bentoml.com
- Twitter: bentomlai
- Repositories: 76
- Profile: https://github.com/bentoml
The most flexible way to serve AI models in production
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Faysse"
given-names: "Manuel"
email: "manuel.faysse@illuin.tech"
- family-names: "Sibille"
given-names: "Hugues"
email: "hugues.sibille@illuin.tech"
- family-names: "Wu"
given-names: "Tony"
email: "tony.wu@illuin.tech"
title: "Vision Document Retrieval (ViDoRe): Benchmark"
date-released: 2024-06-26
url: "https://github.com/illuin-tech/vidore-benchmark"
preferred-citation:
type: article
authors:
- family-names: "Faysse"
given-names: "Manuel"
- family-names: "Sibille"
given-names: "Hugues"
- family-names: "Wu"
given-names: "Tony"
- family-names: "Omrani"
given-names: "Bilel"
- family-names: "Viaud"
given-names: "Gautier"
- family-names: "Hudelot"
given-names: "Céline"
- family-names: "Colombo"
given-names: "Pierre"
doi: "arXiv.2407.01449"
month: 6
title: "ColPali: Efficient Document Retrieval with Vision Language Models"
year: 2024
url: "https://arxiv.org/abs/2407.01449"
GitHub Events
Total
- Watch event: 16
- Delete event: 3
- Push event: 6
- Pull request review event: 1
- Pull request review comment event: 1
- Pull request event: 3
- Create event: 2
Last Year
- Watch event: 16
- Delete event: 3
- Push event: 6
- Pull request review event: 1
- Pull request review comment event: 1
- Pull request event: 3
- Create event: 2
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Tony Wu | 2****1 | 4 |
| dependabot[bot] | 4****] | 1 |
| Sherlock Xu | 6****3 | 1 |
| Sean Sheng | s****g@g****m | 1 |
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 2 days
- Total issue authors: 0
- Total pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.2
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 0
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 2 days
- Issue authors: 0
- Pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.2
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
Pull Request Authors
- tonywu71 (5)
- dependabot[bot] (1)
- bojiang (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- pytest * development
- pytest-asyncio * development
- ruff * development
- bentoml >=1.3,<1.4
- colpali-engine >=0.3.0,<0.4.0
- pydantic >=2.8,<3