Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary

Keywords from Contributors

mesh
Last synced: 7 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: bentoml
  • Language: Python
  • Default Branch: main
  • Size: 837 KB
Statistics
  • Stars: 20
  • Watchers: 6
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 12 months ago
Metadata Files
Readme Citation

README.md

Serving ColPali with BentoML

ColPali leverages VLMs to construct efficient multi-vector embeddings in the visual space for document retrieval. By feeding the ViT output patches from PaliGemma-3B to a linear projection, ColPali create a multi-vector representation of documents. The model is trained to maximize the similarity between these document embeddings and the query embeddings, following the ColBERT method.

Using ColPali removes the need for potentially complex and brittle layout recognition and OCR pipelines with a single model that can take into account both the textual and visual content (layout, charts, ...) of a document.

ColPali Architecture

This is a BentoML example project, demonstrating how to build a ColPali inference API server for ColPali. See here for a full list of BentoML example projects.

[!NOTE] The recommended ColPali checkpoint for this repository is vidore/colpali-v1.2.

Fore more information on ColPali, please refer to:

Install dependencies

```bash git clone https://github.com/bentoml/BentoColPali.git cd BentoColPali

Supports Python 3.9+

pip install -r requirements.txt ```

Build the model

Before running the BentoML service, you need to download the ColPali model checkpoint and build the model using the following command:

bash python bentocolpali/models.py --model-name vidore/colpali-v1.2 --hf-token <YOUR_TOKEN>

[!IMPORTANT] Because ColPali uses the PaliGemma (Gemma-licensed) as its VLM backbone, the account associated to the input HuggingFace token must have accepted the terms and conditions of google/paligemma-3b-mix-448.

Run the BentoML Service

We have defined a BentoML Service in service.py. Run bentoml serve in your project directory to start the Service.

bash bentoml serve .

The Service is accessible at http://localhost:3000. You can interact with it using the Swagger UI or in other different ways detailed in the Examples section.

API Routes

| Route | Input | Output | Description | | ------------------- | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | | /embed_images | - items: List of ImagePayload | Multi-vector embeddings | Generates image embeddings with shape (batchsize, sequencelength, embeddingdim). | | `/embedqueries| -items: List of strings | Multi-vector embeddings | Generates query embeddings with shape (batch_size, sequence_length, embedding_dim). | |/scoreembeddings| -imageembeddings: List of 2D-arrays<br>-queryembeddings`: List of 2D-arrays | Scores | Computes late-interaction/MaxSim scores between pre-computed embeddings. Returns scores with shape (numqueries, numimages). | | /score | - images: List of ImagePayload
- queries: List of strings | Scores | One-shot computation of similarity scores between images and queries, i.e. run the 3 routes above in the right order.
Returns scores with shape (num
queries, num_images). |

An ImagePayload is a JSON object with a single field url that contains a base64-encoded image. The url field should be formatted like this:

json { "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEU..." }

Examples

With a Python client

```python import bentoml from PIL import Image

from bentocolpali.interfaces import ImagePayload from bentocolpali.utils import convertpiltob64image

imagefilepaths = ["page1.jpg", "page2.jpg"] imagepayloads = [] for filepath in imagefilepaths: image = Image.open(filepath) imagepayloads.append(ImagePayload(url=convertpiltob64image(image)))

queries = [ "How does the positional encoding work?", "How does the scaled dot attention product work?", ]

with bentoml.SyncHTTPClient("http://localhost:3000") as client: imageembeddings = client.embedimages(items=imagepayloads) queryembeddings = client.embed_queries(items=queries)

scores = client.score_embeddings(
    image_embeddings=image_embeddings,
    query_embeddings=query_embeddings,
)

print(scores) ```

You should get a response similar to:

json [ [15.25727272, 6.47964382], [11.67781448, 16.54862022] ]

With CURL

Note: the strings in the base_64 fields are dummy examples.

bash curl -X POST -H "content-type: application/json" --data '{ "queries": [ "How does the positional encoding work?", "How does the scaled dot attention product work?" ], "images": [ { "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEU..." }, { "url": "data:image/png;base64,iVBORw0KGFEWAAAANSUhU..." } ] }' http://localhost:3000/score

Deploy to BentoCloud

After the Service is ready, you can deploy the application to BentoCloud for better management and scalability. Sign up if you haven't got a BentoCloud account.

Make sure you have logged in to BentoCloud, then run the following command to deploy it.

bash bentoml deploy bento

Once the application is up and running on BentoCloud, you can access it via the exposed URL.

Note: For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.

Citation

ColPali: Efficient Document Retrieval with Vision Language Models

Authors: Manuel Faysse*, Hugues Sibille*, Tony Wu*, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo (* denotes equal contribution)

latex @misc{faysse2024colpaliefficientdocumentretrieval, title={ColPali: Efficient Document Retrieval with Vision Language Models}, author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo}, year={2024}, eprint={2407.01449}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2407.01449}, }

Owner

  • Name: BentoML
  • Login: bentoml
  • Kind: organization
  • Location: San Francisco

The most flexible way to serve AI models in production

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Faysse"
  given-names: "Manuel"
  email: "manuel.faysse@illuin.tech"
- family-names: "Sibille"
  given-names: "Hugues"
  email: "hugues.sibille@illuin.tech"
- family-names: "Wu"
  given-names: "Tony"
  email: "tony.wu@illuin.tech"
title: "Vision Document Retrieval (ViDoRe): Benchmark"
date-released: 2024-06-26
url: "https://github.com/illuin-tech/vidore-benchmark"
preferred-citation:
  type: article
  authors:
  - family-names: "Faysse"
    given-names: "Manuel"
  - family-names: "Sibille"
    given-names: "Hugues"
  - family-names: "Wu"
    given-names: "Tony"
  - family-names: "Omrani"
    given-names: "Bilel"
  - family-names: "Viaud"
    given-names: "Gautier"
  - family-names: "Hudelot"
    given-names: "Céline"
  - family-names: "Colombo"
    given-names: "Pierre"
  doi: "arXiv.2407.01449"
  month: 6
  title: "ColPali: Efficient Document Retrieval with Vision Language Models"
  year: 2024
  url: "https://arxiv.org/abs/2407.01449"

GitHub Events

Total
  • Watch event: 16
  • Delete event: 3
  • Push event: 6
  • Pull request review event: 1
  • Pull request review comment event: 1
  • Pull request event: 3
  • Create event: 2
Last Year
  • Watch event: 16
  • Delete event: 3
  • Push event: 6
  • Pull request review event: 1
  • Pull request review comment event: 1
  • Pull request event: 3
  • Create event: 2

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 7
  • Total Committers: 4
  • Avg Commits per committer: 1.75
  • Development Distribution Score (DDS): 0.429
Past Year
  • Commits: 7
  • Committers: 4
  • Avg Commits per committer: 1.75
  • Development Distribution Score (DDS): 0.429
Top Committers
Name Email Commits
Tony Wu 2****1 4
dependabot[bot] 4****] 1
Sherlock Xu 6****3 1
Sean Sheng s****g@g****m 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: 2 days
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.2
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: 2 days
  • Issue authors: 0
  • Pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.2
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
Pull Request Authors
  • tonywu71 (5)
  • dependabot[bot] (1)
  • bojiang (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (1) python (1)

Dependencies

requirements-dev.txt pypi
  • pytest * development
  • pytest-asyncio * development
  • ruff * development
requirements.txt pypi
  • bentoml >=1.3,<1.4
  • colpali-engine >=0.3.0,<0.4.0
  • pydantic >=2.8,<3