DeBEIR

DeBEIR: A Python Package for Dense Bi-Encoder Information Retrieval - Published in JOSS (2023)

https://github.com/ayuei/debeir

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

dense-retrieval python transformers-ranking

Scientific Fields

Mathematics Computer Science - 84% confidence
Last synced: 4 months ago · JSON representation

Repository

Dense Bi-Encoder Retrieval for Rapid Experimentation

Basic Info
  • Host: GitHub
  • Owner: Ayuei
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 23.8 MB
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 0
  • Open Issues: 2
  • Releases: 3
Topics
dense-retrieval python transformers-ranking
Created over 4 years ago · Last pushed 8 months ago
Metadata Files
Readme Contributing License

README.md

DeBEIR

A Dense Bi-Encoder for Information Retrieval library for experimenting and using neural models (with a particular emphasis on bi-encoder models) for end-to-end ranking of documents.

Requirements

  • Python >= 3.10
  • GPU hardware compatible with pytorch is encouraged
  • Otherwise requirements for your index such as storage and CPU usage should be considered

Setup and installation

It is recommended to set up a virtual environment and install from source

```bash python3 -m venv venv source venv/bin/activate

pip install git+https://github.com/Ayuei/DeBEIR.git

Sentence Segmentation Model install

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/encoresci_md-0.5.0.tar.gz ```

Usage

The library has an emphasis on reproducibility and experimentation. With this in mind, settings are placed into configuration files to be used to build the pipeline.

```python3 from debeir.interfaces.pipeline import NIRPipeline

p = NIRPipeline.buildfromconfig(configfp="./tests/config.toml", engine="elasticsearch", nirconfigfp="./tests/nirconfig.toml")

The cosine offset ensures a non-negative score.

results = await p.runpipeline(cosineoffset=1.0) ```

See examples/ for more use cases and where to get started.

Documentation

API Documentation for the library with rendered HTML documentation is available at https://ayuei.github.io/DeBEIR/debeir.html which is built with the pdoc3 library and is rebuilt with every commit with gh-pages.

Statically compiled documentation (which is updated less frequently) can be found in the top level directory docs/index.html.

You can also build this documentation with the pdoc library by executing the following commands: ``` pip install -r requirements-docs.txt

pdoc -o docs/ src/debeir/ ```

Development

If you use to help with development of the library, first verify the tests cases and set up a development environment. This will take approximately 30 minutes to complete on a mid-range system.

Requires: Docker and pip installation of requirements-dev.txt packages.

```bash virtualenv venv

source virtualenv/venv/activate

pip install -r requirements-dev.txt

cd tests/

./buildtestenv.sh

pytest . ```

A helper script for removing the development environment is provided in tests/cleanup.sh

Community Guidelines

An Issue?

If you have any issue with the current library, please file an issue create an issue.

Contributing

For those wanting to contribute to the library, please see CONTRIBUTING.md and submit a pull request!

Support

If you wish to reach out to the author and maintainer of this library, please email vincent.nguyen@csiro.au

Owner

  • Login: Ayuei
  • Kind: user
  • Location: Australia
  • Company: Australian National University

JOSS Publication

DeBEIR: A Python Package for Dense Bi-Encoder Information Retrieval
Published
July 05, 2023
Volume 8, Issue 87, Page 5017
Authors
Vincent Nguyen ORCID
Australian National University, School of Computing, Commonwealth Scientific and Industrial Research Organisation, Data61
Sarvnaz Karimi ORCID
Commonwealth Scientific and Industrial Research Organisation, Data61
Zhenchang Xing ORCID
Australian National University, School of Computing, Commonwealth Scientific and Industrial Research Organisation, Data61
Editor
Arfon Smith ORCID
Tags
information retrieval dense retrieval bi-encoder transformers pytorch python deep learning neural networks machine learning natural language processing

GitHub Events

Total
  • Push event: 3
  • Pull request event: 4
  • Create event: 1
Last Year
  • Push event: 3
  • Pull request event: 4
  • Create event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 158
  • Total Committers: 2
  • Avg Commits per committer: 79.0
  • Development Distribution Score (DDS): 0.006
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Ayuei s****t@h****m 157
ngu143 n****3@c****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 13
  • Total pull requests: 5
  • Average time to close issues: 24 days
  • Average time to close pull requests: less than a minute
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 1.62
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • KonradHoeffner (13)
Pull Request Authors
  • Ayuei (6)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels

Dependencies

.github/workflows/docs.yml actions
  • actions/checkout v3 composite
  • actions/deploy-pages v1 composite
  • actions/setup-python v4 composite
  • actions/upload-pages-artifact v1 composite
tests/Dockerfile docker
  • golang latest build
docs/requirements-docs.txt pypi
  • datasets ==2.4.0
  • dill *
  • elasticsearch ==8.3.1
  • jupyterlab ==3.4.7
  • loguru *
  • numpy *
  • optuna ==3.0.2
  • pandas *
  • plac *
  • requests *
  • scikit-learn ==1.1.2
  • scipy *
  • scispacy *
  • sentence-transformers ==2.2.2
  • shutup *
  • spacy *
  • toml *
  • torch ==1.12.0
  • torch_optimizer *
  • tqdm *
  • transformers ==4.22.0
  • trectools *
  • wandb ==0.13.3
requirements-dev.txt pypi
  • pytest * development
  • pytest-asyncio * development
requirements-docs.txt pypi
  • datasets ==2.4.0
  • dill *
  • elasticsearch ==8.3.1
  • jupyterlab ==3.4.7
  • loguru *
  • numpy *
  • optuna ==3.0.2
  • pandas *
  • plac *
  • requests *
  • scikit-learn ==1.1.2
  • scipy *
  • scispacy *
  • sentence-transformers ==2.2.2
  • shutup *
  • spacy *
  • toml *
  • torch ==1.12.0
  • torch_optimizer *
  • tqdm *
  • transformers ==4.22.0
  • trectools *
  • wandb ==0.13.3
requirements.txt pypi
  • datasets ==2.4.0
  • dill *
  • elasticsearch ==8.3.1
  • joblib *
  • jupyterlab ==3.4.7
  • loguru *
  • numpy *
  • optuna ==3.0.2
  • pandas *
  • plac *
  • requests *
  • scikit-learn ==1.1.2
  • scipy *
  • scispacy *
  • sentence-transformers ==2.2.2
  • shutup *
  • spacy *
  • toml *
  • torch *
  • torch_optimizer *
  • tqdm *
  • transformers ==4.22.0
  • trectools *
  • wandb ==0.13.3
setup.py pypi
  • elasticsearch *
  • torch ==1.12.1