DeBEIR
DeBEIR: A Python Package for Dense Bi-Encoder Information Retrieval - Published in JOSS (2023)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in JOSS metadata -
○Academic publication links
-
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Scientific Fields
Repository
Dense Bi-Encoder Retrieval for Rapid Experimentation
Basic Info
Statistics
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 2
- Releases: 3
Topics
Metadata Files
README.md
DeBEIR
A Dense Bi-Encoder for Information Retrieval library for experimenting and using neural models (with a particular emphasis on bi-encoder models) for end-to-end ranking of documents.
Requirements
- Python >= 3.10
- GPU hardware compatible with pytorch is encouraged
- Otherwise requirements for your index such as storage and CPU usage should be considered
Setup and installation
It is recommended to set up a virtual environment and install from source
```bash python3 -m venv venv source venv/bin/activate
pip install git+https://github.com/Ayuei/DeBEIR.git
Sentence Segmentation Model install
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/encoresci_md-0.5.0.tar.gz ```
Usage
The library has an emphasis on reproducibility and experimentation. With this in mind, settings are placed into configuration files to be used to build the pipeline.
```python3 from debeir.interfaces.pipeline import NIRPipeline
p = NIRPipeline.buildfromconfig(configfp="./tests/config.toml", engine="elasticsearch", nirconfigfp="./tests/nirconfig.toml")
The cosine offset ensures a non-negative score.
results = await p.runpipeline(cosineoffset=1.0) ```
See examples/ for more use cases and where to get started.
Documentation
API Documentation for the library with rendered HTML documentation is available at https://ayuei.github.io/DeBEIR/debeir.html which is built with the pdoc3 library and is rebuilt with every commit with gh-pages.
Statically compiled documentation (which is updated less frequently) can be found in the top level directory docs/index.html.
You can also build this documentation with the pdoc library by executing the following commands: ``` pip install -r requirements-docs.txt
pdoc -o docs/ src/debeir/ ```
Development
If you use to help with development of the library, first verify the tests cases and set up a development environment. This will take approximately 30 minutes to complete on a mid-range system.
Requires: Docker and pip installation of requirements-dev.txt packages.
```bash virtualenv venv
source virtualenv/venv/activate
pip install -r requirements-dev.txt
cd tests/
./buildtestenv.sh
pytest . ```
A helper script for removing the development environment is provided in tests/cleanup.sh
Community Guidelines
An Issue?
If you have any issue with the current library, please file an issue create an issue.
Contributing
For those wanting to contribute to the library, please see CONTRIBUTING.md and submit a pull request!
Support
If you wish to reach out to the author and maintainer of this library, please email vincent.nguyen@csiro.au
Owner
- Login: Ayuei
- Kind: user
- Location: Australia
- Company: Australian National University
- Repositories: 5
- Profile: https://github.com/Ayuei
JOSS Publication
DeBEIR: A Python Package for Dense Bi-Encoder Information Retrieval
Authors
Tags
information retrieval dense retrieval bi-encoder transformers pytorch python deep learning neural networks machine learning natural language processingGitHub Events
Total
- Push event: 3
- Pull request event: 4
- Create event: 1
Last Year
- Push event: 3
- Pull request event: 4
- Create event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Ayuei | s****t@h****m | 157 |
| ngu143 | n****3@c****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 13
- Total pull requests: 5
- Average time to close issues: 24 days
- Average time to close pull requests: less than a minute
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 1.62
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- KonradHoeffner (13)
Pull Request Authors
- Ayuei (6)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/deploy-pages v1 composite
- actions/setup-python v4 composite
- actions/upload-pages-artifact v1 composite
- golang latest build
- datasets ==2.4.0
- dill *
- elasticsearch ==8.3.1
- jupyterlab ==3.4.7
- loguru *
- numpy *
- optuna ==3.0.2
- pandas *
- plac *
- requests *
- scikit-learn ==1.1.2
- scipy *
- scispacy *
- sentence-transformers ==2.2.2
- shutup *
- spacy *
- toml *
- torch ==1.12.0
- torch_optimizer *
- tqdm *
- transformers ==4.22.0
- trectools *
- wandb ==0.13.3
- pytest * development
- pytest-asyncio * development
- datasets ==2.4.0
- dill *
- elasticsearch ==8.3.1
- jupyterlab ==3.4.7
- loguru *
- numpy *
- optuna ==3.0.2
- pandas *
- plac *
- requests *
- scikit-learn ==1.1.2
- scipy *
- scispacy *
- sentence-transformers ==2.2.2
- shutup *
- spacy *
- toml *
- torch ==1.12.0
- torch_optimizer *
- tqdm *
- transformers ==4.22.0
- trectools *
- wandb ==0.13.3
- datasets ==2.4.0
- dill *
- elasticsearch ==8.3.1
- joblib *
- jupyterlab ==3.4.7
- loguru *
- numpy *
- optuna ==3.0.2
- pandas *
- plac *
- requests *
- scikit-learn ==1.1.2
- scipy *
- scispacy *
- sentence-transformers ==2.2.2
- shutup *
- spacy *
- toml *
- torch *
- torch_optimizer *
- tqdm *
- transformers ==4.22.0
- trectools *
- wandb ==0.13.3
- elasticsearch *
- torch ==1.12.1
