Recent Releases of https://github.com/bioscan-ml/barcodebert

https://github.com/bioscan-ml/barcodebert - v1.0.0

This repository contains all the code accompanying the paper BarcodeBERT: Transformers for Biodiversity Analysis (Millan Arias et al., 2025)

BarcodeBERT is a BERT-style transformer model trained exclusively on a dataset of DNA barcode sequences extracted from a reference library of Canadian invertebrates. In addition to the full pretraining pipeline, you’ll find scripts and notebooks for evaluating BarcodeBERT (and several off-the-shelf DNA foundation models) in various downstream tasks:

  • Fine-tuning for supervised species-level classification.
  • Similarity retrieval for labelling rare or unseen species via nearest neighbour search in the embedding space.
  • BIN reconstruction, where BarcodeBERT embeddings are used to group sequences into putative Barcode Index Numbers.

What's Changed

  • BZSL implementation for BarcodeBERT: Transformers for Biodiversity Analysis by @atwang16 in https://github.com/bioscan-ml/BarcodeBERT/pull/1
  • DOC: Fix paths to scripts shown in README by @scottclowe in https://github.com/bioscan-ml/BarcodeBERT/pull/3
  • RF: Standardize requirements.txt into a single file by @scottclowe in https://github.com/bioscan-ml/BarcodeBERT/pull/5
  • Bump transformers from 4.29.2 to 4.36.0 by @dependabot in https://github.com/bioscan-ml/BarcodeBERT/pull/7
  • Bump black from 23.11.0 to 24.3.0 by @dependabot in https://github.com/bioscan-ml/BarcodeBERT/pull/8
  • Bump transformers from 4.36.0 to 4.38.0 by @dependabot in https://github.com/bioscan-ml/BarcodeBERT/pull/9
  • Bump scikit-learn from 1.3.0 to 1.5.0 by @dependabot in https://github.com/bioscan-ml/BarcodeBERT/pull/10
  • MNT: Fix issues flagged by pre-commit by @scottclowe in https://github.com/bioscan-ml/BarcodeBERT/pull/11
  • DOC: Update citation by @scottclowe in https://github.com/bioscan-ml/BarcodeBERT/pull/12
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/bioscan-ml/BarcodeBERT/pull/13
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/bioscan-ml/BarcodeBERT/pull/15
  • Added conditional check for use_cuda before using torch.cuda by @NotMyLyfe in https://github.com/bioscan-ml/BarcodeBERT/pull/14
  • Camera Ready Version for Bioinformatics by @millanp95 in https://github.com/bioscan-ml/BarcodeBERT/pull/19

New Contributors

  • @atwang16 made their first contribution in https://github.com/bioscan-ml/BarcodeBERT/pull/1
  • @dependabot made their first contribution in https://github.com/bioscan-ml/BarcodeBERT/pull/7
  • @pre-commit-ci made their first contribution in https://github.com/bioscan-ml/BarcodeBERT/pull/13
  • @NotMyLyfe made their first contribution in https://github.com/bioscan-ml/BarcodeBERT/pull/14
  • @millanp95 made their first contribution in https://github.com/bioscan-ml/BarcodeBERT/pull/19

Full Changelog: https://github.com/bioscan-ml/BarcodeBERT/commits/v1.0.0

- HTML
Published by millanp95 about 1 year ago