BERT

Batch-Effect Reduction Trees

https://github.com/hsu-hpc/bert

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization hsu-hpc has institutional domain (www.hsu-hh.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary

Keywords

batch-effect bioconductor-package bioinformatics data-integration data-science nature-communications
Last synced: 6 months ago · JSON representation

Repository

Batch-Effect Reduction Trees

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Topics
batch-effect bioconductor-package bioinformatics data-integration data-science nature-communications
Created about 3 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

BERT: Batch-Effect Reduction Trees

Build Status Supported Platforms Bioconductor Availability Last Update DOI

Data from high-throughput technologies assessing global patterns of biomolecules (omic data), is often afflicted with missing values and with measurement-specific biases (batch-effects), that hinder the quantitative comparison of independently acquired datasets. This repository provides the BERT algorithm, a high-performance method for data integration of incomplete omic profiles.

[!IMPORTANT] This repository is primarily intended for development purposes. For typical users, BERT is provided via Bioconductor. Note that repository badges refer to the release version of BERT, which may be multiple commits behind the source code provided here. The latest CI/CD results for BERT may be obtained here.

[!WARNING] The R package provided here is neither affiliated with nor related to Bidirectional Encoder Representations from Transformers as published by Devlin et al in 2019 (arXiv:1810.04805).

Installation

[!TIP] It is recommended to install BERT via Bioconductor as described here.

For development purposes, the BERT package can be installed directly from this repository using devtools.

R if (!require("devtools", quietly = TRUE)) install.packages("devtools") if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c('S4Vectors', 'S4Arrays', 'XVector', 'genefilter', 'SparseArray')) devtools::install_github('HSU-HPC/BERT')

Please compare the installed version of R to the required version for Bioconductor and install all build dependencies if compilation from source is required for your target[^1].

Usage

The BERT library is designed to offer high user friendliness whilst providing maximum flexibility. The following example demonstrates how to use the software on a simulated dataset with batch-effects and missing values:

```R

import library

library(BERT)

simulate dataset with 10% missing values

datasetraw <- generatedataset(features=60, batches=10, samplesperbatch=10, mvstmt=0.1, classes=2)

apply BERT with default arguments

datasetcorrected <- BERT(datasetraw) ```

[!TIP] A detailed explanation of all available parameters, their default values and optimal configurations for typical scenarios can be found in the Bioconductor vignette.

Support

Users may ask for assistance via the Bioconductor support site. Bug reports may be filed via the Issues tab of this repository. For confidential or security-related problems, please send an email to

ju [dot] neumann [at] uke [dot] de or philipp [dot] neumann [at] desy [dot] de

[!WARNING] As of October 2025, this repository will be no longer actively maintained.

License

This code is published under the GPLv3.0 License.

References

Citations make research visible. If you use BERT for your research, please cite the following publication:

  • Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets, Y. Schumann Gocke / A. Gocke / J. E. Neumann, 2024-12 PROTEOMICS, Wiley, https://doi.org/10.1002/pmic.202400100
  • Schumann, Y., Schlumbohm, S., Neumann, J.E. et al. High performance data integration for large-scale analyses of incomplete Omic profiles using Batch-Effect Reduction Trees (BERT). Nat Commun 16, 7104 (2025). https://doi.org/10.1038/s41467-025-62237-4

[^1]: On Ubuntu 24.04, a complete list of depencies would be: wget, curl _, _build-essential, libssl-dev, libcurl4-openssl-dev, pkg-config, git, ca-certificates, libxml2, libxml2-dev, gnupg, software-properties-common, libfontconfig1-dev, libharfbuzz-dev, libfribidi-dev, libfreetype6-dev, libpng-dev, libtiff5-dev, libjpeg-dev

Owner

  • Name: Chair for High Performance Computing
  • Login: HSU-HPC
  • Kind: organization
  • Email: philipp.neumann@hsu-hh.de
  • Location: Hamburg, Germany

GitHub Events

Total
  • Release event: 2
  • Watch event: 1
  • Push event: 13
  • Pull request event: 1
  • Create event: 2
Last Year
  • Release event: 2
  • Watch event: 1
  • Push event: 13
  • Pull request event: 1
  • Create event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • deryannis (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 3,532 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
bioconductor.org: BERT

High Performance Data Integration for Large-Scale Analyses of Incomplete Omic Profiles Using Batch-Effect Reduction Trees (BERT)

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 3,532 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 31.3%
Average: 42.5%
Downloads: 96.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.3.0 depends
  • Rmpi * enhances
  • doMPI * enhances
  • BiocStyle * imports
  • SummarizedExperiment * imports
  • cluster * imports
  • comprehenr * imports
  • doParallel >= 1.0.17 imports
  • foreach >= 1.5.2 imports
  • invgamma * imports
  • iterators >= 1.0.14 imports
  • janitor >= 2.2.0 imports
  • limma >= 3.46.0 imports
  • logging >= 0.10 imports
  • methods * imports
  • parallel * imports
  • sva >= 3.38.0 imports
  • utils * imports
  • knitr * suggests
  • rmarkdown * suggests
  • testthat >= 3.0.0 suggests