bioinfojava-utils

BioInfoJava-Utils

https://github.com/gkanogiannis/bioinfojava-utils

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

BioInfoJava-Utils

Basic Info
  • Host: GitHub
  • Owner: gkanogiannis
  • License: gpl-3.0
  • Language: Java
  • Default Branch: main
  • Homepage:
  • Size: 22.3 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 11
Created over 4 years ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

BioInfoJava-Utils

BioInfoJava-Utils is a modular Java library providing high-performance implementations of core bioinformatics algorithms, such as distance matrix computation and phylogenetic tree construction from VCF and FASTA files.

This library serves as the computational backend for the fastreeR software suite, which offers a flexible and user-friendly interface to these tools across multiple platforms and environments.

Integration and Accessibility

The functionality of BioInfoJava-Utils is exposed through the fastreeR interface, which is accessible in the following ways:

  • NEW Java Backend (v2.y.z) !! 100x times faster and only a couple hundred MB RAM needed. Java 11+ suggested.
  • Bioconda: install with conda install -c bioconda fastreer
  • Docker: available on DockerHub and GHCR for containerized execution
  • PyPI: install with pip install fastreer
  • Python CLI: through a lightweight Python wrapper that calls the Java backend
  • R / Bioconductor: via rJava
  • Galaxy: Also available on Galaxy Toolshed.
  • Pure Java API: developers can integrate this library directly in Java-based pipelines or software.

Overview

BioInfoJava-Utils provides efficient, scalable, and parallel implementations of widely used bioinformatics algorithms. It is designed for processing large-scale genomic datasets efficiently, supporting both research and production environments.

Features

  • 🚀 Now ultra-fast with a superior multithreaded concurrency model and minimal RAM usage — from GBs down to just MBs!
  • ⚙️ Compute sample-wise distance matrices from VCF (cosine) or FASTA (D2S) files
  • 🌳 Build phylogenetic trees using neighbor-joining algorithm
  • 🧬 Support for hierarchical clustering with dynamic tree pruning
  • 🔄 Multithreaded processing for large input files
  • 📦 Integrates seamlessly into diverse environments (R, Python, Docker, Java)

Installation

Prerequisites

  • Java 11 or higher
  • Maven (for building the project)

Building from Source

  1. Clone the repository:

bash git clone https://github.com/gkanogiannis/BioInfoJava-Utils.git

  1. Navigate to the project directory:

bash cd BioInfoJava-Utils

  1. Build the project using Maven:

bash mvn clean package install

This will generate a JAR files in the bin directory.

Usage

The main class for executing the utilities is:

java com.gkano.bioinfo.javautils.JavaUtils

You can run the utilities via the command line or integrate them into other Java applications.

java java -jar bin/BioInfoJavaUtils-VERSION-jar-with-dependencies.jar --help

License

This project is licensed under the GNU General Public License v3.0.

Citation

If you use BioInfoJava-Utils in your research, please cite the following:

Gkanogiannis, A. et al. A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes. BMC Bioinformatics 17, 311 (2016). https://doi.org/10.1186/s12859-016-1186-3

Author

Anestis Gkanogiannis
Bioinformatics/ML Scientist
Website: https://www.gkanogiannis.com
ORCID: 0000-0002-6441-0688

Owner

  • Name: Anestis Gkanogiannis
  • Login: gkanogiannis
  • Kind: user
  • Location: Barcelona, Spain

Bioinformatics/ML/AI Scientist

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Gkanogiannis
    given-names: Anestis
    orcid: https://orcid.org/0000-0002-6441-0688
title: "A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes."
version: 1.13.0
doi: 10.1186/s12859-016-1186-3
date-released: 2021-12-03
url: "https://github.com/gkanogiannis/BioInfoJava-Utils"

GitHub Events

Total
  • Release event: 28
  • Delete event: 23
  • Push event: 44
  • Create event: 29
Last Year
  • Release event: 28
  • Delete event: 23
  • Push event: 44
  • Create event: 29

Dependencies

pom.xml maven
  • com.beust:jcommander 1.72
  • jri:jri 1.0
  • net.sf.trove4j:trove4j 3.0.3
  • org.jfree:jfreechart 1.5.3
.github/workflows/release-on-version-change.yml actions
  • actions/checkout v4 composite
  • actions/setup-java v4 composite
  • softprops/action-gh-release v2 composite