graphsim
graphsim: An R package for simulating gene expression data from graph structures of biological pathways - Published in JOSS (2020)
TipToft
TipToft: detecting plasmids contained in uncorrected long read sequencing data - Published in JOSS (2019)
Multilocus sequence typing by blast from de novo assemblies against PubMLST
Multilocus sequence typing by blast from de novo assemblies against PubMLST - Published in JOSS (2016)
CheckQC
CheckQC: Quick quality control of Illumina sequencing runs - Published in JOSS (2018)
Pynteny
Pynteny: a Python package to perform synteny-aware, profile HMM-based searches in sequence databases - Published in JOSS (2023)
SaffronTree
SaffronTree: Fast, reference-free pseudo-phylogenomic trees from reads or contigs. - Published in JOSS (2017)
GFAKluge
GFAKluge: A C++ library and command line utilities for the Graphical Fragment Assembly formats - Published in JOSS (2019)
org.biojava
:book::microscope::coffee: BioJava is an open-source project dedicated to providing a Java library for processing biological data.
WGS2NCBI - Toolkit for preparing genomes for submission to NCBI
WGS2NCBI - Toolkit for preparing genomes for submission to NCBI - Published in JOSS (2019)
Baargin
Baargin: a Nextflow workflow for the automatic analysis of bacterial genomics data with a focus on Antimicrobial Resistance - Published in JOSS (2023)
cazy-webscraper
Web scraper to retrieve protein data catalogued by the CAZy, UniProt, NCBI, GTDB and PDB websites/databases.
biocommons.seqrepo
non-redundant, compressed, journalled, file-based storage for biological sequences
adapt-diagnostics
A package for designing activity-informed nucleic acid diagnostics for viruses.
jcvi
Python library to facilitate genome assembly, annotation, and comparative genomics
OpenOmics
OpenOmics: A bioinformatics API to integrate multi-omics datasets and interface with public databases. - Published in JOSS (2021)
biopython
Official git repository for Biopython (originally converted from CVS)
maftools
Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
minimap2
A versatile pairwise aligner for genomic and spliced nucleotide sequences
sarek
Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
nplinker
A python framework for microbial natural products data mining by integrating genomics and metabolomics data
https://github.com/althonos/lightmotif
A lightweight platform-accelerated library for biological motif scanning using position weight matrices.
msprime
Simulate genealogical trees and genomic sequence data using population genetic models
filtersam
Tools to filter SAM/BAM files by percent identity and percent of matched sequence
GFF3toEMBL
GFF3toEMBL: Preparing annotated assemblies for submission to EMBL - Published in JOSS (2016)
circdna
Pipeline for the identification of extrachromosomal circular DNA (ecDNA) from Circle-seq, WGS, and ATAC-seq data that were generated from cancer and other eukaryotic cells.
coolpuppy
A versatile tool to perform pile-up analysis on Hi-C data in .cool format.
https://github.com/althonos/pyopal
Cython bindings and Python interface to Opal, a SIMD-accelerated database search aligner.
zol
zol (& fai): large-scale targeted detection and evolutionary investigation of gene clusters (i.e. BGCs, phages, etc.)
smoove
structural variant calling and genotyping with existing tools, but, smoothly.
staramr
Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
cnr-flow
CUT&RUN-Flow, A Nextflow pipeline for QC, tag trimming, normalization, and peak calling for data from CUT&RUN experiments.
https://github.com/brentp/vcfanno
annotate a VCF with other VCFs/BEDs/tabixed files
divbrowse
A web application for interactive visualization and exploratory data analysis of variant call matrices
circrna
circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data
clipkit
a multiple sequence alignment-trimming algorithm for accurate phylogenomic inference
https://github.com/althonos/pyfamsa
Cython bindings and Python interface to FAMSA, an algorithm for ultra-scale multiple sequence alignments.
pycirclize
Circular visualization in Python (Circos Plot, Chord Diagram, Radar Chart)
data-management-resources
Repo supporting the manuscript 'Journeying towards best practice data management in biodiversity genomics'.
orthofisher
a broadly applicable tool for automated gene identification and retrieval
https://github.com/ay-lab/mustache
Multi-scale Detection of Chromatin Loops from Hi-C and Micro-C Maps using Scale-Space Representation
hgvs
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
pymsaviz
MSA(Multiple Sequence Alignment) visualization python package for sequence analysis
https://github.com/akikuno/dajin2
🔬 Genotyping tool for genome-edited samples using nanopore-targeted sequencing
recognizer
A tool for domain based annotation with databases from the Conserved Domains Database
mungesumstats
Rapid standardisation and quality control of GWAS or QTL summary statistics
https://github.com/althonos/pytantan
Cython bindings and Python interface to Tantan, a fast method for identifying repeats in DNA and protein sequences.
https://github.com/althonos/diced
A Rust reimplementation of the MinCED method for identifying CRISPRs in full or assembled genomes.
cpg-gnomad
Hail helper functions for the gnomAD project and Translational Genomics Group
cerebra
cerebra: A tool for fast and accurate summarizing of variant calling format (VCF) files - Published in JOSS (2020)
gcmodeller
GCModeller: genomics CAD(Computer Assistant Design) Modeller system in .NET language
gdsfmt
R Interface to CoreArray Genomic Data Structure (GDS) Files (Development version only)
https://github.com/biocommons/bioutils
provides common tools and lookup tables used primarily by the hgvs and uta packages
cogclassifier
A tool for classifying prokaryote protein sequences into COG(Cluster of Orthologous Genes) functional category
eutils
simplified searching, fetching, and parsing records from NCBI using their E-utilities interface