StabilitySort

Stability Sort: detect instability

https://gitlab.com/baaron/StabilitySort

Science Score: 41.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov, nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary

Keywords

ClickHouse R RShiny Scalable Workflows for Analyzing Genomes Variant effect prediction genomics variant interpretation vcf
Last synced: 6 months ago · JSON representation ·

Repository

Stability Sort: detect instability

Basic Info
  • Host: gitlab.com
  • Owner: baaron
  • Default Branch: main
Statistics
  • Stars: 1
  • Forks: 1
  • Open Issues: 2
  • Releases: 0
Topics
ClickHouse R RShiny Scalable Workflows for Analyzing Genomes Variant effect prediction genomics variant interpretation vcf
Created over 4 years ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Product Name Screen Shot

Stability Sort: detect instability

MIT License

Table of Contents

About The Project

This tool aims to assist in pathogenic mutation prioritisation through the integration of protein stability measures from tools such as MAESTRO applied to AlphaFold predicted structures of all human proteins.

The first version of this tool has been published as a Bioinformatics Application Note at https://doi.org/10.1093/bioinformatics/btac465, and its backend has been replaced recently to further enhance performance for massively-parallel processing.

Built With

StabilitySort is built with a multi-modal approach: as a web-accessible portal which allows users to manipulate prioritised variants from an uploaded Variant Call Format file, and also as a collection of scripts for more high-throughput analysis on modest computational hardware, and should work in principle on any POSIX-compatible operating system (though it has only been thoroughly tested on Linux Ubuntu 16 through 20).

Getting Started

If you simply with to use the tool, a demo test instance is accessible here. Please email if there are any access issues.

You may also find the Quick Start Guide and Glossary useful.

The remaining instructions below are for users who wish to install Stability Sort locally on a server in order to interrogate their own private data without the risk of sending entire VCFs across the internet. Data bundles are being prepared for this to be hosted locally.

Prerequisites

Stability Sort is built on: * R Shiny, with optional deployment on RStudio Server * DuckDB, a multi-threaded performant in-process OLAP column-store * directly accessing data within ZSTD-compressed Apache Parquet files * BCFtools, an efficient VCF manipulation tool based on HTSlib

Please follow the documentation for each of the packages above to install them on your relevant operating system.

Public genomic data

Stability Sort also depends on several files from publically-available human datasets which currently do not need to be downloaded as we will be providing a database resource bundle that has collated information from all these files: * AlphaFold predicted 3D structures for the human proteome (UP0000056409606HUMAN.tar, 4.8G) from https://alphafold.ebi.ac.uk/download * GRCh38 human reference FASTA sequence and GFF3 annotation from http://ensembl.org/Homo_sapiens * gnomAD 3.1.2 chromosomal VCFs from https://gnomad.broadinstitute.org/downloads#v3-variants * gnomAD 2.1.1 pLoF Metrics by Gene TSV (gnomad.v2.1.1.lofmetrics.bygene.txt.bgz, 4.4M) from https://gnomad.broadinstitute.org/downloads#v2-constraint

A data bundle has been prepared to share all precomputed stability values across the human proteome in tabular form for analysis with other tools.

StabilitySort is designed to be dual-modal software: directly accessible online at: https://stabilitysort.org/app/ or locally on any POSIX-operating system (the scripts are functional and produce identical ouput to the web app). A detailed user-guide for the Command-Line-Interface version of StabilitySort is being written.

Usage

Please read the Quick Start Guide for an easy introduction to Stability Sort.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated. See CONTRIBUTING.md for more information.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Acknowledgements

GIL.JCSMR

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Chuah"
  given-names: "Aaron"
  orcid: "https://orcid.org/0000-0003-3066-701X"
- family-names: "Andrews"
  given-names: "T. Daniel"
  orcid: "https://orcid.org/0000-0003-3922-6376"
title: "StabilitySort"
version: 0.2
date-released: 2022-10-07
url: "https://gitlab.com/baaron/StabilitySort/"
preferred-citation:
  type: article
  authors:
  - family-names: "Chuah"
    given-names: "Aaron"
    orcid: "https://orcid.org/0000-0003-3066-701X"
  - family-names: "Li"
    given-names: "Sean Xi"
    orcid: "https://orcid.org/0000-0001-6325-3230"
  - family-names: "Do"
    given-names: "Ngoc Linh (Andrea)"
    orcid: "https://orcid.org/0000-0002-6897-8553"
  - family-names: "Field"
    given-names: "Matt A"
    orcid: "https://orcid.org/0000-0003-0788-6513"
  - family-names: "Andrews"
    given-names: "T Daniel"
    orcid: "https://orcid.org/0000-0003-3922-6376"
  doi: "https://doi.org/10.1093/bioinformatics/btac465"
  journal: "Bioinformatics"
  title: "StabilitySort: assessment of protein stability changes on a genome-wide scale to prioritise potentially pathogenic genetic variation"
  year: 2022
  month: 7