https://github.com/biomedsciai/geno4sd

An python omics data toolkit for the analysis across biological scales

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary

Keywords

cancer-genomics dna machine-learning omics omics-data population-genetics population-genomics

Last synced: 11 months ago · JSON representation

Repository

An python omics data toolkit for the analysis across biological scales

Basic Info

Host: GitHub
Owner: BiomedSciAI
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 78.1 MB

Statistics

Stars: 11
Watchers: 3
Forks: 1
Open Issues: 0
Releases: 1

Topics

cancer-genomics dna machine-learning omics omics-data population-genetics population-genomics

Created almost 4 years ago · Last pushed over 2 years ago

Metadata Files

Readme Contributing License Code of conduct

Geno4SD

Geno4SD is an omics data toolkit for the analysis of omics data across biological scales, from single-cell analysis to large patient cohorts, and over multiple modalities, including genomics, transcriptomics, clinical medical data, and patient demographics. Within this toolkit are analytic methods that span phylogenetics, epidemilogy, topological data analysis, and ML/AL frameworks for omics scale data.

Geno4SD provides access to individual tools as well as detailed use cases for analyses that demonstrate how multiple methodologies can be leveraged together.

Geno4SD

Analytic tools included in Geno4SD

ReVeaL: Rare Variant Learning is a stochastic regularization-based learning algorithm. It partitions the genome into non-overlapping, possibly non-contiguous, windows (w) and then aggregates samples into possibly overlapping subsets, using subsampling with replacement (stochastic), giving units called shingles that are utilized by a statistical learning algorithm. Each shingle captures a distribution of the mutational load (the number of mutations in the window w of a given sample), and the first four moments are used as an approximation of the distribution.

ReVeaL tutorial can be found here: tutorial
LSM: Lesion Shedding Model can order lesions from the highest to the lowest ctDNA shedding for a given patient from cfDNA liquid and lesion biopsies. Our framework intrinsically models for missing/hidden lesions and operates on blood and lesion cfDNA assays to estimate the potential relative shedding levels of lesions into the blood. By characterizing the lesion-specific cfDNA shedding levels, we can better understand the mechanisms of shedding as well as more accurately contextualize and interpret cfDNA assays to improve their clinical impact.

LSM tutorial can be found here: tutorial
CuNA: Cumulant-based Network Analysis is a toolkit for integrating and analyzing multi-omics data which finds higher-order relationships from multi-omic data with EHR information across different thresholds of statistical significance. CuNA provides two components:
1. A network with nodes representing multi-omics variables and edges reflecting their stre ngth in higher-order interactions.
2. A risk score, CuRES, which is a holistic view of risk or liability of a target trait or disease, per individual.

CuNA tutorial can be found here: tutorial

CuNAviz, the visualization tool for CuNA can be found here:

CuNAviz for Parkinson's Disease
CuNAviz for Breast Cancer, scenario I
CuNAviz for Breast Cancer, scenario II
CuNAviz for Breast Cancer, scenario III
CuNAviz for Breast Cancer, scenario IV
1. RubricOE: a rubric for omics epidemiology is a cross-validated machine learning framework with feature ranking described and multiple levels of cross validation to obtain interpretable genetic and non-genetic features from multi-omics data combined.

RubricOE tutorial can be found here: tutorial

StatGen: Statistical Genetics toolkit is a toolkit for performing quality control on imputed genotype data, computing principal component analysis (using TeraPCA) and thereafter, genome-wide association studies (using PLINK)

StatGen tutorial can be found here: tutorial

MaSk-LMM: Matrix Sketching-based Linear Mixed Models is a method to compute linear mixed models which are widely used to perform genome-wide association studies on large biobank-scale genotype data using advances in randomized numerical linear algebra.

MaSk-LMM tutorial can be found here: tutorial

Delta: Significantly associates patients without a known mechanism of resistance to those with one to suggest alterative treatment options based on the known MoR through an analysis of the changes in alterations between timepoints.

Installation and Tutorials

In our detailed Online Documentation you'll find: * Installation instructions.
* An overview of Geno4SD's main components and API * An end-to-end tutorial using a publicly available dataset.

Owner

Name: BiomedSciAI
Login: BiomedSciAI
Kind: organization

Repositories: 6
Profile: https://github.com/BiomedSciAI

GitHub Events

Total

Watch event: 2
Fork event: 1

Last Year

Watch event: 2
Fork event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 20
Average time to close issues: N/A
Average time to close pull requests: 2 days
Total issue authors: 0
Total pull request authors: 5
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 15
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

aritra90 (7)
myson-burch (4)
f-utro (4)
futro (1)
KahnR (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/workflow.yml actions

actions/checkout v2 composite

requirements.txt pypi

argparse *
better_apidoc *
coverage ==4.5.4
joblib *
matplotlib *
myst_parser *
netgraph *
networkx *
nose ==1.3.7
numpy *
pandas *
pickle-mixin *
pinocchio ==0.4.2
pysnptools *
pytest-shutil *
pyvis *
qmplot *
scanpy *
scikit-learn *
scipy *
seaborn *
sphinx *
sphinx-autodoc-typehints *
sphinx_rtd_theme *
statsmodels *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/biomedsciai/geno4sd

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Geno4SD

Analytic tools included in Geno4SD

Installation and Tutorials

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies