reveal

ReVeaL, Rare Variant Learning, is a stochastic regularization-based learning algorithm.

https://github.com/computationalgenomics/reveal

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

ReVeaL, Rare Variant Learning, is a stochastic regularization-based learning algorithm.

Basic Info
  • Host: GitHub
  • Owner: ComputationalGenomics
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 8.2 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme Contributing License Citation

README.md

ReVeaL

ReVeaL, Rare Variant Learning, is a stochastic regularization-based learning algorithm. It partitions the genome into non-overlapping, possibly non-contiguous, windows (w) and then aggregates samples into possibly overlapping subsets, using subsampling with replacement (stochastic), giving units called shingles that are utilized by a statistical learning algorithm. Each shingle captures a distribution of the mutational load (the number of mutations in the window w of a given sample), and the first four moments are used as an approximation of the distribution.

Flowchart of ReVeaL

Usage

Here a basic command line example

python3 ReVeaL.py -s sample_info.tsv -l labels.csv -nf 10 -tt train_test.csv -w 50000 -r regions.tsv -o output_folder

for a full description of the option, please run:

python3 ReVeaL.py --help

Citation

Please cite the following article if you use ReVeaL:

Parida L, Haferlach C, Rhrissorrakrai K, Utro F, Levovitz C, Kern W, et al. (2019) Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA. PLoS Comput Biol 15(8): e1007332. https://doi.org/10.1371/journal.pcbi.1007332

Dependencies

All dependencies are listed in the requirements file.

pip install -r requirements.txt

Input file and formats

The sample_info.tsv file is a tab separated file contaning in each row: the sample id, chromosome, the alteration start and stop coordinates.

Example:

samples chr start stop sample_1 1 20 21 sample_1 2 2 10 sample_2 1 5 10

only chromosome from 1 to 22 are considered.

The labels.csv is a comma separated file containing in each row the phenotype of the samples.

Example

samples, phenotype sample_1, CASE sample_2, CONTROL sample_3, CASE2

The train_test.csv is a comma separated file containing for each phenotype the number of shingle wanted to generate for the train and test

Example

phenotype, Train, Test CASE, 10, 5 CASE2, 10, 5 CONTROL, 10, 5 The regions.tsv is a tab separeted file containg the region of interest. Regions for GRCh38.p14 are reported in the regions folder.

Example

chr start stop 1 0 224999719 2 0 237712649

Owner

  • Name: Computational Genomics
  • Login: ComputationalGenomics
  • Kind: organization

The Computational Genomics group, at IBM TJ Watson Research Center, pursue basic and exploratory research at the interface of algorithmics and genomics.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "ReVeaL"
authors:
 - family-names: "Utro"
   given-names: "Filippo"
   orcid: "https://orcid.org/0000-0003-3226-7642"
preferred-citation:
  authors:
  - family-names: "Parida"
    given-names: "Laxmi"
    orcid: "https://orcid.org/0000-0002-7872-5074"
  - family-names: "Haferlach"
    given-names: "Claudia"
  - family-names: "Rhrissorrakrai"
    given-names: "Kahn"
    orcid: "https://orcid.org/0000-0002-1567-9090"
  - family-names: "Utro"
    given-names: "Filippo"
    orcid: "https://orcid.org/0000-0003-3226-7642"
  - family-names: "Levovitz"
    given-names: "Chaya"
  - family-names: "Kern"
    given-names: "Wolfgang"
  - family-names: "Nadarajah"
    given-names: "Niroshan"
  - family-names: "Twardziok"
    given-names: "Sven"
  - family-names: "Hutter"
    given-names: "Stephan"
  - family-names: "Meggendorfer"
    given-names: "Manja"
  - family-names: "Walter"
    given-names: "Wencke"
  - family-names: "Baer"
    given-names: "Constance"
  - family-names: "Haferlach"
    given-names: "Torsten"
  title: "Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA"
  type: article
  #doi: "journal.pcbi.1007332"
  url: "https://doi.org/10.1371/journal.pcbi.1007332"
  journal: "PLOS Computational Biology"
  month: 08
  start: 1 # First page number
  end: 12 # Last page number
  issue: 8
  volume: 15
  year: 2019

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • coverage ==4.5.4
  • joblib *
  • myst_parser *
  • nose ==1.3.7
  • numpy *
  • pandas *
  • pinocchio ==0.4.2
  • sphinx *
  • sphinx-autodoc-typehints *
  • sphinx_rtd_theme *