reveal
ReVeaL, Rare Variant Learning, is a stochastic regularization-based learning algorithm.
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary
Repository
ReVeaL, Rare Variant Learning, is a stochastic regularization-based learning algorithm.
Basic Info
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
ReVeaL
ReVeaL, Rare Variant Learning, is a stochastic regularization-based learning algorithm. It partitions the genome into non-overlapping, possibly non-contiguous, windows (w) and then aggregates samples into possibly overlapping subsets, using subsampling with replacement (stochastic), giving units called shingles that are utilized by a statistical learning algorithm. Each shingle captures a distribution of the mutational load (the number of mutations in the window w of a given sample), and the first four moments are used as an approximation of the distribution.

Usage
Here a basic command line example
python3 ReVeaL.py -s sample_info.tsv -l labels.csv -nf 10 -tt train_test.csv -w 50000 -r regions.tsv -o output_folder
for a full description of the option, please run:
python3 ReVeaL.py --help
Citation
Please cite the following article if you use ReVeaL:
Parida L, Haferlach C, Rhrissorrakrai K, Utro F, Levovitz C, Kern W, et al. (2019) Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA. PLoS Comput Biol 15(8): e1007332. https://doi.org/10.1371/journal.pcbi.1007332
Dependencies
All dependencies are listed in the requirements file.
pip install -r requirements.txt
Input file and formats
The sample_info.tsv file is a tab separated file contaning in each row: the sample id, chromosome, the alteration start and stop coordinates.
Example:
samples chr start stop
sample_1 1 20 21
sample_1 2 2 10
sample_2 1 5 10
only chromosome from 1 to 22 are considered.
The labels.csv is a comma separated file containing in each row the phenotype of the samples.
Example
samples, phenotype
sample_1, CASE
sample_2, CONTROL
sample_3, CASE2
The train_test.csv is a comma separated file containing for each phenotype the number of shingle wanted to generate for the train and test
Example
phenotype, Train, Test
CASE, 10, 5
CASE2, 10, 5
CONTROL, 10, 5
The regions.tsv is a tab separeted file containg the region of interest. Regions for GRCh38.p14 are reported in the regions folder.
Example
chr start stop
1 0 224999719
2 0 237712649
Owner
- Name: Computational Genomics
- Login: ComputationalGenomics
- Kind: organization
- Website: http://researcher.watson.ibm.com/researcher/view_group.php?id=1179
- Repositories: 3
- Profile: https://github.com/ComputationalGenomics
The Computational Genomics group, at IBM TJ Watson Research Center, pursue basic and exploratory research at the interface of algorithmics and genomics.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "ReVeaL"
authors:
- family-names: "Utro"
given-names: "Filippo"
orcid: "https://orcid.org/0000-0003-3226-7642"
preferred-citation:
authors:
- family-names: "Parida"
given-names: "Laxmi"
orcid: "https://orcid.org/0000-0002-7872-5074"
- family-names: "Haferlach"
given-names: "Claudia"
- family-names: "Rhrissorrakrai"
given-names: "Kahn"
orcid: "https://orcid.org/0000-0002-1567-9090"
- family-names: "Utro"
given-names: "Filippo"
orcid: "https://orcid.org/0000-0003-3226-7642"
- family-names: "Levovitz"
given-names: "Chaya"
- family-names: "Kern"
given-names: "Wolfgang"
- family-names: "Nadarajah"
given-names: "Niroshan"
- family-names: "Twardziok"
given-names: "Sven"
- family-names: "Hutter"
given-names: "Stephan"
- family-names: "Meggendorfer"
given-names: "Manja"
- family-names: "Walter"
given-names: "Wencke"
- family-names: "Baer"
given-names: "Constance"
- family-names: "Haferlach"
given-names: "Torsten"
title: "Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA"
type: article
#doi: "journal.pcbi.1007332"
url: "https://doi.org/10.1371/journal.pcbi.1007332"
journal: "PLOS Computational Biology"
month: 08
start: 1 # First page number
end: 12 # Last page number
issue: 8
volume: 15
year: 2019
GitHub Events
Total
Last Year
Dependencies
- coverage ==4.5.4
- joblib *
- myst_parser *
- nose ==1.3.7
- numpy *
- pandas *
- pinocchio ==0.4.2
- sphinx *
- sphinx-autodoc-typehints *
- sphinx_rtd_theme *