https://github.com/alexandrovlab/sigprofilerextractorr
An R wrapper for SigProfilerExtractor that allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.2%) to scientific vocabulary
Keywords
Repository
An R wrapper for SigProfilerExtractor that allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
Basic Info
Statistics
- Stars: 15
- Watchers: 4
- Forks: 1
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
SigProfilerExtractorR
An R wrapper for running the SigProfilerExtractor framework.
INTRODUCTION
The purpose of this document is to provide a guide for using the SigProfilerExtractor framework to extract the De Novo mutational signatures from a set of samples and decompose the De Novo signatures into the COSMIC signatures. An extensive Wiki page detailing the usage of this tool can be found at https://osf.io/t6j7u/wiki/home/. For users that prefer working in a Python environment, the tool is written in Python and can be found and installed from: https://github.com/AlexandrovLab/SigProfilerExtractor
Table of contents
Installation
PREREQUISITES
devtools (R) ```R
install.packages("devtools")
reticulate* (R)R install.packages("reticulate")
```
*Reticulate has a known bug of preventing python print statements from flushing to standard out. As a result, some of the typical progress messages are delayed.
QUICK START GUIDE
This section will guide you through the minimum steps required to extract mutational signatures from genomes:
1. First, install the python package using pip. The R wrapper still requires the python package:
pip install SigProfilerExtractor
2. Open an R session and ensure that your R interpreter recognizes the path to your python installation:
```R
$ R
library(reticulate) usepython("pathtoyourpython") pyconfig() python: /anaconda3/bin/python libpython: /anaconda3/lib/libpython3.6m.dylib pythonhome: /anaconda3:/anaconda3 version: 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE401/final)] numpy: /anaconda3/lib/python3.6/site-packages/numpy numpy_version: 1.16.1 ``` If you do not see your python path listed, restart your R session and rerun the above commands in order.
- Install SigProfilerExtractorR using devtools:
R >>library(devtools) >>install_github("AlexandrovLab/SigProfilerExtractorR") - Load the package in the same R session and install your desired reference genome as follows (available reference genomes are: GRCh37, GRCh38, mm9, and mm10):
R >> library(SigProfilerExtractorR) >> install("GRCh37", rsync=FALSE, bash=TRUE)
This will install the human 37 assembly as a reference genome.
SUPPORTED GENOMES
Other available reference genomes are GRCh38, mm9 and mm10 (and genomes supported SigProfilerMatrixGenerator. Information about supported will be found at https://github.com/AlexandrovLab/SigProfilerMatrixGeneratorR
Quick Example:
Signatures can be extracted from vcf files or tab delimited mutational table using the sigprofilerextractor function. ```R
help(sigprofilerextractor) ``` This will show the details about the sigprofilerextractor funtion.
```R
library(SigProfilerExtractorR) pathtoexampledata <- importdata("matrix") data <- pathtoexampledata # here you can provide the path of your own data sigprofilerextractor("matrix", "exampleoutput", data, minimumsignatures=2, maximumsignatures=3, nmfreplicates=5, minnmfiterations = 1000, maxnmfiterations =100000, nmftestconv = 1000, nmf_tolerance = 0.00000001) ```
The example file will generated in the working directory. Note that the parameters used in the above example are not optimal to get accurate signatures. Those are used only for a quick example.
Functions
The list of available functions are: - importdata - sigprofilerextractor - estimate_solution
importdata
Imports the path of example data.
R
importdata(datatype)
datatype: Type of example data. There are two types: 1. "vcf", 2. "matrix".
importdata Example
```R library(SigProfilerExtractorR) pathtoexampletable = importdata("matrix") data = pathtoexampletable
This "data" variable can be used as a parameter of the "project" argument of the sigprofilerextractor function.
To get help on the parameters and outputs of the "importdata" function, please use the following:
help(importdata) ```
sigprofilerextractor
Extracts mutational signatures from an array of samples.
R
sigprofilerextractor(input_type, output, input_data, reference_genome="GRCh37",
opportunity_genome = "GRCh37", context_type = "default",
exome = False, minimum_signatures=1, maximum_signatures=10,
nmf_replicates=100, resample = T, batch_size=1, cpu=-1,
gpu=F, nmf_init="random", precision= "single",
matrix_normalization= "gmm", seeds= "random",
min_nmf_iterations= 10000, max_nmf_iterations=1000000,
nmf_test_conv= 10000, nmf_tolerance= 1e-15,
nnls_add_penalty=0.05, nnls_remove_penalty=0.01,
initial_remove_penalty=0.05, get_all_signature_matrices= False)
| Category | Parameter | Variable Type | Parameter Description |
| --------- | --------------------- | -------- |-------- |
| Input Data | | | |
| | input_type | String | The type of input:
- "vcf": used for vcf format inputs.
- "matrix": used for table format inputs using a tab seperated file.
- "bedpe": used for bedpe file with each SV annotated with its type, size bin, and clustered/non-clustered status.
- "seg:TYPE": used for a multi-sample segmentation file for copy number analysis. The accepted callers for TYPE are the following {"ASCAT", "ASCATNGS", "SEQUENZA", "ABSOLUTE", "BATTENBERG", "FACETS", "PURPLE", "TCGA"}. For example, when using segmentation file from BATTENBERG then set inputtype to "seg:BATTENBERG".
Path to input folder for inputtype:
- vcf
- bedpe
- matrix
- seg:TYPE
sigprofilerextractor Example
```R
library(SigProfilerExtractorR)
to get input from vcf files.
pathtoexamplefoldercontainingvcffiles = importdata("vcf")
data = pathtoexamplefoldercontainingvcffiles # you can put the path to your folder containing the vcf samples.
sigprofilerextractor("vcf", "exampleoutput", data, minimumsignatures=1, maximum_signatures=10)
Wait untill the excecution is finished. The process may a couple of hours based on the size of the data.
Check the current working directory for the "example_output" folder.
to get input from table format (mutation catalog matrix)
pathtoexampletable = importdata("matrix") data = pathtoexampletable # you can put the path to your tab delimited file containing the mutational catalog matrix/table sigprofilerextractor("matrix", "exampleoutput", data, opportunitygenome="GRCh38", minimumsignatures=1,maximumsignatures=10) ```
sigprofilerextractor Output
To learn about the output, please visit https://osf.io/t6j7u/wiki/home/
Estimation of the Optimum Solution (estimate_solution)
Estimate the optimum solution (rank) among different number of solutions (ranks).
R
estimate_solution(base_csvfile,
All_solution,
genomes,
output,
title,
stability,
min_stability,
combined_stability)
| Parameter | Variable Type | Parameter Description | | --------------------- | -------- |-------- | | base_csvfile | String | Default is "Allsolutionsstat.csv". Path to a csv file that contains the statistics of all solutions. | | All_solution | String | Default is "AllSolutions". Path to a folder that contains the results of all solutions. | | genomes | String | Default is Samples.txt. Path to a tab delimilted file that contains the mutation counts for all genomes given to different mutation types. | | output | String | Default is "results". Path to the output folder. | | title | String | Default is "SelectionPlot". This sets the title of the selectionplot.pdf | | stability | Float | Default is 0.8. The cutoff thresh-hold of the average stability. Solutions with average stabilities below this thresh-hold will not be considered. | | **minstability** | Float | Default is 0.2. The cutoff thresh-hold of the minimum stability. Solutions with minimum stabilities below this thresh-hold will not be considered. | | combined_stability | Float | Default is 1.0. The cutoff thresh-hold of the combined stability (sum of average and minimum stability). Solutions with combined stabilities below this thresh-hold will not be considered. |
estimate_solution Example
R
estimate_solution(base_csvfile="All_solutions_stat.csv",
All_solution="All_Solutions",
genomes="Samples.txt",
output="results",
title="Selection_Plot",
stability=0.8,
min_stability=0.2,
combined_stability=1.25)
estimate_solution Output
The files below will be generated in the output folder: | File Name | Description | | ----- | ----- | | Allsolutionsstat.csv | A csv file that contains the statistics of all solutions. | | selection_plot.pdf | A plot that depict the Stability and Mean Sample Cosine Distance for different solutions. |
GPU support
If CUDA out of memory exceptions occur, it will be necessary to reduce the number of CPU processes used (the cpu parameter).
For more information, help, and examples, please visit: https://osf.io/t6j7u/wiki/home/
Citation
Islam SMA, Díaz-Gay M, Wu Y, Barnes M, Vangara R, Bergstrom EN, He Y, Vella M, Wang J, Teague JW, Clapham P, Moody S, Senkin S, Li YR, Riva L, Zhang T, Gruber AJ, Steele CD, Otlu B, Khandekar A, Abbasi A, Humphreys L, Syulyukina N, Brady SW, Alexandrov BS, Pillay N, Zhang J, Adams DJ, Martincorena I, Wedge DC, Landi MT, Brennan P, Stratton MR, Rozen SG, and Alexandrov LB (2022) Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genomics. doi: 10.1016/j.xgen.2022.100179.
Copyright
This software and its documentation are copyright 2018 as a part of the sigProfiler project. The SigProfilerExtractor framework is free software and is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Contact Information
Please address any queries or bug reports to Mark Barnes at mdbarnes@ucsd.edu or Marcos Díaz-Gay at mdiazgay@ucsd.edu.
Owner
- Name: Alexandrov Lab
- Login: AlexandrovLab
- Kind: organization
- Email: l-alexandrov-lab@UCSD.EDU
- Location: La Jolla, CA
- Website: http://alexandrov.ucsd.edu/
- Repositories: 12
- Profile: https://github.com/AlexandrovLab
GitHub Events
Total
- Issues event: 2
- Watch event: 1
- Issue comment event: 3
Last Year
- Issues event: 2
- Watch event: 1
- Issue comment event: 3
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: about 3 hours
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: about 3 hours
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ab08028 (1)
- jrcodina (1)
Pull Request Authors
- mdbarnesUCSD (1)
- jingquanlim (1)