https://github.com/alexandrovlab/hrprofiler
Homologous Recombination Profiler (HRProfiler) is a classification tool that predicts the HRD probability for both whole-genome and exome-sequenced samples using HRD-specific mutation and copy number features.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Keywords
Repository
Homologous Recombination Profiler (HRProfiler) is a classification tool that predicts the HRD probability for both whole-genome and exome-sequenced samples using HRD-specific mutation and copy number features.
Basic Info
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
HRProfiler
Homologous Recombination Profiler (HRProfiler) is a classification tool that predicts HRD status for both whole-genome and exome-sequenced breast and ovarian samples using HRD-specific mutation and copy number features.
Installation
bash
pip install HRProfiler
Run HRProfiler
From within a python session, you can run HRProfiler as follows: ```python $ python3
from HRProfiler.scripts import HRProfiler as HR HR.HRProfiler(datamatrix,genome,exome,INDELSDIR,SNVDIR,CNVDIR,RESULTDIR,cnvfiletype,bootstrap,nreplicates,normalize,hrdpropthresh,plotpredictions,organ) ``` The layout of the parameters are as follows: |Parameters |Info | |:-------------|:-------------| | datamatrix | An optional pandas dataframe with the following required feature columns (in the same order):
| | genome | Genome build of snv and indel input files. Options include: GRCh37 and GRCh38 (default: GRCh38) | | exome | Is the input data exome or not (default: True) | | SNVDIR | Directory path to snv vcf/tab-delimited files (default: None) | | INDELSDIR | Directory path to indel vcf/tab-delimited files. If SNVDIR contains mutation files with both indels and SNVs, then set INDELDIR=None (default: None) | | CNVDIR | Directory path to allele-specific segmentation files (default: None) | | RESULTDIR | Path to the directory where HRProfiler will save the output and the log files (default: None) | | cnvfiletype | File type for CNV files provided. Options include: 'ASCAT' , 'ASCAT-NGS', 'ABSOLUTE','BATTENBERG', 'SEQUENZA' (default: 'ASCAT') | | bootstrap | Simulate features per sample based on the sample-weighted probability (default: False) | | nreplicates | Number of replicates to simulate per sample (default: 20) |
- N[C>T]G: Proportion of C:G>T:A single base substitutions at 5’-NpCpG-3’ context
- N[C>G]T: Proportion of C:G>G:C single base substitutions at 5’-NpCpT-3’ context
- DEL:5:MH: Proportion (WGS) or counts (WES) of deletions spanning at least 5bp at microhomologies
- LOH:1-40Mb: Proportion of genomic segments with loss of heterozygosity (LOH) with sizes at least 1 megabase
- 2-4:HET:>40Mb: Proportion of heterozygous genomic segments with TCN between 2 and 4 and sizes above 40 megabases
- 3-9:HET:10-40Mb: Proportion of heterozygous genomic segments with TCN between 3 and 9 and sizes between 10 and 40 megabases
| normalize | Normalize each feature column by pre-defined mean and standard deviation (default: True) | | hrdpropthresh | HRD Probability threshold to classify a sample as HRD (default: 0.5) | | plotpredictions | plot a histogram with the HRD probability values for all samples (default: True) | | organ | Organ type for prediction. Options include 'BREAST' and 'OVARIAN' (default: 'BREAST')
Examples
1. Extract HRD features and predict HRD status for samples with input vcf and copy number files:
To determine HRD status for samples with snvs and indel vcf files and allele-specific copy number calls, we can run the following example code where input files have been provided for 5 breast WGS samples.
```python
Note: cd to parent directory that contains the HRProfiler folder before executing the command
from HRProfiler.scripts import HRProfiler as HR HR.HRProfiler(datamatrix=None, genome='GRCh37', exome=False, INDELSDIR='./HRProfiler/example/input/indels', SNVDIR='./HRProfiler/example/input/mutations/', CNVDIR='./HRProfiler/example/input/copynumber/', RESULTDIR='./HRProfiler/example/outputexample1/', cnvfiletype = 'ASCAT', bootstrap=False, nreplicates=20, normalize=True, hrdprobthresh=0.5, plot_predictions=True, organ='BREAST') ```
2. Predict HRD status for samples with preprocessed HRD features:
To determine the HRD status for samples with pre-defined HRD features: ['NCTG', 'NCGT', 'DEL5MH', 'LOH.1.40Mb', '3-9:HET.10.40Mb','2-4:HET.40Mb'], run the following command:
```python
Note: cd to parent directory that contains the HRProfiler folder before executing the command.
from HRProfiler.scripts import HRProfiler as HR import pandas as pd
datamatrix = pd.readcsv('./HRProfiler/example/input/exampledatamatrix.txt', sep="\t")
HR.HRProfiler(datamatrix=datamatrix, genome='GRCh37', exome=False, INDELSDIR=None, SNVDIR=None, CNVDIR=None, RESULTDIR='./HRProfiler/example/outputforexample2/', cnvfiletype='ASCAT', bootstrap=False, nreplicates=20, normalize=True, hrdprobthresh=0.5, plot_predictions=True, organ='BREAST') ```
HRProfiler Output
HRProfiler generates a histogram with the HRD probabilities per sample and a tab-delimited table with the following columns:
|Columns |Info |
|:------------- |:-------------|
| samples | sample names |
| NCTG | Proportion of C:G>T:A single base substitutions at 5’-NpCpG-3’ context |
| NCGT | Proportion of C:G>G:C single base substitutions at 5’-NpCpT-3’ context |
| DEL5MH | Proportion or total counts of deletions spanning at least 5bp at microhomologies |
| LOH.1.40Mb | Proportion of genomic segments with loss of heterozygosity (LOH) with sizes between 1 and 40 megabases |
| 3-9:HET.10.40Mb | Proportion of heterozygous genomic segments with TCN between 2 and 4 and sizes above 40 megabases |
| 2-4:HET.40Mb | Proportion of heterozygous genomic segments with TCN between 2 and 4 and sizes above 40 megabases |
|hrd.prob | HRD Probability |
|prediction | HRD status |
Additional columns provided if bootstrap=True: |Columns |Info | |:------------- |:-------------| |mean.hrd.prob | Average HRD probability across all replicates | |LCI.95.hrd.prob | Lower confidence interval (2.5%) HRD Probability | |UCI.95.hrd.prob | Upper confidence interval (97.5%) HRD Probability |
License
Academic Software License: © 2024 University of California, San Diego (“Institution”). Academic or nonprofit researchers are permitted to use this Software (as defined below) subject to Paragraphs 1-4:
Institution hereby grants to you free of charge, so long as you are an academic or nonprofit researcher, a nonexclusive license under Institution’s copyright ownership interest in this software and any derivative works made by you thereof (collectively, the “Software”) to use, copy, and make derivative works of the Software solely for educational or academic research purposes, and to distribute such Software free of charge to other academic or nonprofit researchers for their educational or academic research purposes, in all cases subject to the terms of this Academic Software License. Except as granted herein, all rights are reserved by Institution, including the right to pursue patent protection of the Software.
Any distribution of copies of this Software -- including any derivative works made by you thereof -- must include a copy (including the copyright notice above), and be made subject to the terms, of this Academic Software License; failure by you to adhere to the requirements in Paragraphs 1 and 2 will result in immediate termination of the license granted to you pursuant to this Academic Software License effective as of the date you first used the Software.
IN NO EVENT WILL INSTITUTION BE LIABLE TO ANY ENTITY OR PERSON FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE, EVEN IF INSTITUTION HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. INSTITUTION SPECIFICALLY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE IS PROVIDED “AS IS.” INSTITUTION HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS OF THIS SOFTWARE.
Any academic or scholarly publication arising from the use of this Software or any derivative works thereof will include the following acknowledgment: The Software used in this research was created by Alexandrov Lab of University of California, San Diego. © 2022 University of California, San Diego.
Owner
- Name: Alexandrov Lab
- Login: AlexandrovLab
- Kind: organization
- Email: l-alexandrov-lab@UCSD.EDU
- Location: La Jolla, CA
- Website: http://alexandrov.ucsd.edu/
- Repositories: 12
- Profile: https://github.com/AlexandrovLab
GitHub Events
Total
- Issues event: 2
- Watch event: 3
- Delete event: 4
- Member event: 1
- Issue comment event: 2
- Push event: 3
- Pull request event: 6
Last Year
- Issues event: 2
- Watch event: 3
- Delete event: 4
- Member event: 1
- Issue comment event: 2
- Push event: 3
- Pull request event: 6
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 15 minutes
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 15 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- lordaaa999 (1)
Pull Request Authors
- ammalabbasi (3)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- SigProfilerMatrixGenerator >=1.2.23
- joblib >=0.16.0
- scikit-learn >=1.1.3
- scikit-plot ==0.3.7
- seaborn *
- sigProfilerPlotting >=1.3.16