https://github.com/cancerit/cgpcavemanpostprocessing
Flagging add on to CaVEMan
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Repository
Flagging add on to CaVEMan
Basic Info
- Host: GitHub
- Owner: cancerit
- License: agpl-3.0
- Language: Perl
- Default Branch: dev
- Homepage: http://cancerit.github.io/cgpCaVEManPostProcessing/
- Size: 4.25 MB
Statistics
- Stars: 1
- Watchers: 14
- Forks: 4
- Open Issues: 5
- Releases: 0
Metadata Files
README.md
cgpCaVEManPostProcessing
cgpCaVEManPostProcessing is used to apply filtering on raw VCF calls generated using CaVEMan.
For details of the underlying algorithm please see the CaVEMan site.
| Master | Develop |
| --------------------------------------------- | ----------------------------------------------- |
| |
|
Docker, Singularity and Dockstore
cgpCaVEManPostProcessing is available as a separate docker image on quay.io.
And as part of pre-built full analysis images on quay.io.
- dockstore-cgpwxs
- Contains tools specific to WXS analysis.
- dockstore-cgpwgs
- Contains additional tools for WGS analysis.
These were primarily designed for use with dockstore.org but can be used as normal containers.
Usage
More detailed instructions can be found on the wiki
As of version 1.10.0 cgpCaVEManPostProcessing has new WXS flags available that are not used by default. These were developed in conjunction with the Dermatlas project and caution is advised when using them.
- cavemanMatchNormalProportionFlag
- withinGapRangeFlag
Full flag definitions can be found here
Flags can be tuned by modifying their parameters in the species ini file. Human example is here. The parameter names correspond to names in the flag descriptions. Further details of flags and parameters are available in the wiki.
Flagging CaVEMan Files
Flag/Post Process CaVEMan files.
```bash cgpFlagCaVEMan.pl [-h] -f vcfToFlag.vcf -o flaggedVCF.vcf -c configFile.yaml -s human -t pulldown -v vcfFlagNames.ini -n norm.bam -m tum.bam [-u unmatchedStore.tmp]
General Options:
--help (-h) Brief documentation
--version (-version) Output the version number and exit
--input (-i) The VCF input file to flag.
--outFile (-o) The VCF output file to write.
--species (-s) Species associated with this vcf file to use.
--species-assembly (-sa) Species assembly for (output in VCF)
--tumBam (-m) Tumour bam file
--normBam (-n) Normal bam file
--bedFileLoc (-b) Path to a folder containing the centromeric, snp, hi sequence depth,
and simple repeat sorted (gzipped and tabixed) bed files (if required) i.e. the non annotation bed files.
Names of files will be taken from the config file.
--indelBed (-g) A bed file containing germline indels to filter on
--unmatchedVCFLoc (-umv) Path to a directory containing the unmatched VCF normal files listed in the
config file or unmatchedNormal.bed.gz (bed file is used in preference).
--annoBedLoc (-ab) Path to bed files containing annotatable regions and coding regions.
--reference (-ref) Reference index (fai) file corresponding to the mapping of the data being processed.
(must have corresponding fasta file co-located)
--index (-idx) Index of the job (to override LSB_JOBINDEX as used on LSF farms)
--verbose
OPTIONAL:
--sampleToIgnoreInUnmatched (-sp) Unmatched normal to ignore (to be used if the sample is one of those with a normal in the panel).
--processid (-p) Id anaylsis process to be added at a CGP specific header.
--flagConfig (-c) Config ini file to use for flag list and settings.
--flagToVcfConfig (-v) Config::Inifiles style config file containing VCF flag code to flag name conversions see
../config/flag.to.vcf.convert.ini for example
--studyType (-t) Study type, used to decide parameters in file (genome|genomic|WGS|pulldown|exome|WXS|followup|AMPLICON|targeted|RNA_seq).
Examples:
cgpFlagCaVEMan.pl [-h] -f vcfToFlag.vcf -o flaggedVCF.vcf -c configFile.ini -s human -t pulldown
```
Utility Scripts
cavemanPostProcessinginito_yaml.pl
Convert old .ini file to .yaml format.
```bash cavemanPostProcessinginitoyaml.pl [-h] -f flagconfig.ini -o flag_config.yml
General Options:
--help (-h) Brief documentation
--version (-version) Output the version number and exit
--input (-i) The VCF input file to flag.
--outfile (-o) The VCF output file to write.
Examples:
cavemanPostProcessing_ini_to_yaml.pl [-h] -f flag_config.ini -o flag_config.yml
```
Dependencies/Install
Please ensure the following packages are available. Alternatively use the Docker image.
- cgpVcf
- Bio::DB::HTS
- If you have an install of PCAP-core this is already available
Once complete please run:
bash
./setup.sh /some/install/location
setup.sh will also install bedtools for you.
Creating a release
Preparation
- Commit/push all relevant changes.
- Pull a clean version of the repo and use this for the following steps.
Cutting the release
- Update
lib/Sanger/CGP/CavemanPostProcessor.pmto the correct version. - Update
CHANGES.md - Run
./prerelease.sh - Check all tests and coverage reports are acceptable.
- Commit the updated docs tree and updated module/version.
- Push commits.
- Use the GitHub tools to draft a release.
LICENCE
```txt Copyright (c) 2014-2018 Genome Research Ltd.
Author: CASM/Cancer IT cgphelp@sanger.ac.uk
This file is part of cgpCaVEManPostProcessing.
cgpCaVEManPostProcessing is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.
- The usage of a range of years within a copyright statement contained within this distribution should be interpreted as being equivalent to a list of years including the first and last year specified and all consecutive years between them. For example, a copyright statement that reads ‘Copyright (c) 2005, 2007- 2009, 2011-2012’ should be interpreted as being identical to a statement that reads ‘Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012’ and a copyright statement that reads ‘Copyright (c) 2005-2012’ should be interpreted as being identical to a statement that reads ‘Copyright (c) 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012’." ```
Owner
- Name: CASM IT
- Login: cancerit
- Kind: organization
- Email: cgpit@sanger.ac.uk
- Location: Hinxton, Cambridge, UK
- Website: http://www.sanger.ac.uk/science/programmes/cancer-genetics-and-genomics
- Repositories: 89
- Profile: https://github.com/cancerit
CASM IT provide bioinformatic support for Cancer, Ageing and Somatic Mutation group at the Wellcome Sanger Institute