https://github.com/cancerit/hairpin-wrapper

Ease-of-use wrapper around Mathijs' Sanders AdditionalBamStatistics cruciform-DNA detection algorithm

https://github.com/cancerit/hairpin-wrapper

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Ease-of-use wrapper around Mathijs' Sanders AdditionalBamStatistics cruciform-DNA detection algorithm

Basic Info
  • Host: GitHub
  • Owner: cancerit
  • License: agpl-3.0
  • Language: Python
  • Default Branch: develop
  • Size: 3.59 MB
Statistics
  • Stars: 0
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Archived
Created over 3 years ago · Last pushed 11 months ago

https://github.com/cancerit/hairpin-wrapper/blob/develop/

**THIS PROJECT IS NOW ARCHIVED** - we strongly recommend using it's successor, [hairpin2](https://github.com/cancerit/hairpin2) - a complete reimplementation, and extension, of the flagging described here.  

#### ============================================================================================
# hairpin wrapper Project for centrally supported wrapper of Mathijs Sanders' hairpin flagging algorithms, known as "Mathijs' scripts" ### Background CLI for statistical detection and flagging of variants caused by hairpin/cruciform artifacts in LCM sequencing. Previously, this task was being performed with diffuse versions of a small pipeline built by Mathijs and added to by others over many years. This repository brings together the components of that pipeline, strips out the extraneous functionality, and packages them into a single command for ease of use. The components are inherited and remain in their original form - these are `runScriptImitateANNOVAR.sh` and `additionalBamStatistics.jar`. The parent repositories can be found [here](https://github.com/MathijsSanders/SangerLCMFiltering) and [here](https://github.com/MathijsSanders/AdditionalBAMStatistics) respectively. There is an associated paper [here](https://www.nature.com/articles/s41596-020-00437-6#Sec31); information on the calculated statistics can be found in the 'SNV filtering' section. The various components that have been removed or pared down either produced statistics that are not utilised for hairpin detection, or were program functionality which would be better placed elsewhere, such as prefiltering of input VCFs according to CPLM. ### Requirements **Java** >= 8 **Python** >= 3.10 **samtools** == 1.14 **pysam** == 0.19.1 **vcfpy** == 0.13.4 ### Installation clone repository and cd into bin/ and run the following to install into a virtual environment: ``` python3.10 -m venv pyenv source pyenv/bin/activate pip install -r requirements.txt deactivate ``` ### Usage ``` Usage: hairpin \ [ Mandatory ] -v input VCF \ -b BAM file corresponding to VCF \ -g path to reference genome fasta \ [ Optional ] -o output directory (defaults to current working dir) \ -m set java heap memory (default 10G) \ -h display usage \ ``` You must have the .bai, .bas, and .met.gz files associated with the .bam file in the same directory as the .bam file specified with `-b`. The tool will output the input VCF file updated with annotations indicating potential hairpin artificats. ### Issues "Script has failed" If the core component of the hairpin process fails it will usually output this cryptic error. Unfortunately, this is caused by external code and we cannot make this error more informative at this time. It is most likely caused by running the script without enough memory available. ### LICENSE ``` Copyright (c) 2023 Genome Research Ltd. Author: CASM/Cancer IT This file is part of hairpin-wrapper. hairpin-wrapper is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see . ```

Owner

  • Name: CASM IT
  • Login: cancerit
  • Kind: organization
  • Email: cgpit@sanger.ac.uk
  • Location: Hinxton, Cambridge, UK

CASM IT provide bioinformatic support for Cancer, Ageing and Somatic Mutation group at the Wellcome Sanger Institute

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • pysam ==0.19.1
  • vcfpy ==0.13.4