sideretro

A pipeline for detecting Somatic Insertion of DE novo RETROcopies

https://github.com/galantelab/sideretro

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary

Keywords

cnv genotype mobile-elements next-generation-sequencing polimorphism pseudogenes retrocopy wes wgs
Last synced: 6 months ago · JSON representation ·

Repository

A pipeline for detecting Somatic Insertion of DE novo RETROcopies

Basic Info
Statistics
  • Stars: 9
  • Watchers: 3
  • Forks: 5
  • Open Issues: 1
  • Releases: 0
Topics
cnv genotype mobile-elements next-generation-sequencing polimorphism pseudogenes retrocopy wes wgs
Created almost 7 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License Citation Security Authors

README.md

sideRETRO

A pipeline for detecting Somatic Insertion of DE novo RETROcopies

CodeQL GitHub tag

sideRETRO is a bioinformatic tool devoted for the detection of somatic retrocopy insertion, also known as retroCNV, in whole genome and whole exome sequencing data (WGS, WES). The program has been written from scratch in C, and uses HTSlib and SQLite3 libraries, in order to manage SAM/BAM/CRAM reading and data analysis.

For full documentation, please visit https://sideretro.readthedocs.io.

Features

When detecting retrocopies, sideRETRO can annotate several other features related to each event:

  • Parental gene

The gene which underwent retrotransposition process.

  • Genomic position

The genome coordinate where occurred the retrocopy integration event (chromosome:start-end). It includes the insertion point (the expected exact point of each retrocopy insertion).

  • Strandness

Detects the orientation of the insertion (+/-). It takes into account the orientation of insertion, whether in the leading (+) or lagging (-) DNA strand.

  • Genomic context

The retrocopy integration site context: If the retrotransposition event occurred at an intergenic or intragenic region - the latter can be splitted into exonic and intronic according to the host gene.

  • Genotype

When multiple individuals (genomes) are analyzed, sideRETRO discriminates events found in each one. That way, it is possible to distinguish whether an event is exclusive or shared among the cohort analyzed.

  • Haplotype

Our tool provides information about the ploidy of the event, i.e., whether it occurs in one or both homologous chromosomes (homozygous or heterozygous).

Getting Started

Installation

The project depends on Meson build system and Ninja to manage configuration and compilation process. They can be obtained using package manager or from source. For example, using Ubuntu distribution:

$ sudo apt-get install python3 \ python3-pip \ python3-setuptools \ python3-wheel \ ninja-build

and then:

$ pip3 install --user meson

(or: $ sudo apt install meson)

Finally, clone this repository:

$ git clone https://github.com/galantelab/sideRETRO.git

Inside sideRETRO directory, run:

$ meson build && ninja -C build

You can find sider executable inside build/src. Optionally, install to system directories with:

$ sudo ninja -C build install

Usage

sideRETRO compiles to an executable called sider, which has three subcommands: process-sample, merge-call and make-vcf. The process-sample subcommand processes a list of SAM/BAM/CRAM files, and captures abnormal reads that must be related to an event of retrocopy. All those data is saved to a SQLite3 database and then we come to the second step merge-call, which processes the database and annotates all the retrocopies found. Finally we can run the subcommand make-vcf and generate a file (in VCF format) with retrocopies and further information about them.

```sh

List of BAM files

$ cat 'my-bam-list.txt' /path/to/file1.bam /path/to/file2.bam /path/to/file3.bam

Run process-sample step

$ sider process-sample \ --annotation-file='my-annotation.gtf' \ --input-file='my-bam-list.txt'

$ ls -1 my-genome.fa my-annotation.gtf my-bam-list.txt out.db

Run merge-call step

$ sider merge-call --in-place out.db

Run make-vcf step

$ sider make-vcf \ --reference-file='my-genome.fa' out.db ```

Citation

If sideRETRO was somehow useful in your research, please cite it:

bib @article{10.1093/bioinformatics/btaa689, author = {Miller, Thiago L A and Orpinelli, Fernanda and Buzzo, José Leonel L and Galante, Pedro A F}, title = "{sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies}", journal = {Bioinformatics}, year = {2020}, month = {07}, issn = {1367-4803}, doi = {10.1093/bioinformatics/btaa689}, url = {https://doi.org/10.1093/bioinformatics/btaa689}, note = {btaa689}, }

License

This is free software, licensed under:

The GNU General Public License, Version 3, June 2007

Owner

  • Name: galantelab
  • Login: galantelab
  • Kind: organization
  • Email: pgalante@mochsl.org.br
  • Location: São Paulo - SP - Brazil

Bioinformatics Group

Citation (CITATION.cff)

cff-version: 1.2.0
message: >-
  If sideRETRO was somehow useful in your research,
  please cite it using the metadata from this file.
title: sideRETRO
type: software
authors:
  - given-names: Thiago L. A.
    email: tmiller@mochsl.org.br
    family-names: Miller
  - given-names: Fernanda Orpinelli
    email: forpinelli@mochsl.org.br
    family-names: Rego
  - email: lbuzzo@mochsl.org.br
    given-names: José Leonel L.
    family-names: Buzzo
  - given-names: Pedro A. F.
    email: pgalante@mochsl.org.br
    family-names: Galante
version: 1.0.0
date-released: 2020-08-11
url: https://github.com/galantelab/sideRETRO
preferred-citation:
  type: article
  title: >-
    sideRETRO: a pipeline for identifying somatic and
    polymorphic insertions of processed pseudogenes or
    retrocopies
  authors:
    - given-names: Thiago L. A.
      email: tmiller@mochsl.org.br
      family-names: Miller
    - given-names: Fernanda Orpinelli
      email: forpinelli@mochsl.org.br
      family-names: Rego
    - email: lbuzzo@mochsl.org.br
      given-names: José Leonel L.
      family-names: Buzzo
    - given-names: Pedro A. F.
      email: pgalante@mochsl.org.br
      family-names: Galante
  doi: 10.1093/bioinformatics/btaa689
  url: https://doi.org/10.1093/bioinformatics/btaa689
  journal: Bioinformatics
  month: 07
  year: 2020
  start: 419
  end: 421
  issue: 3
  volume: 37

GitHub Events

Total
  • Watch event: 3
  • Delete event: 3
  • Push event: 22
  • Pull request event: 4
  • Fork event: 2
  • Create event: 6
Last Year
  • Watch event: 3
  • Delete event: 3
  • Push event: 22
  • Pull request event: 4
  • Fork event: 2
  • Create event: 6

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • docker/build-push-action v3 composite
  • docker/login-action v2 composite
  • docker/setup-buildx-action v2 composite
docker/Dockerfile docker
  • ubuntu 20.04 build