LTRpred

LTRpred: de novo annotation of intact retrotransposons - Published in JOSS (2020)

https://github.com/drostlab/LTRpred

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org, ncbi.nlm.nih.gov, nature.com, plos.org, mdpi.com, joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

diversification evolution genome ltr ltr-retrotransposons ltr-transposons pipeline
Last synced: 6 months ago · JSON representation

Repository

De novo annotation of young retrotransposons

Basic Info
Statistics
  • Stars: 48
  • Watchers: 4
  • Forks: 9
  • Open Issues: 5
  • Releases: 1
Topics
diversification evolution genome ltr ltr-retrotransposons ltr-transposons pipeline
Created about 10 years ago · Last pushed almost 4 years ago
Metadata Files
Readme License

README.md

LTRpred(ict): de novo annotation of young and intact retrotransposons

status

Transposable elements (TEs) comprise vast parts of eukaryotic genomes. In the past, TEs were seen as selfish mobile elements capable of populating a host genome to increase their chances for survival. By doing so they leave traces of junk DNA in host genomes that are usually regarded as by-products when sequencing, assembling, and annotating new genomes.

However, this picture is slowly changing (Drost & Sanchez, 2019) and TEs have been shown to be involved in generating a diverse range of novel phenotypes.

Today, the de novo detection of transposable elements is performed by annotation tools which try to detect any type of repeated sequence, TE family, or remnand DNA loci that can be associated with a known transposable element within a genome assembly. The main goal of such efforts is to retrieve a maximum amount of loci that can be associated with TEs. If successful, such annotation can then be used to mask host genomes and to perform classic (phylo-)genomics studies focusing on host genes.

More than 600 repeat and TE annotation tools have been developed so far. Most of them are designed and optimized to annotate either the entire repeat space or specific superfamilies of TEs and their DNA remnants.

The LTRpred pipeline has a different goal than all other annotation tools. It focuses particularly on LTR retrotransposons and aims to annotate only functional and potentially mobile elements. Such type of annotation is crucial for studying retrotransposon activity in eukaryotic genomes and to understand whether specific retrotransposon families can be activated artificially and harnessed to mutagenize genomes at much faster speed.

In detail, LTRpred will take any genome assembly file in fasta format as input and will generate a detailed annotation of functional and potentially mobile LTR retrotransposons.

Users can consult a comprehensive Introduction to the LTRpred pipeline to get familiar with the tool.

Install

The fastest way to install LTRpred is via a Docker container. Please make sure to read the detailed installation instructions to be able to pass data to the container.

```bash

retrieve docker image from dockerhub

docker pull drostlab/ltrpred

run ltrpred container

docker run --rm -ti drostlab/ltrpred

start R prompt within ltrpred container

~:/app# R ```

Users who wish to run the LTRpred Docker container in a conda environment can use the following approach based on UDocker (Many thanks to Ilja Bezrukov).

Accessing LTRpred Container via RStudio

A more interactive way of performing analyses with LTRpred is via the RStudio version of LTRpred. In this LTRpred Docker Container users can access LTRpred within the container via Rstudio.

```bash

retrieve docker image from dockerhub

docker pull drostlab/ltrpred_rstudio

run ltrpred container

docker run -e PASSWORD=ltrpred --rm -p 8787:8787 -ti drostlab/ltrpred_rstudio ```

To open RStudio and interact with the container go to your standard web browser and type in the following URL:

``` http://localhost:8787

Username: rstudio

Password: ltrpred ```

Users can choose a custom password if they wish.

Within RStudio you can now run the example:

r LTRpred::LTRpred(genome.file = system.file("Hsapiens_ChrY.fa", package = "LTRpred"))

Users can exit the container by pressing Ctrl + c multiple times.

Please find all details here about how to use the Rstudio version here.

Citation

Please cite the following paper when using LTRpred for your own research:

HG Drost. LTRpred: de novo annotation of intact retrotransposons. Journal of Open Source Software, 5(50), 2170 (2020).

Tutorials

Quick Start

The fastest way to generate a LTR retrotransposon prediction for a genome of interest (after installing all prerequisite command line tools) is to use the LTRpred() function and relying on the default parameters. In the following example, a LTR transposon prediction is performed for parts of the Human Y chromosome.

```r

Perform de novo LTR transposon prediction for the Human Y chromosome

LTRpred::LTRpred(genome.file = system.file("Hsapiens_ChrY.fa", package = "LTRpred")) ```

When running your own genome, please specify genome.file = "path/to/your/genome.fasta instead of system.file(..., package = "LTRpred"). The command system.file(..., package = "LTRpred") merely references the path to the example file stored in the LTRpred package itself.

This tutorial introduces users to LTRpred:

Users can also read the tutorials within (RStudio) :

r library(LTRpred) browseVignettes("LTRpred")

You can also find a list of all available LTRpred functions here: https://hajkd.github.io/LTRpred/reference/index.html

Studies that successfully used LTRpred to annotate functional retrotransposons

Discussions and Bug Reports

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:

https://github.com/HajkD/LTRpred/issues

Owner

  • Name: Drost Laboratory - Digital Biology Research Group
  • Login: drostlab
  • Kind: organization
  • Location: United Kingdom

School of Life Sciences, University of Dundee

GitHub Events

Total
  • Member event: 5
Last Year
  • Member event: 5

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 1,092
  • Total Committers: 3
  • Avg Commits per committer: 364.0
  • Development Distribution Score (DDS): 0.003
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
HajkD h****t@g****m 1,089
Hajk-Georg Drost h****t@a****l 2
Arfon Smith a****n 1

Issues and Pull Requests

Last synced: 6 months ago


Dependencies

DESCRIPTION cran
  • R >= 3.1.1 depends
  • BSDA * imports
  • Biostrings * imports
  • GenomeInfoDb * imports
  • GenomicRanges * imports
  • IRanges * imports
  • R.utils * imports
  • RColorBrewer * imports
  • ape * imports
  • biomartr >= 0.5.1 imports
  • downloader * imports
  • dplyr >= 0.3.0.2 imports
  • ggplot2 * imports
  • ggrepel * imports
  • gridExtra * imports
  • magrittr * imports
  • methods * imports
  • parallel >= 3.0.2 imports
  • readr * imports
  • reshape2 * imports
  • scales * imports
  • stringr >= 0.6.2 imports
  • tibble * imports
  • amap * suggests
  • devtools >= 1.6.1 suggests
  • ggbio * suggests
  • ggsci * suggests
  • knitr >= 1.6 suggests
  • rmarkdown >= 0.3.3 suggests
  • roxygen2 * suggests
  • testthat >= 0.9.1 suggests