te_feature_density
A script for computing density or number of TEs in protein coding features.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
A script for computing density or number of TEs in protein coding features.
Basic Info
- Host: GitHub
- Owner: GuillePeris
- License: gpl-3.0
- Language: R
- Default Branch: main
- Size: 36.1 KB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
TEfeaturedensity
A script for computing density or number of transposable elements (TEs) in protein coding features.
Dependencies
- dplyr
- stringr
- reshape2
- bedtoolsr
- AnnotationHub
- rtracklayer
- biomartr
- UCSCRepeatMasker
- data.table
- tools
bedtoolsr is an R package that uses internally bedtools so this package has to be installed previously.
TEfeaturedensity has only been tested in Unix/Linux.
How does it work?
TEfeaturedensity computes the number of TEs or overlapping TE density in different gene regions (gene, exons, introns, 5'UTR, 3'UTR, downstream). You can use your own gene and RepeatMasker annotation or let TEfeaturedensity download them from Ensembl. Furthermore, a subset from gene annotation (defined by tag; "basic", "Ensemblcanonical", "MANEselect"...) can be chosen.
Automatic annotation download
Gene annotation is downloaded from Ensemble database using function getGTF from biomartrpackage. You need to define species (Canis lupus familiaris) and Ensembl release (113). You can check available species and releases in this website. You can further filter annotation by tag.
TE annotation is downloaded from UCSC through AnnotationHub package using metadata from UCSCRepeatMaskerpackage. For this purpose you must know the corresponding code to a UCSC genome version.
To know what tags are available for a specific species and release in a gene annotation, and code for UCSC RepeatMasker annotation, you can use checkAnnotations.R (see Usage).
Using your own annotations
Gene annotation can be downloaded from Ensembl.org.
Remember to change in TE_feature_density.R these variables:
fileGene <- TRUEgene_annot_file <- "data/your_genome_annotation.gtf"gene_annot_format <- "gff"
TE annotation can be downloaded from UCSC Table Browser: choose Clade, Genome and Assembly of interest, then
- Group: Variations and Repeats,
- Track: RepeatMasker
- Output format: All fields from selected table.
- Output field separator: tsv
and Get output, saving file in Data folder.
Remember to change in TE_feature_density.R these variables:
fileTE <- TRUETE_annot_file <- "data/your_rmsk.txt"TE_annot_format <- "rmsk"
Usage
Check genome and TE annotation
Use script checkAnnotations.R to get possible filter tags for gene annotation
(or NULL for no tag filtering) and UCSC RepeatMasker code for online TE annotation
downloading. Change variables in Parameterssection:
organism <- "Canis lupus familiaris"release <- "113"fileTE <- TRUE: TRUE for using your own TE annotation. FALSE for automatic downloading.fileGene <- TRUE: TRUE for using your own gene annotation. FALSE for automatic downloading.gene_annot_file <- "data/Canis_lupus_familiaris.ROS_Cfam_1.0.113.gtf"
Change parameters
Before you run TE_feature_density.R script you have to change R variables in section
Parameters:
organism: Species name. E.g "Homo sapiens", "Mus musculus", "Danio rerio"UCSC_TE_annot: UCSC code for TE annotation. You can get this code running firstcheckAnnotations.Rscript.release: Ensembl gene annotation version.interest_TEs: a list of TE classes to analyze. E.g.c("LINE", "SINE"). If set toNULLall TE classes are analyzed.interest_subF: a list of TE families to analyze. E.g.c("Alu", "L1"). If ser toNULLall TE families are considered. Please, notice that if this variable is not NULL overrides `interest_TEs' variable (you can only filter classes or families).tag: Filter gene annotation according to gene selection ("Ensemblcanonical", "basic", "MANEselect"...). Check tags available running firstcheckAnnotations.Rscript.minOverlap: only consider TEs that overlap at leastminOverlapbp. In density analysis this applies to overlapping TE clusters, not individual TEs.downstream: Number of bp defining downstream region.OUTPUT_DIR: Results folder.analysis: You can choose to analyze number of TEs (number) or TE density (density). In density analysis, overlapping TEs are merged so that common nucleotides are not counted several times.
You may also consider to change some variables in Advanced parameters section, particularly if you want to use your own downloaded annotations.
fileGene:TRUEfor reading file annotation fromgene_annot_file.FALSEfor automatic downloading.gene_annot_file: Path to gene annotation file.gene_annot_format: Parameter to import file function. Don't change it if you are not really sure!fileTE:TRUEfor reading file annotation fromTE_annot_file.FALSEfor automatic downloading.TE_annot_file: Path to TE annotation file.TE_annot_format: Parameter to import file function. Don't change it if you are not really sure!feature_types: List of gene features to analyze. Choose from c("gene", "fiveprimeutr", "threeprimeutr", "exon", "intron", "downstream").
Please, notice that using your own annotation files can take longer time than expected!
Session info
``` R version 4.3.0 (2023-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04 LTS
Matrix products: default BLAS: /usr/lib/x8664-linux-gnu/blas/libblas.so.3.10.0 LAPACK: /usr/lib/x8664-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LCCTYPE=esES.UTF-8 LCNUMERIC=C
[3] LCTIME=esES.UTF-8 LCCOLLATE=esES.UTF-8
[5] LCMONETARY=esES.UTF-8 LCMESSAGES=esES.UTF-8
[7] LCPAPER=esES.UTF-8 LCNAME=C
[9] LCADDRESS=C LCTELEPHONE=C
[11] LCMEASUREMENT=esES.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Madrid tzcode source: system (glibc)
attached base packages: [1] stats4 stats graphics grDevices utils datasets [7] methods base
other attached packages:
[1] data.table1.16.4 BRGenomics1.14.1
[3] rtracklayer1.62.0 GenomicRanges1.54.1
[5] biomartr1.0.7 UCSCRepeatMasker3.15.2
[7] GenomeInfoDb1.38.8 IRanges2.36.0
[9] S4Vectors0.40.2 AnnotationHub3.10.1
[11] BiocFileCache2.10.2 dbplyr2.5.0
[13] BiocGenerics0.48.1 dplyr1.1.4
[15] stringr1.5.1 reshape21.4.4
[17] bedr_1.0.7
loaded via a namespace (and not attached):
[1] DBI1.2.3 bitops1.0-9
[3] formatR1.14 testthat3.2.1.1
[5] biomaRt2.58.2 rlang1.1.4
[7] magrittr2.0.3 matrixStats1.4.1
[9] compiler4.3.0 RSQLite2.3.8
[11] png0.1-8 vctrs0.6.5
[13] pkgconfig2.0.3 crayon1.5.3
[15] fastmap1.2.0 XVector0.42.0
[17] Rsamtools2.18.0 promises1.3.0
[19] rmarkdown2.29 tzdb0.4.0
[21] purrr1.0.2 bit4.5.0.1
[23] xfun0.49 zlibbioc1.48.2
[25] cachem1.1.0 jsonlite1.8.9
[27] progress1.2.3 blob1.2.4
[29] later1.3.2 DelayedArray0.28.0
[31] BiocParallel1.36.0 interactiveDisplayBase1.40.0
[33] parallel4.3.0 prettyunits1.2.0
[35] R62.5.1 stringi1.8.4
[37] brio1.1.5 knitr1.49
[39] Rcpp1.0.13-1 SummarizedExperiment1.32.0
[41] downloader0.4 R.utils2.12.3
[43] readr2.1.5 VennDiagram1.7.3
[45] httpuv1.6.15 Matrix1.6-5
[47] tidyselect1.2.1 rstudioapi0.17.1
[49] abind1.4-8 yaml2.3.10
[51] codetools0.2-20 curl6.0.1
[53] lattice0.22-6 tibble3.2.1
[55] plyr1.8.9 withr3.0.2
[57] Biobase2.62.0 shiny1.9.1
[59] KEGGREST1.42.0 evaluate1.0.1
[61] lambda.r1.2.4 futile.logger1.4.3
[63] xml21.3.6 Biostrings2.70.3
[65] pillar1.10.0 BiocManager1.30.25
[67] filelock1.0.3 MatrixGenerics1.14.0
[69] renv1.0.11 generics0.1.3
[71] vroom1.6.5 RCurl1.98-1.16
[73] BiocVersion3.18.1 hms1.1.3
[75] ggplot23.5.1 munsell0.5.1
[77] scales1.3.0 xtable1.8-4
[79] glue1.8.0 tools4.3.0
[81] BiocIO1.12.0 locfit1.5-9.10
[83] GenomicAlignments1.38.2 XML3.99-0.17
[85] grid4.3.0 colorspace2.1-1
[87] AnnotationDbi1.64.1 GenomeInfoDbData1.2.11
[89] restfulr0.0.15 cli3.6.3
[91] rappdirs0.3.3 futile.options1.0.1
[93] S4Arrays1.2.1 gtable0.3.6
[95] R.methodsS31.8.2 DESeq21.42.1
[97] digest0.6.37 SparseArray1.2.4
[99] rjson0.2.23 memoise2.0.1
[101] htmltools0.5.8.1 R.oo1.27.0
[103] lifecycle1.0.4 httr1.4.7
[105] mime0.12 bit644.5.2
```
Owner
- Name: Guillermo Peris Ripollés
- Login: GuillePeris
- Kind: user
- Location: Castellón/Granada
- Company: Universitat Jaume I/Genyo
- Twitter: waltzing_piglet
- Repositories: 1
- Profile: https://github.com/GuillePeris
Full professor at Universitat Jaume I (Spain) and bioinformatic at Genyo (Granada). Interested in mobile genetic elements and miRNA.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Peris
given-names: Guillermo
orcid: https://orcid.org/0000-0003-2010-7844
title: "TE_feature_density"
version: 1.0.0
date-released: 2025-13-01
GitHub Events
Total
- Watch event: 1
- Push event: 4
- Create event: 2
Last Year
- Watch event: 1
- Push event: 4
- Create event: 2