hamrlnc

High-throughput pipeline for modified mRNA annotation and long non-coding RNA annotation

https://github.com/bdgregory/hamrlnc

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

bioinformatics docker epitranscriptomics pipelines rna-seq
Last synced: 9 months ago · JSON representation ·

Repository

High-throughput pipeline for modified mRNA annotation and long non-coding RNA annotation

Basic Info
  • Host: GitHub
  • Owner: bdgregory
  • License: mit
  • Language: Shell
  • Default Branch: main
  • Homepage:
  • Size: 10.9 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Topics
bioinformatics docker epitranscriptomics pipelines rna-seq
Created over 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

HAMRLNC: High-throughput Annotation of Modified Ribonucleotides and Long Non-Coding RNAs

HAMRLINC_workflow

Overview

  • HAMRLNC is a multipurpose toolbox that expedites the analysis pipeline for HAMR developed by Paul Ryvkin et al. HAMRLNC aims to make the original method more accessible by automating the tedious pre-processing steps and expanding on their functionalities with its built-in post-processing steps, allowing users to visualize epitranscriptomic analysis with experimental condition contexts.
  • HAMRLNC is high-throughput and performs RNA-modification annotation and long non-coding RNAs(lncRNA) annotation at a bioproject scale. HAMRLNC performs constitutive trimming of acquired reads using Trim-Galore, and makes use of STAR as the default aligning tool; mapped reads are pre-processed using selected methods from GATK, gffread, CPC2, infernal, samtools, etc. Users can also opt to quantify transcripts alongside these steps.
  • HAMRLNC is optimized for partial parallel processing and modularization. Specifying a larger thread count where hardware permits will greatly increase the speed of a single run. If only partial functionality is needed (e.g. only analyzing modified ribonucleotides), users can implement flags to activate the function modules desired. See below for more details.

Command Line Arguments and Description

Read the Wiki for detailed descriptions of selected flags.

| Command | Description | | :---: | :---: | | Required | | -o | <pipeline output directory>
name of the directory where you would like your hamrlnc run to be | | -c | <filenames for each fastq.csv>
a csv file that corresponds each srr code (or name of fastq file) to your desired nomenclature for each read | | -g | <reference genome.fa>
a fasta file of the genome of the model organism | | -i | <reference genome annotation.gff3>
a gff3 file of the genome of the model organism, note we require gff3 instead of gtf | | Optional | | -d | [raw fastq folder]
default=NA | | -l | <minimum average read length>
default: auto-detect | | -t | [trim raw fastq]
default=false | | -s | [adapter sequence for trimming R1 or single end] | -a | [adapter sequence for trimming R2] | -D | [raw bam folder]
default=NA | | -b | [sort raw bam]
default=false | | -r | [perform fastqc]
default=false | | -I | [STAR genome index folder]
default=NA | | -n | [number of threads]
default=4 | | -O | [Panther organism taxon ID]
default="3702" | | -A | [Panther annotation dataset]
default="GO:0008150" | | -Y | [Panther test type: FISHER or BINOMIAL]
default="FISHER" | | -R | [Panther correction type: FDR, BONFERRONI, or NONE]
default="FDR" | | -y | [keep intermediate bam files]
default=false | | -q | [halt program upon completion of checkpoint 2]
default=false | | -G | [attribute used for featurecount]
default="geneid" | | -k | [activate modification annotation workflow]
default=false | | -p | [activate lncRNA annotation workflow]
default=false | | -u | [activate featurecount workflow]
default=false | | -H | [SERVER alt path for panther] | | -U | [SERVER alt path for HAMRLNC] | | -W | [SERVER alt path for GATK] | -S | [SERVER alt path for HAMR] | -J | [SERVER alt path for CPC2] | -M | [SERVER alt path for Rfam] | -f | [HAMR filter]
default=filter
SAMnumberhits.pl | | -m | [HAMR model]
default=euktrnamods.Rdata | | -Q | [HAMR minimum quality score: 0-40]
default=30 | | -E | [HAMR sequencing error: 0-1]
default=0.01 | | -P | [HAMR maximum p-value: 0-1]
default=1 | | -F | [HAMR maximum FDR: 0-1]
default=0.05 | | -C | [HAMR minimum coverage: 0-∞]
default=10 | | -B | [HAMR: keep intermediate files (debug)]
default=false | | -T | [HAMR: speficy target bed]
default=NA | | -z | [keep raw fastq files downloaded from SRA]
default=false | | -x | [max intron length for lncRNA-annotation-unique STAR mapping]
default=NA | | -h | [help message]|

Running HAMRLNC

Required dependencies

  1. Linux-based computer, server, or cluster.
  2. Docker
  3. Minimum memory of 32 GB and minimum disk space of 120 GB, could require higher specs for organisms with larger genomes like human.

```

pull HAMRLNC docker image:

docker pull chosenobih/hamrlnc:v0.13

clone HAMRLNC repo

git clone https://github.com/harrlol/HAMRLNC.git cd HAMRLNC

download the genome file for Arabidopsis thaliana from ENSEMBL

wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/arabidopsisthaliana/dna/Arabidopsisthaliana.TAIR10.dna.toplevel.fa.gz gunzip Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz

download the annotation file for Arabidopsis thaliana from ENSEMBL

wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/gff3/arabidopsisthaliana/Arabidopsisthaliana.TAIR10.59.gff3.gz gunzip Arabidopsis_thaliana.TAIR10.59.gff3.gz

make sure your fa and gff3 files are in your working directory, and enter that directory

cd /your/working/directory

run HAMRLNC with SRA IDs with all three arms activated

docker run \ --rm -v $(pwd):/working-dir \ -w /working-dir chosenobih/hamrlnc:v0.13 \ -o testrun \ -c demo/demofilenames.csv \ -g Arabidopsisthaliana.TAIR10.dna.toplevel.fa \ -i Arabidopsisthaliana.TAIR10.59.gff3 \ -l 50 -n 4 -k -p -u -r -t

if your system uses Apple Silicon chip

not yet supported 2025-03-13

docker run \ --platform linux/amd64 \ --rm -v $(pwd):/working-dir \ -w /working-dir chosenobih/hamrlnc:v0.13 \ -o testrun \ -c demo/demofilenames.csv \ -g Arabidopsisthaliana.TAIR10.dna.toplevel.fa \ -i Arabidopsisthaliana.TAIR10.59.gff3 \ -l 50 -n 4 -k -p -u -r -t ```

Running HAMRLNC as an application on CyVerse's Discovery Environment

HAMRLNC has been integrated as an app on CyVerse's Discovery Environment (DE), and it is available for use by researchers. Search for “HAMRLNC" and then select the v0.02 version. A short tutorial on how to run the app is available at this CyVerse wiki. CyVerse's DE provides an easy-to-use graphic user interphase for running several Life Sciences computational pipelines.

Step-by-step walkthrough

For more detailed documentation and step-by-step tutorial for running HAMRLNC, please visit the Wiki page.

Issues

If you encounter any issues while running HAMRLNC, please open an issue on this GitHub repo, and we'll attend to it as soon as possible.

Copyright

``` Copyright (c) 2024 HAMRLNC Team

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE ```

Owner

  • Login: bdgregory
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Obih
    given-names: Chosen
  - family-names: Li
    given-names: Jiatong
title: "HAMRLNC: High-throughput Analysis of Modified Ribonucleotides and Long Non-Coding RNAs"
version: beta
identifiers:
date-released: 2024-03-28

GitHub Events

Total
  • Watch event: 2
  • Push event: 25
  • Gollum event: 2
  • Pull request event: 4
  • Create event: 1
Last Year
  • Watch event: 2
  • Push event: 25
  • Gollum event: 2
  • Pull request event: 4
  • Create event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • chosenobih (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

Dockerfile docker
  • ubuntu 18.04 build