https://github.com/animesh/rnaseq_nextflow

RNA-seq analysis with nextflow, bulk transcriptomics as simple as it can get without compromising quality

https://github.com/animesh/rnaseq_nextflow

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

RNA-seq analysis with nextflow, bulk transcriptomics as simple as it can get without compromising quality

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

RNA-seq analysis with nextflow

Create directory or simply clone this repo

git clone https://github.com/animesh/rnaseq_nextflow cd rnaseq_nextflow/

Download raw data

wget http://genomedata.org/rnaseq-tutorial/HBR_UHR_ERCC_ds_5pc.tar tar -xvf HBR_UHR_ERCC_ds_5pc.tar

Create sample-sheet from the downloaded Data

echo "sample,fastq_1,fastq_2,strandedness" > samples.csv ls -1 *1.fastq.gz | awk -F "_" '{print $1 $2}' > c0 ls -1 $PWD/*1.fastq.gz > c1 ls -1 $PWD/*2.fastq.gz > c2 printf 'auto\n%.0s' {1..`ls *1.fastq.gz`} > c3 paste -d "," c? >> samples.csv cat samples.csv

Download Reference Genome

wget http://genomedata.org/rnaseq-tutorial/fasta/GRCh38/chr22_with_ERCC92.fa

And the annotation for the Reference Genome

wget http://genomedata.org/rnaseq-tutorial/annotations/GRCh38/chr22_with_ERCC92.gtf sed "s/exon_number \"1\";$/exon_number \"1\";gene_biotype "protein_coding";/g" chr22_with_ERCC92.gtf > chr22_with_ERCC92.biotype.gtf

Install nextflow

Pre-reqs: curl, Java and Docker or Singularity on Linux, Windows users can run via WSL curl -s https://get.nextflow.io | bash ```

  N E X T F L O W
  version 24.04.4 build 5917
  created 01-08-2024 07:05 UTC (09:05 CEST)
  cite doi:10.1038/nbt.3820
  http://nextflow.io

Nextflow installation completed. Please note: - the executable file nextflow has been created in the folder: /home/ash022/rnaseq_nextflow - you may complete the installation by moving it to a directory in your $PATH ```

test

./nextflow run hello ```

N E X T F L O W ~ version 24.04.4

Launching https://github.com/nextflow-io/hello [trusting_cori] DSL2 - revision: afff16a9b4 [master]

executor > local (4) executor > local (4) [4d/db3896] sayHello (1) [100%] 4 of 4 Ciao world!

Hello world!

Hola world!

Bonjour world! ```

can finally test nextflow-rnaseq-pipeline

./nextflow run nf-core/rnaseq -profile docker,test --outdir test

if it goes well, run it over the downloaded data above using the created sample-sheet, reference genome and the corresponding annotation

./nextflow run nf-core/rnaseq --max_memory '16.GB' --max_cpus 6 --input samples.csv --outdir results --gtf chr22_with_ERCC92.biotype.gtf --fasta chr22_with_ERCC92.fa -profile docker

AND in case above fails, check the .nextflow logs, lets say cosmic rays flipped the bits, one can try to resume from last successfull step/checkpoint just with a -resume switch!

./nextflow run nf-core/rnaseq --max_memory '16.GB' --max_cpus 6 --input samples.csv --outdir results --gtf chr22_with_ERCC92.biotype.gtf --fasta chr22_with_ERCC92.fa -profile docker -resume but if nothing seems to be working, try https://github.com/nf-core/rnaseq/issues

QC

wget https://rnabio.org/assets/module_2/multiqc.png looks similar to nextflow multiqc_report

Result

compare original wget http://genomedata.org/rnaseq-tutorial/results/cshl2022/rnaseq/gene_read_counts_table_all_final.tsv with resulting nextflow-gene-counts, spearman rank-correlation is ~ 0.99 calculated using Perseus shown in blue in scatter-plot and Euclidean distance clusters) accordingly, more on this in blog/post https://fuzzylife.substack.com/p/rna-seq-analysis-with-nextflow

For mapping to the latest whole Human-genome assembly, need to download and run nextflow the appropriate Reference Genome and its corresponding Annotation, for example

wget http://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz wget http://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/Homo_sapiens.GRCh38.112.gtf.gz ./nextflow run nf-core/rnaseq --max_memory '62.GB' --max_cpus 16 --input samples.csv --outdir results --gtf Homo_sapiens.GRCh38.112.gtf.gz --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz -profile docker

cleanup

rm -rf work .nextflow c? HBR_UHR_ERCC_ds_5pc.tar

issues faced (so far...)

docker daemon not running?

sudo touch nohup.out sudo nohup dockerd &

crash? maybe due to RAM, usually due to STAR genome-indexing, try to increase allocated RAM

./nextflow run nf-core/rnaseq --max_memory '92.GB' --max_cpus 16 --input samples.csv --outdir results --gtf Homo_sapiens.GRCh38.112.gtf.gz --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz -profile docker

Owner

  • Name: Ani
  • Login: animesh
  • Kind: user
  • Location: Norway
  • Company: Norwegian University of Science and Technology

A medical graduate from Delhi University with post-graduation in bioinformatics from Jawaharlal Nehru University, India.

GitHub Events

Total
  • Push event: 2
  • Create event: 2
Last Year
  • Push event: 2
  • Create event: 2

Dependencies

.github/workflows/jekyll-gh-pages.yml actions
  • actions/checkout v4 composite
  • actions/configure-pages v5 composite
  • actions/deploy-pages v4 composite
  • actions/jekyll-build-pages v1 composite
  • actions/upload-pages-artifact v3 composite