https://github.com/animesh/rnaseq_nextflow
RNA-seq analysis with nextflow, bulk transcriptomics as simple as it can get without compromising quality
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: nature.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Repository
RNA-seq analysis with nextflow, bulk transcriptomics as simple as it can get without compromising quality
Basic Info
- Host: GitHub
- Owner: animesh
- License: mit
- Language: HTML
- Default Branch: main
- Homepage: https://fuzzylife.substack.com/p/rna-seq-analysis-with-nextflow
- Size: 324 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
RNA-seq analysis with nextflow
Create directory or simply clone this repo
git clone https://github.com/animesh/rnaseq_nextflow
cd rnaseq_nextflow/
Download raw data
wget http://genomedata.org/rnaseq-tutorial/HBR_UHR_ERCC_ds_5pc.tar
tar -xvf HBR_UHR_ERCC_ds_5pc.tar
Create sample-sheet from the downloaded Data
echo "sample,fastq_1,fastq_2,strandedness" > samples.csv
ls -1 *1.fastq.gz | awk -F "_" '{print $1 $2}' > c0
ls -1 $PWD/*1.fastq.gz > c1
ls -1 $PWD/*2.fastq.gz > c2
printf 'auto\n%.0s' {1..`ls *1.fastq.gz`} > c3
paste -d "," c? >> samples.csv
cat samples.csv
Download Reference Genome
wget http://genomedata.org/rnaseq-tutorial/fasta/GRCh38/chr22_with_ERCC92.fa
And the annotation for the Reference Genome
wget http://genomedata.org/rnaseq-tutorial/annotations/GRCh38/chr22_with_ERCC92.gtf
sed "s/exon_number \"1\";$/exon_number \"1\";gene_biotype "protein_coding";/g" chr22_with_ERCC92.gtf > chr22_with_ERCC92.biotype.gtf
Install nextflow
Pre-reqs: curl, Java and Docker or Singularity on Linux, Windows users can run via WSL
curl -s https://get.nextflow.io | bash
```
N E X T F L O W
version 24.04.4 build 5917
created 01-08-2024 07:05 UTC (09:05 CEST)
cite doi:10.1038/nbt.3820
http://nextflow.io
Nextflow installation completed. Please note:
- the executable file nextflow has been created in the folder: /home/ash022/rnaseq_nextflow
- you may complete the installation by moving it to a directory in your $PATH
```
test
./nextflow run hello
```
N E X T F L O W ~ version 24.04.4
Launching https://github.com/nextflow-io/hello [trusting_cori] DSL2 - revision: afff16a9b4 [master]
executor > local (4) executor > local (4) [4d/db3896] sayHello (1) [100%] 4 of 4 Ciao world!
Hello world!
Hola world!
Bonjour world! ```
can finally test nextflow-rnaseq-pipeline
./nextflow run nf-core/rnaseq -profile docker,test --outdir test
if it goes well, run it over the downloaded data above using the created sample-sheet, reference genome and the corresponding annotation
./nextflow run nf-core/rnaseq --max_memory '16.GB' --max_cpus 6 --input samples.csv --outdir results --gtf chr22_with_ERCC92.biotype.gtf --fasta chr22_with_ERCC92.fa -profile docker
AND in case above fails, check the .nextflow logs, lets say cosmic rays flipped the bits, one can try to resume from last successfull step/checkpoint just with a -resume switch!
./nextflow run nf-core/rnaseq --max_memory '16.GB' --max_cpus 6 --input samples.csv --outdir results --gtf chr22_with_ERCC92.biotype.gtf --fasta chr22_with_ERCC92.fa -profile docker -resume
but if nothing seems to be working, try https://github.com/nf-core/rnaseq/issues
QC
wget https://rnabio.org/assets/module_2/multiqc.png
looks similar to nextflow multiqc_report
Result
compare original
wget http://genomedata.org/rnaseq-tutorial/results/cshl2022/rnaseq/gene_read_counts_table_all_final.tsv
with resulting nextflow-gene-counts, spearman rank-correlation is ~ 0.99 calculated using Perseus shown in blue in scatter-plot and Euclidean distance clusters) accordingly, more on this in blog/post https://fuzzylife.substack.com/p/rna-seq-analysis-with-nextflow
For mapping to the latest whole Human-genome assembly, need to download and run nextflow the appropriate Reference Genome and its corresponding Annotation, for example
wget http://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
wget http://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/Homo_sapiens.GRCh38.112.gtf.gz
./nextflow run nf-core/rnaseq --max_memory '62.GB' --max_cpus 16 --input samples.csv --outdir results --gtf Homo_sapiens.GRCh38.112.gtf.gz --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz -profile docker
cleanup
rm -rf work .nextflow c? HBR_UHR_ERCC_ds_5pc.tar
issues faced (so far...)
docker daemon not running?
sudo touch nohup.out
sudo nohup dockerd &
crash? maybe due to RAM, usually due to STAR genome-indexing, try to increase allocated RAM
./nextflow run nf-core/rnaseq --max_memory '92.GB' --max_cpus 16 --input samples.csv --outdir results --gtf Homo_sapiens.GRCh38.112.gtf.gz --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz -profile docker
Owner
- Name: Ani
- Login: animesh
- Kind: user
- Location: Norway
- Company: Norwegian University of Science and Technology
- Website: https://www.fuzzylife.org
- Twitter: animesh1977
- Repositories: 749
- Profile: https://github.com/animesh
A medical graduate from Delhi University with post-graduation in bioinformatics from Jawaharlal Nehru University, India.
GitHub Events
Total
- Push event: 2
- Create event: 2
Last Year
- Push event: 2
- Create event: 2
Dependencies
- actions/checkout v4 composite
- actions/configure-pages v5 composite
- actions/deploy-pages v4 composite
- actions/jekyll-build-pages v1 composite
- actions/upload-pages-artifact v3 composite