https://github.com/ax-ekk/proseq_alignment.sh

Pipeline shell script for aligning paired-end PRO-seq data with spike-in and UMIs

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Pipeline shell script for aligning paired-end PRO-seq data with spike-in and UMIs

Basic Info

Host: GitHub
Owner: ax-ekk
License: mit
Default Branch: master
Size: 14.6 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Fork of JAJ256/PROseq_alignment.sh

Created over 3 years ago · Last pushed almost 6 years ago

https://github.com/ax-ekk/PROseq_alignment.sh/blob/master/

# PROseq_alignment.sh

[![DOI](https://zenodo.org/badge/254700530.svg)](https://zenodo.org/badge/latestdoi/254700530)

This is a pipeline script for aligning paired-end PRO-seq data that has cells of a different species spiked in for normalization, and uses some combination of random UMI sequences on the ligation end of either the 5' or 3' adapter, or both.

Run this script in a directory that has one folder named "fastq" which contains the data. Fastq files must have identical names other than ending in _R1.fastq and _R2.fastq.

# Parameters

These parameters are at the top of the file and can be changed depending on your data.

THREADS: (Integer) Number of threads to spawn for each step in the process.

UMI_LEN: (Integer) length of the UMI in basepairs. The if both 5' and 3' UMIs are used, this is the lenght for both. Fastp cannot handle multiple UMIs of different length.


FIVEP_UMI: (String) Set to "Y" if the UMI is on the 5' adapter

THREEP_UMI: (String) Set to "Y" if the UMI is on the 3' adapter

Note: both FIVEP_UMI and THREEP_UMI can be "Y" if there are UMIs on both sides of the insert

ADAPTOR_1 and ADAPTOR_2: (String) adapter sequences to trim. Default is TruSeq Small RNA sequences. These sequences are only here for backup in the case that fastp cannot automatically determine the adapter sequences by overlap analysis.


GENOME_EXP: (String) Path to the bowtie2 index for your experimental genome

GENOME_SPIKE: (String) Path to the bowtie2 index for your spike-in genome. To prepare a spike-in genome, combine your experimental genome with a repeat-masked version of your spike-in organism's genome. You must first modify the chromosome labels of the spike-in genome so alignments can be sorted later. 

SPIKE_PREFIX: (String) This is the prefix you've used on your spike in chromosomes, ie >spikechr1

RDNA: (String) Path to the bowtie2 index for the rDNA repeat for your organism(s)

MAPQ: (Integer) Mapq score cutoff for filtering multimappers

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/ax-ekk/proseq_alignment.sh

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/ax-ekk/PROseq_alignment.sh/blob/master/

Owner

GitHub Events

Total

Last Year