rnaseq-quantification-pipeline
Developmental private RNASeq pipeline for bacterial quantification.
https://github.com/pbradleylab/rnaseq-quantification-pipeline
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.3%) to scientific vocabulary
Repository
Developmental private RNASeq pipeline for bacterial quantification.
Basic Info
- Host: GitHub
- Owner: pbradleylab
- License: mit
- Language: Python
- Default Branch: main
- Size: 23.2 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 6
- Releases: 2
Metadata Files
README.md
RNASeq Quantification Workflow
This workflow is meant for use originally for isolates with STAR using a transcriptome; however additions have made it now possible to use Kallisto. If running on a MAG sample you may wish to take an alignmnet approach instead of using this method (especially if the annotation set is limited). In the case of an limited annotation set, you may wish to annotate using anvi'o which provides a much more extensive annotation set then other tools tested (as of April, 2025) prior to using this workflow.
Installation and Environment Set Up
This workflow expects that you have conda installed prior to starting. Conda is very easy to install in general and will allow you to easily install the other dependencies needed in this workflow. then you will need to download this repository via git clone.
- Activate your conda environment so that you are in the "base" environment. This can be achieved by running a source /path/to/conda/install/folder/bin/activate
- Make sure you have your channels set up to allow conda-forge by running
conda config --add channels conda-forge - Install snakemake into a new environment by running
conda create -n rnaquant -c bioconda snakemake=7.25.4. Then activate it by runningconda activate rnaquant. - Move into the directory
cd rnaseq-quantification-pipelineand if desired generate the test data by runningcd test && snakemake --use-conda --cores 4 -j. The md5sum may say the files are not downloaded on the first go; however, if you run it again the files should all be OK. Move back to the rnaseq-quantification-pipeline folder after this is done, then runsnakemake --use-conda --cores 2 -nto test if everything is installed correctly. If not, then you'll need to trouble shoot your environment. The
That's it! If you have downloaded snakemake, the rest of the dependencies will be downloaded automatically for you via the workflow.
Prepare the input files
In this pipeline we are using PEP which allows for easier portability between projects. You need to alter some of the files and generate two .tsv.
- Edit the config/pepconfig.yml file's
raw_data:string to be where your files are located at. Make sure to use the full path to avoid any errors. - We need to generate the samples and subsamples files. subsamples is just where the paired end files are input, samples is an overview file. Each row contains a different sample and it's related information. First create your sample sheet and make sure it ends with
.csv.sample_name: The name of the sample without the ending. If it can't match it then you'll get an error.alternate_id: Really doesn't matter, but if there is a different ID that the sample is under add it here. Otherwise, just set this as 1,2,3,4 etc.project: The name of the project you are working onorganism: The organism / strainsample_type: Putrnahere. - Create a subsample sheet fill out the following.
sample_name: The name of the sample as supplied in the sample file above.subsample: The same name of the samplename. This is needed if it is paired end as there are technically two files to process.protocol: Putrnahere. `seqmethod: Putpaired_end` here - Edit the config/pepconfig.yml to contain the full path to the subsample.csv and the sample.csv on lines
sample_table:andsubsample_table: - You can select to download your genome and gff3 via a url or to download it manually before hand and set the path. Make sure it is the genome and not the transcriptome or the gffread step will fail. Also make sure the gff3 is the annotation associated with the genome you have provided. You'll need to then edit the config/config.json file's lines:
is_local: to be True or False depending on if you wish for it to download from a provided url in the config or from the provided path.genome: To be the path to where your genome.fa is.genome_name: To be the exact name of your genome including the fasta.gff3: To be the path where the gff3 file is location.run_gunc: Put either "true" of flase, to choose to run GUNC on the reference genome being used.quantification_tool: Put either "kallisto" or "star" to choose which quantification method you with to use.
Run The Workflow
You should be good to go. Resources are automatically downloaded for tools that need them in the resouces/ folder. This may take an hour or so after running. At the ened you will get two combined matrices of counts and TPM from the quantification.
1. To run the workflow enter snakemake --use-conda -j --cores 4 in the rnaseq-quantification-pipeline directory. Make sure you are not trying to run the quantification pipeline in the test or workflow directories. If the workflow is failing at a certain point, you can still generate some of the data *likely by running it with the -k flag as well.
NOTE:: You can also run the workflow on slurm by using snakemake --slurm --default-resources slurm_account=osc_account_number -j --use-conda.
TIP: If you want to run multiple samples, move them all to the same directory (or symlink them to save space), and design your input sheet to have different project ids.
We are running the following programs, and are welcome to PR and input on other programs we should be running. Trimming: Trim-galore Quanitifcation: Kallisto Quality Control: Fastq-screen Fastqc Multiqc
This is an example DAG for an analysis run with 6 samples.
Solid lines indicate the rules that have not been executed yet, whereas dashed lines depict completed jobs at time of the dag generation.
Common FAQ
list index out of rangeError: If you recieve an error that sayslist index out of rangeand the traceback points toget_r1_fastqorget_r2_fastqas the reason. More often than not the problem is in how the sample name is being given in the sample and subsample.tsv files. A common error is that the R1 or R2 extension is not recognized or it is being duplicated. When inputing the sample names, you do not need to include R1 or R2 in the name. Only include the file name up until the last . or _ before the paired read name. Recognized file endings are_R1,.R1,.r1,_r1,_1with the endingfastq.gzorfq.gz. If the error persists please feel free to submit an issue.- When making the input csv files, there shouldn't be any duplicates in the
samplecolumn of the sample.csv or else an error will be thrown by snakemake.
Owner
- Name: Bradley Lab
- Login: pbradleylab
- Kind: organization
- Location: United States of America
- Website: https://bradleylab.science/
- Repositories: 1
- Profile: https://github.com/pbradleylab
Our long-term aim is to understand the gut microbiome as well as we currently understand model microbes.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Kananen" given-names: "Kathryn" orcid: "https://orcid.org/0000-0001-6080-0323" - family-names: "Bradley" given-names: "Patrick" orcid: "https://orcid.org/0000-0002-9231-8344" title: "RNASeq quantification pipeline" version: 1.0.0 date-released: 2023-05-30 url: "https://github.com/pbradleylab/rnaseq-quantification-pipeline.git"
GitHub Events
Total
- Issues event: 1
- Delete event: 2
- Member event: 1
- Push event: 3
- Pull request event: 4
- Fork event: 1
- Create event: 1
Last Year
- Issues event: 1
- Delete event: 2
- Member event: 1
- Push event: 3
- Pull request event: 4
- Fork event: 1
- Create event: 1