https://github.com/czbiohub-sf/bcell_pipeline
Immcantation B-cell Repertoire Sequencing Pipeline adapted for Reflow
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: frontiersin.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Repository
Immcantation B-cell Repertoire Sequencing Pipeline adapted for Reflow
Basic Info
Statistics
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Immcantation B-cell Repertoire Sequencing Pipeline adapted for Reflow.
Author: Eric Waltari, Gerry Meixiong, Aaron McGeever, CZ Biohub
Date: 2.22.19
For more information, see the accompanying article:
Waltari, E., McGeever, A., Friedland, N., Kim, P.S. and McCutcheon, K.M., 2019. Functional enrichment and analysis of antigen-specific memory B cell antibody repertoires in PBMCs. Frontiers in immunology, 10, p.1452.
Contents
Introduction
The Immcantation framework is a start-to-finish pipeline going from raw reads to repertoire analysis. It encompasses pre-processing of fastq files, pRESTO, Change-O Igblast, TigGER, SHazaM, Change-O Clone, and Alakazam. All of these steps up to Alakazam summaries are included in bcell.rf. The reflow pipeline currently uses the 2.3 version of kleinstein/immcantation.
Setup
To run locally, install reflow from https://github.com/grailbio/reflow. Reflow is implemented in Go, so you want to install that as well.
Make sure you have AWS credentials saved to your environment by running aws configure or exporting them like this: export AWSACCESSKEYID=[your key id] export AWSSECRETACCESSKEY=[your secret key]
Then run the following: AWSSDKLOADCONFIG=1 reflow setup-ec2 AWSSDKLOADCONFIG=1 reflow setup-s3-repository czbiohub-reflow-quickstart-cache AWSSDKLOAD_CONFIG=1 reflow setup-dynamodb-assoc czbiohub-reflow-quickstart
Your reflow config should now be configured with the biohub's s3 bucket and ready to run.
To run on an ec2 instance, use aegea to launch the czbiohub-reflow packer image. aegea launch --iam-role S3fromEC2 --ami-tags Name=czbiohub-reflow -t t2.micro [yourname]-reflow aegea start [yourinstance]
Copy your read fastqs and primers fastas to the reflow/ directory in this repo. Then copy the entire repo to the launched instance using scp or aegea scp as the reflow/ directory. ssh into the instance. You should now be able to run the pipeline. Be sure to aegea stop your instance when you are done using reflow.
Usage
For single runs (two read files with corresponding primers), cd to the reflow/ directory.
reflow [-cache off] run bcell.rf -read1file
It is important to note that the read1file and read2file inputs do not necessarily correspond to your R1 and R2 fastq files from the sequencer. The read1file input must be the fastq sequence beginning with the C-region or J-segment. The read2file input much be the matching fastq sequence beginning with the leader V-segment. read1primers and read2primers enumerate the primer sequences for read1file and read2file, respectively. Results will be saved to the s3 bucket under the runname directory (i.e. Alakazam results will be under the runname/alakazam/ directory). The -cache off option prevents caching and forces a full run.
Questions
For questions, please contact eric.waltari@czbiohub.org or aaron.mcgeever@czbiohub.org.
Owner
- Name: Chan Zuckerberg Biohub San Francisco
- Login: czbiohub-sf
- Kind: organization
- Location: San Francisco
- Website: https://www.czbiohub.org/sf/
- Repositories: 1
- Profile: https://github.com/czbiohub-sf