swabseq
Code for analyzing Swab-Seq data from the Cusanovich lab.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
·
Repository
Code for analyzing Swab-Seq data from the Cusanovich lab.
Basic Info
- Host: GitHub
- Owner: cusanovichlab
- Language: R
- Default Branch: main
- Size: 336 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 4 years ago
· Last pushed almost 2 years ago
Metadata Files
Readme
Citation
README.md
swabseq
Code for analyzing Swab-Seq data from the Cusanovich lab.
The basic steps of the data processing workflow are as follows:
- Convert a 'plate map' .xls to a 'sample sheet' .csv using the
platemap2samp.pyscript from https://github.com/octantbio/SwabSeq (we provide this script in our repo as well for convenience). A 'plate map' is a specially formatted Excel spread sheet that establishes the barcodes used and the experimental conditions present in a Swab-Seq experiment. Please see the Octant repo for further details. We provide an example 'plate map' here as well (https://github.com/cusanovichlab/swabseq/blob/main/preprocess/ExamplePlateMap.xlsx). - Convert the 'sample sheet' to a metadata file using the
parseSS.pyscript available from the Pachter lab (https://caltech.box.com/shared/static/m6t4ok1bqwuhy3f6tut9sqantufvtiro.gz, also provided here https://github.com/cusanovichlab/swabseq/tree/main/preprocess/colab) as described in their notebook on processing Swab-Seq data (https://github.com/pachterlab/BLCSBGLKP_2020/blob/master/notebooks/swabseq.ipynb). - Convert the metadata to a 'whitelist' using the following command (from the Pachter lab workflow):
cat [metadata.txt] | awk '{print $1}' | tail -n +2 > [whitelist.txt]. - Generate fastq files. To convert BCL files to fastq files, we used a Docker container that includes Illumina's bcl2fastq2 program (genomicpariscentre/bcl2fastq2, sha256:50e6d0382a72e19ce9d3cf9091430499d39a89b15aefde4570dedbafcef2934c). In our case, we also used the
fastq_barcode_correcter_reformatter_w_exclusion_list.pyscript to demultiplex reads before further processing. An example of usage is:python fastq_barcode_correcter_reformatter_w_exclusion_list.py [R1 fastq] [I1 fastq] [I2 fastq] [Sample sheet] [Minus list] [Out prefix]. TheSample sheetis the same one used to create the metadata file above. TheMinus listis a tab-separated list of barcode combinations ('i7\ti5') that were included on the sequencing run, but should be excluded from demultiplexing (this was employed to guard against any erroneous assignments of barcodes due to tolerance of mismatches). We generated this file by copying and pasting from the 'sample sheet'. TheOut prefixshould specify the directory and the name of the output files (suffixes are appended by the script).
NOTE: We ran steps 6-11 in a Docker container on a macbook. For convenience we have set up a Docker Hub repository with the image here: https://hub.docker.com/repository/docker/cusanovichlab/swabseq.
- Index the custom reference transcriptome available from the Pachter lab tarball downloaded in Step 2 (or provided here https://github.com/cusanovichlab/swabseq/tree/main/preprocess/colab):
kallisto index -i colab/index.idx -k 11 colab/trunc_transcriptome_11.fa. - Map reads to the Swab-Seq-specific reference. Again, following the Pachter lab workflow (https://github.com/pachterlab/BLCSBGLKP_2020/blob/master/notebooks/swabseq.ipynb), except that we used the SwabSeq10 configuration of kallisto (available on the
covidbranch, which we have forked for convenience - https://github.com/cusanovichlab/kallisto):kallisto bus -x SwabSeq10 -o [Outdir] -t 2 -i colab/index.idx [I1 fastq] [I2 fastq] [R1 fastq]. - Sort the kallisto output with bustools:
colab/bustools sort -o sort.bus output.bus. - Correct barcodes with bustools:
colab/bustools correct -d dump.txt -w whitelist.txt -o sort.correct.bus sort.bus. - Sort the barcode-corrected bus file:
colab/bustools sort -o sort.correct.sort.bus sort.correct.bus. - Generate a text file of read counts mapping to target genes for each sample barcode:
colab/bustools text -p sort.correct.sort.bus > data.txt. - Read
data.txtfile andmetadata.txtfiles into R and generate appropriate plots with R scripts provided in https://github.com/cusanovichlab/swabseq/tree/main/analysis. The R analysis process was modified from the scripts provided in the Octant Bio repo.
Owner
- Name: Cusanovich Lab
- Login: cusanovichlab
- Kind: organization
- Email: darrenc@arizona.edu
- Location: University of Arizona
- Website: https://cusanovichlab.github.io/
- Repositories: 1
- Profile: https://github.com/cusanovichlab
Github organization for the Cusanovich Lab at the University of Arizona
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Code for analyzing Swab-Seq data from the Cusanovich lab.
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Darren
name-particle: A
family-names: Cusanovich
email: darrenc@arizona.edu
orcid: 'https://orcid.org/0000-0001-6889-0095'
repository-code: 'https://github.com/cusanovichlab/swabseq'