https://github.com/camillescott/mmetsp
re-assembly and analysis of the Marine Microbial Eukaryotic Transcriptome Sequencing Project
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, plos.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
re-assembly and analysis of the Marine Microbial Eukaryotic Transcriptome Sequencing Project
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of johnsolk/MMETSP
Created over 9 years ago
· Last pushed over 9 years ago
https://github.com/camillescott/MMETSP/blob/master/
# MMETSP [](http://mybinder.org/repo/dib-lab/MMETSP) This is a work-in-progress repository, automating the khmer protocols over a large-scale RNAseq data set: https://khmer-protocols.readthedocs.org/en/ctb/mrnaseq/ The Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP) data set contains 678 cultured samples of 306 pelagic and endosymbiotic marine eukaryotic species representing more than 40 phyla (Keeling et al. 2014). Each script takes `SraRunInfo.csv` as input. This file can be obtained for any collection of SRA records from NCBI. Automated scripts running the protocols: 1. `getdata.py`, download data from NCBI and organize into individual directories for each sample/accession ID 2. `trim_qc.py`, trim reads for quality, interleave reads 3. `diginorm_mmetsp.py`, normalize-by-median and filter-abund from khmer, rename, combined orphans, 4. `assembly.py`, runs Trinity de novo transcriptome assembly software Additional scripts: 1. `dammit.py`, annotation https://github.com/camillescott/dammit/tree/master/dammit 2. `salmon.py`, runs salmon reference-free transcript quantification https://github.com/COMBINE-lab/salmon 3. `rapclust.py`, clustering contigs https://github.com/COMBINE-lab/rapclust 4. `busco.py`, assessing assembly and annotation completeness with single-copy orthologs http://busco.ezlab.org/ 5. `clusterfunc.py`, cluster control module 6. `sourmash.py`, MinHash signatures to cluster unassembled reads https://github.com/dib-lab/sourmash/tree/v0.9.4 7. `transdecoder.py`, translate nucleotide contigs to amino acid contigs http://transdecoder.github.io/ 8. `transrate.py`, evaluate assembly with reads http://hibberdlab.com/transrate/ 9. `transrate_reference.py`, evaluate assembly with reference assembly http://hibberdlab.com/transrate/ References: Keeling et al. 2014: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001889 Supporting information with methods description: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001889#s6 Preliminary assembly protocol run by NCGR: https://github.com/ncgr/rbpa NCBI Bioproject accession: PRJNA231566 http://www.ncbi.nlm.nih.gov/bioproject/PRJNA231566/ MMETSP website: http://marinemicroeukaryotes.org/ iMicrobe project with data and combined assembly downloads: http://data.imicrobe.us/project/view/104
Owner
- Name: Camille Scott
- Login: camillescott
- Kind: user
- Location: Davis, CA
- Website: http://www.camillescott.org
- Repositories: 41
- Profile: https://github.com/camillescott
Sys Admin @ucdavis High Performance Compute Core Facility; formerly @dib-lab