random-seqs-ecoli
Pipeline used to analyse frequency changes of random sequences expressed in E. coli. (Fajardo Castro & Tautz, 2021)
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.5%) to scientific vocabulary
Repository
Pipeline used to analyse frequency changes of random sequences expressed in E. coli. (Fajardo Castro & Tautz, 2021)
Basic Info
- Host: GitHub
- Owner: johanafc
- Language: R
- Default Branch: main
- Size: 32.2 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Pipeline used for the analysis of amplicon sequencing data generated from a library of random sequences in E. coli
The scripts in this repository were used for the analysis of amplicon sequencing date as described in "Castro JF, Tautz D. The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells. Genes (Basel). 2021 Nov 28;12(12):1913. doi: 10.3390/genes12121913. PMID: 34946861; PMCID: PMC8702183."
Available data for separate experiments was saved in individual folders labelled "exp-X" with "X" being an integer form 1 to 9.
The pipeline consists of the following steps:
- Extraction of random sequences between the sequencing primers.
- Generation of the database of unique sequences present in the sequencing data.
- Description of sequence features (length, GC content, ISD, for each sequence in the database).
- Mapping the sequencing data to the database to get count tables.
- Calculating frequency changes for each sequence in the database.
- Cross-comparison of the results obtained for each individual experiment.
Dependencies: This pipeline uses several third party software that should be installed and be functional: - Trimmomatic (v 0.36) - USEARCH10 - getorf - transeq - faToTab - seqtk - IUpred2
Additionallly, protein aggregation propensities in step 2 were calculated using an online tool: PASTA2.0 http://old.protein.bio.unipd.it/pasta2/ (Ian Walsh, Flavio Seno, Silvio C.E. Tosatto and Antonio Trovato. PASTA2: An improved server for protein aggregation prediction. Nucleic Acids Research, 2014 Jul;42(Web Server issue):W301-7.)
Citation (CITATION.cff)
cff-version: 1.2.0
message: ""
authors:
- family-names: Fajardo Castro
given-names: Johana
orcid: https://orcid.org/0000-0002-4472-6204
- family-names: Tautz
given-names: Diethard
orcid: https://orcid.org/0000-0002-0460-5344
title: "The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells"
doi: https://doi.org/10.1101/2021.11.22.469569
date-released: 2021-11-28