random-seqs-ecoli

Pipeline used to analyse frequency changes of random sequences expressed in E. coli. (Fajardo Castro & Tautz, 2021)

https://github.com/johanafc/random-seqs-ecoli

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Pipeline used to analyse frequency changes of random sequences expressed in E. coli. (Fajardo Castro & Tautz, 2021)

Basic Info

Host: GitHub
Owner: johanafc
Language: R
Default Branch: main
Size: 32.2 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 4 years ago · Last pushed almost 4 years ago

Metadata Files

Readme Citation

Pipeline used for the analysis of amplicon sequencing data generated from a library of random sequences in E. coli

The scripts in this repository were used for the analysis of amplicon sequencing date as described in "Castro JF, Tautz D. The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells. Genes (Basel). 2021 Nov 28;12(12):1913. doi: 10.3390/genes12121913. PMID: 34946861; PMCID: PMC8702183."

Available data for separate experiments was saved in individual folders labelled "exp-X" with "X" being an integer form 1 to 9.

The pipeline consists of the following steps:

Extraction of random sequences between the sequencing primers.
Generation of the database of unique sequences present in the sequencing data.
Description of sequence features (length, GC content, ISD, for each sequence in the database).
Mapping the sequencing data to the database to get count tables.
Calculating frequency changes for each sequence in the database.
Cross-comparison of the results obtained for each individual experiment.

Dependencies: This pipeline uses several third party software that should be installed and be functional: - Trimmomatic (v 0.36) - USEARCH10 - getorf - transeq - faToTab - seqtk - IUpred2

Additionallly, protein aggregation propensities in step 2 were calculated using an online tool: PASTA2.0 http://old.protein.bio.unipd.it/pasta2/ (Ian Walsh, Flavio Seno, Silvio C.E. Tosatto and Antonio Trovato. PASTA2: An improved server for protein aggregation prediction. Nucleic Acids Research, 2014 Jul;42(Web Server issue):W301-7.)

Citation (CITATION.cff)

cff-version: 1.2.0
message: ""
authors:
  - family-names: Fajardo Castro
    given-names: Johana
    orcid: https://orcid.org/0000-0002-4472-6204
  - family-names: Tautz
    given-names: Diethard
    orcid: https://orcid.org/0000-0002-0460-5344
title: "The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells"
doi: https://doi.org/10.1101/2021.11.22.469569
date-released: 2021-11-28

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

random-seqs-ecoli

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Pipeline used for the analysis of amplicon sequencing data generated from a library of random sequences in E. coli

Citation (CITATION.cff)

GitHub Events

Total

Last Year