https://github.com/bricoletc/backtranslate

https://github.com/bricoletc/backtranslate

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: bricoletc
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 38.1 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 6 years ago · Last pushed over 5 years ago
Metadata Files
Readme

README.rst

Protein back translation with pairwise distance constraints
============================================================

Usage
``````
Run using `python3 -m src.backTrans`

::

	usage: backTrans.py [-h] (-i INPUT_FILE | -s SEQUENCE) -m {dna,protein}
                    [-d MIN_DIST] [-n NUM_SAMPLES] [-o OUTPUT]
                    [--output_prefix OUT_PREFIX] [--stats_header]
                    [--forbidden FORBIDDEN]

    Backtranslate amino acid sequences with pairwise distance constraints

    optional arguments:
      -h, --help            show this help message and exit
      -i INPUT_FILE, --input_file INPUT_FILE
                            Path to fasta file containing amino acid or DNA
                            sequences
      -s SEQUENCE, --sequence SEQUENCE
                            amino acid sequence passed on command-line
      -m {dna,protein}, --mode {dna,protein}
                            mode to run the tool on: in protein mode, samples
                            backtranslations, in dna mode, uses the dna sequences
                            as samples directly.
      -d MIN_DIST, --min_distance MIN_DIST
                            Minimum distance (Hamming, as fraction of distinct
                            nucleotides) between all returned sequences
      -n NUM_SAMPLES, --num_samples NUM_SAMPLES
                            Maximum number of backtranslated DNA samples to
                            produce
      -o OUTPUT, --output-dir OUTPUT
                            An existing directory where the output will go. This
                            is a fasta of the sampled sequence(s) and a stats file
      --output_prefix OUT_PREFIX
                            prefix for output files
      --stats_header        Prints the header for the stats file
      --forbidden FORBIDDEN
                            File path to DNA sequences that cannot appear in the
                            sample; one sequence per line.


Protein(s)
-----------
Produces DNA sequences compatible with each protein, where no two DNA sequences have less than `--min_distance` Hamming distance.

* Provide a single amino acid sequence on command-line: `-s`
* Or an fasta file with one or more protein sequence: `-i` and `-m protein`

DNA(s)
-------
Finds a set of DNA sequences where no two have less than ``min--distance`` Hamming distance

Provide a fasta file: `-i` and `-m dna`


TODOs
``````

* Try CD-Hit on DNA sequences as opposed to graph approach. Note:

	* CD-Hit produces one representative per cluster, and the pairwise distance is only guaranteed lower than threshold on those
	* cd-hit-est seems to be capped at 75% ID minimum

* Faster pairwise distance computation:
	* cd-hit similar approach: avoid alignment for sequences where given number of shared small word frequency counts is exceeded.
	* sketching-based approach, eg Mash. Con: inexact

Owner

  • Name: Brice Letcher
  • Login: bricoletc
  • Kind: user
  • Company: EMBL-EBI

Bioinformatician and early-career researcher - EMBL-EBI and CNRS ~~~~~~ Parsing my way through DNA sequence data

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels