hgtsim

A simulator for horizontal gene transfer (HGT) in microbial communities

https://github.com/songweizhi/hgtsim

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary

Keywords

bioinformatics hgt horizontal-gene-transfer lateral-gene-transfer lgt metagenomics simulator
Last synced: 6 months ago · JSON representation

Repository

A simulator for horizontal gene transfer (HGT) in microbial communities

Basic Info
  • Host: GitHub
  • Owner: songweizhi
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 1.45 GB
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
bioinformatics hgt horizontal-gene-transfer lateral-gene-transfer lgt metagenomics simulator
Created over 8 years ago · Last pushed over 5 years ago
Metadata Files
Readme License

README.md

logo

pypi  licence  pypi  version  pypi  download DOI

Publication

  • Song W, Steensen K, Thomas T. (2017) HgtSIM: a simulator for horizontal gene transfer (HGT) in microbial communities. PeerJ 5:e4015 https://doi.org/10.7717/peerj.4015 PDF
  • Contact: Weizhi Song (songwz03@gmail.com), Torsten Thomas(t.thomas@unsw.edu.au)
  • Affiliation: The Centre for Marine Bio-Innovation (CMB), The University of New South Wales, Sydney, Australia

Workflow

workflow

Dependencies

Change Log

  • 2019-01-06:

    • HgtSIM can be installed with "pip3 install HgtSIM" now.
  • 2018-04-06:

    • combined the '-mixed', '-mini' and '-maxi' options into one: '-mixed min-max'.
  • 2017-09-16:

    • add support for draft genome.
    • add support for dynamic flanking sequences.
    • add support for the 'mixed' mode.
    • add support for the 'keep_cds' option.

To-do

  • run Prodigal if "-keep_cds" was specified
  • check Ns in provided gene sequences
  • check whether provided sequences to transfer are ORFs, exit if not

Installation

  • HgtSIM is implemented in python3, you can install it with:

    pip3 install HgtSIM
    
  • HgtSIM requires BLAST+, you can either add it to your system path or specify full path to "blastn" and "blastp" executables with options "-blastn" and "-blastp".

Help information

    HgtSIM -h

      -t          sequences of genes to be transferred (multi-fasta format)
      -i          mutation level
      -d          distribution of transfers to the recipient genomes
      -f          folder holds recipient genomes
      -r          ratio of mutation types
      -x          file extension of recipient genomes
      -lf         left end flanking sequences
      -rf         right end flanking sequences
      -mixed      randomly assign mutation levels between specified values, parameter format: min-max
      -keep_cds   insert transfers only to non-coding regions, need the annotation files (in gbk format) of recipient genomes
      -a          folder holds the annotation files (in gbk format) of recipient genomes
      -l          minimum length of intergenic region to be considered for insertion
      -blastn     path to blastn executable, default: blastn
      -blastp     path to blastp executable, default: blastp

Input files and arguments

  1. Sequences of genes to be transferred (in multi-fasta format).
  2. A folder holds all recipient genomes, one file per genome.
  3. The mutation level of genes to be transferred. This can be specified either as a fixed value, or within a range (the 'mixed' mode). If the 'mixed' argument was provided, HgtSIM will randomly select a value between user specified minimum and maximum mutation levels to alter each gene transfer.

    # with fixed mutation level (e.g. 10%).
    HgtSIM -t genes.fasta -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna -i 10
    
    # with 'mixed' mode (e.g. 5-25%)
    HgtSIM -t genes.fasta -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna -mixed 5-25
    
  4. The ratio of mutation categories (separated with dash). The default setting is '1-0-1-1'. Please refer to the publication (http://dx.doi.org/10.7717/peerj.4015) or the figure below for its setting.

    ratio_selection

  5. The distribution of transfers to the recipient genomes. The first column refers to the recipient genomes(without file extension), followed by a list of genes to be transferred therein (separated with comma).

    BAD,AAM_03063,AKV_01007,AMAC_01196,AMAU_02632,AMS_01785
    BDS,AAM_00175,AKV_00943,AMAC_00215,AMAU_02085,AMS_01465
    BGC,AAM_00176,AKV_01272,AMAC_01576,AMAU_00617,AMS_02653
    BHS,AAM_00195,AKV_01273,AMAC_01674,AMAU_05963,AMS_03303
    BNM,AAM_00209,AKV_00282,AMAC_02914,AMAU_02414,AMS_03378
    BRT,AAM_00308,AKV_02353,AMAC_03303,AMAU_00830,AMS_01655
    
  6. The flanking sequences to be added to the end of gene transfers. Can be specified with '-lf' and '-rf', the default value is None.

    # introduce gene transfers without adding flanking sequences
    HgtSIM -t genes.fasta -i 10 -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna
    
    # or, add same pair of flanking sequences (e.g. 'TAGATGAGTGATTAGTTAGTTA') to all gene transfers
    HgtSIM -t genes.fasta -i 10 -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna -lf TAGATGAGTGATTAGTTAGTTA -rf TAGATGAGTGATTAGTTAGTTA
    
    # or, add flanking sequences dynamically to the two ends of each gene transfer
    HgtSIM -t genes.fasta -i 10 -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna -lf lf.fasta -rf rf.fasta
    

    if you want to add flanking sequences dynamically to the gene transfers, you can specify the left and right side sequences in two multi-fasta files. The IDs of the flanking sequences need to be exactly the same to their corresponding gene transfers.

    As an illustration, if you have four transfers, which are transferA, transferB, transferC and transferD. And you have provided the following two files:

    lf.fasta

    >transfer_A
    AAAAAAAAAA
    >transfer_B
    TTT
    

    rf.fasta

    >transfer_A
    GGGGGGG
    >transfer_C
    CCCCC
    

    HgtSIM will then:

    1. add 'AAAAAAAAAA' to the left and 'GGGGGGG' to the right end of transfer_A;
    2. add 'TTT' to the left and nothing to the right end of transfer_B;
    3. add nothing to the left and 'CCCCC' to the right end of transfer_C;
    4. add nothing to boths end of transfer_D.
  7. Transfers can be inserted only to the intergenic regions by specifying the 'keep_cds' option. The annotation files (in genbank format) of the recipient genomes are needed to enable this option.

Output files

  1. Produced genomes with transferred genes, which were placed in folder 'Genomeswithtransfers'.
  2. The amino acid sequences of input genes to be transferred.
  3. The nucleotide and amino acid sequences of mutated input genes.
  4. The mutation report file, which includes two parts:
    1. on the top is the nc and aa identities between input and mutated sequences for each transfer.
    2. followed by a summary of changed nucleotide bases for each transfer.
  5. The insertion report file.

Owner

  • Name: Weizhi Song
  • Login: songweizhi
  • Kind: user
  • Location: Hong Kong
  • Company: The Chinese University of Hong Kong

GitHub Events

Total
Last Year

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 57
  • Total Committers: 1
  • Avg Commits per committer: 57.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
songweizhi s****3@g****m 57

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: 9 months
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 3.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mjw349 (1)
  • Zjianglin (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 13 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 1
  • Total maintainers: 1
pypi.org: hgtsim

a simulator for HGT in microbial communities

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 13 Last month
Rankings
Dependent packages count: 10.0%
Stargazers count: 18.5%
Dependent repos count: 21.7%
Forks count: 22.6%
Average: 23.0%
Downloads: 42.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

HgtSIM.egg-info/requires.txt pypi
  • biopython *