hgtsim
A simulator for horizontal gene transfer (HGT) in microbial communities
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 8 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Keywords
Repository
A simulator for horizontal gene transfer (HGT) in microbial communities
Basic Info
Statistics
- Stars: 9
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md

Publication
- Song W, Steensen K, Thomas T. (2017) HgtSIM: a simulator for horizontal gene transfer (HGT) in microbial communities. PeerJ 5:e4015 https://doi.org/10.7717/peerj.4015 PDF
- Contact: Weizhi Song (songwz03@gmail.com), Torsten Thomas(t.thomas@unsw.edu.au)
- Affiliation: The Centre for Marine Bio-Innovation (CMB), The University of New South Wales, Sydney, Australia
Workflow

Dependencies
Change Log
2019-01-06:
- HgtSIM can be installed with "pip3 install HgtSIM" now.
2018-04-06:
- combined the '-mixed', '-mini' and '-maxi' options into one: '-mixed min-max'.
2017-09-16:
- add support for draft genome.
- add support for dynamic flanking sequences.
- add support for the 'mixed' mode.
- add support for the 'keep_cds' option.
To-do
- run Prodigal if "-keep_cds" was specified
- check Ns in provided gene sequences
- check whether provided sequences to transfer are ORFs, exit if not
Installation
HgtSIM is implemented in python3, you can install it with:
pip3 install HgtSIMHgtSIM requires BLAST+, you can either add it to your system path or specify full path to "blastn" and "blastp" executables with options "-blastn" and "-blastp".
Help information
HgtSIM -h
-t sequences of genes to be transferred (multi-fasta format)
-i mutation level
-d distribution of transfers to the recipient genomes
-f folder holds recipient genomes
-r ratio of mutation types
-x file extension of recipient genomes
-lf left end flanking sequences
-rf right end flanking sequences
-mixed randomly assign mutation levels between specified values, parameter format: min-max
-keep_cds insert transfers only to non-coding regions, need the annotation files (in gbk format) of recipient genomes
-a folder holds the annotation files (in gbk format) of recipient genomes
-l minimum length of intergenic region to be considered for insertion
-blastn path to blastn executable, default: blastn
-blastp path to blastp executable, default: blastp
Input files and arguments
- Sequences of genes to be transferred (in multi-fasta format).
- A folder holds all recipient genomes, one file per genome.
The mutation level of genes to be transferred. This can be specified either as a fixed value, or within a range (the 'mixed' mode). If the 'mixed' argument was provided, HgtSIM will randomly select a value between user specified minimum and maximum mutation levels to alter each gene transfer.
# with fixed mutation level (e.g. 10%). HgtSIM -t genes.fasta -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna -i 10 # with 'mixed' mode (e.g. 5-25%) HgtSIM -t genes.fasta -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna -mixed 5-25The ratio of mutation categories (separated with dash). The default setting is '1-0-1-1'. Please refer to the publication (http://dx.doi.org/10.7717/peerj.4015) or the figure below for its setting.

The distribution of transfers to the recipient genomes. The first column refers to the recipient genomes(without file extension), followed by a list of genes to be transferred therein (separated with comma).
BAD,AAM_03063,AKV_01007,AMAC_01196,AMAU_02632,AMS_01785 BDS,AAM_00175,AKV_00943,AMAC_00215,AMAU_02085,AMS_01465 BGC,AAM_00176,AKV_01272,AMAC_01576,AMAU_00617,AMS_02653 BHS,AAM_00195,AKV_01273,AMAC_01674,AMAU_05963,AMS_03303 BNM,AAM_00209,AKV_00282,AMAC_02914,AMAU_02414,AMS_03378 BRT,AAM_00308,AKV_02353,AMAC_03303,AMAU_00830,AMS_01655The flanking sequences to be added to the end of gene transfers. Can be specified with '-lf' and '-rf', the default value is None.
# introduce gene transfers without adding flanking sequences HgtSIM -t genes.fasta -i 10 -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna # or, add same pair of flanking sequences (e.g. 'TAGATGAGTGATTAGTTAGTTA') to all gene transfers HgtSIM -t genes.fasta -i 10 -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna -lf TAGATGAGTGATTAGTTAGTTA -rf TAGATGAGTGATTAGTTAGTTA # or, add flanking sequences dynamically to the two ends of each gene transfer HgtSIM -t genes.fasta -i 10 -d distribution.txt -f input_genomes -r 1-0-1-1 -x fna -lf lf.fasta -rf rf.fastaif you want to add flanking sequences dynamically to the gene transfers, you can specify the left and right side sequences in two multi-fasta files. The IDs of the flanking sequences need to be exactly the same to their corresponding gene transfers.
As an illustration, if you have four transfers, which are transferA, transferB, transferC and transferD. And you have provided the following two files:
lf.fasta
>transfer_A AAAAAAAAAA >transfer_B TTTrf.fasta
>transfer_A GGGGGGG >transfer_C CCCCCHgtSIM will then:
- add 'AAAAAAAAAA' to the left and 'GGGGGGG' to the right end of transfer_A;
- add 'TTT' to the left and nothing to the right end of transfer_B;
- add nothing to the left and 'CCCCC' to the right end of transfer_C;
- add nothing to boths end of transfer_D.
Transfers can be inserted only to the intergenic regions by specifying the 'keep_cds' option. The annotation files (in genbank format) of the recipient genomes are needed to enable this option.
Output files
- Produced genomes with transferred genes, which were placed in folder 'Genomeswithtransfers'.
- The amino acid sequences of input genes to be transferred.
- The nucleotide and amino acid sequences of mutated input genes.
- The mutation report file, which includes two parts:
- on the top is the nc and aa identities between input and mutated sequences for each transfer.
- followed by a summary of changed nucleotide bases for each transfer.
- The insertion report file.
Owner
- Name: Weizhi Song
- Login: songweizhi
- Kind: user
- Location: Hong Kong
- Company: The Chinese University of Hong Kong
- Repositories: 9
- Profile: https://github.com/songweizhi
GitHub Events
Total
Last Year
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| songweizhi | s****3@g****m | 57 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: 9 months
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 3.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mjw349 (1)
- Zjianglin (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 13 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 1
- Total maintainers: 1
pypi.org: hgtsim
a simulator for HGT in microbial communities
- Homepage: https://github.com/songweizhi/HgtSIM
- Documentation: https://hgtsim.readthedocs.io/
- License: GPL3+
-
Latest release: 1.1.0
published about 7 years ago
Rankings
Maintainers (1)
Dependencies
- biopython *