pangenome-gene-transfer-simulation

Simulating and estimating the effect of gene transfer on bacterial pangenomes.

https://github.com/not-a-feature/pangenome-gene-transfer-simulation

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

bioinformatics msprime thesis-project tskit
Last synced: 6 months ago · JSON representation ·

Repository

Simulating and estimating the effect of gene transfer on bacterial pangenomes.

Basic Info
  • Host: GitHub
  • Owner: not-a-feature
  • License: gpl-3.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 23.1 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
bioinformatics msprime thesis-project tskit
Created almost 2 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

logo

Simulating and estimating the effect of genetransfer on bacterial pangenomes

[!NOTE] - Master Thesis Bioinformatics at the University of Tbingen - Thesis period: 01.12.2023 - 01.06.2024 - DOI: 10.13140/RG.2.2.13490.52165

Horizontal gene transfer (HGT) plays a significant role in shaping the genetic landscape of bacterial populations. In contrast to the more common vertical gene transfer, horizontal gene transfer allows the lateral exchange of genes. To study the impact of HGT on bacterial gene frequency spectra, we have extended existing mutation models within the open-source software msprime [^1] [^2] by incorporating a gene gain and loss model using the Infinitely Many Genes model [^3] approach. The ancestry and mutation simulation is then extended to support HGT events. Additionally, the model is adjusted to fix its otherwise random ancestry simulation to specified trees, which is essential for parameter estimation and fitting the simulation to real data. We then develop an innovative simulation-based testing framework to determine whether a gene frequency spectrum results from neutral evolution. Finally, this framework is validated, and real-world parameters are estimated using pangenome data.

[!TIP] A ready to use Jupyter Notebook with examples can be found here: example_usage.ipynb

Overview

The repository is structured as follows:

| Filename | Description | | -------------------------------------- | ------------------------------------------------------------- | | condaenv.yml | Conda environment with all required software packages. | | genemodel.py | Main Code for the Gene Gain / Loss simulation. | | gfs.py | Utility function for analysing / modifying GFS. | | hgtmutations.py | Extension of the msprime mutation simulation to support HGT. | | hgtsimargs.py | Default simulation parameters. | | hgtsimulation.py | Extension of the msprime ancestry simulation to support HGT. | | neutralitytest.py | Neutrality test based on a $\chi^2$-like and direct approach. | | optimisation.py | Algorithm to fit the simulation to real world GFS. | | exampleusage.ipynb | Jupyter Notebook with examples. | | pangenome-gene-transfer-simulation.pdf | Thesis |

| Dirname | Description | | ------------------ | ----------------------------------------------- | | data | Simulated data and measurements. | | gfsanalysis | Impact of HGT and GC on the GFS of fixed trees. | | minimalsite_count | Impact of double gene gain events on the GFS. | | panX | Files generated by panX. | | tex | LaTeX source files. |

License and Notes

Unless otherwise labelled this piece of software is published unter the GNU General Public License v3.0.

| Permissions | Conditions | Limitations | | ---------------- | ---------------------------- | ----------- | | Commercial use | Disclose source | Liability | | Distribution | License and copyright notice | Warranty | | Modification | Same license | | | Patent use | State changes | | | Private use | | |

Go to LICENSE.md to see the full version.

Logo

The logo is partially based on the output of tskitargvisualizer.

[^2]: Franz Baumdicker et al. "Efficient ancestry and mutation simulation with msprime 1.0". In: Genetics 220.3 (Dec. 2021). Ed. by S Browning.issn: 1943-2631. doi: 10.1093/genetics/iyab229. url: http://dx.doi.org/10.1093/genetics/iyab229 [^3]: Franz Baumdicker, Wolfgang R. Hess and Peter Pfaffelhuber. "The Infinitely Many Genes Model for the Distributed Genome of Bacteria". In: Genome Biology and Evolution 4.4 (2012), pp. 443456. doi: 10.1093/gbe/evs016. url: http://dx.doi.org/10.1093/gbe/evs016

Owner

  • Name: Jules Kreuer
  • Login: not-a-feature
  • Kind: user
  • Location: Germany

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Kreuer"
  given-names: "Jules"
  orcid: "https://orcid.org/0000-0002-7305-833X"
title: "Simulating and estimating the effect of gene transfer on bacterial pangenomes."
version: 1.0.0
doi: 10.13140/RG.2.2.13490.52165
date-released: 2024-06-01
url: "https://github.com/not-a-feature/pangenome-gene-transfer-simulation"

GitHub Events

Total
  • Push event: 2
  • Fork event: 1
Last Year
  • Push event: 2
  • Fork event: 1