network-tree-reconciliation

Code and data for "An efficient algorithm for the reconciliation of a gene network and species tree"

https://github.com/yao-ban/network-tree-reconciliation

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Code and data for "An efficient algorithm for the reconciliation of a gene network and species tree"

Basic Info
  • Host: GitHub
  • Owner: yao-ban
  • Language: C++
  • Default Branch: main
  • Size: 57.6 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Citation

README.md

Supplementary material for the paper "An efficient algorithm for the reconciliation of a gene network and species tree"

simulate_species.cpp

Usage:

./simulate_species <number of species>

This produces a species tree via a pure birth process with birth rate Srate (a macro defined in the code, defaulting to 1). The tree is output in the format:

<number of nodes> 1 <lines>

where each line represents one node in bottom-up order, in the format:

<node id> <type> <id of parent> <label if a leaf> <time>

Details:

  • node id starts at 0 and increases by 1 up to the number of nodes (no gaps) and is in bottom-up order, so node id of child < node id of parent
  • type is 0 = root, 1 = divergence, 2 = reticulation (does not happen for species trees), 3 = leaf
  • time starts at 0 for leaf nodes (current day) and increases going backwards in time
  • the root node always has no parents and 1 child (not 2 children)

simulate_gene.cpp

Usage:

./simulate_gene <species tree file> [--Drate <D rate>] [--Lrate <L rate>] [--Rrate <R rate>]

This produces a gene network via a birth-and-death process inside the species tree given by <species tree file>, which must be in the format output by simulate_species above.

The birth-and-death process simulates D events at a rate of <D rate> per lineage, L events at a rate of <L rate> per lineage, and R events at a rate of <R rate> per lineage past the first in a species. These rates have default values 0.5.

The network is output in the format:

<number of nodes> 0 <lines>

where each line represents one node in bottom-up order, in the format:

<node id> <type> <id of parent> <id of second parent if a reticulation> <label if a leaf> <breakpoint if a reticulation>

Then a reconciliation is output in the format:

<lines>

where each line represents the mapping of one gene node in id order, in the format:

<gene node id> <species node id> <event type>

Details:

  • event type is C = current, S = speciation, D = duplication, R = reticulation, T = rooT

perecon.cpp

Usage:

./perecon <gene network file> <species tree file>

This produces the most parsimonious reconciliation between the gene network given by <gene network file> and the species tree given by <species tree file>. These must be in the format output by simulate_gene and simulate_species.

The MPR is calculated with each D event costing Dcost (default 1), and each L event costing Lcost (default 1), where Dcost and Lcost are macros defined in the code.

The reconciliation is output in the format:

<lines>

where each line represents the mapping of one gene node in id order, in the format:

<gene node id> <species node id> <event type>

Details:

  • event type is C = current, S = speciation, D = duplication, R = reticulation, T = rooT
  • currently the program also outputs a lot of other information including species tree, gene network, LCA reconciliation, LCA-HCA reconciliation, BCC decomposition, BCC tree-child-ness, intermediate cost calculations, and final MPR cost

wrapper.py

Usage:

python3 wrapper.py [-s <number of species>] [-d <D rate>] [-l <L rate>] [-r <R rate>] [--replicates <number of replicates>]

This runs simulate_species with <number of species> species, then simulate_gene with the given rates, then perecon to calculate an MPR. It then outputs in one line, separated by ,:

  • number of D events with correct type and location;
  • number of D events with correct location only;
  • number of D events with correct type only;
  • number of D events with neither correct type nor location;
  • number of S events with correct type and location;
  • number of S events with correct location only;
  • number of S events with correct type only;
  • number of S events with neither correct type nor location;
  • number of R events with correct type and location;
  • number of R events with correct location only (always 0);
  • number of R events with correct type only;
  • number of R events with neither correct type nor location (always 0);
  • Proportion of sequence with correct paralogy.

This is then repeated a total of <number of replicates> times.

simulate.py

Usage:

python3 simulate.py

This runs wrapper.py for the range of parameters shown in the paper.

process.py

Usage:

python3 process.py

This opens all results/results-*.txt files and amalgamates them into the single file results/results.csv. Each line is copied as

<line>, <D rate>, <L rate>, <R rate>

analysis.R

This produces (from the results/results.csv file) all plots used in the paper.

Owner

  • Login: yao-ban
  • Kind: user

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: An efficient algorithm for the reconciliation of a gene network and species tree
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Yao-ban
    family-names: Chan
    email: yaoban@unimelb.edu.au
    affiliation: The University of Melbourne
    orcid: 'https://orcid.org/0000-0002-8425-8775'
repository-code: 'https://github.com/yao-ban/network-tree-reconciliation'
abstract: >-
  Supplementary material for the paper "An efficient
  algorithm for the reconciliation of a gene network and
  species tree"
keywords:
  - Reconciliation
  - Phylogenetic network
  - Paralog exchange
license: GPL-3.0

GitHub Events

Total
Last Year