treemaker

treemaker: A Python tool for constructing a Newick formatted tree from a set of classifications. - Published in JOSS (2018)

https://github.com/simongreenhill/treemaker

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 15 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

classification newick newick-format phylogenetics python taxonomy
Last synced: 4 months ago · JSON representation

Repository

A Python library for creating a Newick formatted tree from a set of classification strings

Basic Info
  • Host: GitHub
  • Owner: SimonGreenhill
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 1.04 MB
Statistics
  • Stars: 10
  • Watchers: 3
  • Forks: 2
  • Open Issues: 1
  • Releases: 2
Topics
classification newick newick-format phylogenetics python taxonomy
Created almost 10 years ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License Codemeta

README.md

treemaker

A Python library for creating a Newick formatted tree from a set of classification strings (e.g. a taxonomy)

Build Status Coverage Status DOI status

treemaker is a Python library to convert a text-based classification schema into a Newick file for use in phylogenetic and bioinformatic programs.

Research in linguistics or cultural evolution often produces or uses tree taxonomies or classifications. However, these are usually not in a format readily available for use in programs that can understand and manipulate trees. For example, the global taxonomy of languages published by the Ethnologue classifies languages into families and subgroups using a taxonomy string e.g. the language Kalam is classified as "Trans-New Guinea, Madang, Kalam-Kobon", while Mauwake is classified as "Trans-New Guinea, Madang, Croisilles, Pihom", and Kare is "Trans-New Guinea, Madang, Croisilles, Kare". This classification indicates that while all these languages are part of the Madang subgroup of the Trans-New Guinea language family, Kare and Mauwake are more closely related (as they belong to the Croisilles subgroup).

Other publications use a tabular indented format to demarcate relationships, such as the example in Figure 1 from Stephen Wurm's classification of his proposed Yele-Solomons language phylum (Wurm 1975).

Both the taxonomy string and tabular format however are hard to load into software packages that can analyse, compare, visualise and manipulate trees. treemaker aims to make this easy by converting taxonomic data into Newick and Nexus (Maddison 1997) formats commonly used by phylogenetic manipulation programs.

Converting a Taxonomy to a Tree:

treemaker can convert a text file with a taxonomy to a tree. These taxonomies can easily be obtained from Ethnologue or manually entered, such as this example from Wurm's (outdated) classification of Yele-Solomons in Figure 1:

text Bilua Yele-Solomons, Central Solomon Baniata Yele-Solomons, Central Solomon Lavukaleve Yele-Solomons, Central Solomon Savosavo Yele-Solomons, Central Solomon Kazukuru Yele-Solomons, Kazukuru Guliguli Yele-Solomons, Kazukuru Dororo Yele-Solomons, Kazukuru Yele Yele-Solomons

treemaker can then generate a Newick representation:

text ((Baniata,Bilua,Lavukaleve,Savosavo),(Dororo,Guliguli,Kazukuru),Yele);

...which can then be loaded into phylogenetic programs to visualise or manipulate as in Figure 2.

treemaker has been used to enable the analyses in (Bromham et al. 2018), and a number of forthcoming articles.

Example of a language taxonomy in indented format from Wurm (1975).

Tree visualisation of the relationships between the putative Yele-Solomons languages.

Installation:

Installation is only a pip install away:

shell pip install treemaker

Or from git:

shell git clone https://github.com/SimonGreenhill/treemaker/ treemaker cd treemaker python setup.py install

Usage: Command line:

Basic usage:

```shell

treemaker

usage: treemaker [-h] [-o OUTPUT] [-m {nexus,newick}] [--labels] input ```

e.g. Given a text file:

LangA Indo-European, Germanic LangB Indo-European, Germanic LangC Indo-European, Romance LangD Indo-European, Anatolian

... then you can build a taxonomy/classification tree from that as follows:

```shell

treemaker classification.txt (LangD,(LangA,LangB),LangC);

with nodelabels:

treemaker --labels classification.txt (LangD,(LangA,LangB)Germanic,LangC)Indo-European;

treemaker -m nexus classification.txt

NEXUS

begin trees; tree root = (LangD,(LangA,LangB),LangC); end; ```

To write to file:

```shell

treemaker classification.txt (LangD,(LangA,LangB),LangC);

treemaker classification.txt -o classification.nex ```

Usage: Library:

python from treemaker import TreeMaker

generate a tree manually:

```python from treemaker import TreeMaker

t = TreeMaker() t.add('A1', 'family a, subgroup 1') t.add('A2', 'family a, subgroup 2') t.add('B1a', 'family b, subgroup 1') t.add('B1b', 'family b, subgroup 1') t.add('B2', 'family b, subgroup 2')

print(t.write()) ```

Add from a list:

```python from treemaker import TreeMaker

taxa = [ ('A1', 'family a, subgroup 1'), ('A2', 'family a, subgroup 2'), ('B1a', 'family b, subgroup 1'), ('B1b', 'family b, subgroup 1'), ('B2', 'family b, subgroup 2'), ]

t = TreeMaker() t.add_from(taxa)

print(t.write())

```

API Documentation:

The API is documented here.

Running treemaker's tests:

To run treemaker's tests simply run:

```shell

make test

or

python setup.py test

or

python treemaker/test_treemaker.py ```

Version History:

  • v1.4: fix bug with no terminating semicolon in nexus file output.
  • v1.3: add nodelabels support, add some rudimentary input checking.

Support:

For questions on how to use or update this, feel free to open an issue. I'll get to it as soon as I can.

Acknowledgements:

Thank you to Richard Littauer, Mitsuhiro Nakamura, and Dillon Niederhut.

References:

Owner

  • Name: Simon J Greenhill
  • Login: SimonGreenhill
  • Kind: user
  • Location: Jena, Canberra, Auckland
  • Company: @shh-dlce @eva-dlce

I study how languages and cultures evolve. Scientist at the University of Auckland, and the Max Planck Institute for Evolutionary Anthropology

JOSS Publication

treemaker: A Python tool for constructing a Newick formatted tree from a set of classifications.
Published
November 08, 2018
Volume 3, Issue 31, Page 1040
Authors
Simon J. Greenhill ORCID
Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany., ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, Australia.
Editor
Tania Allard ORCID
Tags
phylogenetics newick tree

CodeMeta (codemeta.json)

{
  "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
  "@type": "Code",
  "author": [
    {
      "@id": "http://orcid.org/0000-0001-7832-6156",
      "@type": "Person",
      "email": "simon@simon.net.nz",
      "name": "Simon J. Greenhill",
      "affiliation": "Max Planck Institute for the Science of Human History & ARC Centre of Excellence for the Dynamics of Language, Australian National University"
    }
  ],
  "identifier": "",
  "codeRepository": "https://github.com/SimonGreenhill/treemaker",
  "datePublished": "2018-09-05",
  "dateModified": "2018-11-08",
  "dateCreated": "2018-09-05",
  "description": "A Python library for creating a Newick formatted tree from a set of classifications.",
  "keywords": "phylogenetics,newick",
  "license": "BSD",
  "title": "treemaker",
  "version": "1.2"
}

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 71
  • Total Committers: 2
  • Avg Commits per committer: 35.5
  • Development Distribution Score (DDS): 0.141
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
SimonGreenhill s****n@s****z 61
pyup-bot g****t@p****o 10
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 10
  • Total pull requests: 93
  • Average time to close issues: about 3 hours
  • Average time to close pull requests: 22 days
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 2.3
  • Average comments per pull request: 1.61
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 10
  • Average time to close issues: N/A
  • Average time to close pull requests: 10 days
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.9
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • RichardLitt (7)
  • deniederhut (2)
  • pyup-bot (1)
Pull Request Authors
  • pyup-bot (104)
  • mnacamura (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 16 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 3
  • Total versions: 6
  • Total maintainers: 1
pypi.org: treemaker

A python tool for generating a Newick formatted tree from alist of classifications

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 16 Last month
Rankings
Dependent packages count: 7.3%
Dependent repos count: 9.1%
Forks count: 19.2%
Stargazers count: 20.4%
Average: 21.7%
Downloads: 52.6%
Maintainers (1)
Last synced: 4 months ago

Dependencies

docs/requirements.txt pypi
  • sphinx ==3.3.1
  • sphinxcontrib-napoleon ==0.7