tcherry

tcherry: Learning the structure of tcherry trees - Published in JOSS (2019)

https://github.com/nvihrs14/tcherry

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    1 of 7 committers (14.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

chow-liu-tree graphical-models junction-tree markov-networks structure-learning t-cherry-tree undirected-graph variable-dependence

Scientific Fields

Materials Science Physical Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

This R-package is for learning the structure of the type of graphical models called t-cherry trees from data. The structure is determined either directly from data or by increasing the order of a lower order t-cherry tree.

Basic Info
  • Host: GitHub
  • Owner: nvihrs14
  • License: gpl-3.0
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 301 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 4
  • Open Issues: 2
  • Releases: 3
Topics
chow-liu-tree graphical-models junction-tree markov-networks structure-learning t-cherry-tree undirected-graph variable-dependence
Created about 7 years ago · Last pushed over 5 years ago
Metadata Files
Readme Contributing License

README.md

tcherry

The package is meant for learning the structure of the type of graphical models called t-cherry trees from data. For more information about t-cherry trees see for instance the vignette or Kovács & Szántai (2012). The functions can only be used for categorical data. The purpose of the package is mainly to provide functions for learning the structure of the graph from given data with no missing values. The functions are attempting to find structures of maximal likelihood. If the corresponding graphical model is to be used in connection with probability propagation or prediction other packages are needed (see the vignette for examples).

Install

Without vignettes remotes::install_github("nvihrs14/tcherry")

With vignettes remotes::install_github("nvihrs14/tcherry", build_opts = c("--no-resave-data", "--no-manual"))

If there are problems with viewing documentation or vignettes, it is recommended to restart the R session.

Note that the package requires the following R-packages, which are automatically installed with the package: Rdpack, utils, gRbase, compare, rGrain, Rgraphviz and stats.

The package gRbase further requires the packages graph and RBGL which may have to be installed from Bioconductor for instance with

BiocManager::install(c("graph", "RBGL"))

Main functions (see vignette for more details)

-increase_order2: Determine a (k + 1)'th order t-cherry tree from a k'th order t-cherry tree by a greedy search.

-increase_order_complete_search: Determine the (k + 1)'th order t-cherry tree from a k'th order t-cherry tree by a complete search.

-k_tcherry_p_lookahead: Determine a k'th order t-cherry tree from data by adding p cliques at a time by a greedy search. Note that if p is the total number of cliques in a k'th order t-cherry tree with the desired number of vertices, this is a complete search.

-thin_edges: Thinning of edges in an undirected graphical model with a triangulated graph.

-compute_BIC_junction_tree: Calculates the BIC value for a graphical model from a junction tree for the graph.

Example usage

To demonstrate the main functions in this package consider the car evaluation data set from UCI Machine Learning Repository (Dau & Graff 2017). This data set contains 7 variables (all categorical) with 1728 observations for each and no missing values. The variables are describing different aspects of the car such as the estimated safety of the car, the number of doors etc. To find a graphical structure of a third order t-cherry tree for this data the function k_tcherry_p_lookahead is used. It is chosen to add just one clique at a time in the greedy search procedure.

``` r library(tcherry) car <- read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", header = FALSE, sep = ",", dec = ".") names(car) <- c("buying", "maint", "doors", "persons", "lugboot", "safety", "class") tch3 <- ktcherryplookahead(data = car, k = 3, p = 1, smooth = 0.001) tch3$adj_matrix

> buying maint doors persons lug_boot safety class

> buying 0 1 0 0 0 1 1

> maint 1 0 0 0 0 0 1

> doors 0 0 0 0 1 0 1

> persons 0 0 0 0 0 1 1

> lug_boot 0 0 1 0 0 1 1

> safety 1 0 0 1 1 0 1

> class 1 1 1 1 1 1 0

```

Note that the smooth argument is added to cell counts when estimating probabilities to avoid zero probabilities, which would make some calculations invalid.

The graphical structure of af fourth order t-cherry tree for this data can be found by using the same function as above with k = 4. However in this case, it is chosen to show how increase_order2 can be used to increase the order of the fitted third order t-cherry tree. The typical reason for this choice will be to save time, but often at the cost of a fitted structure of smaller likelihood.

``` r tch4 <- increaseorder2(tchcliq = tch3$cliques, data = car, smooth = 0.001) tch4$adj_matrix

> buying maint doors persons lug_boot safety class

> buying 0 1 0 1 1 1 1

> maint 1 0 0 0 0 1 1

> doors 0 0 0 0 1 1 1

> persons 1 0 0 0 0 1 1

> lug_boot 1 0 1 0 0 1 1

> safety 1 1 1 1 1 0 1

> class 1 1 1 1 1 1 0

```

Note that the smooth argument is added for the same reasons as above, and the given third order t-cherry tree is represented by its cliques.

It can now be attempted to simplify this model by thinning the edges.

``` r tchthinning <- thinedges(cliques = tch4$cliques, separators = tch4$separators, data = car, smooth = 0.001) tchthinning$adjmatrix

> buying class doors lug_boot maint persons safety

> buying 0 1 0 0 1 0 1

> class 1 0 0 1 1 1 1

> doors 0 0 0 0 0 0 0

> lug_boot 0 1 0 0 0 0 1

> maint 1 1 0 0 0 0 0

> persons 0 1 0 0 0 0 1

> safety 1 1 0 1 0 1 0

tchthinning$nedges_removed

> 6

``` Notice that in this function the structure is represented by the cliques and separators of its junction tree and that the resulting graph is no longer a t-cherry tree. However, the function will always return a triangulated graph. In this case, six edges has been deleted from the graph. The order in which the edges are deleted is random, so the resulting structure may vary.

The three fitted structures can be compared by calculating a BIC score.

``` r computeBICjunction_tree(cliques = tch3$cliques, separators = tch3$separators, data = car, smooth = 0.001)

> -20079.89

computeBICjunction_tree(cliques = tch4$cliques, separators = tch4$separators, data = car, smooth = 0.001)

> -21572.4

computeBICjunctiontree(cliques = tchthinning$cliques, separators = tch_thinning$separators, data = car, smooth = 0.001)

> -19923.95

``` Because of the way this score is calculated, it is desired to get a high BIC score. The structure with the highest score is the thinned fourth order t-cherry tree.

For more help

See documentation included in package (vignettes and man) at https://github.com/nvihrs14/tcherry

Feedback

If you experience any problems with this package, you are welcome to contact the authors by email. Notice however that this package was developed as part of our Master Thesis and because we have graduated and no longer work with this subject, it may take a while before issues are addressed.

References

Dua, D. & Graff, C. (2017). UCI machine learning repository.

Kovács, E. & Szántai, T. (2012). Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution. Ann Oper Res, 193, 71-90. https://doi.org/10.1007/s10479-010-0814-y

See also

JOSS paper DOI

Owner

  • Name: Ninna Vihrs
  • Login: nvihrs14
  • Kind: user

JOSS Publication

tcherry: Learning the structure of tcherry trees
Published
July 18, 2019
Volume 4, Issue 39, Page 1480
Authors
Katrine Kirkeby ORCID
Department of Mathematical Sciences, Aalborg University
Maria Knudsen ORCID
Department of Mathematical Sciences, Aalborg University
Ninna Vihrs ORCID
Department of Mathematical Sciences, Aalborg University
Editor
Jason Clark ORCID
Tags
t-cherry trees Undirected triangulated graphical models Dependencies between variables

GitHub Events

Total
  • Fork event: 1
Last Year
  • Fork event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 132
  • Total Committers: 7
  • Avg Commits per committer: 18.857
  • Development Distribution Score (DDS): 0.447
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
nvihrs14 n****4@s****k 73
katrinekirkeby 3****y 38
nvihrs14 3****4 16
Mikkel Meyer Andersen m****l@m****k 2
MariaKnudsen 4****n 1
David H Hagan d****d@d****m 1
Daniel S. Katz d****z@i****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 9
  • Total pull requests: 3
  • Average time to close issues: 20 days
  • Average time to close pull requests: 4 days
  • Total issue authors: 6
  • Total pull request authors: 3
  • Average comments per issue: 0.67
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kellieotto (3)
  • dhhagan (2)
  • zhangbs92 (1)
  • mikldk (1)
  • sngirl (1)
  • ClausDethlefsen (1)
Pull Request Authors
  • dhhagan (1)
  • mikldk (1)
  • danielskatz (1)
Top Labels
Issue Labels
Pull Request Labels