ddtlcm

ddtlcm: An R package for overcoming weak separation in Bayesian latent class analysis via tree-regularization - Published in JOSS (2024)

https://github.com/limengbinggz/ddtlcm

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: arxiv.org, joss.theoj.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Engineering Computer Science - 60% confidence
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: limengbinggz
  • License: other
  • Language: R
  • Default Branch: main
  • Size: 41.1 MB
Statistics
  • Stars: 6
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 2
Created almost 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License Citation

README.md

ddtlcm: Dirichlet diffusion tree-latent class model (DDT-LCM)

An R package for tree-regularized latent class models with a DDT process prior on class profiles to overcome weak separation issues

DOI

Maintainer: Mengbing Li (mengbing@umich.edu)

Contributors: Briana Stephenson (bstephenson@hsph.harvard.edu); Zhenke Wu (zhenkewu@umich.edu)

| | Citation | Paper Link | ------------- | ------------- | ------------- | | Bayesian tree-regularized LCM | Li M, Stephenson B, Wu Z (2025). Tree-Regularized Bayesian Latent Class Analysis for Improving Weakly Separated Dietary Pattern Subtyping in Small-Sized Subpopulations. Annals of Applied Statistics. In press. |Link| | Method | Li M, Stephenson B, Wu Z (2023). Tree-Regularized Bayesian Latent Class Analysis for Improving Weakly Separated Dietary Pattern Subtyping in Small-Sized Subpopulations. ArXiv:2306.04700. |Link| | Software | Li M, Wu B, Stephenson B, Wu Z (2024). ddtlcm: An R package for overcoming weak separation in Bayesian latent class analysis via tree-regularization. Journal of Open Source Software, 9(99), 6220, https://doi.org/10.21105/joss.06220. |Link|

Gallery

Table of content

Installation

```r

install bioconductor package ggtree for visualizing results:

if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("ggtree")

install.packages("devtools",repos="https://cloud.r-project.org") devtools::install_github("limengbinggz/ddtlcm") ```

Overview

ddtlcm is designed for analyzing multivariate binary observations over grouped items in a tree-regularized Bayesian LCM framework. Between-class similarities are guided by an unknown tree, where classes positioned closer on the tree are more similar a priori. This framework facilitates the sharing of information between classes to make better estimates of parameters using less data. The model is built upon equipping LCMs with a DDT process prior on the class profiles, with varying degrees of shrinkage across major item groups. The model is particularly promising for addressing weak separation of latent classes when sample sizes are small. The posterior inferential algorithm is based on a hybrid Metropolis-Hastings-within-Gibbs algorithm and can provide posterior uncertainty quantifications.

ddtlcm works for multivariate binary responses over pre-specified grouping of items. The functions' relations in the package ddtlcm can be visualized by

r library(DependenciesGraphs) # if not installed, try this-- devtools::install_github("datastorm-open/DependenciesGraphs") library(QualtricsTools) # devtools::install_github("emmamorgan-tufts/QualtricsTools") dep <- funDependencies('package:ddtlcm','ddtlcm_fit') plot(dep)

A Quickstart

```r library(ddtlcm)

data(parameter_diet)

unlist the elements into variables in the global environment

list2env(setNames(parameterdiet, names(parameterdiet)), envir = globalenv())

N <- 496 seedparameter = 1 # random seed to generate node parameters given the tree seedresponse = 1 # random seed to generate multivariate binary observations from LCM

simulate data given the parameters

simdata <- simulatelcmgiventree(treephylo, N, classprobability, itemmembershiplist, Sigmabygroup, rootnodelocation = 0, seedparameter = seedparameter, seedresponse = seedresponse)

K <- 6 # number of latent classes, same as number of leaves on the tree resultdiet <- ddtlcmfit(K = K, data = simdata$responsematrix, itemmembershiplist = itemmembershiplist, totaliters = 100) print(resultdiet) ```

Examples

A simple workflow using semi-synthetic data is provided.

Tests

Unit tests are in place to ensure correct implementation of certain utility functions, and ensure the functions accept and return the correct classes of input and output. To run the unit tests in the repository, the R package testthat is required. After incorporating your changes to the package code, run the following line in a terminal for unit tests:

R CMD check --as-cran ddtlcm_0.2.1.tar.gz

In addition, automated tests are implemented in major functions to inform the user about any false input.

Reference Manual

See the manual on CRAN.

Contributing And Getting Help

Please report bugs by opening an issue. If you wish to contribute, please make a pull request. If you have questions, you can open a discussion thread.

If you are in need of support, please contact the maintainer at limengbinggz@gmail.com.

Note

  • When running some functions in the package, such as ddtlcm_fit, a warning that "Tree contains singleton nodes" may be displayed. This warning originates from the checkPhylo4 function in the phylobase package to perform basic checks on the validity of S4 phylogenetic objects. We would like to point out that seeing such warnings shall not pose any concerns about the statistical validity of the implemented algorithm. This is because any tree generaetd from a DDT process contains a singleton node (having only one child node) as the root node. To avoid repeated appearances of this warning, we recommend either of the followings:

    • Wrapping around the code via suppressWarnings({ code_that_will_generate_singleton_warning });
    • Setting options(warn = -1) globally. This may be dangerous because other meaningful warnings may be ignored.

Owner

  • Name: Mengbing Li
  • Login: limengbinggz
  • Kind: user

JOSS Publication

ddtlcm: An R package for overcoming weak separation in Bayesian latent class analysis via tree-regularization
Published
July 15, 2024
Volume 9, Issue 99, Page 6220
Authors
Mengbing Li ORCID
Department of Biostatistics, University of Michigan
Bolin Wu
Department of Computer Science, University of Michigan
Briana Stephenson
Department of Biostatistics, Harvard University
Zhenke Wu ORCID
Department of Biostatistics, University of Michigan
Editor
Nikoleta Glynatsi ORCID
Tags
Dirichlet diffusion tree latent class model ddtlcm nutrition epidemiology

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Li
  given-names: Mengbing
  orcid: "https://orcid.org/0000-0002-2264-8006"
- family-names: Wu
  given-names: Bolin
- family-names: Stephenson
  given-names: Briana
- family-names: Wu
  given-names: Zhenke
  orcid: "https://orcid.org/0000-0001-7582-669X"
contact:
- family-names: Li
  given-names: Mengbing
  orcid: "https://orcid.org/0000-0002-2264-8006"
- family-names: Wu
  given-names: Zhenke
  orcid: "https://orcid.org/0000-0001-7582-669X"
doi: 10.5281/zenodo.12711232
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Li
    given-names: Mengbing
    orcid: "https://orcid.org/0000-0002-2264-8006"
  - family-names: Wu
    given-names: Bolin
  - family-names: Stephenson
    given-names: Briana
  - family-names: Wu
    given-names: Zhenke
    orcid: "https://orcid.org/0000-0001-7582-669X"
  date-published: 2024-07-15
  doi: 10.21105/joss.06220
  issn: 2475-9066
  issue: 99
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6220
  title: "ddtlcm: An R package for overcoming weak separation in
    Bayesian latent class analysis via tree-regularization"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06220"
  volume: 9
title: "ddtlcm: An R package for overcoming weak separation in Bayesian
  latent class analysis via tree-regularization"

GitHub Events

Total
  • Watch event: 1
  • Push event: 3
Last Year
  • Watch event: 1
  • Push event: 3

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 81
  • Total Committers: 4
  • Avg Commits per committer: 20.25
  • Development Distribution Score (DDS): 0.16
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
limengbinggz@gmail.com l****z@g****m 68
zhenkewu z****u@g****m 9
James Uanhoro j****o@g****m 3
Larry Dong l****g@m****a 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 5
  • Total pull requests: 11
  • Average time to close issues: 19 days
  • Average time to close pull requests: 8 days
  • Total issue authors: 3
  • Total pull request authors: 3
  • Average comments per issue: 0.8
  • Average comments per pull request: 0.36
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jamesuanhoro (3)
  • zhenkewu (1)
Pull Request Authors
  • zhenkewu (7)
  • jamesuanhoro (6)
  • larryshamalama (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 147 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: ddtlcm

Latent Class Analysis with Dirichlet Diffusion Tree Process Prior

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 147 Last month
Rankings
Forks count: 21.3%
Stargazers count: 23.8%
Dependent packages count: 28.0%
Dependent repos count: 36.6%
Average: 39.1%
Downloads: 85.9%
Maintainers (1)
Last synced: 6 months ago