VersatileHDPMixtureModels

Code for our UAI '20 paper "Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes"

https://github.com/bgu-cs-vil/versatilehdpmixturemodels.jl

Science Score: 28.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 3 committers (66.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary

Keywords

dirichlet-process dirichlet-process-mixtures hdp hierarchical-dirichlet-processes inference julia paper
Last synced: 6 months ago · JSON representation ·

Repository

Code for our UAI '20 paper "Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes"

Basic Info
  • Host: GitHub
  • Owner: BGU-CS-VIL
  • License: mit
  • Language: Julia
  • Default Branch: master
  • Homepage:
  • Size: 19.9 MB
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 4
  • Open Issues: 2
  • Releases: 0
Topics
dirichlet-process dirichlet-process-mixtures hdp hierarchical-dirichlet-processes inference julia paper
Created over 5 years ago · Last pushed about 4 years ago
Metadata Files
Readme License Citation

README.md

VersatileHDPMixtureModels.jl

This package is the code for our UAI '20 paper titled "Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes".
Paper, Supplemental Material

What can it do?

This package allows to perform inference in the vHDPMM setting, as described in the paper, or as an alternative, it can perform inference in HDPMM setting.

A note on scalability

With the recent release (0.1.1) we have added threads support (instead of multiprocessing) as default. to enable multiprocessing instead add mp=true to the fit functions. Using the multithreaded version, we can now handle more groups, much more. Just to emphasize, we have recently used it with 7k groups, summing to a total of 220MIL data points, each data point a D=256 histogram. Convergance took only 4 hours. In another scenario we have used it for topic modeling, with 84K documents, each between 100 to 300 words, convergance took about an hour.

Quick Start

  1. Get Julia from here, any version above 1.1.0 should work, install, and run it.
  2. Add the package ]add VersatileHDPMixtureModels.
  3. Add some processes and use the package: using Distributed addprocs(2) @everywhere using VersatileHDPMixtureModels
  4. Now you can start using it!
  5. For the HDP Version: ``` # Sample some data from a CRF PRIOR: # We sample 3D data, 4 Groups, with $\alpha=10,\gamma=1$. and variance of 100 between the components means. crfprior = hdppriorcrfdraws(100,3,10,1) pts,labels = generategroupedgaussianfromhdpgroupcounts(crf_prior[2],3,100.0)

Create the priors we opt to use:

As we want HDP, we set the local prior dimension to 0, and the global prior dimension to 3

gprior, lprior = createdefaultpriors(3,0,:niw)

Run the model:

model = hdp_fit(pts,10,1,gprior,100)

Get results:

modelresults = getmodelglobalpred(model[1]) # Get global components assignments

```

  • Running the vHDP full setting: ``` #Generate some data: #We generate gaussian data, 20K pts each group, Global Dim= 2, Local Dim = 1, 3 Global components, 5 Local in each group, 10 groups: pts,labels = generategroupedgaussian_data(20000, 2, 1, 3, 5, 10, false, 25.0, false)

Create Priors:

gprior, lprior = createdefaultpriors(2,1,:niw)

Run the model:

vhdpmmresults = vhdpfit(pts,2,100.0,1000.0,100.0,gprior,lprior,50)

Get global and local assignments for the points:

vhdpmmglobal = Dict([i=> creategloballabels(vhdpmmresults[1].groupsdict[i]) for i=1:length(data)]) vhdpmmlocal = Dict([i=> vhdpmmresults[1].groupsdict[i].labels for i=1:length(data)]) ```

Examples:

Coseg with super pixels
vHDP as HDP
Missing data experiment
Synthethic data experiemnt

License

This software is released under the MIT License (included with the software). Note, however, that if you are using this code (and/or the results of running it) to support any form of publication (e.g., a book, a journal paper, a conference paper, a patent application, etc.) then we request you will cite our paper:

@inproceedings{dinari2020vhdp, title={Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical {D}irichlet Processes}, author={{Dinari, Or and Freifeld, Oren}, booktitle={UAI}, year={2020} }

Misc

For any questions: dinari at post.bgu.ac.il

Contributions, feature requests, suggestion etc.. are welcomed.

Owner

  • Name: BGU-CS-VIL
  • Login: BGU-CS-VIL
  • Kind: organization

The Vision, Inference, and Learning group

Citation (CITATION.bib)

@inproceedings{dinari2020vhdp,
  title={Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical {D}irichlet Processes},
  author={{Dinari, Or and Freifeld, Oren},
  booktitle={UAI},
  year={2020}
}

GitHub Events

Total
  • Watch event: 1
  • Issue comment event: 1
  • Pull request event: 1
  • Fork event: 1
Last Year
  • Watch event: 1
  • Issue comment event: 1
  • Pull request event: 1
  • Fork event: 1

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 30
  • Total Committers: 3
  • Avg Commits per committer: 10.0
  • Development Distribution Score (DDS): 0.1
Top Committers
Name Email Commits
dinarior d****r@g****m 27
Oren Freifeld o****r@c****l 2
Steven G. Johnson s****j@m****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 3
  • Total pull requests: 8
  • Average time to close issues: 9 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 1
  • Total pull request authors: 3
  • Average comments per issue: 3.0
  • Average comments per pull request: 0.38
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dinarior (3)
Pull Request Authors
  • dinarior (6)
  • mainrs (2)
  • stevengj (1)
Top Labels
Issue Labels
release (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
juliahub.com: VersatileHDPMixtureModels

Code for our UAI '20 paper "Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes"

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 9.9%
Forks count: 28.1%
Average: 29.2%
Dependent packages count: 38.9%
Stargazers count: 39.8%
Last synced: 7 months ago