mmsbm

Mixed Membership Stochastic Block Models

https://github.com/eudald-seeslab/mmsbm

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary

Keywords

python recommender-system stochastic-block-model

Last synced: 6 months ago · JSON representation ·

Repository

Mixed Membership Stochastic Block Models

Basic Info

Host: GitHub
Owner: eudald-seeslab
License: bsd-3-clause
Language: Python
Default Branch: main
Homepage: https://mmsbm-docs.readthedocs.io/en/latest/
Size: 7.39 MB

Statistics

Stars: 1
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 2

Topics

python recommender-system stochastic-block-model

Created almost 5 years ago · Last pushed 8 months ago

Metadata Files

Readme License Citation

Mixed Membership Stochastic Block Models

A Python implementation of Mixed Membership Stochastic Block Models for recommendation systems, based on the work by Godoy-Lorite et al. (2016). This library provides an efficient, vectorized implementation with multiple computational backends suitable for both research and production environments.

Features

Multiple Backends: Choose between numpy (default), numba (JIT-compiled CPU), and cupy (GPU-accelerated) for performance tuning.
Fast, vectorized implementation of MMSBM.
Support for both simple and cross-validated fitting.
Parallel processing for multiple sampling runs.
Comprehensive model statistics and evaluation metrics.
Compatible with Python 3.7+.

Installation

The base library can be installed with pip: bash pip install mmsbm

For accelerated backends, you can install the optional dependencies:

Numba (JIT Compilation on CPU): bash pip install mmsbm[numba]

CuPy (NVIDIA GPU Acceleration): Make sure you have a compatible NVIDIA driver and CUDA toolkit installed. Then install with: bash pip install mmsbm[cupy]

You can also install all optional dependencies with: bash pip install mmsbm[numba,cupy]

Performance & Backends

This library uses a backend system to perform the core computations of the Expectation-Maximization algorithm. You can specify the backend when you initialize the model, giving you control over the performance characteristics.

```python from mmsbm import MMSBM

Use the default, pure NumPy backend

modelnumpy = MMSBM(usergroups=2, item_groups=4, backend='numpy')

Use the Numba backend for JIT-compiled CPU acceleration

modelnumba = MMSBM(usergroups=2, item_groups=4, backend='numba')

Use the CuPy backend for GPU acceleration

modelcupy = MMSBM(usergroups=2, item_groups=4, backend='cupy') ```

numpy (Default): A highly optimized, pure NumPy implementation. It is universally compatible and requires no extra dependencies beyond NumPy itself.
numba: Uses the Numba library to just-in-time (JIT) compile the core computational loops. This can provide a significant speedup on the CPU, especially for large datasets. It is recommended for users who want better performance without a dedicated GPU. Note that there is some issue with the parallelization of samples which makes numba slower for smaller datasets.
cupy: Offloads computations to a compatible NVIDIA GPU using the CuPy library. This provides the best performance but requires a CUDA-enabled GPU and the appropriate drivers. Note that there is some overhead for transferring data to and from the GPU, so it's most effective on larger models where the computation time outweighs the data transfer time.

Note: There are some issues with numba parallelization and cupy data transfer to the GPU which don't guarantee they will be superior to Numpy. Please try different backends to find the best one for your data and your system.

Usage

Data Format

The input data should be a pandas DataFrame with exactly 3 columns calles users, items, and ratings. For example:

```python import pandas as pd from random import choice

train = pd.DataFrame( { "users": [f"user{choice(list(range(5)))}" for _ in range(100)], "items": [f"item{choice(list(range(10)))}" for _ in range(100)], "ratings": [choice(list(range(1, 6))) for _ in range(100)] } )

test = pd.DataFrame( { "users": [f"user{choice(list(range(5)))}" for _ in range(50)], "items": [f"item{choice(list(range(10)))}" for _ in range(50)], "ratings": [choice(list(range(1, 6))) for _ in range(50)] } )

```

Model Configuration

```python

from mmsbm import MMSBM

Initialize the MMSBM class:

model = MMSBM( usergroups=2, # Number of user groups itemgroups=4, # Number of item groups backend='numba', # Specify the computational backend (numpy, numba, or cupy) iterations=500, # Number of EM iterations sampling=5, # Number of parallel EM runs (different random initializations); the best run is kept seed=1, # Random seed for reproducibility debug=False # Enable debug logging ) ```

Note on sampling
Setting sampling to a value greater than 1 makes the library launch that many independent EM optimizations in parallel, each starting from a different random initialization. Once all runs finish, the one with the highest log-likelihood is selected. This increases the chances of finding a better (global) solution at the cost of extra computation time.

Training Methods

The library offers two complementary ways to train a model:

Simple Fit – runs the EM algorithm once on the full training set. This is the fastest option and is appropriate when you already have a train-test split (or when you do not need a validation step).
Cross-Validation Fit – automatically splits the input data into k folds (default 5), trains a separate model on each (k-1) subset, and evaluates it on the held-out fold. The routine returns the accuracy of every fold so you can inspect the variability and pick hyper-parameters more reliably. It is slower because it performs the fit k times but provides an unbiased estimate of generalisation performance.

Simple Fit

python model.fit(train)

Cross-Validation Fit

python accuracies = model.cv_fit(train, folds=5) print(f"Mean accuracy: {np.mean(accuracies):.3f} ± {np.std(accuracies):.3f}")

Making Predictions

python predictions = model.predict(test)

Model Evaluation

Note: you need to predict before running model.score().

```python results = model.score()

Access various metrics

accuracy = results['stats']['accuracy'] mae = results['stats']['mae']

Access model parameters

theta = results['objects']['theta'] # User group memberships eta = results['objects']['eta'] # Item group memberships pr = results['objects']['pr'] # Rating probabilities ```

Running Tests

To run tests do the following:

pytest

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

TODO

Progress bars are not working for jupyter notebooks.
There is a persistent (albeit harmless) warning when using the cupy backend.
Numba and cupy backends show unexpected behaviour on the "sampling" parallelization.
Improve the treatment of the prediction / score steps and the results object
Add sampling as an extra axis in the EM objects for more efficiency

References

[1]: Godoy-Lorite, Antonia, et al. "Accurate and scalable social recommendation using mixed-membership stochastic block models." Proceedings of the National Academy of Sciences 113.50 (2016): 14207-14212.

Owner

Name: eudald-seeslab
Login: eudald-seeslab
Kind: organization

Repositories: 2
Profile: https://github.com/eudald-seeslab

Citation (CITATION.cff)

cff-version: 1.2.2
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Correig-Fraga"
    given-names: "Eudald"
    orcid: "https://orcid.org/0000-0001-8556-0469"
title: "mmsbm"
version: 1.0.1
doi: 10.5281/zenodo.15011623
date-released: 2021-08-06
url: "https://github.com/eudald-seeslab/mmsbm"

GitHub Events

Total

Create event: 8
Release event: 2
Issues event: 4
Watch event: 1
Issue comment event: 1
Push event: 20

Last Year

Create event: 8
Release event: 2
Issues event: 4
Watch event: 1
Issue comment event: 1
Push event: 20

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 118
Total Committers: 1
Avg Commits per committer: 118.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Eudald	e**d@c**t	118

Committer Domains (Top 20 + Academic)

correig.net: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 3
Total pull requests: 5
Average time to close issues: about 2 years
Average time to close pull requests: 2 minutes
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.67
Average comments per pull request: 0.0
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: about 8 hours
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 2.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

ecorreig (2)
sungsushi (1)

Pull Request Authors

ecorreig (5)

Top Labels

Issue Labels

enhancement (1) bug (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 68 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 17
Total maintainers: 1

pypi.org: mmsbm

Compute Mixed Membership Stochastic Block Models.

Homepage: https://github.com/eudald-seeslab/mmsbm
Documentation: https://mmsbm.readthedocs.io/
License: BSD-3-Clause License
Latest release: 1.0.1
published 8 months ago

Versions: 17
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 68 Last month

Rankings

Dependent packages count: 10.0%

Dependent repos count: 21.7%

Forks count: 22.6%

Average: 25.3%

Downloads: 33.1%

Stargazers count: 38.8%

Maintainers (1)

eudald

Last synced: 7 months ago

Dependencies

setup.py pypi

numpy *
pandas *
scikit-learn *
scipy *
tqdm *

mmsbm

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Mixed Membership Stochastic Block Models

Features

Installation

Performance & Backends

Use the default, pure NumPy backend

Use the Numba backend for JIT-compiled CPU acceleration

Use the CuPy backend for GPU acceleration

Usage

Data Format

Model Configuration

Initialize the MMSBM class:

Training Methods

Simple Fit

Cross-Validation Fit

Making Predictions

Model Evaluation

Access various metrics

Access model parameters

Running Tests

Contributing

TODO

References

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: mmsbm

Rankings

Maintainers (1)

Dependencies