https://github.com/alokamgnaneswarasai/mambarec

Mamba for sequential recommendation

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Mamba for sequential recommendation

Basic Info

Host: GitHub
Owner: alokamgnaneswarasai
Language: Jupyter Notebook
Default Branch: master
Size: 22 MB

Statistics

Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme

State-Space Models for Large Sequence Modeling in Sequential Recommendation

Author: Alokam Gnaneswara Sai

Abstract

This project focuses on enhancing sequential recommendation systems using Mamba, a state-space model. The work aims to address challenges with handling long sequences in traditional methods, such as slow inference and higher training costs. The proposed solutions achieve faster inference, reduced training times, and improved accuracy compared to Transformer-based models like SASRec.

GitHub Repository

Introduction

Sequential recommendation systems are critical for applications such as e-commerce, video streaming, and music platforms. This work explores state-space models to efficiently capture long-range dependencies and proposes a hybrid approach combining Mamba and self-attention mechanisms.

Overview of Training Process
Figure 1: Training process for a sequential recommendation system.

Problem Statement

The task is to predict the next item a user will interact with based on their historical interactions. Formally, the goal is to learn a function:

$$f: (su, u) \to P(i{t+1} | s_u, u; \Theta)$$,

where $s_u$ is the interaction sequence and $\Theta$ are model parameters.

Methodology

1. Replacing Self-Attention with the Mamba Block

The Mamba block substitutes the self-attention mechanism within a standard Transformer block.
Replacing Self-Attention with Mamba
Figure 2: Architecture for replacing self-attention.

2. Replacing the Entire Transformer Block

The Mamba block replaces the entire Transformer block, including feedforward layers.
Replacing Transformer Block
Figure 3: Architecture for replacing the Transformer block.

3. Hybrid Approach

A mixture of self-attention and Mamba blocks, regulated by a gating network, is proposed for handling both short and long sequences.
Hybrid Approach
Figure 4: Hybrid model combining self-attention and Mamba.

Experiments

Datasets

Three datasets were used:
- LFM-1b: Last.fm listening histories.
- KuaiRand: Short-video interaction data.
- ML-1M: MovieLens 1M ratings dataset.

| Statistic | LFM-1b | KuaiRand | ML-1M | |-------------------------|-------------------|-----------------|-------------------| | Number of users | 120,322 | 27,285 | 6,040 | | Number of items | 31,634,450 | 32,038,725 | 3,952 | | Number of interactions | 1,088,161,692 | 322,278,385 | 100,209 | | Average sequence length | 9,043 | 11,811 | 165 |

Table 1: Statistics of the LFM-1b, KuaiRand, and ML-1M datasets.

Evaluation Metrics

To evaluate the performance of the proposed recommendation system, two widely recognized Top-N metrics are utilized: Hit Rate@K (HR@K) and Normalized Discounted Cumulative Gain@K (NDCG@K).

Hit Rate@K (HR@K)

Hit Rate@K quantifies the proportion of instances where the ground-truth next item is present within the top $K$ recommended items. For a dataset consisting of $N$ users, HR@K is defined as:

$$ HR@K = \frac{1}{N} \sum{i=1}^{N} \mathbb{I}(yi \in \mathcal{R}_i^K) $$

where: - $yi$ represents the ground-truth next item for user $i$, - $\mathcal{R}i^K$ denotes the set of top $K$ recommended items for user $i$, - $\mathbb{I}(\cdot)$ is the indicator function.

Normalized Discounted Cumulative Gain@K (NDCG@K)

NDCG@K evaluates the ranking quality of recommended items by assigning higher importance to items ranked closer to the top:

$$ NDCG@K = \frac{1}{N} \sum{i=1}^{N} \frac{\mathbb{I}(yi \in \mathcal{R}i^K)}{\log2(\text{rank}i(yi) + 1)} $$

where: - $\text{rank}{i}(yi)$ represents the position of the ground-truth item $y_i$.

These metrics provide an evaluation framework that captures both recommendation accuracy and ranking quality.

Table 2: Performance metrics across datasets.

Results

| Dataset | Metric | GRU4Rec | SASRec | RecMamba | MambaRec* | MoEMaTrRec | |--------------------|-------------|-------------|------------|--------------|---------------|----------------| | KuaiRand (2k) | Hit@10 | 0.4439 | 0.7857 | 0.7652 | 0.8502 | 0.8419 | | | NDCG@10 | 0.3184 | 0.6126 | 0.5828 | 0.7360 | 0.7972 | | | Hit@20 | 0.4617 | 0.8361 | 0.8172 | 0.8781 | 0.8777 | | | NDCG@20 | 0.3230 | 0.6254 | 0.5960 | 0.7430 | 0.7264 | | KuaiRand (5k) | Hit@10 | 0.2957 | 0.8642 | 0.7733 | 0.8854 | 0.8769 | | | NDCG@10 | 0.3080 | 0.7242 | 0.5951 | 0.7782 | 0.7608 | | | Hit@20 | 0.2318 | 0.8982 | 0.8253 | 0.9071 | 0.9049 | | | NDCG@20 | 0.2350 | 0.7328 | 0.6082 | 0.7846 | 0.7679 |

Table: Evaluation of GRU4Rec, SASRec, RecMamba, and our models on the KuaiRand dataset with varying context lengths.

| Dataset |---------------------|-------- | LFM-1b (0.25k) | Hit@10 | | NDCG@10 | | Hit@20 | | NDCG@20 | LFM-1b (0.5k) | Hit@10 | | NDCG@10 | | Hit@20 | | NDCG@20 | LFM-1b (1k) | | NDCG@10 | | Hit@20 | | NDCG@20 | LFM-1b (2k) | | NDCG@10 | | Hit@20 | | NDCG@20 | LFM-1b (5k) | | NDCG@10 | | Hit@20 | | NDCG@20 | Metric | GRU4Rec | SASRec | RecMamba | MambaRec* | MoEMaTrRec | -----|-------------|------------|--------------|---------------|----------------| | 0.3469 | 0.7533 | 0.7431 | 0.7548 | 0.7493 | | 0.2285 | 0.6187 | 0.5290 | 0.6120 | 0.6138 | | 0.3512 | 0.7791 | 0.7612 | 0.7786 | 0.7792 | | 0.2345 | 0.6328 | 0.5386 | 0.6212 | 0.6214 | | 0.3586 | 0.7822 | 0.7464 | 0.7897 | 0.7842 | | 0.2384 | 0.6321 | 0.5940 | 0.6522 | 0.6484 | | 0.3623 | 0.8144 | 0.7658 | 0.8147 | 0.8169 | | 0.2394 | 0.6351 | 0.6002 | 0.6550 | 0.6560 | | Hit@10 | 0.3469 | 0.8059 | 0.8023 | 0.8130 | 0.8104 | | 0.2391 | 0.6767 | 0.6948 | 0.7187 | 0.6964 | | 0.3638 | 0.8410 | 0.8389 | 0.8420 | 0.8409 | | 0.2398 | 0.6856 | 0.7148 | 0.7261 | 0.7015 | | Hit@10 | 0.3597 | 0.8388 | 0.8319 | 0.8429 | 0.8295 | | 0.2401 | 0.7537 | 0.7393 | 0.7582 | 0.7328 | | 0.3864 | 0.8598 | 0.8570 | 0.8612 | 0.8573 | | 0.2469 | 0.7591 | 0.7456 | 0.7604 | 0.7407 | | Hit@10 | 0.3800 | 0.8794 | 0.8727 | 0.8548 | 0.8461 | | 0.2366 | 0.7905 | 0.7908 | 0.7713 | 0.7625 | | 0.4055 | 0.8969 | 0.8904 | 0.8923 | 0.8700 | | 0.2432 | 0.7950 | 0.7953 | 0.7967 | 0.7712 |

Table: Evaluation of GRU4Rec, SASRec, RecMamba, and our model on the LFM-1b dataset with varying context lengths.

Key Findings

RQ1: Longer sequences improved performance significantly, with the best results for sequences of length 5000.
RQ2: State-space models like MambaRec outperformed RNN and Transformer-based models for long sequences.
RQ3: MambaRec was more resource-efficient while achieving competitive accuracy.

Efficiency Comparison

Mamba models demonstrated 60–70% faster training and evaluation compared to SASRec.
| Dataset | Model | GPU Memory (GB) | Training Time (s/ 5 epochs) | Evaluation Time (s) | |-----------------|------------|---------------------|--------------------------------|-------------------------| | LFM-1b(1k) | SASRec | 25.42 | 1319.18 | 119.2 | | | RecMamba | 20.48 | 819.13 | 64.32 | | | MambaRec* | 24.26 | 840.28 | 61.86 |

Table: Efficiency Comparison on LFM-1b.

Ablation Study

Insights:

Positional Embeddings: Mamba models inherently capture positional information.
Dropout and Layer Normalization: Improved robustness and stability.
Mamba Blocks: Increased model complexity without proportional performance gains.

Related Work

Sequential recommendation systems have traditionally relied on neural networks like RNNs and Transformers. This work highlights the potential of state-space models for efficient and scalable sequential modeling.

Future Work

Explore bidirectional modeling for richer contextual encoding.
Incorporate hierarchical structures to capture both local and global patterns in user behavior.

Owner

Login: alokamgnaneswarasai
Kind: user

Repositories: 1
Profile: https://github.com/alokamgnaneswarasai

GitHub Events

Total

Watch event: 5
Member event: 1
Push event: 70
Create event: 2

Last Year

Watch event: 5
Member event: 1
Push event: 70
Create event: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/alokamgnaneswarasai/mambarec

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

readme.md

State-Space Models for Large Sequence Modeling in Sequential Recommendation

Author: Alokam Gnaneswara Sai

Abstract

Introduction

Problem Statement

Methodology

1. Replacing Self-Attention with the Mamba Block

2. Replacing the Entire Transformer Block

3. Hybrid Approach

Experiments

Datasets

Evaluation Metrics

Hit Rate@K (HR@K)

Normalized Discounted Cumulative Gain@K (NDCG@K)

Results

Key Findings

Efficiency Comparison

Ablation Study

Insights:

Related Work

Future Work

Owner

GitHub Events

Total

Last Year