stockout-predictor-lightgbm
A lightweight and scalable machine learning pipeline using LightGBM to predict stock-out dates for retail inventory, built with structured time-series features.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Repository
A lightweight and scalable machine learning pipeline using LightGBM to predict stock-out dates for retail inventory, built with structured time-series features.
Basic Info
- Host: GitHub
- Owner: hungchenhsu
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 25.4 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
📦 Stock-Out Date Predictor for Retail Inventory (LightGBM)
A production-ready machine learning pipeline for forecasting the number of days until stock-out for retail inventory SKUs.
Designed using structured daily features, tabular time-series techniques, and scalable LightGBM models.
⚠️ Note: This project was developed as a public-facing portfolio artifact.
The original dataset is private and under NDA, but the entire pipeline is reusable on any retail inventory dataset with daily BOH-style features.
🧠 Why this project?
Stock-outs are a persistent challenge in retail. Most forecasting models focus on how much to reorder. This project asks a different question:
“Given today’s data, when is this SKU likely to go out of stock?”
This shift from quantity to timing allows operations teams to prioritize interventions, optimize replenishment schedules, and reduce lost sales.
📚 Table of Contents
- 🔍 Use Case
- 📁 Repository Structure
- ⚙️ How It Works
- 📊 Model Overview
- 🚀 Getting Started
- 📁 Project Files
- 🧪 How to Run Locally
- 📃 License
🔍 Use Case
- Domain: Retail inventory forecasting
- Prediction target: Days until stock-out
- Input: Daily, per-SKU features (e.g., balance on hand, rolling statistics, calendar flags)
- Output: Predicted days until next stock-out
- Applications: Replenishment prioritization, alerting systems, shelf availability dashboards
- Designed for: SKUs with short-term inventory visibility and relatively stable consumption patterns
📁 Repository Structure
text
stockout-predictor/
├── model/
│ └── lgbm/
│ ├── lgbm_stockout_model.txt ← Final trained model
│ ├── permanent_oos_list.json ← List of always-OOS SKUs
│ └── params.yaml ← Feature list + hyperparameters
│
├── src/
│ ├── preprocessing/
│ │ └── stockout_preprocess.py ← Feature engineering pipeline
│ ├── training/
│ │ └── train_lgbm_stockout.py ← CV training + evaluation
│ └── inference/
│ ├── predict_batch.py ← Batch scoring script
│ └── predict_api.py ← (Optional) FastAPI endpoint
│
├── environment.yml
├── requirements.txt
└── README.md
⚙️ How It Works
Step 1: Data Preparation
- Input: Daily SKU-level inventory dataset (simulated or anonymized)
- Feature engineering includes:
- Rolling 3-day and 7-day averages and standard deviations of
DailyBOH - Day-of-week and U.S. holiday flags
- Native categorical features such as
itemsku,storeid, etc.
- Rolling 3-day and 7-day averages and standard deviations of
Step 2: Labeling
- The model learns to predict
days_to_oos— the number of days remaining until the SKU hits zeroDailyBOH(out-of-stock). - If a SKU is always out-of-stock during the training window, it is handled separately (excluded from training, returned as
0in inference).
Step 3: Modeling
- Algorithm: LightGBM Regressor
- Cross-validation: 5-fold
GroupKFoldgrouped byitemsku(ensures SKU leakage is avoided) - Target:
days_to_oosas a regression task - Categorical features: Handled natively by LightGBM (no manual encoding needed)
- Feature importance: Computed post-training to enhance model interpretability
Step 4: Deployment
- Trained model: Exported as
.txtfile (via LightGBM'sbooster.save_model()) - Batch prediction: Done via
predict_batch.py, returning daily forecast CSVs - Optional REST API: FastAPI endpoint (
predict_api.py) enables SKU-level real-time scoring
📊 Model Overview
| Metric | Value | |-----------------------|--------------------| | CV MAE (5-fold) | 6.76 ± 0.29 days | | Hold-out MAE | 3.55 days | | Observation window | 30 daily records per SKU | | Training samples | ~4,000 rows | | Inference latency | ~50 ms for 50k rows | | Model size | ~400 KB |
Why LightGBM?
Deep learning models such as LSTM and TimeGAN were evaluated during earlier experimentation. However:
- Most SKUs had only ~30 days of available data
- Sequence models required significantly longer time series to converge
- Deep models produced unstable or highly variable predictions
- TimeGAN failed to simulate realistic inventory dynamics under sparse conditions
LightGBM was chosen due to:
- Robust performance with short tabular sequences
- Efficient handling of thousands of unique SKUs via categorical splits
- Fast training (<1 min) and low inference overhead
- Easy interpretability via feature importance
- Simplified deployment (single
.txtmodel file, no GPU or DL framework required)
🚀 Getting Started
This repository includes a fully functional training pipeline, from feature engineering to model training and evaluation, focused on stock-out prediction using tabular time-series data.
Note: Inference scripts (e.g., for real-time or batch scoring) are not included in this version, as the focus is on modeling and experimentation.
🛠 Prerequisites
- Python 3.11.8
- Recommended: virtual environment
bash
python -m venv venv && source venv/bin/activate
pip install -r src/requirements.txt
📁 Project Files
| File | Description |
|----------------------------------|-------------|
| stockout_preprocess.py | Feature engineering pipeline that processes daily SKU-level data into model-ready format, including rolling statistics, OOS labeling, and holiday flagging. |
| train_lgbm_stockout.py | Trains the LightGBM model using 5-fold GroupKFold cross-validation and a 5-day hold-out set. Outputs CV metrics and saves the final model. |
| lgbm_stockout_model.txt | Trained LightGBM model exported in native .txt format. Can be reloaded for reuse with Booster().load_model(). |
| permanent_oos_list.json | Contains SKUs that were always out-of-stock during the training window. These are excluded from model training and treated as immediately OOS in evaluation. |
| params.yaml | Captures training hyperparameters, features used, evaluation results, and training metadata for full reproducibility. |
| training_walkthrough.ipynb | Jupyter notebook that demonstrates the full modeling process, including data loading, feature construction, model training, CV evaluation, and interpretability analysis. |
🧪 How to Run Locally
Once dependencies are installed, you can execute the full pipeline in two main steps:
1. Preprocess your dataset
bash
python stockout_preprocess.py --input data/raw/YOUR_DATASET.csv
This will generate: - trainproc.parquet and testproc.parquet - permanentooslist.json
2. Train the model
bash
python train_lgbm_stockout.py
This will generate: - lgbmstockoutmodel.txt - params.yaml - Console output with CV metrics and hold-out performance
💡 Optional
Use the included notebook 01trainingwalkthrough_lgbm.ipynb to: - Explore and visualize raw and engineered features - Analyze feature importance - Interpret hold-out performance and model behavior - Document assumptions and modeling decisions for future deployment
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🤝 Citation
If you find this repository helpful in your research, teaching, or professional work, please consider citing or linking back to the repository:
Hung-Chen Hsu. Stock-Out Date Predictor: Forecasting Retail Inventory Depletion Using LightGBM. GitHub, 2025.
Repository: https://github.com/hungchenhsu/stockout-predictor-lightgbm
This helps acknowledge the original work and supports open sharing in the machine learning and retail analytics community 🙌
Created with 💻 and 🎯 by Hung-Chen Hsu
Owner
- Name: Hung-Chen Hsu
- Login: hungchenhsu
- Kind: user
- Repositories: 1
- Profile: https://github.com/hungchenhsu
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this work, please cite it as below."
title: "Stock-Out Date Predictor: Forecasting Retail Inventory Depletion Using LightGBM"
authors:
- family-names: Hsu
given-names: Hung-Chen
orcid: https://orcid.org/0009-0007-9806-2443
date-released: 2025-05-22
version: "1.0"
url: https://github.com/hungchenhsu/stockout-predictor-lightgbm
repository-code: https://github.com/hungchenhsu/stockout-predictor-lightgbm
license: MIT
type: software
keywords:
- stock-out prediction
- inventory forecasting
- LightGBM
- retail analytics
- time-series regression
abstract: >
This project presents a modular and interpretable machine learning pipeline to forecast
the number of days until stock-out for retail inventory SKUs. Built using LightGBM and structured
daily inventory data, the pipeline includes BOH-based rolling features, holiday-aware flags, and
per-SKU grouping to deliver fast and accurate short-horizon depletion predictions.
The model is trained on a 30-day observation window and evaluated using grouped cross-validation.
Source data is excluded due to NDA constraints, but the full codebase is reusable on any structured retail dataset.
GitHub Events
Total
- Push event: 30
- Create event: 2
Last Year
- Push event: 30
- Create event: 2
Dependencies
- PyYAML ==6.0.1
- joblib ==1.2.0
- lightgbm ==4.5.0
- numpy ==1.26.4
- pandas ==2.2.3
- scikit-learn ==1.5.2
- joblib 1.2.0.*
- lightgbm 4.5.0.*
- numpy 1.26.4.*
- pandas 2.2.3.*
- python 3.11.8.*
- pyyaml 6.0.1.*
- scikit-learn 1.5.2.*