stockout-predictor-lightgbm

A lightweight and scalable machine learning pipeline using LightGBM to predict stock-out dates for retail inventory, built with structured time-series features.

https://github.com/hungchenhsu/stockout-predictor-lightgbm

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A lightweight and scalable machine learning pipeline using LightGBM to predict stock-out dates for retail inventory, built with structured time-series features.

Basic Info
  • Host: GitHub
  • Owner: hungchenhsu
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 25.4 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 9 months ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

📦 Stock-Out Date Predictor for Retail Inventory (LightGBM)

A production-ready machine learning pipeline for forecasting the number of days until stock-out for retail inventory SKUs.
Designed using structured daily features, tabular time-series techniques, and scalable LightGBM models.

⚠️ Note: This project was developed as a public-facing portfolio artifact.
The original dataset is private and under NDA, but the entire pipeline is reusable on any retail inventory dataset with daily BOH-style features.


🧠 Why this project?

Stock-outs are a persistent challenge in retail. Most forecasting models focus on how much to reorder. This project asks a different question:

“Given today’s data, when is this SKU likely to go out of stock?”

This shift from quantity to timing allows operations teams to prioritize interventions, optimize replenishment schedules, and reduce lost sales.


📚 Table of Contents


🔍 Use Case

  • Domain: Retail inventory forecasting
  • Prediction target: Days until stock-out
  • Input: Daily, per-SKU features (e.g., balance on hand, rolling statistics, calendar flags)
  • Output: Predicted days until next stock-out
  • Applications: Replenishment prioritization, alerting systems, shelf availability dashboards
  • Designed for: SKUs with short-term inventory visibility and relatively stable consumption patterns

📁 Repository Structure

text stockout-predictor/ ├── model/ │ └── lgbm/ │ ├── lgbm_stockout_model.txt ← Final trained model │ ├── permanent_oos_list.json ← List of always-OOS SKUs │ └── params.yaml ← Feature list + hyperparameters │ ├── src/ │ ├── preprocessing/ │ │ └── stockout_preprocess.py ← Feature engineering pipeline │ ├── training/ │ │ └── train_lgbm_stockout.py ← CV training + evaluation │ └── inference/ │ ├── predict_batch.py ← Batch scoring script │ └── predict_api.py ← (Optional) FastAPI endpoint │ ├── environment.yml ├── requirements.txt └── README.md


⚙️ How It Works

Step 1: Data Preparation

  • Input: Daily SKU-level inventory dataset (simulated or anonymized)
  • Feature engineering includes:
    • Rolling 3-day and 7-day averages and standard deviations of DailyBOH
    • Day-of-week and U.S. holiday flags
    • Native categorical features such as itemsku, storeid, etc.

Step 2: Labeling

  • The model learns to predict days_to_oos — the number of days remaining until the SKU hits zero DailyBOH (out-of-stock).
  • If a SKU is always out-of-stock during the training window, it is handled separately (excluded from training, returned as 0 in inference).

Step 3: Modeling

  • Algorithm: LightGBM Regressor
  • Cross-validation: 5-fold GroupKFold grouped by itemsku (ensures SKU leakage is avoided)
  • Target: days_to_oos as a regression task
  • Categorical features: Handled natively by LightGBM (no manual encoding needed)
  • Feature importance: Computed post-training to enhance model interpretability

Step 4: Deployment

  • Trained model: Exported as .txt file (via LightGBM's booster.save_model())
  • Batch prediction: Done via predict_batch.py, returning daily forecast CSVs
  • Optional REST API: FastAPI endpoint (predict_api.py) enables SKU-level real-time scoring

📊 Model Overview

| Metric | Value | |-----------------------|--------------------| | CV MAE (5-fold) | 6.76 ± 0.29 days | | Hold-out MAE | 3.55 days | | Observation window | 30 daily records per SKU | | Training samples | ~4,000 rows | | Inference latency | ~50 ms for 50k rows | | Model size | ~400 KB |

Why LightGBM?

Deep learning models such as LSTM and TimeGAN were evaluated during earlier experimentation. However:

  • Most SKUs had only ~30 days of available data
  • Sequence models required significantly longer time series to converge
  • Deep models produced unstable or highly variable predictions
  • TimeGAN failed to simulate realistic inventory dynamics under sparse conditions

LightGBM was chosen due to:

  • Robust performance with short tabular sequences
  • Efficient handling of thousands of unique SKUs via categorical splits
  • Fast training (<1 min) and low inference overhead
  • Easy interpretability via feature importance
  • Simplified deployment (single .txt model file, no GPU or DL framework required)

🚀 Getting Started

This repository includes a fully functional training pipeline, from feature engineering to model training and evaluation, focused on stock-out prediction using tabular time-series data.
Note: Inference scripts (e.g., for real-time or batch scoring) are not included in this version, as the focus is on modeling and experimentation.

🛠 Prerequisites

  • Python 3.11.8
  • Recommended: virtual environment

bash python -m venv venv && source venv/bin/activate pip install -r src/requirements.txt


📁 Project Files

| File | Description | |----------------------------------|-------------| | stockout_preprocess.py | Feature engineering pipeline that processes daily SKU-level data into model-ready format, including rolling statistics, OOS labeling, and holiday flagging. | | train_lgbm_stockout.py | Trains the LightGBM model using 5-fold GroupKFold cross-validation and a 5-day hold-out set. Outputs CV metrics and saves the final model. | | lgbm_stockout_model.txt | Trained LightGBM model exported in native .txt format. Can be reloaded for reuse with Booster().load_model(). | | permanent_oos_list.json | Contains SKUs that were always out-of-stock during the training window. These are excluded from model training and treated as immediately OOS in evaluation. | | params.yaml | Captures training hyperparameters, features used, evaluation results, and training metadata for full reproducibility. | | training_walkthrough.ipynb | Jupyter notebook that demonstrates the full modeling process, including data loading, feature construction, model training, CV evaluation, and interpretability analysis. |


🧪 How to Run Locally

Once dependencies are installed, you can execute the full pipeline in two main steps:

1. Preprocess your dataset

bash python stockout_preprocess.py --input data/raw/YOUR_DATASET.csv

This will generate: - trainproc.parquet and testproc.parquet - permanentooslist.json

2. Train the model

bash python train_lgbm_stockout.py

This will generate: - lgbmstockoutmodel.txt - params.yaml - Console output with CV metrics and hold-out performance

💡 Optional

Use the included notebook 01trainingwalkthrough_lgbm.ipynb to: - Explore and visualize raw and engineered features - Analyze feature importance - Interpret hold-out performance and model behavior - Document assumptions and modeling decisions for future deployment


📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Citation

If you find this repository helpful in your research, teaching, or professional work, please consider citing or linking back to the repository:

Hung-Chen Hsu. Stock-Out Date Predictor: Forecasting Retail Inventory Depletion Using LightGBM. GitHub, 2025.
Repository: https://github.com/hungchenhsu/stockout-predictor-lightgbm

This helps acknowledge the original work and supports open sharing in the machine learning and retail analytics community 🙌


Created with 💻 and 🎯 by Hung-Chen Hsu

Owner

  • Name: Hung-Chen Hsu
  • Login: hungchenhsu
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this work, please cite it as below."
title: "Stock-Out Date Predictor: Forecasting Retail Inventory Depletion Using LightGBM"
authors:
  - family-names: Hsu
    given-names: Hung-Chen
    orcid: https://orcid.org/0009-0007-9806-2443
date-released: 2025-05-22
version: "1.0"
url: https://github.com/hungchenhsu/stockout-predictor-lightgbm
repository-code: https://github.com/hungchenhsu/stockout-predictor-lightgbm
license: MIT
type: software
keywords:
  - stock-out prediction
  - inventory forecasting
  - LightGBM
  - retail analytics
  - time-series regression
abstract: >
  This project presents a modular and interpretable machine learning pipeline to forecast 
  the number of days until stock-out for retail inventory SKUs. Built using LightGBM and structured 
  daily inventory data, the pipeline includes BOH-based rolling features, holiday-aware flags, and 
  per-SKU grouping to deliver fast and accurate short-horizon depletion predictions. 
  The model is trained on a 30-day observation window and evaluated using grouped cross-validation. 
  Source data is excluded due to NDA constraints, but the full codebase is reusable on any structured retail dataset.

GitHub Events

Total
  • Push event: 30
  • Create event: 2
Last Year
  • Push event: 30
  • Create event: 2

Dependencies

requirements.txt pypi
  • PyYAML ==6.0.1
  • joblib ==1.2.0
  • lightgbm ==4.5.0
  • numpy ==1.26.4
  • pandas ==2.2.3
  • scikit-learn ==1.5.2
environment.yml conda
  • joblib 1.2.0.*
  • lightgbm 4.5.0.*
  • numpy 1.26.4.*
  • pandas 2.2.3.*
  • python 3.11.8.*
  • pyyaml 6.0.1.*
  • scikit-learn 1.5.2.*