ccac-2025-ncaa-bracket-xgboost

NCAA bracket prediction model for CCAC 2025 using XGBoost and engineered geospatial features.

https://github.com/hungchenhsu/ccac-2025-ncaa-bracket-xgboost

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.4%) to scientific vocabulary
Last synced: 8 months ago · JSON representation ·

Repository

NCAA bracket prediction model for CCAC 2025 using XGBoost and engineered geospatial features.

Basic Info
  • Host: GitHub
  • Owner: hungchenhsu
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 22.9 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

NCAA Bracket Prediction using XGBoost – CCAC 2025 (Kaggle Competition)

🏀 Project Overview

This repository contains our complete solution for the "NCAA Basketball Bracket Prediction" Kaggle competition. The primary goal was to predict outcomes of the NCAA Basketball Tournament matches accurately. Our final model ranked 3rd on the private leaderboard.

We used XGBoost, a powerful gradient-boosted decision tree model, combined with comprehensive feature engineering and hyperparameter tuning to optimize predictions.

CCAC Official Website


📌 Motivation & Technology

Why XGBoost?

  • Efficiency: Handles large datasets quickly.
  • Accuracy: Provides superior predictive performance.
  • Robustness: Handles missing values and categorical features efficiently.
  • Tuning Flexibility: Allows detailed hyperparameter optimization for maximizing performance.

Why Feature Engineering?

  • Captures nuanced interactions between teams, regional factors, and historical performance metrics.
  • Provides more predictive power to the models, boosting accuracy significantly.

📖 Table of Contents


📚 Repository Contents

⚠️ Note: Final submission files and competition datasets are not included in this repository due to Kaggle’s competition data rules. Please refer to the official dataset to download required files after accepting the competition rules.

  • bracket_training.csv: Training dataset provided by Kaggle.
  • bracket_test.csv: Testing dataset for generating predictions.
  • CCAC 2025 - Institutions.csv: Additional information about institutions.
  • submission_template.csv: Template for Kaggle submission.
  • Final Notebook (ccac2025_ncaa_bracket_prediction.ipynb): Complete Python script containing:
    • Data Loading
    • Preprocessing & Cleaning
    • Feature Engineering
    • Model Training and Evaluation
    • Hyperparameter Tuning
    • Generating Kaggle Submission

🚀 Project Workflow

Step 1: Data Exploration & Cleaning

  • Handled missing data using median and constant imputation.
  • Extracted numerical postal code information from categorical data.

Step 2: Advanced Feature Engineering

  • Created regional interaction features (East_West_Diff, South_Midwest_Diff).
  • Calculated distances using the Haversine formula to capture geographic proximity.
  • Merged aggregated team performance statistics from historical contests (average wins, losses, tournament seed, attendance, and win percentage).

Step 3: Modeling

  • Implemented XGBClassifier with tuned hyperparameters:
    • Learning rate, max depth, minchildweight, gamma, and subsample.
  • Employed cross-validation strategies to avoid overfitting.

Step 4: Ensembling & Final Predictions

  • Tested multiple hyperparameter configurations and selected the best-performing model based on validation accuracy.
  • Generated final predictions for Kaggle submission, achieving our highest public (2nd Place) and private (3rd Place) leaderboard ranking.

🏆 Kaggle Competition Results

  • Final Ranking: 3rd Place (Private Leaderboard)
  • Best Public Score: 0.63070
  • Best Private Score: 0.63324 🌟

3rd Place


🖼️ Project Presentation

For a full overview of this project in presentation format, please see:
📃 CCAC 2025 – Nimbus 2025 Team Presentation (PDF)

This presentation was created by Nimbus 2025,
a team led by Hung-Chen Hsu, with teammates Da Fang Lin and Yiran Liu.


📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Citation

If you find this repository helpful in your research, teaching, or other work,
please consider citing or linking back to the repository:

Hung-Chen Hsu. NCAA Bracket Prediction for CCAC 2025 using XGBoost. GitHub, 2025. Repository: https://github.com/hungchenhsu/ccac-2025-ncaa-bracket-xgboost

This helps acknowledge the original work and supports open sharing in the ML community 🙌


Created with 💻 and 🎯 by Hung-Chen Hsu

Owner

  • Name: Hung-Chen Hsu
  • Login: hungchenhsu
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this work, please cite it as below:"
authors:
  - family-names: Hsu
    given-names: Hung-Chen
title: "NCAA Basketball Bracket Prediction (CCAC 2025) using XGBoost"
version: "1.0"
date-released: 2025-03-22
url: "https://github.com/hungchenhsu/ccac-2025-ncaa-bracket-xgboost"
repository-code: "https://github.com/hungchenhsu/ccac-2025-ncaa-bracket-xgboost"
license: "MIT"
keywords:
  - NCAA
  - Bracket Prediction
  - XGBoost
  - Kaggle
  - Machine Learning

GitHub Events

Total
  • Push event: 35
  • Create event: 2
Last Year
  • Push event: 35
  • Create event: 2