ccac-2025-ncaa-bracket-xgboost
NCAA bracket prediction model for CCAC 2025 using XGBoost and engineered geospatial features.
https://github.com/hungchenhsu/ccac-2025-ncaa-bracket-xgboost
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.4%) to scientific vocabulary
Repository
NCAA bracket prediction model for CCAC 2025 using XGBoost and engineered geospatial features.
Basic Info
- Host: GitHub
- Owner: hungchenhsu
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 22.9 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
NCAA Bracket Prediction using XGBoost – CCAC 2025 (Kaggle Competition)
🏀 Project Overview
This repository contains our complete solution for the "NCAA Basketball Bracket Prediction" Kaggle competition. The primary goal was to predict outcomes of the NCAA Basketball Tournament matches accurately. Our final model ranked 3rd on the private leaderboard.
We used XGBoost, a powerful gradient-boosted decision tree model, combined with comprehensive feature engineering and hyperparameter tuning to optimize predictions.
📌 Motivation & Technology
Why XGBoost?
- Efficiency: Handles large datasets quickly.
- Accuracy: Provides superior predictive performance.
- Robustness: Handles missing values and categorical features efficiently.
- Tuning Flexibility: Allows detailed hyperparameter optimization for maximizing performance.
Why Feature Engineering?
- Captures nuanced interactions between teams, regional factors, and historical performance metrics.
- Provides more predictive power to the models, boosting accuracy significantly.
📖 Table of Contents
- 🏀 Project Overview
- 📌 Motivation & Technology
- 📚 Repository Contents
- 🚀 Project Workflow
- 🏆 Kaggle Competition Results
- 📄 License
- 🤝 Citation
📚 Repository Contents
⚠️ Note: Final submission files and competition datasets are not included in this repository due to Kaggle’s competition data rules. Please refer to the official dataset to download required files after accepting the competition rules.
bracket_training.csv: Training dataset provided by Kaggle.bracket_test.csv: Testing dataset for generating predictions.CCAC 2025 - Institutions.csv: Additional information about institutions.submission_template.csv: Template for Kaggle submission.- Final Notebook (
ccac2025_ncaa_bracket_prediction.ipynb): Complete Python script containing:- Data Loading
- Preprocessing & Cleaning
- Feature Engineering
- Model Training and Evaluation
- Hyperparameter Tuning
- Generating Kaggle Submission
🚀 Project Workflow
Step 1: Data Exploration & Cleaning
- Handled missing data using median and constant imputation.
- Extracted numerical postal code information from categorical data.
Step 2: Advanced Feature Engineering
- Created regional interaction features (
East_West_Diff,South_Midwest_Diff). - Calculated distances using the Haversine formula to capture geographic proximity.
- Merged aggregated team performance statistics from historical contests (average wins, losses, tournament seed, attendance, and win percentage).
Step 3: Modeling
- Implemented
XGBClassifierwith tuned hyperparameters:- Learning rate, max depth, minchildweight, gamma, and subsample.
- Employed cross-validation strategies to avoid overfitting.
Step 4: Ensembling & Final Predictions
- Tested multiple hyperparameter configurations and selected the best-performing model based on validation accuracy.
- Generated final predictions for Kaggle submission, achieving our highest public (2nd Place) and private (3rd Place) leaderboard ranking.
🏆 Kaggle Competition Results
- Final Ranking: 3rd Place (Private Leaderboard)
- Best Public Score: 0.63070
- Best Private Score: 0.63324 🌟

🖼️ Project Presentation
For a full overview of this project in presentation format, please see:
📃 CCAC 2025 – Nimbus 2025 Team Presentation (PDF)
This presentation was created by Nimbus 2025,
a team led by Hung-Chen Hsu, with teammates Da Fang Lin and Yiran Liu.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🤝 Citation
If you find this repository helpful in your research, teaching, or other work,
please consider citing or linking back to the repository:
Hung-Chen Hsu. NCAA Bracket Prediction for CCAC 2025 using XGBoost. GitHub, 2025. Repository: https://github.com/hungchenhsu/ccac-2025-ncaa-bracket-xgboost
This helps acknowledge the original work and supports open sharing in the ML community 🙌
Created with 💻 and 🎯 by Hung-Chen Hsu
Owner
- Name: Hung-Chen Hsu
- Login: hungchenhsu
- Kind: user
- Repositories: 1
- Profile: https://github.com/hungchenhsu
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this work, please cite it as below:"
authors:
- family-names: Hsu
given-names: Hung-Chen
title: "NCAA Basketball Bracket Prediction (CCAC 2025) using XGBoost"
version: "1.0"
date-released: 2025-03-22
url: "https://github.com/hungchenhsu/ccac-2025-ncaa-bracket-xgboost"
repository-code: "https://github.com/hungchenhsu/ccac-2025-ncaa-bracket-xgboost"
license: "MIT"
keywords:
- NCAA
- Bracket Prediction
- XGBoost
- Kaggle
- Machine Learning
GitHub Events
Total
- Push event: 35
- Create event: 2
Last Year
- Push event: 35
- Create event: 2