credit-risk-prediction
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: PrashantJha183
- License: other
- Language: Jupyter Notebook
- Default Branch: main
- Size: 63.5 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Credit Risk Prediction using Machine Learning
This repository contains Python scripts, Jupyter notebooks, and data for a comprehensive machine learning study on credit risk prediction. The work evaluates various ML models on the UCI Credit Card Default dataset and the South German Credit dataset. It includes data preprocessing, feature analysis, class balancing with SMOTE, model training, evaluation metrics, and visualization of results.
🔗 Project Overview
- Analyze credit risk using modern ML models
- Compare performance of Logistic Regression, Random Forest, XGBoost, and SVM
- Address class imbalance via SMOTE
- Visualize:
- Feature importances
- Class distributions
- ROC curves
- Benchmark model training times
This work contributes to improving credit risk scoring and aligns with regulatory and ethical considerations for AI in finance.
📁 Project Structure
Your project folders and key files:
- data/ → Contains raw datasets and preprocessed numpy files
- evaluation/ → Stores generated plots, performance metrics, and CSV result files
- models/ → Saved trained models and feature names
- notebooks/ → Jupyter notebooks for data exploration, preprocessing, and analysis
- src/ → Python scripts for data processing, training, evaluation, plotting, and benchmarking
📦 Installation
- Clone the repository
bash
git clone https://github.com/PrashantJha183/Credit-risk-prediction.git
cd Credit-risk-prediction
- Create a virtual environment (recommended)
bash
python -m venv venv
source venv/bin/activate # On Linux/macOS
venv\Scripts\activate # On Windows
- Install dependencies
bash
pip install -r requirements.txt
🚀 How to Run
Train and evaluate a model
- Preprocess data
bash
python src/preprocess.py
- Train Models
bash
python src/train_model.py
- Evaluate All Models
bash
python src/evaluate_all_models_uci.py
python src/evaluate_all_models_german.py
📊 Generating Plots
Class distributions before and after SMOTE
Feature importance visualizations
ROC curves for model comparisons
Performance summary plots
📄 Results
Tables and plots summarizing model performance are saved in the evaluation/ directory:
- ROC curves for UCI and German datasets
- Feature importance plots
- Class balance before and after SMOTE
- Training time comparisons
- CSV files with detailed results
📝 Citation
If you use this code or results, please cite the repository:
Prashant Jha, Credit Risk Prediction using Machine Learning, GitHub Repository, https://github.com/PrashantJha183/Credit-risk-prediction
📊 Data Sources
Data used in this project comes from publicly available benchmark datasets:
🔗 Related Work
See references in the paper for further reading on credit scoring and machine learning.
Owner
- Name: Prashant Jha
- Login: PrashantJha183
- Kind: user
- Website: www.github.PrashantJha183
- Twitter: PrashantJha183
- Repositories: 1
- Profile: https://github.com/PrashantJha183
My name is Prashant Jha. I am pursuing Integrated Bachelors Of Computer Application + Master Of Computer Application (IMCA) from Parul University.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software or datasets, please cite as follows."
# Author of repository
authors:
- family-names: "Jha"
given-names: "Prashant"
orcid: "https://orcid.org/0009-0008-6830-0647"
title: "Credit Risk Prediction using Machine Learning"
version: "1.0.0"
date-released: 2025-07-12
url: "https://github.com/PrashantJha183/Credit-risk-prediction"
# The preferred citation
preferred-citation:
type: article
authors:
- family-names: "Jha"
given-names: "Prashant"
orcid: "https://orcid.org/0009-0008-6830-0647"
title: "Credit Risk Prediction Using Machine Learning: A Comparative Study on Benchmark Datasets"
journal: ""
volume:
issue:
year:
month:
start:
end:
doi: ""
# Additional references (datasets used)
references:
- type: dataset
title: "Default of Credit Card Clients Dataset"
authors:
- family-names: Yeh
given-names: I-Cheng
- family-names: Lien
given-names: Che-hui
year: 2009
publisher: UCI Machine Learning Repository
url: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
- type: dataset
title: "South German Credit Dataset"
authors:
- family-names: Hofmann
given-names: Peter
year: 2022
publisher: UCI Machine Learning Repository
url: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
GitHub Events
Total
- Push event: 8
- Create event: 2
Last Year
- Push event: 8
- Create event: 2
Dependencies
- imblearn *
- jupyter *
- matplotlib *
- numpy *
- pandas *
- scikit-learn *
- seaborn *
- xgboost *