Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: PrashantJha183
  • License: other
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 63.5 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

Credit Risk Prediction using Machine Learning

This repository contains Python scripts, Jupyter notebooks, and data for a comprehensive machine learning study on credit risk prediction. The work evaluates various ML models on the UCI Credit Card Default dataset and the South German Credit dataset. It includes data preprocessing, feature analysis, class balancing with SMOTE, model training, evaluation metrics, and visualization of results.


🔗 Project Overview

  • Analyze credit risk using modern ML models
  • Compare performance of Logistic Regression, Random Forest, XGBoost, and SVM
  • Address class imbalance via SMOTE
  • Visualize:
    • Feature importances
    • Class distributions
    • ROC curves
  • Benchmark model training times

This work contributes to improving credit risk scoring and aligns with regulatory and ethical considerations for AI in finance.


📁 Project Structure

Your project folders and key files:

  • data/ → Contains raw datasets and preprocessed numpy files
  • evaluation/ → Stores generated plots, performance metrics, and CSV result files
  • models/ → Saved trained models and feature names
  • notebooks/ → Jupyter notebooks for data exploration, preprocessing, and analysis
  • src/ → Python scripts for data processing, training, evaluation, plotting, and benchmarking

📦 Installation

  1. Clone the repository

bash git clone https://github.com/PrashantJha183/Credit-risk-prediction.git cd Credit-risk-prediction

  1. Create a virtual environment (recommended)

bash python -m venv venv source venv/bin/activate # On Linux/macOS venv\Scripts\activate # On Windows

  1. Install dependencies

bash pip install -r requirements.txt


🚀 How to Run

Train and evaluate a model

  1. Preprocess data

bash python src/preprocess.py

  1. Train Models

bash python src/train_model.py

  1. Evaluate All Models

bash python src/evaluate_all_models_uci.py python src/evaluate_all_models_german.py


📊 Generating Plots

  • Class distributions before and after SMOTE

  • Feature importance visualizations

  • ROC curves for model comparisons

  • Performance summary plots


📄 Results

Tables and plots summarizing model performance are saved in the evaluation/ directory:

  • ROC curves for UCI and German datasets
  • Feature importance plots
  • Class balance before and after SMOTE
  • Training time comparisons
  • CSV files with detailed results

📝 Citation

If you use this code or results, please cite the repository:

Prashant Jha, Credit Risk Prediction using Machine Learning, GitHub Repository, https://github.com/PrashantJha183/Credit-risk-prediction


📊 Data Sources

Data used in this project comes from publicly available benchmark datasets:


🔗 Related Work

See references in the paper for further reading on credit scoring and machine learning.


Owner

  • Name: Prashant Jha
  • Login: PrashantJha183
  • Kind: user

My name is Prashant Jha. I am pursuing Integrated Bachelors Of Computer Application + Master Of Computer Application (IMCA) from Parul University.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software or datasets, please cite as follows."

# Author of repository
authors:
  - family-names: "Jha"
    given-names: "Prashant"
    orcid: "https://orcid.org/0009-0008-6830-0647"


title: "Credit Risk Prediction using Machine Learning"
version: "1.0.0"
date-released: 2025-07-12
url: "https://github.com/PrashantJha183/Credit-risk-prediction"

# The preferred citation 
preferred-citation:
  type: article
  authors:
    - family-names: "Jha"
      given-names: "Prashant"
      orcid: "https://orcid.org/0009-0008-6830-0647"
  title: "Credit Risk Prediction Using Machine Learning: A Comparative Study on Benchmark Datasets"
  journal: ""   
  volume: 
  issue: 
  year: 
  month:
  start:
  end: 
  doi: "" 

# Additional references (datasets used)
references:
  - type: dataset
    title: "Default of Credit Card Clients Dataset"
    authors:
      - family-names: Yeh
        given-names: I-Cheng
      - family-names: Lien
        given-names: Che-hui
    year: 2009
    publisher: UCI Machine Learning Repository
    url: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

  - type: dataset
    title: "South German Credit Dataset"
    authors:
      - family-names: Hofmann
        given-names: Peter
    year: 2022
    publisher: UCI Machine Learning Repository
    url: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)

GitHub Events

Total
  • Push event: 8
  • Create event: 2
Last Year
  • Push event: 8
  • Create event: 2

Dependencies

requirements.txt pypi
  • imblearn *
  • jupyter *
  • matplotlib *
  • numpy *
  • pandas *
  • scikit-learn *
  • seaborn *
  • xgboost *