credit-card-prediction

💳 This repository focuses on building a predictive model to assess the likelihood of credit card defaults. The project includes data analysis, feature engineering, and machine learning to provide accurate default predictions.

https://github.com/mindful-ai-assistants/credit-card-prediction

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Keywords

artificial-intelligence data-analysis data-science jupyter logistic-regression machine-learning oneness-consciousness predictive-modeling python3 scikit-learn

Keywords from Contributors

quantum-circuit measurement interactive quantum-computing photoshop packaging mathp mathplotlib puthon3 design-ai-and-code

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: Mindful-AI-Assistants
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage: https://github.com/Mindful-AI-Assistants/credit-card-prediction
Size: 6.47 MB

Statistics

Stars: 2
Watchers: 1
Forks: 1
Open Issues: 3
Releases: 0

Topics

artificial-intelligence data-analysis data-science jupyter logistic-regression machine-learning oneness-consciousness predictive-modeling python3 scikit-learn

Created over 1 year ago · Last pushed 6 months ago

Metadata Files

Readme Contributing Funding License Citation Security

README.md

<!--

https://github.com/Mindful-AI-Assistants/credit-card-prediction

Credit Card Defaults Prediction

-->

💳 Credit Card Defaults Prediction

📉 This project predicts credit card default risk using data analysis and machine learning.

This repository aims to develop a predictive model for assessing credit card default risks, encompassing data analysis, feature engineering, and machine learning for accurate predictions.

Credit card default prediction involves using analytical approaches, such as data analysis techniques and statistical methods, to forecast the likelihood of an individual failing to repay their outstanding debt. This process typically includes:

Incorporating Alternative Variables: Adding geographical, behavioral, and consumption data to traditional factors like income, assets, and payment history enhances customer profiling.
Individual Credit Scoring: Evaluating credit scores on an individual basis for improved risk assessment.
Behavioral Profile Analysis: Assessing customer behavior to forecast potential defaults.

This strategy enables financial institutions to refine their credit granting processes and manage risk more efficiently.

Executive Summary
Introduction
Theoretical Framework
Dataset Description
Exploratory Data Analysis
Methodology
Results
Generated Graphs
Conclusion
How to Run
Data Analysis Report
Contribute
GitHub Repository
Contact
References
License

Executive Summary

This project aims to predict credit card defaults using a Logistic Regression model. Our primary focus is identifying significant factors such as payment history, educational level, and customer age, which influence the likelihood of default. The results of this project will help financial institutions make better decisions regarding risk management.

Introduction

Predicting credit card defaults is crucial for financial institutions. It allows them to better manage risks and prevent financial losses by identifying customers who are likely to default on their payments. This study uses a dataset of credit card customers and applies Logistic Regression, a common technique for binary classification, to predict default risk.

Theoretical Framework

- Definition of Default

Default occurs when a customer fails to meet their financial obligations within the specified timeframe. For financial institutions, this represents a significant risk, as recovering the money owed can be difficult and costly.

Credit Analysis and Predictive Modeling
Credit analysis evaluates a customer’s financial profile, while predictive modeling anticipates future behavior, such as the likelihood of default, based on historical data.
Logistic Regression
Logistic Regression is a statistical method used for binary classification problems (such as default or non-default). It calculates the probability of a customer defaulting based on specific features.

Dataset Description

👉🏻 Click here to get the dataset

The dataset contains information on credit card customers, with variables such as:

LIMIT_BAL: Total credit amount granted.
EDUCATION: Education level of customers.
MARRIAGE: Marital status (married, single, others).
AGE: Age of the customer.
PAY0 to PAY6: Payment status of the previous months.
BILLAMT1 to BILLAMT6: Credit card bill amounts for the past six months.
default payment next month: Indicator of default in the following month (1 = yes, 0 = no).

Exploratory Data Analysis

Several visualizations were created to identify patterns in the data. Key insights include:

Education: Lower education levels are associated with higher default rates.
Marital Status: Single customers tend to default more than married customers.
Age: Younger customers have higher default rates.
Credit Limit: Lower credit limits are linked to higher default rates.
Payment History: Customers with a history of delayed payments are more likely to default.

The following Python code loads, processes, and analyzes the data.

```python copy code

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.modelselection import traintestsplit from sklearn.linearmodel import LogisticRegression from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracyscore, confusionmatrix, classificationreport, rocauc_score ```

Load Dataset

```python copy code

path = r'/path/to/dataset.xls' defaults = pd.read_excel(path, engine="xlrd") ```

Preprocess Dataset

```python copy code

defaults.columns = [col for col in defaults.iloc[0, :]] defaults.drop(columns=["ID", "SEX"], inplace=True) defaults.drop(index=0, inplace=True) defaults.index = list(range(30000)) ```

Adjust variables for consistency

```python copy code

defaults["EDUCATION"] = defaults["EDUCATION"].apply(lambda x: 5 if x == 6 or x == 0 else x) defaults["MARRIAGE"] = defaults["MARRIAGE"].apply(lambda x: 3 if x == 0 else x) ```

Methodology

Data Preparation

Cleaning and Preparation: We removed irrelevant columns (e.g., ID) and sensitive variables (e.g., SEX).
Transformations: Adjustments were made to ensure data consistency in EDUCATION and MARRIAGE.

Model Development

The data was split into training (80%) and testing (20%) sets. The model was trained using Logistic Regression, which is ideal for binary classification tasks such as predicting defaults.

```python copy code

Split data into training and testing sets

X = defaults.drop(columns=["MARRIAGE", "default payment next month"], axis=1) y = defaults["default payment next month"] ```

Standardize features

```python copy code

scaler = StandardScaler() Xscaled = pd.DataFrame(scaler.fittransform(X), columns=X.columns) ```

Train-test split

```python copy code

Xtrain, Xtest, ytrain, ytest = traintestsplit(Xscaled, y, testsize=0.2, random_state=42) ```

Train logistic regression model

```python coph code

model = LogisticRegression(randomstate=42, maxiter=2000) model.fit(Xtrain, ytrain) ```

Model Evaluation

We evaluated the model’s performance using several metrics, including accuracy, a confusion matrix, and a classification report.

```python copy code

Model evaluation

ytrainpred = model.predict(Xtrain) ytestpred = model.predict(Xtest) ```

Accuracy

```python copy code

trainaccuracy = round(accuracyscore(ytrain, ytrainpred) * 100, 2) testaccuracy = round(accuracyscore(ytest, ytestpred) * 100, 2) ```

Confusion Matrix

```python copy code

matrix = confusionmatrix(ytest, ytestpred) sns.heatmap(matrix, annot=True, fmt='d', cmap='viridis') ```

Display evaluation metrics

```python copy code

print(f"Training Accuracy: {trainaccuracy}%") print(f"Test Accuracy: {testaccuracy}%") print(classificationreport(ytest, ytestpred)) ```

Results

The Logistic Regression model achieved an accuracy of approximately 80%. The confusion matrix and classification report demonstrated that the model was able to differentiate between defaulters and non-defaulters with reasonable efficiency.

Generated Graphs

Here are the key visualizations, their corresponding code, and descriptions:

1. Default Distribution by Educational Level

```python copy code

Plotting default rate by education level

fig, ax = plt.subplots(figsize=(10, 6)) sns.countplot(data=defaults, x="EDUCATION", hue="default payment next month", palette="viridis", ax=ax) ax.settitle("Default Distribution by Educational Level") ax.setxlabel("Education Level") ax.setylabel("Count") ax.setxticklabels(["Graduate School", "University", "High School", "Others", "Unknown"]) plt.show() ```

Description: This graph shows the distribution of defaults across different education levels, indicating that individuals with lower education levels tend to have a higher likelihood of defaulting on their payments.

2. Proportion of Defaulters and Non-Defaulters by Education

```python copy code

Proportions of default vs non-default by education level using heatmap

aux = defaults.copy() auxeducation = aux.groupby("EDUCATION")["default payment next month"].valuecounts(normalize=True).unstack() plt.figure(figsize=(8, 6)) sns.heatmap(aux_education, annot=True, cmap="viridis", fmt=".2f") plt.title("Proportion of Defaulters and Non-Defaulters by Education") plt.show() ```

Description: This heatmap illustrates the proportions of defaulters and non-defaulters based on education levels. It reveals that higher education correlates with a lower probability of default.

3. Default Distribution by Marital Status

```python copy code

Plotting default rate by marital status

plt.figure(figsize=(10, 6)) sns.countplot(data=defaults, x="MARRIAGE", hue="default payment next month", palette="viridis") plt.title("Default Distribution by Marital Status") plt.xticks(ticks=[0, 1, 2], labels=["Married", "Single", "Other"]) plt.show() ```

Description: This chart displays the distribution of defaults across different marital statuses, indicating that single individuals have a higher tendency to default compared to married individuals.

4. Proportion of Defaulters by Marital Status

```python copy code

Proportions of default vs non-default by marital status using heatmap

auxmarriage = aux.groupby("MARRIAGE")["default payment next month"].valuecounts(normalize=True).unstack() plt.figure(figsize=(8, 6)) sns.heatmap(aux_marriage, annot=True, cmap="viridis", fmt=".2f") plt.title("Proportion of Defaulters by Marital Status") plt.show() ```

Description: The heatmap displays the proportions of defaulters and non-defaulters categorized by marital status, showing minimal variation among the different groups.

5. Default and Non-Default Rates by Age

```python copy code

Plotting default rate by age

plt.figure(figsize=(17, 9)) sns.countplot(data=defaults, x="AGE", hue="default payment next month", palette="viridis") plt.title("Default and Non-Default Rates by Age") plt.xlabel("Age") plt.ylabel("Count") plt.show() ```

Description: This graph indicates that the number of non-defaulters decreases more significantly with age, suggesting that older customers are less likely to default.

6. Default and Non-Default Rates by Credit Limit

```python copy code

Plotting default rate by credit limit quantiles

aux['LIMITBALquantile'] = pd.qcut(defaults['LIMITBAL'], q=4, labels=["Up to 50,000", "50,000 to 140,000", "140,000 to 240,000", "Above 240,000"]) plt.figure(figsize=(15, 8)) sns.countplot(data=aux, x="LIMITBAL_quantile", hue="default payment next month", palette="viridis") plt.title("Default and Non-Default Rates by Credit Limit") plt.show() ```

Description: This graph reveals a clear trend: as the credit limit increases, the probability of default decreases. Customers with lower credit limits are at a higher risk of defaulting

7. Payment Status vs Default

```python copy code

Heatmap of payment status vs default

fig, axes = plt.subplots(2, 3, figsize=(20, 12)) months = ["April", "May", "June", "July", "August", "September"] for i, ax in enumerate(axes.flat): sns.heatmap(data=proportion(defaults[[f"PAY{i}", "default payment next month"]]), annot=True, cmap="viridis", fmt=".2f", ax=ax) ax.settitle(f"Payment Status in {months[i]}") plt.show() ```

Description: This heatmap demonstrates that the default rate is consistently higher starting from the second month of payment delays, indicating a significant correlation between delayed payments and defaults.

8. Bill Amount Impact on Default

```python copy code

Plotting default rate by bill amount quantiles

fig, axis = plt.subplots(6, 1, figsize=(25, 45)) months = ["April", "May", "June", "July", "August", "September"] for i, ax in enumerate(axis.flat): aux[f"BILLAMT{i + 1}quantiles"] = pd.qcut(defaults[f"BILLAMT{i + 1}"], q=9) sns.countplot(data=aux, x=f"BILLAMT{i + 1}quantiles", hue="default payment next month", palette="viridis", ax=ax) ax.settitle(f"Bill Amount in {months[i]}") plt.show() ```

Description: The charts illustrate that the differences in bill amounts between defaulters and non-defaulters are relatively subtle, suggesting these variables may not significantly affect the prediction of defaults on their own.

9. Previous Payments Impact on Default

```python copy code

Plotting default rate by previous payments quantiles

fig, axis = plt.subplots(6, 1, figsize=(15, 40)) for i, ax in enumerate(axis.flat): aux[f"PAYAMT{i + 1}quantiles"] = pd.qcut(defaults[f"PAYAMT{i + 1}"], q=4) sns.countplot(data=aux, x=f"PAYAMT{i + 1}quantiles", hue="default payment next month", palette="viridis", ax=ax) ax.settitle(f"Previous Payments in {months[i]}") plt.show() ```

Description: These graphs reveal a consistent trend: higher previous payment amounts are associated with lower default rates, indicating that prior payment behavior can be a strong predictor of future defaults.

10. Confusion Matrix

```python copy code

Plotting confusion matrix for model evaluation

from sklearn.metrics import confusionmatrix plt.figure(figsize=(8, 6)) matrix = confusionmatrix(ytest, ytest_pred) sns.heatmap(matrix, annot=True, fmt='d', cmap='viridis', xticklabels=['Non-Defaulter', 'Defaulter'], yticklabels=['Non-Defaulter', 'Defaulter']) plt.title('Confusion Matrix') plt.xlabel('Predicted') plt.ylabel('Actual') plt.show() ```

Description: The confusion matrix shows a strong negative correlation between payment status and defaults, indicating that improvements in payment behavior could significantly reduce default risk.

Conclusion

In this project, we successfully built a Logistic Regression model to predict credit card defaults, achieving an accuracy of around 80%. The exploratory data analysis highlighted significant predictors of default, including education level, marital status, age, and payment history. The findings underscore the importance of these factors in assessing credit risk.

Future work could explore more complex modeling techniques, such as decision trees or ensemble methods, to enhance predictive accuracy. Additionally, incorporating more diverse datasets could improve the robustness of the model.

How to Run

To run the project locally, follow these steps:

Clone the Repository:

```bash git clone https://github.com/your-username/credit-card-default-prediction.git

cd credit-card-default-prediction ```

Install Required Packages:

It's recommended to use a virtual environment. You can create one and install the required packages as follows:

```bash python -m venv venv

source venv/bin/activate # On Windows use venv\Scripts\activate

pip install -r ```

Run the Analysis:

Execute the analysis script to load the data and generate the graphs:

bash python analysis.py

View Results:

Open the generated graphs and review the model evaluation metrics displayed in the console.

Data Analysis Report

You can find the detailed analysis of this project in the Data Analysis Report here. This report provides comprehensive insights into the features, methodology, and model evaluation.

References

Contribute

We welcome contributions to improve this project. If you'd like to contribute, follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes and commit them (git commit -m 'Add some feature').
Push to the branch (git push origin feature-branch).
Open a pull request, explaining the changes you made.

GitHub Repository

You can explore the project repository, access the code, and contribute on GitHub: GitHub Repository Link

Contact

For any questions or suggestions, please feel free to reach out:

👩🏻‍🚀 Fabiana 🚀 Campanari - Contacts Hub

👨🏽‍🚀 Pedro 🛰️ Vyctor -LinkedIn

Contributors

💌 Contact Me

For any questions or suggestions, please feel free to reach out:

🛸๋ My Contacts Hub

────────────── 🔭⋆ ──────────────

➣➢➤ Back to Top

Owner

Name: 𖤐 Mindful AI ॐ

Login: Mindful-AI-Assistants

Kind: organization

Email: fabicampanari@proton.me

Location: Brazil

Website: https://github.com/Mindful-AI-Assistants

Repositories: 4

Profile: https://github.com/Mindful-AI-Assistants

𖤐 Empowering businesses with AI-driven technologies like Copilots, Agents, Bots and Predictions, alongside intelligent Decision-Making Support 𖤐

Citation (CITATION.cff)

cff-version: 1.2.0 title: MindfulAI ⚝ credit card prediction message: If you really want to cite this repository, here's how you should cite it. type: software authors: - given-names: credit-card-prediction repository-code:https://github.com/Mindful-AI-Assistants/credit-card-prediction license: MIT License

GitHub Events

Total

Issues event: 23

Watch event: 1

Delete event: 160

Issue comment event: 5

Push event: 171

Pull request event: 331

Fork event: 1

Create event: 168

Last Year

Issues event: 23

Watch event: 1

Delete event: 160

Issue comment event: 5

Push event: 171

Pull request event: 331

Fork event: 1

Create event: 168

Committers

Last synced: over 1 year ago

All Time

Total Commits: 180

Total Committers: 3

Avg Commits per committer: 60.0

Development Distribution Score (DDS): 0.267

Past Year

Commits: 180

Committers: 3

Avg Commits per committer: 60.0

Development Distribution Score (DDS): 0.267

Top Committers

Name Email Commits

Fabiana 🚀 Campanari f****i@g****m 132

Fabiana 🚀 Campanari 1****i 44

dependabot[bot] 4****] 4

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 14

Total pull requests: 250

Average time to close issues: 2 days

Average time to close pull requests: 1 day

Total issue authors: 2

Total pull request authors: 2

Average comments per issue: 0.0

Average comments per pull request: 0.03

Merged pull requests: 227

Bot issues: 1

Bot pull requests: 78

Past Year

Issues: 14

Pull requests: 250

Average time to close issues: 2 days

Average time to close pull requests: 1 day

Issue authors: 2

Pull request authors: 2

Average comments per issue: 0.0

Average comments per pull request: 0.03

Merged pull requests: 227

Bot issues: 1

Bot pull requests: 78

View more stats

Top Authors

Issue Authors

FabianaCampanari (21)

Pull Request Authors

FabianaCampanari (428)

dependabot[bot] (81)

Top Labels

Issue Labels

Pull Request Labels

dependencies (81) python (81)

Dependencies

package-lock.json npm

package.json npm

@primer/css 21.1.1

requirements.txt pypi

beautifulsoup4 ==4.12.3

ipywidgets ==8.1.5

joblib ==1.4.2

jupyter ==1.1.1

keras ==3.6.0

matplotlib ==3.9.2

notebook ==7.2.2

numpy ==2.1.2

pandas ==2.2.3

psycopg2-binary ==2.9.9

requests ==2.32.3

scikit-learn ==1.5.2

scipy ==1.14.1

seaborn ==0.13.2

sqlalchemy ==2.0.36

streamlit ==1.39.0

tensorflow ==2.17.0

setup.py pypi

Lista *

dependencia1 *

dependencia2 *

Name	Email	Commits
Fabiana 🚀 Campanari	f**i@g**m	132
Fabiana 🚀 Campanari	1****i	44
dependabot[bot]	4****]	4

credit-card-prediction

Science Score: 44.0%

Keywords

Keywords from Contributors

Basic Info

Statistics

Topics

Metadata Files

https://github.com/Mindful-AI-Assistants/credit-card-prediction

Credit Card Defaults Prediction

💳 Credit Card Defaults Prediction

📉 This project predicts credit card default risk using data analysis and machine learning.

Table of Contents

- Definition of Default

👉🏻 Click here to get the dataset

Several visualizations were created to identify patterns in the data. Key insights include:

Data Preparation

Split data into training and testing sets

Model evaluation

Here are the key visualizations, their corresponding code, and descriptions:

1. Default Distribution by Educational Level

Plotting default rate by education level

2. Proportion of Defaulters and Non-Defaulters by Education

Proportions of default vs non-default by education level using heatmap

3. Default Distribution by Marital Status

Plotting default rate by marital status

4. Proportion of Defaulters by Marital Status

Proportions of default vs non-default by marital status using heatmap

5. Default and Non-Default Rates by Age

Plotting default rate by age

6. Default and Non-Default Rates by Credit Limit

Plotting default rate by credit limit quantiles

7. Payment Status vs Default

Heatmap of payment status vs default

8. Bill Amount Impact on Default

Plotting default rate by bill amount quantiles

9. Previous Payments Impact on Default

Plotting default rate by previous payments quantiles

10. Confusion Matrix

Plotting confusion matrix for model evaluation

References

💌 Contact Me

For any questions or suggestions, please feel free to reach out:

🛸๋ My Contacts Hub

Copyright 2024 Mindful-AI-Assistants. Code released under the MIT license.

Citation (CITATION.cff)

GitHub Events

Total

Last Year

All Time

Past Year

Top Committers

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies