https://github.com/5uperpalo/churnpred

https://github.com/5uperpalo/churnpred

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: 5uperpalo
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 15.8 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme

README.MD

Customer churn predictor for an assignment below.

Documentation: https://5uperpalo.github.io/churnpred/

``` In this assignment you're tasked with developing a machine learning solution for churn prediction to identify which customers are likely to leave a service (column "Exited" in the attached dataset). This assignment is meant to assess

  • analytical skills and reasoning
  • design and modelling choices, e.g. choices with respect to measuring model performance
  • coding skills, e.g. modularity, readability, reproducibility, any other best practices in software development

Please note that multiple solutions may exist and we do not expect a production ready solution, though any reflections on how you may wish to productionalise your solution are welcome. You are free to choose the medium (e.g., notebooks, python scripts).

Additional explanation of independent variables:

NumberOfProducts - the number of accounts and bank-affiliated products HasCreditCard - whether a customer has a credit card CustomerFeedback - latest customer feedback, if available ```

Solution

Please see the Notebooks section. The notebooks are sorted from 0 to 5. Notebooks start with gathering auxiliary data that I could extract from the provided dataset, e.g. 'country origin of the surname'. This is followed by Exploratory Data Analysis of features and target in the notebooks 2, 3. In the notebook 4, I presented a Trainer object that handles training an hyperparameter search of the model. In the notebook 5 I made a quick analysis of the model and it's predictions using SHAP values.

The final solution uses LightGBM, a GBM model of my choice. I chose GBM as 4 out of top 5 models in H2O AutoML were GBMs.

Additional work note mentioning

In notebook 00_auxiliary_features_surname_origin_country_classification.ipynb I adjusted(copy/paste+adjust) a BERT model for surname origin prediction. Due to lack of time I could not gather additional data that would help with model training, but I left some ideas in the notebook.

The solution was tested in a virtual machine, spawned from jupyter/datascience-notebook:python-3.10 image in Zero-to-JupyterHub solution. As the bare metal server with GPU was down in the kubernetes, I had to do additional troubleshooting and fixing.

The code is easily extendable to multiclass, regression and quantile_regression tasks.

Installation

The code was tested on

Install using pip directly from github:

bash pip install git+https://github.com/5uperpalo/ecovadis_assignment.git

Locally

bash git clone https://github.com/5uperpalo/ecovadis_assignment.git cd ecovadis_assignment pip install .

Documentation

```bash

to build locally

cd docs pip install -r requirements.txt mkdocs build --clean

to push to github pages

mkdocs gh-deploy

if you want to run webserver locally

mkdocs serve ```

Code quality

Before pushing a code or making a pull request please run codestyle checks and tests bash ./code_style.sh pytest --doctest-modules churn_pred --cov-report xml --cov-report term --disable-pytest-warnings --cov=churn_pred tests/

Owner

  • Name: Pavol Mulinka
  • Login: 5uperpalo
  • Kind: user
  • Location: Barcelona, ES
  • Company: CTTC

Data Scientist / Machine learning Enthusiast & former network engineer

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels