aiinsurance
R package, using machine learning for insurance claims classification
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.1%) to scientific vocabulary
Repository
R package, using machine learning for insurance claims classification
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
aiinsurance 
Creator: Hamed Vaheb
Icon Source: aiinsurance.io
Introduction
The aiinsurance package is dedicated to a project for the workshop of master of data science at university of Luxembourg.
In this package, functions are defined for various stages of classification of the outcome column of the Car Insurance Data.
The raw dataset along with the processed train and test datasets, obtained by using the aiinsurance functions (prepared to be fed in models) are incorporated in the package.
The package's documentation, including documentation of functions and datasets, as well as unit tests for the functions are elaborated on Documentation.
Using package's functions, a machine learning classification project was done on the Car Insurance Dataset. The link to project's implementation, and a detailed report is provided on Report.
Moreover, a pipeline including main stages of the classification (refer to targets Pipeline), and a shiny interactive app (refer to shiny App) visualizing evaluation plots are included in the package.
Note: All functions from this pakcage are suffixed with hmd so as not to confuse with other built-in packages.
Install
To install this package in an R edtior (e.g., Rstudio), install devtools library and then install my package. Please follow the following commands:
r
install.packages("devtools")
library(devtools)
install_github("berserkhmdvhb/aiinsurance")
Usage
Please run the commands of this section in a console of an R editor (e.g., Rstudio)
renv packages
Either use renv when creating a project, or if you haven't, install the renv library, load it, and then use the renv.lock file (by copying it to project's directory) to install the requied packages. Please follow the following commands in the console:
r
install.packages("renv")
library(renv)
renv::restore()
To view the documentation my package, how to use its functions, and to read the report for a machine learning project that I did using this package, refer to Documentation.
targets Pipeline
To run the pipeline, adhere to the instructions provided below:
First clone the package's repository, using the following command in a command line:
git clone git@github.com:berserkhmdvhb/aiinsurance.git
Then navigate to to the cloned folder and open aiinsurance.Rproj in an R editor to create a project.
Install the packages required for the aiinsurance package from the renv.lock file (refer to renv), then install the aiinsurance package itself (refer to Install), install and load the targets library, and the run tar_make() to run the pipeline.
Note : Make sure to install the aiinsurance package after restoring the renv, as restoring will only include the packages required for the aiinsurance, but no the package itself, and hence if one installs renv after, the aiinsurance will no longer exist.
After restoring the renv and installing the aiinsurance package, run the following in the console:
r
library(aiinsurance)
library(targets)
targets::tar_make()
After the pipeline is successfully run, there should be now two plots called plot_glm and plot_rf (as can bee seen in the figure in Visualize). Both of the plots display ROC curve, while the former attributes to the logistic regression (implemented by the glm), and the latter attributes to random forest classifier. The two plots are very similar, as the models had very similar performance. Two view the two plots, run the following in the console:
r
targets::tar_read(plot_glm)
targets::tar_read(plot_rf)
Directory Tree
bash
├── R
│ ├── functions.R
├── run.R
├── run.sh
├── _targets.R
Visualize
To visalize the components of the pipeline, run the following:
targets::tar_visnetwork()
The following figure should be displayed:

Pipeline Steps
Evidenced by the visualization, the two datasets used in the pipeline are insurance_train and insurance_test.
They are datasets processed from the raw car_insurance_data, and all the three mentioned datasets are incorporated in the package.
The steps of the pipeline are elaborated on in the following:
- Logistic Regression Part
- Access the
insurance_trainwithget_data_train(), and insurancetest with `getdata_test()`. - Store the
outcomecolumn (labels) frominsurance_testfor later evaluation in steps vi (and iii from Random Forest Part) - Fit the
insurance_traininto theglm_fit_hmdfunction (from the package) so as to apply the logistic regression model on data ,and thn store the fitted object inmodel_glm - Predict the
insurance_testusing the fitted objectmodel_glmfrom step iii, by feeding bothinsurance_testandmodel_glmto theglm_predict_hmd, and store the prediction results inpredictions_glm. - Extract prediction probabilities (required for ROC curve) from
predictions_glmand store them inpred_proba_glm - Compute ROC metrics be feeding
actualdata (from step ii) and prediction probabilitiespred_proba_glmto theroc_obj_calfunction, store the result inroc_obj_glm - Plot the roc curve by feeding
roc_obj_glmto theplot_roc_curvefunction, store the plot inplot_glm
- Access the
- Random Forest Part
- Fit the
insurance_traininto therf_fit_hmdfunction (from the package) so as to apply the random forest model on data ,and thn store the fitted object inmodel_rf - Predict the
insurance_testusing the fitted objectmodel_random_forestfrom step iii, by feeding bothinsurance_testandmodel_rfto therf_predict_hmd, and store the prediction results inpredictions_rf. - Extract prediction probabilities (required for ROC curve) from
predictions_rfand store them inpred_proba_rf - Compute ROC metrics be feeding
actualdata (from step ii) and prediction probabilitiespred_proba_rfto theroc_obj_calfunction, store the result inroc_obj_rf - Plot the roc curve by feeding
roc_obj_rfto theplot_roc_curvefunction, store the plot inplot_rf
- Fit the
shiny App
Unlike the targets pipeline, the shiny is part of the packages' functions.
To view the shiny app, adhere to the instructions provided below:
Simply install the aiinsurance package (refer to Install) and execute the following commands to display the app.
r
library(aiinsurance)
shiny_run_hmd()
Although the shiny app could be based on targets, since I wanted the shiny app to work just by installing the package (and without the need to clone anything), I separated the shiny app and targerts pipeline.
Directory Tree
bash
├── inst
│ ├── plot_app
│ │ ├── app-cache
│ │ ├── global.R
│ │ ├── server.R
│ │ └── ui.R
├── R
│ ├── shiny_run_hmd.R
Documentation
The documentation of the package can be accessed with the following commands.
r
help(package = aiinsurance)
testthat Unit Tests
All functions include type checking of inputs.
Some functions were supplied with unit tests. For purpose of illustration, I will describe the tests for the eval_hmd function, which accepts actual and predicted objects, and then computes evaluation metrics suitable for a binary classified prediction. The function returns a hash containing various evaluation metrics, as well as a confusion matrix plot.
Since the inputs actual and predicted should have certain conditions, the following type checkings and other tests were embedded inside the eval_hmd function:
- The actual input and predict input can be matrices, and if not, they should be of class either
numericorfactor. If they are of classfactor, they will be converted tonumericclass, as this makes later tests and computations more convenient. - The actual input and predict input should be binary, therefore if any of them contain more than 2 values, error will raise.
- The predict input should contain any class that is not present in actual input. But since the user might not always put the arguments' names (actual and predicted), I will accept two cases, either the predict input is subset of actual, or actual is subset of predicted.
Moreover, using the testthat library the following unit tests have been added in the test-eval_hmd.R file (visible also in tree structure below):
test_that("check value ranges",...ensures all outputs' evaluation metrics, i.e., accuracy, precision, recall, fbetascore, and f1score have values in the range [0,1]test_that("check output class",...ensures the output of the functions is of class hash.
Directory Tree
```bash └── tests ├── testthat │ ├── test-categoricalshmd.R │ ├── test-evalhmd.R │ ├── test-glmfithmd.R │ └── test-normalizer_hmd.R └── testthat.R
```
Report
- The Rmarkdown for implementing the insurance claims' classification of the outcomes is provided in the following links:
- The written report explaining both theory and implementation can is provided in the following links:
rticles
The rticles library was used to produce the report. The arXiv pre-prints based on George Kour’s template is used as the template. Each time the Rmarkdown file is knitted, a .tex file and then .pdf file is generated. Since the .tex file is regenerated atuomatically, it is not possible to edit it. To edit the .tex file, first find the location of installed libraries with the following command in R console
r
.libPaths()
Navigate to this directory in command line and insert the following commands:
bash
cd ./rticles/rmarkdown/templates/arxiv/resources
nano template.tex
Now the edits in .tex file will be permanent.
Owner
- Name: Hamed
- Login: berserkhmdvhb
- Kind: user
- Repositories: 4
- Profile: https://github.com/berserkhmdvhb
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: AI Insurance Package
message: >-
If you use this software, please cite it using the
metadata from this file.
authors:
- family-names: "Vaheb"
given-names: "Hamed"
type: package
url: "https://github.com/berserkhmdvhb/aiinsurance"
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Dependencies
- R >= 4.1 depends
- Metrics * imports
- cachem * imports
- car * imports
- caret * imports
- dplyr * imports
- glmnet * imports
- hash * imports
- janitor * imports
- pROC * imports
- randomForest * imports
- rticles * imports
- shiny * imports
- shinycustomloader * imports
- stats * imports
- targets * imports
- ggplot2 * suggests
- imbalance * suggests
- knitr * suggests
- readr * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests
- tibble * suggests
- tidyverse * suggests
- visdat * suggests