fraud-detection-transaction-data

Pipeline for analyzing fraud in card transaction data-sets with an addition of graph features, modeled using Random Forest

https://github.com/janandrosiuk/fraud-detection-transaction-data

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.4%) to scientific vocabulary

Keywords

fraud-detection graph-data random-forest transaction-data
Last synced: 6 months ago · JSON representation

Repository

Pipeline for analyzing fraud in card transaction data-sets with an addition of graph features, modeled using Random Forest

Basic Info
  • Host: GitHub
  • Owner: JanAndrosiuk
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 8.27 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
fraud-detection graph-data random-forest transaction-data
Created almost 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

About the project

Although the number of transaction fraud events grows slower than the number of transactions in total, it is still a problem for many institutions. Detecting fraudulent transactions is challenging for multiple reasons, including a general lack of labels, class imbalance, and hidden and evolving fraud patterns. Even more difficulties emerge while modeling public transaction datasets, namely feature anonymization, missing information, and data aggregation. This work suggests a pipeline of modeling fraudulent transactions, which accounts for most of those concerns based on other researchers experience. From the modeling approaches, one can distinguish those based on transaction features and those using graph anomaly detection methods. This research combines both methods and presents cross-validation results over two datasets. Performance scores did not indicate the superior predictive power of any presented approach. Nevertheless, the addition of graph features in the case of the second dataset significantly improved validation scores and therefore indicated the direction for further research.

Links

[Vesta raw dataset]

[Elliptic raw dataset]


[project directory structure]

[miceforest imputation method]

[Explanation of HITS algorithm]

[Great YouTube channel explaining centrality and community algorithms]

Further research

  • [ ] Optimize hyperparameter tuning using cuML API to train models
  • [ ] Entity embedding method applied within cross validation function
  • [ ] Evaluate Graph Neural Network (GNN) methods

Owner

  • Name: Jan Androsiuk
  • Login: JanAndrosiuk
  • Kind: user
  • Location: Utrecht, Netherlands
  • Company: j.androsiuk99@gmail.com

Student at Utrecht University - Applied Data Science.

GitHub Events

Total
Last Year