ethereum-fraud-detection-models

Ethereum Fraud Detection Models

https://github.com/sepandhaghighi/ethereum-fraud-detection-models

Keywords

artificial-intelligence artificial-neural-networks datascience datascience-machinelearning ethereum fraud fraud-detection machine-learning machine-learning-algorithms machinelearning models neural-network python random-forest

Last synced: 10 months ago · JSON representation ·

Repository

Ethereum Fraud Detection Models

Basic Info

Host: GitHub
Owner: sepandhaghighi
License: mit
Language: Jupyter Notebook
Default Branch: master
Homepage:
Size: 7.66 MB

Statistics

Stars: 14
Watchers: 2
Forks: 2
Open Issues: 0
Releases: 1

Topics

artificial-intelligence artificial-neural-networks datascience datascience-machinelearning ethereum fraud fraud-detection machine-learning machine-learning-algorithms machinelearning models neural-network python random-forest

Created almost 4 years ago · Last pushed almost 4 years ago

Metadata Files

Readme License Citation

README.md

Ethereum Fraud Detection Models

Sepand Haghighi - Farzad Ramezani

September 2022

Overview

The goal of data analytics is to detect potential fraud by spotting anomalies or deviations from “normal” behavior or patterns. To do that, an expert establishes a baseline of non-fraudulent activity to compare to the suspicious dataset. Understanding the concept of fraud detection analytics requires knowledge of the definition of the terms fraud and fraud detection. Fraud is a crime or deceptive action done by a criminal to get unlawful gain or unlawful access to information and assets.

Fraud detection is the process of identifying this form of deceptive action. It can be done before fraud occurs, during the process of fraud, or after the fraud has taken place. Fraud detection analytics refers to a combination of techniques of fraud detection and data analytics that are employed to detect and prevent the occurrence of fraud. Some of the data analytics techniques that are used in fraud detection include data mining, clustering analysis, data pre-processing, and data matching.

It may also be possible to identify data known to be associated with fraud. Perhaps fraudulent activity is more likely to occur at certain times of the day, in certain geographic locations, in certain types of accounts or in certain amounts.

The following are the reasons why fraud detection analytics is important in organizations:

Reduced fraud exposure: With fraud detection analytics, the system seals any loopholes for conducting fraudulent activities. Fraudsters have limited opportunities for conducting fraud, which reduces fraud exposure.
Reliable fraud detection: Fraud detection analytics offers a reliable way of detecting fraudulent activity even before damage has been caused. This is in the case of early-detection systems. These systems can identify any attempt to undertake a fraudulent activity. This enhances control and security in the organization.
Increased customer trust: Fraud detection analytics ensures that the system of the organization is safe. When customers experience little or no security issues, they develop trust. Increased customer trust contributes towards customer loyalty, which is important for an organization’s growth.
Makes use of unstructured data: Many fraudsters conduct fraudulent actions when the data is unstructured. Fraud detection analytics is capable of reviewing unstructured data to identify and prevent fraudulent activities.
Supports data integration: This system collects data from diverse sources and combines it, which enhances the integration of data in the organization.
Detects hidden patterns: Some of the traditional techniques of identifying fraud may fail to detect hidden patterns. Fraud detection analytics is superior to these techniques because it can identify hidden trends, scenarios, and patterns.
Improved organizational performance: Fraud detection analytics minimizes fraudulent activities, which significantly reduces the loss of revenue as a result of fraud. Organizations achieve huge financial gains as a result of fraud analytics. These systems enhance efficiency and improve processes in financial transactions in organizations.

How can data analytics help detect and prevent fraud? Here are three real-world examples:

An insurance company used data analytics to uncover a fraudulent claim for flood damage to a car. By including social media data, its system was able to show that the car was out of town on the day the flood occurred.
PayPal uses data analytics to protect its customers against fraud. The company analyzes historical payment data to identify factors that are closely associated with potential fraud, such as the type of device used, country of origin and certain details from user profiles. The company uses this information to create machine-learning algorithms that evaluate each transaction for signs of fraud.
The SEC used data analytics to catch an investment advisor guilty of “cherry picking.” The data revealed that, rather than follow his firm’s pro rata allocation guidelines, the advisor allocated a disproportionate number of profitable trades to his account or those belonging to someone else with the same last name.

Old data analysis techniques were oriented toward extracting quantitative and statistical data characteristics. These techniques facilitate useful data interpretations and may help to urge better insights into the processes behind the info.

Although the normal data analysis techniques can indirectly lead us to knowledge, it’s still created by human analysts.

To go beyond, a knowledge analysis system has got to be equipped with a considerable amount of background, and be ready to perform reasoning tasks involving that knowledge and therefore the data provided. In effort to satisfy this goal, researchers have turned to ideas from the machine learning field.

This is a natural source of ideas, since the machine learning task are often described as turning background and examples (input) into knowledge (output).

If data processing leads to discovering meaningful patterns, data turns into information. Information or patterns that are novel, valid and potentially useful aren’t merely information, but knowledge.

One speaks of discovering knowledge, before hidden within the huge amount of knowledge, but now revealed.

The machine learning and AI solutions could also be classified into two categories: ‘supervised’ and ‘unsupervised’ learning.

These methods seek for accounts, customers, suppliers, etc. that behave ‘unusually’ so as to output suspicion scores, rules or visual anomalies, counting on the tactic .

Whether supervised or unsupervised methods are used, note that the output gives us only a sign of fraud likelihood. No stand-alone statistical analysis can assure that a specific object may be a fraudulent one, but they will identify them with very high degrees of accuracy.

Here we have compared various classification models to check which model gives the best result and also an attempt has been made to interpret them.

Classification models:

KNN
Random Forest
Neural Network (MLP)

Datasets

We will use two data set in this report.

We will analyze these two datasets both individually and in combination.

	Number of Features	Total Cases	Fraud Cases	Non-Fraud Cases
Ethereum Fraud Detection Dataset	37	9816	2179	7637
Ethereum Fraud Dataset	31	12146	5150	6996
Merged Dataset	17	20302	5675	14627

Table1. Datasets Overview

Requirements

Python >= 3.5
pandas >= 0.24.2
matplotlib >= 3.0.3
seaborn >= 0.9.1
numpy >= 1.18.5
scikit-learn >= 0.22.2
pycm >= 2.2
notebook >= 5.7.4

Run pip install -r requirements.txt or pip3 install -r requirements.txt

Notebooks

	GitHub Viewer	NB Viewer	Google Colab
Ethereum Fraud Detection Dataset	Link	Link	Link
Ethereum Fraud Dataset	Link	Link	Link
Merged Dataset	Link	Link	Link

Table2. Notebooks

Analytics Example

Here you can see a limited number of examples. The full version of this analytics and all codes can be seen in the notebooks!

ERC20_min_val_rec                                      0.154421
Total_ERC20_tnxs                                       0.114916
ERC20_uniq_rec_addr                                    0.085711
Time_Diff_between_first_and_last_Mins                  0.080427
ERC20_uniq_rec_contract_addr                           0.069955
Avg_min_between_received_tnx                           0.058168
Average_of_Unique_Received_From_Addresses              0.046456
total_ether_received                                   0.037595
total_transactions_including_tnx_to_create_contract    0.036712
ERC20_avg_val_rec                                      0.030953
avg_val_received                                       0.030111
Sent_tnx                                               0.022398
Received_Tnx                                           0.021866
min_val_sent                                           0.021263
ERC20_total_ether_sent                                 0.020375
ERC20_total_Ether_received                             0.018861
ERC20_uniq_sent_addr_1                                 0.018135
max_value_received                                     0.017657
ERC20_total_Ether_sent_contract                        0.016291
total_ether_balance                                    0.016271
min_value_received                                     0.014136
ERC20_uniq_sent_addr                                   0.013666
total_Ether_sent                                       0.011947
Avg_min_between_sent_tnx                               0.010480
avg_val_sent                                           0.010065
max_val_sent                                           0.008923
Average_of_Unique_Sent_To_Addresses                    0.008289
Number_of_Created_Contracts                            0.003955

Fig1. Features Importance

Best : Random Forest

Rank   Name              Class-Score       Overall-Score
1      Random Forest     0.76667           0.87778
2      KNN               0.725             0.78056
3      Neural Network    0.65              0.74722

Fig2. Confusion Matrices Compare

Fig3. Confusion Matrix

Fig4. Decision Tree Classifier Diagram

Cite

If you use this repo in your work, please cite it using the following metadata:

Haghighi, S., & Ramezani, F. (2022). Ethereum Fraud Detection Models (Version 1.0) [Computer software]. https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models


@software{Haghighi_Ethereum_Fraud_Detection_2022,
author = {Haghighi, Sepand and Ramezani, Farzad},
license = {MIT},
month = {10},
title = {{Ethereum Fraud Detection Models}},
url = {https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models},
version = {1.0},
year = {2022}
}

Owner

Name: Sepand Haghighi
Login: sepandhaghighi
Kind: user
Location: Aalborg, Denmark
Company: Denu

Website: https://www.sepand.tech
Twitter: sepkjaer20
Repositories: 124
Profile: https://github.com/sepandhaghighi

Open Source Enthusiast

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repo, please cite it as below."
title: "Ethereum Fraud Detection Models"
abstract: "Here we have compared various classification models to detect fraudulent transactions and check which model gives the best result and also an attempt has been made to interpret them."
authors:
  - family-names: "Haghighi"
    given-names: "Sepand"
  - family-names: "Ramezani"
    given-names: "Farzad"
version: 1.0
date-released: 2022-10-06
repository-code: "https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models"
license: MIT
keywords:
    - "Ethereum"
    - "python"
    - "Fraud"
    - "data-science"
    - "machine-learning"

GitHub Events

Total

Watch event: 4

Last Year

Watch event: 4

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 7
Average time to close issues: N/A
Average time to close pull requests: about 1 hour
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.29
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

ethereum-fraud-detection-models

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Ethereum Fraud Detection Models

Sepand Haghighi - Farzad Ramezani

September 2022

Overview

Datasets

Requirements

Notebooks

Analytics Example

Cite

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies