ethereum-fraud-detection-models
Ethereum Fraud Detection Models
https://github.com/sepandhaghighi/ethereum-fraud-detection-models
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.4%) to scientific vocabulary
Keywords
Repository
Ethereum Fraud Detection Models
Basic Info
Statistics
- Stars: 14
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
Ethereum Fraud Detection Models
Sepand Haghighi - Farzad Ramezani
September 2022
Overview
The goal of data analytics is to detect potential fraud by spotting anomalies or deviations from “normal” behavior or patterns. To do that, an expert establishes a baseline of non-fraudulent activity to compare to the suspicious dataset. Understanding the concept of fraud detection analytics requires knowledge of the definition of the terms fraud and fraud detection. Fraud is a crime or deceptive action done by a criminal to get unlawful gain or unlawful access to information and assets.
Fraud detection is the process of identifying this form of deceptive action. It can be done before fraud occurs, during the process of fraud, or after the fraud has taken place. Fraud detection analytics refers to a combination of techniques of fraud detection and data analytics that are employed to detect and prevent the occurrence of fraud. Some of the data analytics techniques that are used in fraud detection include data mining, clustering analysis, data pre-processing, and data matching.
It may also be possible to identify data known to be associated with fraud. Perhaps fraudulent activity is more likely to occur at certain times of the day, in certain geographic locations, in certain types of accounts or in certain amounts.
The following are the reasons why fraud detection analytics is important in organizations:
Reduced fraud exposure: With fraud detection analytics, the system seals any loopholes for conducting fraudulent activities. Fraudsters have limited opportunities for conducting fraud, which reduces fraud exposure.
Reliable fraud detection: Fraud detection analytics offers a reliable way of detecting fraudulent activity even before damage has been caused. This is in the case of early-detection systems. These systems can identify any attempt to undertake a fraudulent activity. This enhances control and security in the organization.
Increased customer trust: Fraud detection analytics ensures that the system of the organization is safe. When customers experience little or no security issues, they develop trust. Increased customer trust contributes towards customer loyalty, which is important for an organization’s growth.
Makes use of unstructured data: Many fraudsters conduct fraudulent actions when the data is unstructured. Fraud detection analytics is capable of reviewing unstructured data to identify and prevent fraudulent activities.
Supports data integration: This system collects data from diverse sources and combines it, which enhances the integration of data in the organization.
Detects hidden patterns: Some of the traditional techniques of identifying fraud may fail to detect hidden patterns. Fraud detection analytics is superior to these techniques because it can identify hidden trends, scenarios, and patterns.
Improved organizational performance: Fraud detection analytics minimizes fraudulent activities, which significantly reduces the loss of revenue as a result of fraud. Organizations achieve huge financial gains as a result of fraud analytics. These systems enhance efficiency and improve processes in financial transactions in organizations.
How can data analytics help detect and prevent fraud? Here are three real-world examples:
An insurance company used data analytics to uncover a fraudulent claim for flood damage to a car. By including social media data, its system was able to show that the car was out of town on the day the flood occurred.
PayPal uses data analytics to protect its customers against fraud. The company analyzes historical payment data to identify factors that are closely associated with potential fraud, such as the type of device used, country of origin and certain details from user profiles. The company uses this information to create machine-learning algorithms that evaluate each transaction for signs of fraud.
The SEC used data analytics to catch an investment advisor guilty of “cherry picking.” The data revealed that, rather than follow his firm’s pro rata allocation guidelines, the advisor allocated a disproportionate number of profitable trades to his account or those belonging to someone else with the same last name.
Old data analysis techniques were oriented toward extracting quantitative and statistical data characteristics. These techniques facilitate useful data interpretations and may help to urge better insights into the processes behind the info.
Although the normal data analysis techniques can indirectly lead us to knowledge, it’s still created by human analysts.
To go beyond, a knowledge analysis system has got to be equipped with a considerable amount of background, and be ready to perform reasoning tasks involving that knowledge and therefore the data provided. In effort to satisfy this goal, researchers have turned to ideas from the machine learning field.
This is a natural source of ideas, since the machine learning task are often described as turning background and examples (input) into knowledge (output).
If data processing leads to discovering meaningful patterns, data turns into information. Information or patterns that are novel, valid and potentially useful aren’t merely information, but knowledge.
One speaks of discovering knowledge, before hidden within the huge amount of knowledge, but now revealed.
The machine learning and AI solutions could also be classified into two categories: ‘supervised’ and ‘unsupervised’ learning.
These methods seek for accounts, customers, suppliers, etc. that behave ‘unusually’ so as to output suspicion scores, rules or visual anomalies, counting on the tactic .
Whether supervised or unsupervised methods are used, note that the output gives us only a sign of fraud likelihood. No stand-alone statistical analysis can assure that a specific object may be a fraudulent one, but they will identify them with very high degrees of accuracy.
Here we have compared various classification models to check which model gives the best result and also an attempt has been made to interpret them.
Classification models:
- KNN
- Random Forest
- Neural Network (MLP)
Datasets
We will use two data set in this report.
We will analyze these two datasets both individually and in combination.
| Number of Features | Total Cases | Fraud Cases | Non-Fraud Cases | |
| Ethereum Fraud Detection Dataset | 37 | 9816 | 2179 | 7637 |
| Ethereum Fraud Dataset | 31 | 12146 | 5150 | 6996 |
| Merged Dataset | 17 | 20302 | 5675 | 14627 |
Table1. Datasets Overview
Requirements
- Python >= 3.5
- pandas >= 0.24.2
- matplotlib >= 3.0.3
- seaborn >= 0.9.1
- numpy >= 1.18.5
- scikit-learn >= 0.22.2
- pycm >= 2.2
- notebook >= 5.7.4
- Run
pip install -r requirements.txtorpip3 install -r requirements.txt
Notebooks
| GitHub Viewer | NB Viewer | Google Colab | |
| Ethereum Fraud Detection Dataset | Link | Link | Link |
| Ethereum Fraud Dataset | Link | Link | Link |
| Merged Dataset | Link | Link | Link |
Table2. Notebooks
Analytics Example
Here you can see a limited number of examples. The full version of this analytics and all codes can be seen in the notebooks!
ERC20_min_val_rec 0.154421 Total_ERC20_tnxs 0.114916 ERC20_uniq_rec_addr 0.085711 Time_Diff_between_first_and_last_Mins 0.080427 ERC20_uniq_rec_contract_addr 0.069955 Avg_min_between_received_tnx 0.058168 Average_of_Unique_Received_From_Addresses 0.046456 total_ether_received 0.037595 total_transactions_including_tnx_to_create_contract 0.036712 ERC20_avg_val_rec 0.030953 avg_val_received 0.030111 Sent_tnx 0.022398 Received_Tnx 0.021866 min_val_sent 0.021263 ERC20_total_ether_sent 0.020375 ERC20_total_Ether_received 0.018861 ERC20_uniq_sent_addr_1 0.018135 max_value_received 0.017657 ERC20_total_Ether_sent_contract 0.016291 total_ether_balance 0.016271 min_value_received 0.014136 ERC20_uniq_sent_addr 0.013666 total_Ether_sent 0.011947 Avg_min_between_sent_tnx 0.010480 avg_val_sent 0.010065 max_val_sent 0.008923 Average_of_Unique_Sent_To_Addresses 0.008289 Number_of_Created_Contracts 0.003955
Fig1. Features Importance
Best : Random Forest Rank Name Class-Score Overall-Score 1 Random Forest 0.76667 0.87778 2 KNN 0.725 0.78056 3 Neural Network 0.65 0.74722
Fig2. Confusion Matrices Compare
Fig3. Confusion Matrix
Fig4. Decision Tree Classifier Diagram
Cite
If you use this repo in your work, please cite it using the following metadata:
Haghighi, S., & Ramezani, F. (2022). Ethereum Fraud Detection Models (Version 1.0) [Computer software]. https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models
@software{Haghighi_Ethereum_Fraud_Detection_2022,
author = {Haghighi, Sepand and Ramezani, Farzad},
license = {MIT},
month = {10},
title = {{Ethereum Fraud Detection Models}},
url = {https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models},
version = {1.0},
year = {2022}
}
Owner
- Name: Sepand Haghighi
- Login: sepandhaghighi
- Kind: user
- Location: Aalborg, Denmark
- Company: Denu
- Website: https://www.sepand.tech
- Twitter: sepkjaer20
- Repositories: 124
- Profile: https://github.com/sepandhaghighi
Open Source Enthusiast
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this repo, please cite it as below."
title: "Ethereum Fraud Detection Models"
abstract: "Here we have compared various classification models to detect fraudulent transactions and check which model gives the best result and also an attempt has been made to interpret them."
authors:
- family-names: "Haghighi"
given-names: "Sepand"
- family-names: "Ramezani"
given-names: "Farzad"
version: 1.0
date-released: 2022-10-06
repository-code: "https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models"
license: MIT
keywords:
- "Ethereum"
- "python"
- "Fraud"
- "data-science"
- "machine-learning"
GitHub Events
Total
- Watch event: 4
Last Year
- Watch event: 4
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 0
- Total pull requests: 7
- Average time to close issues: N/A
- Average time to close pull requests: about 1 hour
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.29
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- sepandhaghighi (5)
- Farziiii (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- matplotlib >=3.0.3
- numpy >=1.18.5
- pandas >=0.24.2
- pycm >=2.2
- scikit-learn >=0.22.2
- seaborn >=0.9.1