ethereum-fraud-detection-visualization

Ethereum Fraud Detection Visualization

https://github.com/sepandhaghighi/ethereum-fraud-detection-visualization

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary

Keywords

data-analysis data-science data-visualization ethereum exploratory-data-analysis fraud fraud-detection machine-learning matplotlib python visualization
Last synced: 4 months ago · JSON representation ·

Repository

Ethereum Fraud Detection Visualization

Basic Info
  • Host: GitHub
  • Owner: sepandhaghighi
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 10.9 MB
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
data-analysis data-science data-visualization ethereum exploratory-data-analysis fraud fraud-detection machine-learning matplotlib python visualization
Created over 3 years ago · Last pushed about 3 years ago
Metadata Files
Readme License Citation

README.md

Ethereum Fraud Detection Visualization


Sepand Haghighi - Farzad Ramezani

September 2022


Overview

Fraud detection is a process that detects and prevents fraudsters from obtaining money or property through false means. It is a set of activities undertaken to detect and block the attempt of fraudsters from obtaining money or property fraudulently. Fraud is an expensive and complicated problem. To detect and investigate it effectively, you need to see connections – between people, accounts, transactions, and dates – and understand complex sequences of events. That means analyzing a lot of data. Fraud detection is prevalent across banking, insurance, medical, government, and public sectors, as well as in law enforcement agencies.

Advantages of Visualizations in Fraud Detection:

The detection of fraud schemes requires an investigation of a vast amount of data that stems from many different anti-fraud systems with varying types of data. The auditors have to combine all the data and use statistical methods to uncover suspicious claims, which is time-consuming and inefficient in most cases.

Visualizations, on the other hand, can enhance the quick identification of relationships and significant structures and the detection of suspicious patterns that may hide in the amount of data. Besides the visual exploration, interaction with the data allows for a deeper understanding of the dependencies within the data changing over time.

One of the most challenging tasks when using visualization for fraud detection is the sheer amount of data that is usually obtained by auditing systems. First, the auditor has to retrieve the data from the auditing system. Visualizing such a large amount of data is the next challenge: the data needs a meaningful arrangement to create a human-readable representation. Providing suitable styling should enable users to identify different types of entities and relations.

Since there exist a lot of different types of fraud schemes, it is clear that there is no unique solution that can detect all of them. Thus, a visualization meant to fight against fraud has to be adaptive to the needs of each auditor.

At first, it must not limit to a specific amount or type of data since the volume of the investigated data grows exponentially and comes from different sources. In some cases, it is also necessary to be able to support and visualize time-dependent data.

A sophisticated visualization should also provide the means for arranging the elements in multiple ways on the screen, i.e., using arrangements that reveal clusters or others that highlight hierarchical structures. Additionally, more sophisticated graph analysis algorithms should be supported for the detection of fraud schemes, e.g., cycle detection, or shortest paths.

Regarding the representation of the elements of the visualization, an auditor should be able to customize the look and feel of the graph elements based on his/her needs and be able to display additional properties of the graph elements. Finally, interaction is one of the essential operations when visualizing fraud data since it allows the auditor to explore its dataset.

Fraud detection can be separated by the use of statistical data analysis techniques or artificial intelligence.

Statistical data analysis techniques include:

  1. calculating statistical parameters
  2. regression analysis
  3. probability distributions and models
  4. data matching

AI techniques used to detect fraud include:

  1. Data mining classifies, groups and segments data to search through millions of transactions to find patterns and detect fraud.
  2. Neural networks learn suspicious-looking patterns and use those patterns to detect them further.
  3. Machine learning automatically identifies characteristics found in fraud.
  4. Pattern recognition detects classes, clusters and patterns of suspicious behavior.

Cryptocurrency fraud analysts look at huge volumes of historical data spanning long time periods. Our main idea is to comprehensively examine and visualize the available data related to fraud detection in the Ethereum network.

Our suggested steps to visualize data:

  1. Downloading and collecting data
  2. Data cleaning
  3. Data statistics and distribution
  4. Comparing different features of data between fraud and non-fraud classes

Datasets

We will use two data set in this report.

  1. Ethereum Fraud Detection Dataset
  2. Ethereum Fraud Dataset

We will analyze these two datasets both individually and in combination.

Number of Features Total Cases Fraud Cases Non-Fraud Cases
Ethereum Fraud Detection Dataset 37 9816 2179 7637
Ethereum Fraud Dataset 31 12146 5150 6996
Merged Dataset 17 20302 5675 14627

Table1. Datasets Overview

Requirements

  1. Python >= 3.5
  2. pandas >= 0.24.2
  3. matplotlib >= 3.0.3
  4. seaborn >= 0.9.1
  5. numpy >= 1.18.5
  6. notebook >= 5.7.4
  • Run pip install -r requirements.txt or pip3 install -r requirements.txt

Notebooks

GitHub Viewer NB Viewer Google Colab
Ethereum Fraud Detection Dataset Link Link Link
Ethereum Fraud Dataset Link Link Link
Merged Dataset Link Link Link

Table2. Notebooks

Visualization Example

Here you can see a limited number of examples. The full version of this visualization and all codes can be seen in the notebooks!

Fig1. Data Distribution Pie Diagram


Fig2. Most Received Token Type Pie Diagram (Fraud Cases)


Fig3. Received Transactions Different Statistics Comparing


Fig4. Features Correlation Diagram


Fig5. Features Distribution Diagram

Cite

If you use this repo in your work, please cite it using the following metadata:

Haghighi, S., & Ramezani, F. (2022). Ethereum Fraud Detection Models (Version 1.0) [Computer software]. https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models

@software{Haghighi_Ethereum_Fraud_Detection_2022,
author = {Haghighi, Sepand and Ramezani, Farzad},
license = {MIT},
month = {10},
title = {{Ethereum Fraud Detection Models}},
url = {https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models},
version = {1.0},
year = {2022}
}

Owner

  • Name: Sepand Haghighi
  • Login: sepandhaghighi
  • Kind: user
  • Location: Aalborg, Denmark
  • Company: Denu

Open Source Enthusiast

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repo, please cite it as below."
title: "Ethereum Fraud Detection Models"
abstract: "Here we have compared various classification models to detect fraudulent transactions and check which model gives the best result and also an attempt has been made to interpret them."
authors:
  - family-names: "Haghighi"
    given-names: "Sepand"
  - family-names: "Ramezani"
    given-names: "Farzad"
version: 1.0
date-released: 2022-10-06
repository-code: "https://github.com/sepandhaghighi/Ethereum-Fraud-Detection-Models"
license: MIT
keywords:
    - "Ethereum"
    - "python"
    - "Fraud"
    - "data-science"
    - "machine-learning"

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 11
  • Average time to close issues: N/A
  • Average time to close pull requests: about 6 hours
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.18
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • Farziiii (6)
  • sepandhaghighi (5)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • matplotlib >=3.0.3
  • numpy >=1.18.5
  • pandas >=0.24.2
  • seaborn >=0.9.1