https://github.com/azazh/advanced_fraud_detection

https://github.com/azazh/advanced_fraud_detection

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: Azazh
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 170 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Advanced Fraud Detection

GitHub Repo

This repository contains the implementation of Advanced Fraud Detection, a project aimed at detecting fraudulent transactions using machine learning techniques. The project focuses on data preprocessing, exploratory data analysis (EDA), feature engineering, and model preparation.

Table of Contents

  1. Overview
  2. Features
  3. Dataset
  4. Folder Structure
  5. Installation
  6. Usage
  7. Key Findings
  8. Contributing
  9. Contact
  10. License

Overview

Fraud detection is critical for businesses to minimize financial losses and improve customer trust. This project implements a pipeline for analyzing transaction data, identifying patterns of fraudulent behavior, and preparing the dataset for machine learning models. Key tasks include:

  • Handling missing values and duplicates
  • Cleaning and normalizing data
  • Performing exploratory data analysis (EDA)
  • Engineering features such as transaction frequency and time-to-action
  • Preparing the dataset for downstream modeling

Features

  • Data Preprocessing: Handles missing values, removes duplicates, and corrects data types.
  • Feature Engineering: Creates meaningful features like time_to_action, transaction_frequency, and geolocation-based features.
  • Normalization: Scales numerical features for compatibility with machine learning algorithms.
  • Exploratory Data Analysis (EDA): Provides insights into class imbalance, fraud hotspots, and transaction patterns.
  • Modular Codebase: Organized structure for scalability and reproducibility.

Dataset

The dataset used in this project consists of transaction data with the following key attributes:

  • User Information: user_id, signup_time, purchase_time, device_id, age, sex
  • Transaction Details: purchase_value, ip_address, country
  • Labels: Binary target variable (class) indicating whether a transaction is fraudulent (1) or legitimate (0).

The dataset is split into: - Raw Data: Located in data/raw/ - Processed Data: Located in data/processed/

Folder Structure

advanced_fraud_detection/ ├── README.md # Project overview and setup instructions ├── CONTRIBUTING.md # Guidelines for contributing to the project ├── LICENSE # License file (MIT) ├── CHANGELOG.md # Tracks changes, updates, and version history ├── .gitignore # Specifies files and directories to ignore in version control ├── requirements.txt # Lists Python dependencies for the project ├── requirements-dev.txt # Lists development-specific dependencies (e.g., pytest, flake8) ├── environment.yml # Conda environment configuration (optional, if using Conda) ├── pyproject.toml # Configuration for packaging and linting tools (e.g., Black, isort) ├── setup.py # Package setup file for distributing the project as a Python package ├── tests/ # Unit tests and integration tests for the project │ ├── unit/ # Unit tests for individual components │ └── integration/ # Integration tests for workflows and pipelines ├── src/ # Source code for the project │ ├── __init__.py # Makes the src directory a Python package │ ├── config.py # Configuration settings (e.g., file paths, hyperparameters) │ ├── preprocessing.py # Data cleaning, feature engineering, and transformation logic │ ├── models.py # Model training, evaluation, and prediction logic │ ├── utils.py # Helper/utility functions (e.g., logging, visualization) │ └── pipeline.py # End-to-end pipeline orchestration (data -> model -> deployment) ├── scripts/ # Scripts for running workflows (e.g., data ingestion, model training) │ ├── __init__.py # Makes the scripts directory a Python package │ ├── train_model.py # Script to train and save the model │ ├── evaluate_model.py # Script to evaluate the model on test data │ └── deploy_model.py # Script for deploying the model (e.g., as an API) ├── notebooks/ # Jupyter notebooks for exploratory data analysis (EDA) and experimentation │ ├── EDA.ipynb # Exploratory data analysis notebook │ ├── feature_engineering.ipynb # Feature engineering experiments │ └── model_experiments.ipynb # Model training and evaluation experiments ├── data/ # Raw, processed, and intermediate data │ ├── raw/ # Raw datasets (e.g., Fraud_Data.csv, creditcard.csv) │ ├── processed/ # Processed datasets after cleaning and feature engineering │ └── interim/ # Intermediate data files (optional, for debugging) ├── models/ # Saved models and related artifacts │ ├── trained_models/ # Final trained models (e.g., .pkl or .joblib files) │ └── metrics/ # Evaluation metrics (e.g., JSON or CSV files) ├── logs/ # Logs for debugging and monitoring │ ├── training_logs/ # Logs generated during model training │ └── deployment_logs/ # Logs generated during model deployment └── assets/ # Static assets like images, diagrams, or visualizations

Installation

Prerequisites

  • Python 3.9+
  • Git

Steps

  1. Clone the repository: bash git clone https://github.com/Azazh/advanced_fraud_detection.git cd advanced_fraud_detection

  2. Install dependencies: bash pip install -r requirements.txt pip install -r requirements-dev.txt

  3. (Optional) Set up a Conda environment: bash conda env create -f environment.yml conda activate advanced_fraud_detection

Usage

Run the Preprocessing Pipeline

To preprocess the raw data and generate the processed dataset: bash python scripts/preprocess_data.py

Perform Exploratory Data Analysis (EDA)

Open the notebooks/EDA.ipynb notebook to analyze the dataset and visualize key insights.

Train a Model

To train a machine learning model: bash python scripts/train_model.py

Evaluate the Model

To evaluate the trained model: bash python scripts/evaluate_model.py

Key Findings

  1. Class Imbalance:

    • Fraudulent transactions account for 9.37% of the dataset.
    • Techniques like SMOTE or class weighting will be required during modeling.
  2. Geolocation Insights:

    • High fraud rates are observed in countries such as Nigeria, Russia, and Vietnam.
  3. Time-to-Action:

    • Fraudulent transactions occur significantly faster (673.29 hours) compared to legitimate transactions (1,370.01 hours).
  4. Transaction Frequency Issue:

    • The transaction_frequency column currently shows 0.00 for all users, indicating a flaw in the calculation logic.

Contributing

We welcome contributions! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/your-feature).
  3. Commit your changes (git commit -m "Add your feature").
  4. Push to the branch (git push origin feature/your-feature).
  5. Open a pull request.

For more details, refer to CONTRIBUTING.md.

Contact

For questions or feedback, feel free to reach out:

License

This project is licensed under the MIT License. See LICENSE for more details.

Owner

  • Login: Azazh
  • Kind: user

GitHub Events

Total
  • Push event: 3
  • Create event: 2
Last Year
  • Push event: 3
  • Create event: 2

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/unittests.yml actions
requirements.txt pypi
  • Markdown ==3.7
  • MarkupSafe ==3.0.2
  • Pygments ==2.19.1
  • Werkzeug ==3.1.3
  • absl-py ==2.1.0
  • asttokens ==3.0.0
  • astunparse ==1.6.3
  • certifi ==2025.1.31
  • charset-normalizer ==3.4.1
  • comm ==0.2.2
  • contourpy ==1.3.1
  • cycler ==0.12.1
  • debugpy ==1.8.13
  • decorator ==5.2.1
  • exceptiongroup ==1.2.2
  • executing ==2.2.0
  • flatbuffers ==25.2.10
  • fonttools ==4.56.0
  • gast ==0.6.0
  • google-pasta ==0.2.0
  • grpcio ==1.70.0
  • h5py ==3.13.0
  • idna ==3.10
  • ipykernel ==6.29.5
  • ipython ==8.33.0
  • jedi ==0.19.2
  • joblib ==1.4.2
  • jupyter_client ==8.6.3
  • jupyter_core ==5.7.2
  • keras ==3.9.0
  • kiwisolver ==1.4.8
  • libclang ==18.1.1
  • markdown-it-py ==3.0.0
  • matplotlib ==3.10.1
  • matplotlib-inline ==0.1.7
  • mdurl ==0.1.2
  • ml-dtypes ==0.4.1
  • namex ==0.0.8
  • nest-asyncio ==1.6.0
  • numpy ==2.0.2
  • opt_einsum ==3.4.0
  • optree ==0.14.1
  • packaging ==24.2
  • pandas ==2.2.3
  • parso ==0.8.4
  • patsy ==1.0.1
  • pexpect ==4.9.0
  • pillow ==11.1.0
  • platformdirs ==4.3.6
  • prompt_toolkit ==3.0.50
  • protobuf ==5.29.3
  • psutil ==7.0.0
  • ptyprocess ==0.7.0
  • pure_eval ==0.2.3
  • pyparsing ==3.2.1
  • python-dateutil ==2.9.0.post0
  • pytz ==2025.1
  • pyzmq ==26.2.1
  • requests ==2.32.3
  • rich ==13.9.4
  • scikit-learn ==1.6.1
  • scipy ==1.15.2
  • seaborn ==0.13.2
  • six ==1.17.0
  • stack-data ==0.6.3
  • statsmodels ==0.14.4
  • tensorboard ==2.18.0
  • tensorboard-data-server ==0.7.2
  • tensorflow ==2.18.0
  • tensorflow-io-gcs-filesystem ==0.37.1
  • termcolor ==2.5.0
  • threadpoolctl ==3.5.0
  • tornado ==6.4.2
  • traitlets ==5.14.3
  • typing_extensions ==4.12.2
  • tzdata ==2025.1
  • urllib3 ==2.3.0
  • wcwidth ==0.2.13
  • wrapt ==1.17.2