https://github.com/dadananjesha/credit-card-fraud-detection

Credit Card Fraud Detection is a state-of-the-art real-time streaming analytics solution designed to detect fraudulent credit card transactions instantly.

https://github.com/dadananjesha/credit-card-fraud-detection

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary

Keywords

case-study credit-card fraud-detection fraud-prevention fraudulent-transactions iiit-bangalore kafka pyspark spark upgrad
Last synced: 5 months ago · JSON representation

Repository

Credit Card Fraud Detection is a state-of-the-art real-time streaming analytics solution designed to detect fraudulent credit card transactions instantly.

Basic Info
  • Host: GitHub
  • Owner: DadaNanjesha
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.6 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
case-study credit-card fraud-detection fraud-prevention fraudulent-transactions iiit-bangalore kafka pyspark spark upgrad
Created 12 months ago · Last pushed 12 months ago
Metadata Files
Readme License

README.md

Credit Card Fraud Detection 🚀💳

Python Version Spark Version Kafka Version License: MIT

Credit Card Fraud Detection is a state-of-the-art real-time streaming analytics solution designed to detect fraudulent credit card transactions instantly. By harnessing the power of Apache Spark, Kafka, and HBase, this project combines dynamic rule-based evaluation, geo-spatial analysis, and historical data enrichment to secure financial transactions.


📖 Table of Contents


🔍 Overview

Fraudulent transactions are one of the biggest challenges facing financial institutions today. Our solution processes transaction data in real time, enriches it with geo-location and historical insights, and uses a smart rules engine to classify transactions as GENUINE or FRAUD. This project is built with scalability and modularity in mind, making it easy to extend and adapt for evolving fraud detection needs.


✨ Key Features

  • ⚡ Real-Time Streaming:
    Seamlessly consumes live transactions from Kafka using Spark Structured Streaming.

  • 📍 Geo-Spatial Analysis:
    Utilizes CSV-based mapping of ZIP codes to compute accurate distances and risk factors.

  • 🛡️ Dynamic Rule Engine:
    Evaluates transactions against thresholds (e.g., Upper Control Limit, credit score, speed) to flag anomalies.

  • 💾 HBase Integration:
    Efficiently retrieves and updates historical transaction data using a robust DAO module powered by HappyBase.

  • 🔧 Modular Design:
    Clean, organized code structure ensures ease of maintenance, scalability, and future enhancements.


🏗️ Architecture

mermaid flowchart TD A[📥 Kafka: Transaction Stream] --> B[⚡ Spark Streaming] B --> C[🔍 Data Parsing & Enrichment] C --> D[💾 HBase Lookup & Update] D --> E[🛡️ Rule Engine Evaluation] E --> F{💡 Transaction Status} F -- Genuine --> G[✅ Forward to downstream systems] F -- Fraudulent --> H[🚨 Alert & Monitor]

Workflow:

  1. Data Ingestion:

    • Kafka streams live transaction data into Spark.
  2. Stream Processing:

    • Spark parses, timestamps, and enriches data with geo-location and historical HBase records.
  3. Rule Evaluation:

    • The rule engine applies custom logic to determine if a transaction is genuine or fraudulent.
  4. Data Update & Monitoring:

    • HBase is updated with enriched transaction data, and the results are output in real time.

Tip: Replace the placeholder diagram above with your actual architecture image if available!


🗂️ Project Structure

plaintext Credit-card-fraud-detection/ ├── data/ │ └── uszipsv.csv # 📍 CSV mapping ZIP codes to geo-coordinates ├── db/ │ ├── dao.py # 💾 HBase DAO for read/write operations │ └── geo_map.py # 🌍 Geo-spatial utilities for location calculations ├── rules/ │ └── rules.py # 🛡️ Rule engine for fraud detection logic ├── driver.py # 🚀 Main Spark streaming application ├── LogicFinal.pdf # 📄 Detailed design explanation and architecture ├── requirements.txt # 📦 List of Python dependencies └── README.md # 📖 Project documentation (this file)

Each module is organized to promote clean code, easy debugging, and straightforward enhancements.


💻 Installation

Prerequisites

  • Python 3.8+
  • Apache Kafka (Ensure your Kafka broker is running)
  • Apache Spark (with Structured Streaming capabilities)
  • HBase (with HappyBase for Python)

Setup Steps

  1. Clone the Repository:

bash git clone https://github.com/yourusername/Credit-card-fraud-detection.git cd Credit-card-fraud-detection

  1. Set Up a Virtual Environment:

bash python -m venv venv source venv/bin/activate # For Windows: venv\Scripts\activate

  1. Install Dependencies:

bash pip install -r requirements.txt

  1. Configure External Services:
    Ensure Kafka and HBase are up and running, and update connection settings in the code if needed.

🚀 Usage

Starting the Application

  1. Start Kafka & HBase:
    Make sure your Kafka broker and HBase server are active.

  2. Run the Application:

bash python driver.py

The application will: - Consume transactions from Kafka. - Enrich data with geo-location and historical insights from HBase. - Evaluate transactions using the rule engine. - Update HBase and display real-time transaction statuses on the console.

Monitoring

  • Console Output:
    Monitor real-time transaction statuses and alerts directly in your terminal.

  • HBase Shell:
    Use commands like list and scan look_up_table to inspect updated data.


🔍 Testing & Evaluation

  • Simulated Transactions:
    Test the entire pipeline using simulated data or test streams from Kafka.

  • Performance Metrics:
    Extend the evaluation framework with metrics like ROC-AUC, precision, recall, and F1-score for a comprehensive analysis.

  • Deep Dive Documentation:
    Refer to LogicFinal.pdf for an in-depth explanation of the design and processing flow.


🤝 Contributing

We welcome contributions to improve this project! Here’s how you can get involved:

  1. Fork the Repository
  2. Create a Feature Branch:

bash git checkout -b feature/your-feature-name

  1. Commit Your Changes:

bash git commit -m "Add feature or fix issue"

  1. Push the Branch and Open a Pull Request:
    Provide a detailed description of your changes for review.

📜 License

This project is licensed under the MIT License. See the LICENSE file for more details.


🙏 Acknowledgements

  • Inspiration & Guidance:
    A huge thank you to upGrad Education and the open-source community for their continuous support and inspiration.

  • Core Technologies:
    Special thanks to the teams behind Apache Kafka, Apache Spark, HBase, and HappyBase.

  • Community:
    We appreciate all the contributors who have helped improve this project.


⭐️ Support & Star

If you find this project useful, please consider starring it on GitHub, following the repository for updates, or forking it to contribute your improvements. Your support helps us continue to build and share valuable insights!

Happy coding and safe transactions! 🚀💳

Owner

  • Name: DADA NANJESHA
  • Login: DadaNanjesha
  • Kind: user
  • Location: BERLIN

GitHub Events

Total
  • Watch event: 1
  • Push event: 5
  • Pull request event: 6
  • Create event: 3
Last Year
  • Watch event: 1
  • Push event: 5
  • Pull request event: 6
  • Create event: 3

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • DadaNanjesha (3)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • happybase >=1.2.0
  • kafka-python >=2.0.0
  • pandas >=1.0.0
  • pyspark >=3.0.0