https://github.com/dadananjesha/credit-card-fraud-detection
Credit Card Fraud Detection is a state-of-the-art real-time streaming analytics solution designed to detect fraudulent credit card transactions instantly.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Keywords
Repository
Credit Card Fraud Detection is a state-of-the-art real-time streaming analytics solution designed to detect fraudulent credit card transactions instantly.
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Credit Card Fraud Detection 🚀💳
Credit Card Fraud Detection is a state-of-the-art real-time streaming analytics solution designed to detect fraudulent credit card transactions instantly. By harnessing the power of Apache Spark, Kafka, and HBase, this project combines dynamic rule-based evaluation, geo-spatial analysis, and historical data enrichment to secure financial transactions.
📖 Table of Contents
- Overview
- Key Features
- Architecture
- Project Structure
- Installation
- Usage
- Testing & Evaluation
- Contributing
- License
- Acknowledgements
🔍 Overview
Fraudulent transactions are one of the biggest challenges facing financial institutions today. Our solution processes transaction data in real time, enriches it with geo-location and historical insights, and uses a smart rules engine to classify transactions as GENUINE or FRAUD. This project is built with scalability and modularity in mind, making it easy to extend and adapt for evolving fraud detection needs.
✨ Key Features
⚡ Real-Time Streaming:
Seamlessly consumes live transactions from Kafka using Spark Structured Streaming.📍 Geo-Spatial Analysis:
Utilizes CSV-based mapping of ZIP codes to compute accurate distances and risk factors.🛡️ Dynamic Rule Engine:
Evaluates transactions against thresholds (e.g., Upper Control Limit, credit score, speed) to flag anomalies.💾 HBase Integration:
Efficiently retrieves and updates historical transaction data using a robust DAO module powered by HappyBase.🔧 Modular Design:
Clean, organized code structure ensures ease of maintenance, scalability, and future enhancements.
🏗️ Architecture
mermaid
flowchart TD
A[📥 Kafka: Transaction Stream] --> B[⚡ Spark Streaming]
B --> C[🔍 Data Parsing & Enrichment]
C --> D[💾 HBase Lookup & Update]
D --> E[🛡️ Rule Engine Evaluation]
E --> F{💡 Transaction Status}
F -- Genuine --> G[✅ Forward to downstream systems]
F -- Fraudulent --> H[🚨 Alert & Monitor]
Workflow:
Data Ingestion:
- Kafka streams live transaction data into Spark.
Stream Processing:
- Spark parses, timestamps, and enriches data with geo-location and historical HBase records.
Rule Evaluation:
- The rule engine applies custom logic to determine if a transaction is genuine or fraudulent.
Data Update & Monitoring:
- HBase is updated with enriched transaction data, and the results are output in real time.
Tip: Replace the placeholder diagram above with your actual architecture image if available!
🗂️ Project Structure
plaintext
Credit-card-fraud-detection/
├── data/
│ └── uszipsv.csv # 📍 CSV mapping ZIP codes to geo-coordinates
├── db/
│ ├── dao.py # 💾 HBase DAO for read/write operations
│ └── geo_map.py # 🌍 Geo-spatial utilities for location calculations
├── rules/
│ └── rules.py # 🛡️ Rule engine for fraud detection logic
├── driver.py # 🚀 Main Spark streaming application
├── LogicFinal.pdf # 📄 Detailed design explanation and architecture
├── requirements.txt # 📦 List of Python dependencies
└── README.md # 📖 Project documentation (this file)
Each module is organized to promote clean code, easy debugging, and straightforward enhancements.
💻 Installation
Prerequisites
- Python 3.8+
- Apache Kafka (Ensure your Kafka broker is running)
- Apache Spark (with Structured Streaming capabilities)
- HBase (with HappyBase for Python)
Setup Steps
- Clone the Repository:
bash
git clone https://github.com/yourusername/Credit-card-fraud-detection.git
cd Credit-card-fraud-detection
- Set Up a Virtual Environment:
bash
python -m venv venv
source venv/bin/activate # For Windows: venv\Scripts\activate
- Install Dependencies:
bash
pip install -r requirements.txt
- Configure External Services:
Ensure Kafka and HBase are up and running, and update connection settings in the code if needed.
🚀 Usage
Starting the Application
Start Kafka & HBase:
Make sure your Kafka broker and HBase server are active.Run the Application:
bash
python driver.py
The application will: - Consume transactions from Kafka. - Enrich data with geo-location and historical insights from HBase. - Evaluate transactions using the rule engine. - Update HBase and display real-time transaction statuses on the console.
Monitoring
Console Output:
Monitor real-time transaction statuses and alerts directly in your terminal.HBase Shell:
Use commands likelistandscan look_up_tableto inspect updated data.
🔍 Testing & Evaluation
Simulated Transactions:
Test the entire pipeline using simulated data or test streams from Kafka.Performance Metrics:
Extend the evaluation framework with metrics like ROC-AUC, precision, recall, and F1-score for a comprehensive analysis.Deep Dive Documentation:
Refer to LogicFinal.pdf for an in-depth explanation of the design and processing flow.
🤝 Contributing
We welcome contributions to improve this project! Here’s how you can get involved:
- Fork the Repository
- Create a Feature Branch:
bash
git checkout -b feature/your-feature-name
- Commit Your Changes:
bash
git commit -m "Add feature or fix issue"
- Push the Branch and Open a Pull Request:
Provide a detailed description of your changes for review.
📜 License
This project is licensed under the MIT License. See the LICENSE file for more details.
🙏 Acknowledgements
Inspiration & Guidance:
A huge thank you to upGrad Education and the open-source community for their continuous support and inspiration.Core Technologies:
Special thanks to the teams behind Apache Kafka, Apache Spark, HBase, and HappyBase.Community:
We appreciate all the contributors who have helped improve this project.
⭐️ Support & Star
If you find this project useful, please consider starring it on GitHub, following the repository for updates, or forking it to contribute your improvements. Your support helps us continue to build and share valuable insights!
Happy coding and safe transactions! 🚀💳
Owner
- Name: DADA NANJESHA
- Login: DadaNanjesha
- Kind: user
- Location: BERLIN
- Repositories: 1
- Profile: https://github.com/DadaNanjesha
GitHub Events
Total
- Watch event: 1
- Push event: 5
- Pull request event: 6
- Create event: 3
Last Year
- Watch event: 1
- Push event: 5
- Pull request event: 6
- Create event: 3
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- DadaNanjesha (3)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- happybase >=1.2.0
- kafka-python >=2.0.0
- pandas >=1.0.0
- pyspark >=3.0.0