https://github.com/captaincodercool/real-time-fraud-detection-pipeline

Detect credit card fraud in real-time using a big data pipeline with Kafka, Spark Streaming, Cassandra, and ML models. Simulates transactions and applies classification to flag suspicious activity. Designed for scalability and low-latency fraud detection.

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: CAPTAINCODERCOOL
License: apache-2.0
Language: Scala
Default Branch: main
Size: 11.8 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 11 months ago · Last pushed 11 months ago

Metadata Files

Readme License

README.md

Real-Time Credit Card Fraud Detection Pipeline

This project demonstrates a real-time big data pipeline to detect fraudulent credit card transactions. It integrates Apache Kafka, Apache Spark Streaming, Cassandra, and ML models to simulate, ingest, process, and classify transaction data in real-time.

🚀 Architecture

architecture

Simulate 100 customers' profiles and 10K+ transaction records.
Load data into Apache Cassandra using Spark SQL.
Train a Random Forest classifier using Spark MLlib.
Use Apache Kafka to stream new transaction data.
Run Spark Streaming jobs to classify live transactions as fraud or not.

🧠 Technologies Used

Apache Spark (MLlib + SQL + Streaming)
Apache Kafka
Apache Cassandra
Python (pyspark, kafka-python)
Random Forest Classifier

📂 File Structure

Cassandra Keyspace.cql: Cassandra schema setup
customer.csv, transaction_training.csv, transaction_testing.csv: Simulated data
Cassandra Python/: Contains ingestion and model training scripts
src/: Contains Kafka producers and architecture diagram

⚙️ How to Run

Set up Kafka, Spark, and Cassandra locally or via Docker.
Load datasets into Cassandra.
Train the ML models.
Start Kafka producers for streaming transactions.
Run the Spark Streaming job to detect fraud.

📈 Output

Fraud classification output printed/logged in real-time.
Model files and logs stored for reuse and analysis.

🧑‍💻 Author

Built with 💡 by CAPTAINCODERCOOL

📄 License

MIT License

Owner

Login: CAPTAINCODERCOOL
Kind: user

Repositories: 1
Profile: https://github.com/CAPTAINCODERCOOL

GitHub Events

Total

Push event: 1

Last Year

Push event: 1

Dependencies

Creditcard Producer/pom.xml maven

com.google.code.gson:gson 2.8.2
com.twitter:algebird-core_2.11 0.12.0
com.typesafe:config 1.3.3
io.confluent:kafka-avro-serializer 3.3.1
log4j:log4j 1.2.17
org.apache.commons:commons-csv 1.1
org.apache.kafka:kafka-clients 1.1.0
org.scala-lang:scala-library 2.11.8
org.scala-tools:maven-scala-plugin 2.15.2
org.scalatest:scalatest_2.11 2.2.5
junit:junit 4.4 test
org.scala-tools.testing:specs 1.6.2.2_1.5.0 test

Fraud Alert Dashboard/pom.xml maven

junit:junit
log4j:log4j 1.2.17
org.springframework.boot:spring-boot-starter-data-cassandra
org.springframework.boot:spring-boot-starter-websocket

Fraud Detection/pom.xml maven

com.databricks:spark-csv_2.11 1.5.0
com.datastax.cassandra:cassandra-driver-core 3.3.2
com.datastax.spark:spark-cassandra-connector_2.11 2.0.7
com.twitter:algebird-core_2.11 0.12.0
com.twitter:jsr166e 1.1.0
com.typesafe:config 1.3.3
log4j:log4j 1.2.17
org.apache.hadoop:hadoop-client 2.7.2
org.apache.kafka:kafka-clients 0.10.0.1
org.apache.spark:spark-core_2.11 2.2.1
org.apache.spark:spark-mllib_2.11 2.2.1
org.apache.spark:spark-sql-kafka-0-10_2.11 2.2.0
org.apache.spark:spark-sql_2.11 2.2.1
org.apache.spark:spark-streaming-kafka-0-10_2.11 2.2.1
org.scala-lang:scala-library 2.11.8
org.scalatest:scalatest_2.11 2.2.5
junit:junit 4.4 test
org.scala-tools.testing:specs 1.6.2.2_1.5.0 test

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/captaincodercool/real-time-fraud-detection-pipeline

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Real-Time Credit Card Fraud Detection Pipeline

🚀 Architecture

🧠 Technologies Used

📂 File Structure

⚙️ How to Run

📈 Output

🧑‍💻 Author

📄 License

Owner

GitHub Events

Total

Last Year

Dependencies