Updated 9 months ago
https://github.com/dadananjesha/spark-streaming
Spark Streaming KPI Processing is a real-time data processing application built using Apache Spark Streaming
Updated 9 months ago
https://github.com/dadananjesha/redshift-etl-project
The project covers the complete data pipeline—from importing data from an RDS source to HDFS using Sqoop, processing data with Spark, to executing analytical queries on an AWS Redshift cluster.
Updated 9 months ago
https://github.com/rumbledb/rumble
⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Updated 9 months ago
https://github.com/a-imantha/mahout-tutorial
Building a Recommender with Apache Mahout on Amazon Elastic MapReduce (EMR) Tutorial