Data processing using docker containers, kafka, spark, and hadoop
-
Updated
Aug 17, 2018
Data processing using docker containers, kafka, spark, and hadoop
This project analyzes 10 years of U.S. domestic airline data (~3GB) using Hadoop (Cloudera) and Hive for data processing. Power BI dashboards visualize key metrics like delays, on-time rates, air time, and diversions. The solution includes Hive queries, DAX measures, HDFS ingestion scripts, and year-wise insights with recommendations.
mini project for big data elective in final year using cloudera hadoop's framework with pig ,hive and data visualization with tableau connected to hadoop with hive odbc driver.
Java-MapReduce using Cloudera Hadoop. Big Data Ecosystem, HDFS, MapReduce
a Simple SparkSQL Tutorial
a Simple Apache Spark Tutorial
Add a description, image, and links to the cloudera-hadoop-framework topic page so that developers can more easily learn about it.
To associate your repository with the cloudera-hadoop-framework topic, visit your repo's landing page and select "manage topics."