Weather-Data-Analysis-Big-Data

This project involves the development of a Python-based data processing application to analyze weather data from the National Climatic Data Center (NCDC) records. In the first part, a Mapper and Reducer application is designed to calculate the average wind direction for each observation month from each year, with specific handling of missing and good-quality values. The second part extends the analysis to PySpark, where a Python application is implemented to determine the range of sky ceiling height for each USAF weather station ID, leveraging the distributed computing capabilities. These tasks contribute to a holistic understanding of weather patterns and conditions over time.

Continuing with the project's scope, the third part introduces a Mapper and Reducer application to extract USAF weather station ID and visibility distance data from NCDC records, subsequently writing this information into a text file. The fourth part shifts the data processing to Pig, where the text file is loaded, and the range of visibility distance for each weather station is calculated. Lastly, the fifth part utilizes Hive to load the text file and compute the average visibility distance for each USAF weather station ID. This multi-step approach showcases the versatility of different tools and frameworks in handling diverse aspects of weather data analysis, providing valuable insights for further research and decision-making. Yearly Temperature Analysis using Hadoop, Pig, and Hive This repository contains the solution for the course project, which involves retrieving and analyzing temperature data from NCDC records using a combination of Hadoop, Pig, and Hive.

Project Overview: The project is divided into three parts:

Python Application with Hadoop Streaming:

A Python mapper and reducer are developed to extract the year and temperature data from NCDC records. The data is processed in Hadoop, and the results are written into an output file. Data Processing with Pig:

The output file is loaded into Apache Pig to calculate the highest and lowest temperatures for each year. Data Querying with Hive:

The output file is loaded into Hive to calculate the average temperature for each year. Contents: Python Mapper and Reducer Scripts: These scripts are used to extract and process temperature data in a Hadoop environment. Execution Commands: Commands for running the Hadoop job, Pig script, and Hive queries are provided. Output Files: Includes sample output of the Hadoop, Pig, and Hive queries. Screenshots: Screenshots showing the creation of the output files and the final results from Pig and Hive queries. Execution Steps: Copy Data to HDFS:

hdfs dfs -copyFromLocal /home/student44/CourseProjectData /home/44student44/ Run Hadoop Job:

hadoop jar hadoop-streaming-2.9.0.jar -input /home/44student44/CourseProjectData -output /home/44student44/projectOutput -file temperature_map.py -mapper temperature_map.py -file temperature_reduce.py -reducer temperature_reduce.py Pig Script Execution:

Load the data, group by year, and compute maximum and minimum temperatures. Hive Queries:

Load the data into Hive and compute the average temperature for each year.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
010230-99999-1940.gz		010230-99999-1940.gz
010520-99999-1940.gz		010520-99999-1940.gz
010550-99999-1940.gz		010550-99999-1940.gz
010890-99999-1940.gz		010890-99999-1940.gz
011470-99999-1940.gz		011470-99999-1940.gz
011520-99999-1940.gz		011520-99999-1940.gz
012710-99999-1940.gz		012710-99999-1940.gz
013840-99999-1940.gz		013840-99999-1940.gz
014150-99999-1940.gz		014150-99999-1940.gz
014520-99999-1940.gz		014520-99999-1940.gz
014880-99999-1940.gz		014880-99999-1940.gz
060110-99999-1940.gz		060110-99999-1940.gz
060300-99999-1940.gz		060300-99999-1940.gz
060630-99999-1940.gz		060630-99999-1940.gz
061600-99999-1940.gz		061600-99999-1940.gz
064070-99999-1940.gz		064070-99999-1940.gz
064500-99999-1940.gz		064500-99999-1940.gz
064510-99999-1940.gz		064510-99999-1940.gz
064740-99999-1940.gz		064740-99999-1940.gz
070050-99999-1940.gz		070050-99999-1940.gz
070070-99999-1940.gz		070070-99999-1940.gz
070130-99999-1940.gz		070130-99999-1940.gz
070140-99999-1940.gz		070140-99999-1940.gz
070160-99999-1940.gz		070160-99999-1940.gz
070550-99999-1940.gz		070550-99999-1940.gz
070570-99999-1940.gz		070570-99999-1940.gz
070580-99999-1940.gz		070580-99999-1940.gz
070590-99999-1940.gz		070590-99999-1940.gz
070600-99999-1940.gz		070600-99999-1940.gz
070640-99999-1940.gz		070640-99999-1940.gz
070700-99999-1940.gz		070700-99999-1940.gz
070750-99999-1940.gz		070750-99999-1940.gz
070900-99999-1940.gz		070900-99999-1940.gz
071500-99999-1940.gz		071500-99999-1940.gz
071900-99999-1940.gz		071900-99999-1940.gz
075920-99999-1940.gz		075920-99999-1940.gz
100010-99999-1940.gz		100010-99999-1940.gz
100130-99999-1940.gz		100130-99999-1940.gz
100190-99999-1940.gz		100190-99999-1940.gz
100200-99999-1940.gz		100200-99999-1940.gz
100270-99999-1940.gz		100270-99999-1940.gz
100330-99999-1940.gz		100330-99999-1940.gz
100460-99999-1940.gz		100460-99999-1940.gz
100670-99999-1940.gz		100670-99999-1940.gz
100910-99999-1940.gz		100910-99999-1940.gz
101130-99999-1940.gz		101130-99999-1940.gz
101200-99999-1940.gz		101200-99999-1940.gz
101220-99999-1940.gz		101220-99999-1940.gz
101270-99999-1940.gz		101270-99999-1940.gz
101310-99999-1940.gz		101310-99999-1940.gz
101470-99999-1940.gz		101470-99999-1940.gz
101510-99999-1940.gz		101510-99999-1940.gz
101620-99999-1940.gz		101620-99999-1940.gz
101700-99999-1940.gz		101700-99999-1940.gz
101730-99999-1940.gz		101730-99999-1940.gz
101780-99999-1940.gz		101780-99999-1940.gz
101800-99999-1940.gz		101800-99999-1940.gz
101810-99999-1940.gz		101810-99999-1940.gz
101850-99999-1940.gz		101850-99999-1940.gz
101920-99999-1940.gz		101920-99999-1940.gz
101930-99999-1940.gz		101930-99999-1940.gz
102130-99999-1940.gz		102130-99999-1940.gz
102460-99999-1940.gz		102460-99999-1940.gz
102500-99999-1940.gz		102500-99999-1940.gz
102540-99999-1940.gz		102540-99999-1940.gz
102620-99999-1940.gz		102620-99999-1940.gz
102700-99999-1940.gz		102700-99999-1940.gz
102740-99999-1940.gz		102740-99999-1940.gz
102800-99999-1940.gz		102800-99999-1940.gz
102890-99999-1940.gz		102890-99999-1940.gz
103030-99999-1940.gz		103030-99999-1940.gz
103060-99999-1940.gz		103060-99999-1940.gz
103080-99999-1940.gz		103080-99999-1940.gz
103100-99999-1940.gz		103100-99999-1940.gz
103120-99999-1940.gz		103120-99999-1940.gz
103130-99999-1940.gz		103130-99999-1940.gz
103160-99999-1940.gz		103160-99999-1940.gz
103170-99999-1940.gz		103170-99999-1940.gz
103190-99999-1940.gz		103190-99999-1940.gz
103200-99999-1940.gz		103200-99999-1940.gz
README.md		README.md
commands.txt		commands.txt
part1-1.png		part1-1.png
part2-1.png		part2-1.png
part2-2.png		part2-2.png
part2-3.png		part2-3.png
part3-1.png		part3-1.png
projectYearTemp.txt		projectYearTemp.txt
temperature_map.py		temperature_map.py
temperature_reduce.py		temperature_reduce.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weather-Data-Analysis-Big-Data

About

Releases

Packages

Languages

shikhasingh96/Weather-Data-Analysis-Big-Data

Folders and files

Latest commit

History

Repository files navigation

Weather-Data-Analysis-Big-Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages