This project chose the Great Houston Area and used data munging techniques to assess the quality of the data for validity, accuracy, completeness, consistency and uniformity. The following steps are accomplished in the project:
- Audited dataset (800+mb) in XML format for Greater Houston Area
- Fixed street names and deleted problematic nodes and ways
- Found the most popular cuisine and religion in Houston using SQLite
In details, the dataset is cleaned by fixing street names and deleting some problematic nodes and ways. Using SQL, it was able to get some Houston-related insights like the most popular cuisine, the most popular religion and the busiest streets with most merchants.
The data is from OpenStreetMap. You can download the Greater Houston Area data that I used here.
This project is part of the efforts for Udacity Data Anaylst Nanodegree.