Japan Visa Analysis: Azure End to End Data Engineering 🌐

This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark clusters are set up within a Docker container on Azure.

📝 Table of Contents

System Architecture
Setup & Requirements
Usage
Features
Notes
Video

System Architecture

🛠 Setup & Requirements

Azure Account: Ensure you have an active Azure account.
Docker: The Spark master-worker architecture is set up in a Docker container on Azure.
Python Libraries: Install the required Python libraries:
- PySpark
- Plotly Express
- pycountry
- pycountry_convert
- fuzzywuzzy

🚀 Usage

Data Input: Place your CSV file named visa_number_in_japan.csv in the input directory.
Run the Script: Execute the provided Python script.
Visualizations: After execution, you'll find the visualizations saved as HTML files in the output directory.
Cleaned Data: The cleaned data will also be saved as a CSV file in the output directory.

📈 Features

System Architecture: The Spark master-worker architecture is set up in a Docker container on Azure.
Data Ingestion: The script ingests the CSV file containing the visa numbers in Japan.
Data Cleaning: The script standardizes column names, drops null columns, and corrects country names using fuzzy matching.
Data Transformation: The data is further enriched by adding continent information for each country.
Data Visualization: The cleaned and transformed data is visualized using Plotly Express to provide insights into visa trends in Japan.

📝 Notes

Ensure that your Azure and Docker setups are correctly configured to allow the Spark master-worker architecture to function seamlessly.
The country name corrections and continent mapping are based on the pycountry and pycountry_convert libraries. Ensure that these libraries are up-to-date to get accurate results.
You can adjust the manual mappings in the country_mapping dictionary in the main.py file to correct any country names that are not correctly matched.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
src		src
.gitignore		.gitignore
README.md		README.md
download_files.sh		download_files.sh
main.py		main.py
upload_files.sh		upload_files.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Japan Visa Analysis: Azure End to End Data Engineering 🌐

📝 Table of Contents

System Architecture

🛠 Setup & Requirements

🚀 Usage

📈 Features

📝 Notes

🎥 Video

About

Releases

Packages

Languages

airscholar/Japan-visa-data-engineering

Folders and files

Latest commit

History

Repository files navigation

Japan Visa Analysis: Azure End to End Data Engineering 🌐

📝 Table of Contents

System Architecture

🛠 Setup & Requirements

🚀 Usage

📈 Features

📝 Notes

🎥 Video

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages