This repository contains all the material I developed for achieving the IBM Data Science Professional Certificate.
The Jupyter Notebooks
here provided represents a part of my portfolio regarding the Data Science Field.
The idea of the project is to fully deploy the skills I acquired and that now I should master for taking part in a Data Science project based on real data. The most important aspects would be:
-
Coding using
Python
in aJupyter Notebook
environment, using the many Data Science libraries, such aspandas
,numpy
,scikit-learn
,seaborn
,folium
,plotly
,dash
, and many more. -
Computational thinking, i.e., solving real world data issues by means of coding instructions.
- Understanding the patterns in the data I gathered.
- Presenting the data in a way that stakeholders can be advised.
SpaceY is a newly established rocket launch company which wants to compete against the already established SpaceX. To do so, SpaceY should be able to:
- Reuse the 1st stage rocket booster.
- Be more cost competitive than its competitor.
SpaceX states that their launch services with 1 st stage recovery cost 62 million USD , whereas 15 million USD are required to build a 1 st stage Falcon 9 booster when excluding R&D and profit margin.
Considering the parameters in our predictive models, a Decision Tree was capable to predict the successfulness of 1 st stage booster landing with an accuracy of 89%.
It comes that SpaceY will be able to predict the cost of a launch exploiting the Decision Tree model as a proxy. Thus, SpaceY will be capable of making more informed bids against SpaceX for a rocket launch.
Here you can find a brief description for each of the Jupyter Notebook
files used for the project. All the results in the final presentation come from the above mentioned notebooks.
-
01_jupyter-labs-spacex-data-collection-api.ipynb
allows to collect launches information using the Open Source REST API for SpaceX. -
02_jupyter-labs-webscraping.ipynb
allows to retrieve information through web scraping exploiting the Wikipedia page listing the Falcon 9 Heavy launches. -
03_labs-jupyter-spacex-Data wrangling.ipynb
manipulates the information previously retrieved in order to get appropriate labeling for further classification model creation. -
04_jupyter-labs-eda-sql-coursera_sqllite.ipynb
queries a SQL database to retrieve further information about the SpaceX Falcon 9 history. -
05_jupyter-labs-eda-dataviz.ipynb
allows to perform data visualization on the data previously gathered, so that visual insights can be readiliy retrieved. -
06_lab_jupyter_launch_site_location.ipynb
creates an interactive map to retrieve information about the Launch Sites exploited by SpaceX for the Falcon 9 missions. -
07_SpaceX_Machine_Learning_Prediction_Part_5.jupyterlite.ipynb
trains, tune, and test different machine learning model from the previosly created dataset.
Moreover:
spacex_dash_app.py
contains the interactive dashboard. It is intended for user friendly data exploration and visualization, i.e., for the stakeholders.
Here it is the certificate I earned:
Guido Mascia, PhD.
Email: mascia.guido@gmail.com