Skip to content

This can be considered my Data Science portfolio for what concerns the Data Science knowledge I acquired through the IBM Course

Notifications You must be signed in to change notification settings

Maskul93/IBM-Applied-DataScience-Captsone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Applied Data Science Capstone

This repository contains all the material I developed for achieving the IBM Data Science Professional Certificate.

The Jupyter Notebooks here provided represents a part of my portfolio regarding the Data Science Field.

The idea of the project is to fully deploy the skills I acquired and that now I should master for taking part in a Data Science project based on real data. The most important aspects would be:

Hard skills

  • Coding using Python in a Jupyter Notebook environment, using the many Data Science libraries, such as pandas, numpy, scikit-learn, seaborn, folium, plotly, dash, and many more.

  • Computational thinking, i.e., solving real world data issues by means of coding instructions.

Soft skills

  • Understanding the patterns in the data I gathered.
  • Presenting the data in a way that stakeholders can be advised.

The Data Science Project, in brief

SpaceY is a newly established rocket launch company which wants to compete against the already established SpaceX. To do so, SpaceY should be able to:

  • Reuse the 1st stage rocket booster.
  • Be more cost competitive than its competitor.

SpaceX states that their launch services with 1 st stage recovery cost 62 million USD , whereas 15 million USD are required to build a 1 st stage Falcon 9 booster when excluding R&D and profit margin.

Considering the parameters in our predictive models, a Decision Tree was capable to predict the successfulness of 1 st stage booster landing with an accuracy of 89%.

It comes that SpaceY will be able to predict the cost of a launch exploiting the Decision Tree model as a proxy. Thus, SpaceY will be capable of making more informed bids against SpaceX for a rocket launch.

The Notebooks

Here you can find a brief description for each of the Jupyter Notebook files used for the project. All the results in the final presentation come from the above mentioned notebooks.

  • 01_jupyter-labs-spacex-data-collection-api.ipynb allows to collect launches information using the Open Source REST API for SpaceX.

  • 02_jupyter-labs-webscraping.ipynb allows to retrieve information through web scraping exploiting the Wikipedia page listing the Falcon 9 Heavy launches.

  • 03_labs-jupyter-spacex-Data wrangling.ipynb manipulates the information previously retrieved in order to get appropriate labeling for further classification model creation.

  • 04_jupyter-labs-eda-sql-coursera_sqllite.ipynb queries a SQL database to retrieve further information about the SpaceX Falcon 9 history.

  • 05_jupyter-labs-eda-dataviz.ipynb allows to perform data visualization on the data previously gathered, so that visual insights can be readiliy retrieved.

  • 06_lab_jupyter_launch_site_location.ipynb creates an interactive map to retrieve information about the Launch Sites exploited by SpaceX for the Falcon 9 missions.

  • 07_SpaceX_Machine_Learning_Prediction_Part_5.jupyterlite.ipynb trains, tune, and test different machine learning model from the previosly created dataset.

Moreover:

  • spacex_dash_app.py contains the interactive dashboard. It is intended for user friendly data exploration and visualization, i.e., for the stakeholders.

Credits

Here it is the certificate I earned:

The Certificate

Guido Mascia, PhD.

Email: mascia.guido@gmail.com

About

This can be considered my Data Science portfolio for what concerns the Data Science knowledge I acquired through the IBM Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published