nba university project
dataset taken from: https://github.com/DomSamangy/NBA_Shots_04_24
The expected structure of the project report (common for Data Project and Probability & Statistics) is the following:
Describe the dataset(s) used in your project including: Source of dataset Number of observations Number of variables per observation Meaning and type of the different variables
Read the dataset in Python and take care of: Possible formatting errors Potential inconsistencies in the data types Missing/repeated values Level of noise in values
Describe the data manipulation operations applied to the original data including Merging of different datasets (join, concatenation) Conversion of units of measurements Definition of derived variables
Develop an intuitive understanding of the data through: descriptive statistics over all observations and over meaningful groups (pandas groupby) visualizations (scatterplots, histograms, box plots, bar plots, etc.)
Use hypothesis tests to confirm or reject some hypothesis of interest about the parameters of the population involved in your study. Clearly state the hypothesis you wish to test and provide an explicit interpretation of the results of these analyses. Perform at least three hypothesis tests Show all the steps of the calculations for at least one test. Include the power analysis for at least one test.
Use confidence intervals to quantify the parameters of interest in your analysis. Build at least three confidence intervals. Explicitly state the meaning of the confidence intervals calculated in the specific context of your analysis. Show all the steps of the calculations for at least one confidence interval.
Use linear regression analysis to analyze these correlations and develop a predictive model. Look for dependencies between the variables in your dataset by means of scatterplots, contingency tables, correlation coefficients. Compute the parameters of at least one multiple linear regression model, clearly stating which are the response and predictor variables. Analyze the parameters of the model. Select the relevant predictors using either a forward or a backward selection strategy. check that the assumptions of the regression model are satisfied by analyzing its residuals Provide at least one model prediction with the corresponding prediction interval.
Summarize the most relevant results you found in your data exploration and analysis.