Data Description

This data was retrieved from UFL Larry Winner’s Teaching Data Sets (http://users.stat.ufl.edu/~winner/datasets.html). The csv file was retrieved from this website and then stored on github (https://raw.githubusercontent.com/JackRoss10089/STA-321/main/nbaodds201415.csv). This data was collected in a csv file from covers.com. These are back-tested outcomes from sports betting data from the 2014-2015 NBA season. The data describes Point spreads, Over/Under and Game outcomes for all NBA games during the 2014/2015 regular season. If TeamSprd is negative, Home Team is favored, and is “giving points”. If TeamSprd is positive, Home Team is underdog and is “getting points”. The variables in the dataset are as follows:

I will conduct simple linear regression on this dataset using a single response variable TeamPts and a single explanatory variable TeamDiff. I am looking to deduct if the explanatory variable TeamDiff has a significant impact on the response variable TeamPts. This data set contains many categorical and numeric variables that could be used to quantify a response variable, therefore I believe the dataset will have enough information to answer the questions.

Data

When the pairs function is utilized, it shows a multitude of relationships between the selected variables from the newnbaodds data set. The relationship I am choosing to evaluate is the relationship between TeamPts and TeamDiff. I will use TeamPts as the response variable and TeamDiff as the explanatory, suggesting the practical implication that the amount of points a team scores on a given night can be predicted by the difference in points within the final score of the game. When evaluating this relationship with the pairs function, it is apparent that there is a positive linear relationship of moderate strength between the two variables of choice. Simple linear regression will be performed to precisely evaluate the validity of this observation.

Simple Linear Regression Model

Inferential statistics for the parametric linear regression model: Team Points Scored and Team Difference in Points in the 2014-15 NBA Season
Estimate Std. Error t value Pr(>|t|)
(Intercept) 100.034408 0.2771006 361.00390 0
TeamDiff2 0.491617 0.0202572 24.26873 0
Confidence Interval for the parametric linear regression model: Team Points Scored and Team Difference in Points in the 2014-15 NBA Season
2.5 % 97.5 %
TeamDiff2 0.4518744 0.5313596

After evaluating the simple linear regression model, we can first deduct that our assumptions are met by the model diagnostics. We can see that there is no relationship between the residuals on the residual plot generated for the model, and the use of the pairs function in the previous data analysis suggests there is a linear relationship between TeamPts and TeamDiff.

Bootstrap Algorithm with Previous SLR Model

Estimate Std. Error t value Pr(>|t|) per.025 per.975
(Intercept) 100.034408 0.2771006 361.00390 0 99.5111520 100.5922820
TeamDiff2 0.491617 0.0202572 24.26873 0 0.4525197 0.5314747

Recommendation and Justification

When comparing the results between original ordinary least square regression and the ordinary least square regression that was sampled with the bootstrap algorithm, there is only a very small difference between the confidence intervals of the results of the models. The original model was sampled from a data set that was parametric therefore adding the bootstrap sampling algorithm had no impact on the model results. If the data were to be nonparametric, the two models would likely have different outcomes, and the bootstrap results would have the most meaningful interpretation.