Introduction
This data set is on the NBA game betting odds and outcomes of the
2014-2015 Season. There is 1230 observations and 17 variables. The
variables in this data set are
- Datenum (categorical)- This is the amount of days since January 1,
1960
- Team (categorical)- Where the home team is from
- Dateslash (numerical)- MM/DD/YYYY
- OppTeam (categorical)- Where the away team is from
- Home (binary reponse)- If the “Team” is the home team (always
is)
- TeamPts (numerical)- Home team points scored
- OppPts (numerical)- Away team points scored
- OT (binary reponse)- If the game went to OT (1 means OT happened, 0
means OT didn’t happen)
- Wins (binary response)- If the home team won (1 means they won, 0
means they lost)
- TeamCov (binary response) - If the home team covered the spread (1
means they covered, 0 means a “push”, and -1 means they didn’t
cover)
- TeamSprd (numerical)- The Vegas point spread for the home team
- OvrUndr (numerical)- The over/under Vegas line for the total points
in the game
- OUCov (binary response)- If the game went over or under the Vegas
line (1 means it went over, 0 means it was exactly the line, and -1
means it went under)
- Team_id (numerical)- Numeric ID for Home Team
- OppTeam_id (numerical)- Numeric ID for Away Team
- TeamDiff (numerical)- Home Points minus Away Points
- TotalPts (numerical)- Home Points plus Away Points
Research
Question
The objective of this assignment is to identify the variables that
contribute to winning.
Exploratory
Analysis
We will make scatter plots to see if there is issues with any
predictor variables.

Looking at the scatter plots we can see that none look skewed and are
all unimodal besides our binary response variable which is team wins.
This means that we do not need to transform any of our predictor
variables.
Building the Multiple
Logistic Regression Model
Now we need to build a full model and a reduced model.
Summary of the Full Model
| (Intercept) |
4.8375261 |
33488.0687 |
0.0001445 |
0.9998847 |
| TeamPts |
19.2652779 |
1401.6006 |
0.0137452 |
0.9890333 |
| OppPts |
-19.2864565 |
1403.1298 |
-0.0137453 |
0.9890332 |
| TeamSprd |
-0.0296616 |
248.3366 |
-0.0001194 |
0.9999047 |
| OvrUndr |
-0.0135866 |
189.7476 |
-0.0000716 |
0.9999429 |
Summary of the Reduced Model
| (Intercept) |
2.565926 |
11547.884 |
0.0002222 |
0.9998227 |
| TeamPts |
19.281176 |
1399.149 |
0.0137806 |
0.9890050 |
| OppPts |
-19.305376 |
1401.370 |
-0.0137761 |
0.9890086 |
Now we will look at automatic variable selection.
Summary of the Final Model
| (Intercept) |
2.565926 |
11547.884 |
0.0002222 |
0.9998227 |
| TeamPts |
19.281176 |
1399.149 |
0.0137806 |
0.9890050 |
| OppPts |
-19.305376 |
1401.370 |
-0.0137761 |
0.9890086 |
Next we will do a global goodness-of-fit test
Comparison of Global Goodness-of-Fit statistics
| full.model |
4e-07 |
1677.513 |
10 |
| reduced.model |
4e-07 |
1677.513 |
6 |
| final.model |
4e-07 |
1677.513 |
6 |
Final Model
In the exploratory analysis, we looked at all the models and looked
at what variables we needed to take out. We took out TeamSprd and
OvrUndr because they had so significance in winning games.
We will also do the odds ratio for the final model.
Summary Stats of Final Model with Odds Ratios
| (Intercept) |
2.565926 |
11547.884 |
0.0002222 |
0.9998227 |
1.301271e+01 |
| TeamPts |
19.281176 |
1399.149 |
0.0137806 |
0.9890050 |
2.364330e+08 |
| OppPts |
-19.305376 |
1401.370 |
-0.0137761 |
0.9890086 |
0.000000e+00 |
Looking at the odds ratio we can see that the odds of winning
increases when your team scores 19.28 more points than average or when
the other team scores 19.3 less than average. On the other hand, due to
the p-values being so high in this model, the predictor variables are
not significant.
Conclusion
This study focused on the association analysis between a set of
variables that possibly correlate to winning. The initial data set has
17 numerical and categorical variables. We only used 4 for this
assignment due to the other variables being insignificant, categorical,
binary response that we were not using, or is a variable that is a
combination of two others.
After looking over the full model we decided to get rid of TeamSprd
and OvrUndr.
After automatic variable selection, we obtain the final model with 2
factors, TeamPts and OppPts.
