1 Introduction

In this assignment we will be looking at logistic regression models, odds ratio, and success probability curves. We will specifically be looking at team wins and team total points in this assignment. We want to see how much your team scores affects the outcome of the game.

1.1 Data Description

This data set is on the NBA game betting odds and outcomes of the 2014-2015 Season. There is 1230 observations and 17 variables. The variables in this data set are

  • Datenum (categorical)- This is the amount of days since January 1, 1960
  • Team (categorical)- Where the home team is from
  • Dateslash (numerical)- MM/DD/YYYY
  • OppTeam (categorical)- Where the away team is from
  • Home (binary reponse)- If the “Team” is the home team (always is)
  • TeamPts (numerical)- Home team points scored
  • OppPts (numerical)- Away team points scored
  • OT (binary reponse)- If the game went to OT (1 means OT happened, 0 means OT didn’t happen)
  • Wins (binary response)- If the home team won (1 means they won, 0 means they lost)
  • TeamCov (binary response) - If the home team covered the spread (1 means they covered, 0 means a “push”, and -1 means they didn’t cover)
  • TeamSprd (numerical)- The Vegas point spread for the home team
  • OvrUndr (numerical)- The over/under Vegas line for the total points in the game
  • OUCov (binary response)- If the game went over or under the Vegas line (1 means it went over, 0 means it was exactly the line, and -1 means it went under)
  • Team_id (numerical)- Numeric ID for Home Team
  • OppTeam_id (numerical)- Numeric ID for Away Team
  • TeamDiff(numerical)- Home Points minus Away Points
  • TotalPts (numerical)- Home Points plus Away Points

1.2 Analytical Question

The objective for this study is to find if there is an association between team points and getting a win.

2 Building the Simple Logistic Regression

We need to build a model and check the predictor variable. We will then need to look at odds ratios, success probability curves, and goodness of fit measurements.

In order to build the simple logistic regression we need to check and make sure the predictor variable is not extremely skewed.

Looking at this, I can tell that there is no issue of potential imbalance due to this having a normal distribution. So, I will not transform TeamPts and will fit a logistic regression directly to the data.


Call:
glm(formula = wins ~ TeamPts, family = binomial(link = "logit"), 
    data = bets)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -9.884391   0.714528  -13.83   <2e-16 ***
TeamPts      0.101593   0.007148   14.21   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1677.5  on 1229  degrees of freedom
Residual deviance: 1394.3  on 1228  degrees of freedom
AIC: 1398.3

Number of Fisher Scoring iterations: 4

The response variable is a binary factor variable. In this case, “loss” = 0 and “win” = 1. Looking at this table we can see how the p-value (p<0.001) is significant in this model.

The summary of major statistics is below.

Summary of regression coefficients
Estimate Std. Error z value Pr(>|z|) 2.5 % 97.5 %
(Intercept) -9.8843908 0.7145284 -13.83345 0 -11.317928 -8.5152896
TeamPts 0.1015934 0.0071481 14.21262 0 0.087907 0.1159441

Looking at the table above we can see that Team Points is closely associated with winning due to the p-value being close to zero. Also it is significant that 0 is not found in the 95% confidence interval. This shows that there is a significant connection so far between winning and how many points your team scores.

Now we convert the estimated regression coefficients to the odds ratio.

Summary Stats with Odds Ratios
Estimate Std. Error z value Pr(>|z|) odds_ratio
(Intercept) -9.8843908 0.7145284 -13.83345 0 0.000051
TeamPts 0.1015934 0.0071481 14.21262 0 1.106933

Looking at the table above, we can see that the odds ratio for team points is 1.1. This means that team points increases by one unit, the odds of winning increase by about 10%. This is a practically significant factor for winning.

The global goodness-of-fit measures are summarized below.

Deviance.residual Null.Deviance.Residual AIC
1394 1678 1398

We do not have enough evidence to interpret these measures since the global goodness-of-fit is based on the likelihood function and we do not have anything to compare it to. We would need other candidate models in order to compare.

Now we will make the success probability curve below.

Looking at these curves we can see that on the left it shows that as team points go up, the probability of winning goes up. Looking at the curve on the right we can see that the rate of change in the probability of winning increases when Team Points is less than 100 and decreases when Team Points is greater than 100. The turning point is about 100.

3 Conclusion

This assignment has gone through and shown the simple logistic regression model, odds ratio of the estimated regression coefficients, the goodness-of-fit measures, the success probability curve, and the rate of change curve.

From this assignment we can see that team points is highly correlated with winning games. With my past experience that makes sense due to how better offenses have been a more prevelent thing in the NBA nowadays.

