g2017<-read.csv("../Data/2017games.csv", header=TRUE, stringsAsFactors = FALSE)

NCAA2017 <- g2017[c(1,3,6,10)]
NCAA2017$W.Rating<-g2017$Wrating
NCAA2017$L.Rating<-g2017$Lrating
NCAA2017$Abs<-g2017$Abs
NCAA2017$W.L<-g2017$Hit

Overview

Ken Pomeroy is a professional NCAA college basketball statistician that calculates team efficiency.

AdjEM is the difference between a team’s offensive and defensive efficiency. It represents the number of points the team would be expected to outscore the average Division I team over 100 possessions and it has the advantage of being a linear measure. The difference between +31 and +28 is the same as the difference between +4 and +1. It’s three points per 100 possessions which is much easier to interpret.

There were 67 games in the 2017 NCAA basketball tournament. Here is the data set of those games along with the AdjEM for each team and if the diffence of those ratings correctly predicted the winner.

datatable(NCAA2017, extensions = "Responsive",options=list(lengthMenu = c(10,25,68)))

Logistic Regression

Can we create a model using the AdjEM for two teams to predict the winner?

I took the difference in Ken Pomeroy’s AdjEM for each of the two teams in each of the 67 NCAA tournament games for 2017. I used that difference to prdict the winner of each matchup. Then I recorded if the predicted winner actually won. I took these results and using logistic regression, I created a model for predicting the probability that the higher rated team will actually win.

plot(W.L~Abs,data=NCAA2017, pch=16, xlab="Absolute Difference in AdjEM", ylab = "Predicted Winner Results", main="Ken Pomeroy's AdjEM Differential Prediction Model")
KP.glm <- glm(W.L~Abs,data=NCAA2017, family=binomial)
pander(summary(KP.glm))
  Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.291 0.4714 -0.6172 0.5371
Abs 0.215 0.07865 2.734 0.00626

(Dispersion parameter for binomial family taken to be 1 )

Null deviance: 75.90 on 66 degrees of freedom
Residual deviance: 61.24 on 65 degrees of freedom
b<-KP.glm$coefficients
pvLR<-coef(summary(KP.glm))[2,4]
curve(exp(b[1]+b[2]*x)/(1+exp(b[1]+b[2]*x)), add=TRUE)

pc<-predict(KP.glm, data.frame(Abs=10), type='response')

Appropriateness

Is Logistical Regression an appropriate model for taking the difference of AdjEM and predicting the probability of a team winning?

I want to test two things:

  1. Does the model show that there is a Significant relationship between AdjEM and winning?
  2. Is Logistic Regression a Good Fit?
Significant

From the Logistic Regression, a p-value of 0.0062601 on slope shows that there is a signifcant relationship between AdjEM and winning.

Goodness of Fit

Using the Hosmer and Lemeshow goodness of fit (GOF) test:

\[ H_0:\text{Logistical Model is Appropriate}\\H_a:\text{Logistical Model is Not Appropriate} \]

library(ResourceSelection)

pander(hoslem.test(KP.glm$y, KP.glm$fitted, g=10))
Hosmer and Lemeshow goodness of fit (GOF) test: KP.glm$y, KP.glm$fitted
Test statistic df P value
4.144 8 0.8439
pvHL<-hoslem.test(KP.glm$y, KP.glm$fitted, g=10)$p.value
# Note: doesn't give a p-value for g >= 7, default is g=10.
# Larger g is usually better than smaller g.

With a p-Value of 0.8438981, there is very little evidence that this Logistic Regression Model has a poor fit.

Interpretation

plot(W.L~Abs,data=NCAA2017, pch=16, xlab="Absolute Difference in AdjEM", ylab = "Predicted Winner Results", main="Ken Pomeroy's AdjEM Differential Prediction Model")
KP.glm <- glm(W.L~Abs,data=NCAA2017, family=binomial)


b<-KP.glm$coefficients

curve(exp(b[1]+b[2]*x)/(1+exp(b[1]+b[2]*x)), add=TRUE)

Notice that when the difference between AdjEM of two teams is near zero, the probability of winning is near 50%; this is to be expected.

Conclusion

It appears that Ken Pomeroy’s evaluation of team efficiency is an appropriate predictor of the winner in the 2017 NCAA tournament.

Special Note

On 16 Mar 2018, the unthinkable happened. The University of Maryland in Baltimore County had their first invitation to the NCAA tournament. They defeated the #1 ranked team in the tournament, the University of Virgina by 20 points. Ken Pomeroy had the difference in AdjEM as 34 points.