Sir Jason Tibbetts has offered me a lifetime of free CAC jerseys to give him my best analysis of what the salaries should be for captains in Franchise B league. (By the way, how was the name Franchise D-League not already considered?)
# Load libraries
library(ggplot2); library(caret)
# Load Data
df <- read.csv("~/Dropbox/Franchise Cap Hits/Franchise B League Draft Cap Hits.csv", nrows = 100)
names(df) <- c("PR", "Dollars")
First, here’s a look at a linear model of the data.
# Creating Linear Model
LinMod1 <- lm(Dollars ~ PR, data = df)
b <- signif(coef(LinMod1)[1], digits = 2)
a <- signif(coef(LinMod1)[2], digits = 2)
r <- cor(df$PR, df$Dollars)
textlab <- paste("y = ",a,"x + ",b, sep="")
# Creating graph
ggplot(data = df, aes(PR, Dollars)) + geom_point() + geom_smooth(method = lm, se = FALSE) + annotate("text", x = 25, y = 38, label = textlab, size = 4, parse = FALSE)
It’s not a bad fit, with an r^2 = 0.60. If I break the PRs down into traditional bins of width 5, you’d get these the predicted salaries in the table below. I’ve included a column with my recommended salaries in this scenario.
# Table of Linear Model Salaries
ranges <- c("< 14.9", "15-19.9", "20-24.9", "25-29.9", "30-34.9", "35-39.9", "40-44.9", "45-49.9", "50-54.9", "55-59.9", "60 +")
midpoints = c(12.5, 17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5, 52.5, 57.5, 62.5)
y = a*midpoints + b
table1 <- data.frame(ranges, midpoints, y)
names(table1) <- c("APR Ranges", "Midpoints", "Dollars")
table1$Recommendation <- seq(1, 26, by = 2.5)
print(table1)
## APR Ranges Midpoints Dollars Recommendation
## 1 < 14.9 12.5 1.45 1.0
## 2 15-19.9 17.5 3.75 3.5
## 3 20-24.9 22.5 6.05 6.0
## 4 25-29.9 27.5 8.35 8.5
## 5 30-34.9 32.5 10.65 11.0
## 6 35-39.9 37.5 12.95 13.5
## 7 40-44.9 42.5 15.25 16.0
## 8 45-49.9 47.5 17.55 18.5
## 9 50-54.9 52.5 19.85 21.0
## 10 55-59.9 57.5 22.15 23.5
## 11 60 + 62.5 24.45 26.0
One issue is that it underestimates the values of the highest paid players. The recommended values in this table increment by $2.5 for every jump in PR bracket and it can be seen that my recommendations somewhat makes up for that undervaluation. Or, in a further attempt to reconcile that, we can look at a degree 2 polynomial model:
# Creating Poly Model
PR2 <- df$PR^2
PolyMod2 <- lm(Dollars ~ PR + PR2, data = df)
c2 <- signif(coef(PolyMod2)[1], digits = 2)
b2 <- signif(coef(PolyMod2)[2], digits = 2)
a2 <- signif(coef(PolyMod2)[3], digits = 2)
#summary(PolyMod2) not shown
# Creating Graph
timevalues <- seq(0, 57, 0.1)
predictedcountsPoly <- predict(PolyMod2,list(PR=timevalues, PR2=timevalues^2))
plot(df$PR, df$Dollars, xlab = "PR", ylab = "Dollars", pch = 20)
lines(timevalues, predictedcountsPoly, col = "darkgreen", lwd = 3)
This model has an r^2 = 0.6477 which is a bit better of a fit than the linear model. If I had to make recommended cap hit brackets for this, it would look like:
#Table of PolyMod salary brackets
ranges <- c("Below 14.9", "15-19.9", "20-24.9", "25-29.9", "30-34.9", "35-39.9", "40-44.9", "45-49.9", "50-54.9", "55-59.9", "60 +")
x = c(12.5, 17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5, 52.5, 57.5, 62.5)
y2 = a2*x^2 + b2*x + c2
table2 <- data.frame(ranges, x, y2)
names(table2) <- c("APR Ranges", "APR Range Midpoints", "Dollars")
table2$Recommendation <- round(table2$Dollars)
print(table2)
## APR Ranges APR Range Midpoints Dollars Recommendation
## 1 Below 14.9 12.5 3.11875 3
## 2 15-19.9 17.5 4.00375 4
## 3 20-24.9 22.5 5.29875 5
## 4 25-29.9 27.5 7.00375 7
## 5 30-34.9 32.5 9.11875 9
## 6 35-39.9 37.5 11.64375 12
## 7 40-44.9 42.5 14.57875 15
## 8 45-49.9 47.5 17.92375 18
## 9 50-54.9 52.5 21.67875 22
## 10 55-59.9 57.5 25.84375 26
## 11 60 + 62.5 30.41875 30
Sir Tibbetts, do as you please with my impartial analysis. I think either of the recommendations would be fine and I’d lean slightly towards the polynomial model. At the end of the day, Franchise B players, I hope my thorough analysis enhances how much fun all of you have running up and down the court chasing a basketball around like a bunch of 10 year olds in a soccer game.
BJAX OUT