Z-Score Approximation

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Loads in the data, and then calculates last week salary. I did this because I couldn’t find historical salaries and wanted to show an example of how I envision creating a cross walk of z-scores:

Essentially assemble the non-tail data’s within a position. I used WR’s from last 2 weeks.
Scale those non-tail values. Without tails, its still not normal but its much closer.
Run some sort of function to build a predictive model between actual salaries and z-scores.
use that model to approximate all z-scores, the reason I think using a model is a good idea, is that way it will extrapolate a value for the floor values on an given week and numbers inbetween what we have seen before. alternatively you could trim something like all but a handful (maybe less than 3 per week of data per position) of floor values. Then just convert all floor values to whatever that z-score is. That may be the more appropriate way to do this honestly. But I didn’t really feel like figuring it out.

require(xlsx)

## Loading required package: xlsx
## Loading required package: rJava
## Loading required package: xlsxjars

table <- read.table("Last2WeeksWR.txt",header = T, sep = ";")

## Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
## na.strings, : EOF within quoted string

## Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
## na.strings, : number of items read is not a multiple of the number of
## columns

table$LastWeekSalary <- table$Salary - table$Salary.Change
newData_origin <- data.frame(table$Salary, table$LastWeekSalary, table$Name)
names(newData_origin) <- c("salary", "lastWeek","Name")
head(newData_origin)

##   salary lastWeek               Name
## 1   9100     9300       Jones, Julio
## 2   8700     9200 Beckham Jr., Odell
## 3   8600     7400   Hopkins, DeAndre
## 4   8200     6600    Edelman, Julian
## 5   8100     8700        Bryant, Dez
## 6   7900     8900     Brown, Antonio

My Data is ordered by Salary descending from greatest to smallest.

tail(newData_origin$lastWeek)

## [1] 3000 3000 3000 3000 3000   NA

newData <- na.omit(newData_origin)

## [1] 3000

Prediction way <—

trimDF$Scale<- scale(trimDF$salary)
mod <- lm(Scale ~ salary , data = trimDF)
summary(mod)

## Warning in summary.lm(mod): essentially perfect fit: summary may be
## unreliable

## 
## Call:
## lm(formula = Scale ~ salary, data = trimDF)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -3.613e-16 -6.395e-17 -3.306e-17  4.884e-17  1.042e-15 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept) -3.011e+00  7.103e-17 -4.239e+16   <2e-16 ***
## salary       6.040e-04  1.353e-20  4.464e+16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.834e-16 on 66 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.992e+33 on 1 and 66 DF,  p-value: < 2.2e-16

predictions  <- data.frame(newData$lastWeek,
                           predict.lm(mod,
                                      newdata = 
                                        data.frame(salary = newData$lastWeek)),
                           newData$Name)
names(predictions) <- c("salary", "predictedZscore", "Name")

These are my predicted z-scores. although that may be kind of silly the more I think about it. Given the trim and just convert minimums to whatever minimum z-score I saw in my trimmed data.

##   salary predictedZscore               Name
## 1   9300       2.6061457       Jones, Julio
## 2   9200       2.5457443 Beckham Jr., Odell
## 3   7400       1.4585178   Hopkins, DeAndre
## 4   6600       0.9753061    Edelman, Julian
## 5   8700       2.2437369        Bryant, Dez
## 6   8900       2.3645399     Brown, Antonio

ALTERNATIVE PROBABLY BETTER:

Find minimum z-score in trimmed data

minzScore <- min(trimDF$Scale)

Cross walk the values in our scaled set (which could be extended to have more rows) Merge Data set then set the names that have floor salaries to the minimum z-score.

md <- merge(trimDF,
              newData_origin,
              by = c("Name"),
              all.y=T )
md$salary.x[md$salary.y == 3000] <- min(na.omit(md$salary.y))
md$Scale[md$salary.y == 3000] <- min(na.omit(md$Scale))

md <- md[ order(-md$Scale ),]
head(md,20)

##                  Name salary.x     Scale salary.y lastWeek
## 70       Jones, Julio     9100 2.4853428     9100     9300
## 11 Beckham Jr., Odell     8700 2.2437369     8700     9200
## 57   Hopkins, DeAndre     8600 2.1833354     8600     7400
## 35    Edelman, Julian     8200 1.9417296     8200     6600
## 20        Bryant, Dez     8100 1.8813281     8100     8700
## 16     Brown, Antonio     7900 1.7605252     7900     8900
## 77  Marshall, Brandon     7800 1.7001237     7800     6600
## 3       Allen, Keenan     7700 1.6397222     7700     7100
## 66    Johnson, Calvin     7700 1.6397222     7700     8500
## 37  Fitzgerald, Larry     7400 1.4585178     7400     5500
## 25      Cooper, Amari     6500 0.9149046     6500     6700
## 54       Hilton, T.Y.     6500 0.9149046     6500     7600
## 36        Evans, Mike     6400 0.8545031     6400     7700
## 76     Maclin, Jeremy     6400 0.8545031     6400     6900
## 79   Matthews, Jordan     6400 0.8545031     6400     7200
## 93    Robinson, Allen     6400 0.8545031     6400     5400
## 73     Landry, Jarvis     6200 0.7337002     6200     5600
## 98       Smith, Steve     6200 0.7337002     6200     6400
## 62    Jackson, DeSean     5700 0.4316929     5700     6800
## 18        Brown, John     5500 0.3108899     5500     4500

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Z-Score Approximation

Michael Schoenfield

October 22, 2015