This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Loads in the data, and then calculates last week salary. I did this because I couldn’t find historical salaries and wanted to show an example of how I envision creating a cross walk of z-scores:
require(xlsx)
## Loading required package: xlsx
## Loading required package: rJava
## Loading required package: xlsxjars
table <- read.table("Last2WeeksWR.txt",header = T, sep = ";")
## Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
## na.strings, : EOF within quoted string
## Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
## na.strings, : number of items read is not a multiple of the number of
## columns
table$LastWeekSalary <- table$Salary - table$Salary.Change
newData_origin <- data.frame(table$Salary, table$LastWeekSalary, table$Name)
names(newData_origin) <- c("salary", "lastWeek","Name")
head(newData_origin)
## salary lastWeek Name
## 1 9100 9300 Jones, Julio
## 2 8700 9200 Beckham Jr., Odell
## 3 8600 7400 Hopkins, DeAndre
## 4 8200 6600 Edelman, Julian
## 5 8100 8700 Bryant, Dez
## 6 7900 8900 Brown, Antonio
My Data is ordered by Salary descending from greatest to smallest.
tail(newData_origin$lastWeek)
## [1] 3000 3000 3000 3000 3000 NA
newData <- na.omit(newData_origin)
## [1] 3000
Prediction way <—
trimDF$Scale<- scale(trimDF$salary)
mod <- lm(Scale ~ salary , data = trimDF)
summary(mod)
## Warning in summary.lm(mod): essentially perfect fit: summary may be
## unreliable
##
## Call:
## lm(formula = Scale ~ salary, data = trimDF)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.613e-16 -6.395e-17 -3.306e-17 4.884e-17 1.042e-15
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.011e+00 7.103e-17 -4.239e+16 <2e-16 ***
## salary 6.040e-04 1.353e-20 4.464e+16 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.834e-16 on 66 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 1.992e+33 on 1 and 66 DF, p-value: < 2.2e-16
predictions <- data.frame(newData$lastWeek,
predict.lm(mod,
newdata =
data.frame(salary = newData$lastWeek)),
newData$Name)
names(predictions) <- c("salary", "predictedZscore", "Name")
These are my predicted z-scores. although that may be kind of silly the more I think about it. Given the trim and just convert minimums to whatever minimum z-score I saw in my trimmed data.
## salary predictedZscore Name
## 1 9300 2.6061457 Jones, Julio
## 2 9200 2.5457443 Beckham Jr., Odell
## 3 7400 1.4585178 Hopkins, DeAndre
## 4 6600 0.9753061 Edelman, Julian
## 5 8700 2.2437369 Bryant, Dez
## 6 8900 2.3645399 Brown, Antonio
ALTERNATIVE PROBABLY BETTER:
Find minimum z-score in trimmed data
minzScore <- min(trimDF$Scale)
Cross walk the values in our scaled set (which could be extended to have more rows) Merge Data set then set the names that have floor salaries to the minimum z-score.
md <- merge(trimDF,
newData_origin,
by = c("Name"),
all.y=T )
md$salary.x[md$salary.y == 3000] <- min(na.omit(md$salary.y))
md$Scale[md$salary.y == 3000] <- min(na.omit(md$Scale))
md <- md[ order(-md$Scale ),]
head(md,20)
## Name salary.x Scale salary.y lastWeek
## 70 Jones, Julio 9100 2.4853428 9100 9300
## 11 Beckham Jr., Odell 8700 2.2437369 8700 9200
## 57 Hopkins, DeAndre 8600 2.1833354 8600 7400
## 35 Edelman, Julian 8200 1.9417296 8200 6600
## 20 Bryant, Dez 8100 1.8813281 8100 8700
## 16 Brown, Antonio 7900 1.7605252 7900 8900
## 77 Marshall, Brandon 7800 1.7001237 7800 6600
## 3 Allen, Keenan 7700 1.6397222 7700 7100
## 66 Johnson, Calvin 7700 1.6397222 7700 8500
## 37 Fitzgerald, Larry 7400 1.4585178 7400 5500
## 25 Cooper, Amari 6500 0.9149046 6500 6700
## 54 Hilton, T.Y. 6500 0.9149046 6500 7600
## 36 Evans, Mike 6400 0.8545031 6400 7700
## 76 Maclin, Jeremy 6400 0.8545031 6400 6900
## 79 Matthews, Jordan 6400 0.8545031 6400 7200
## 93 Robinson, Allen 6400 0.8545031 6400 5400
## 73 Landry, Jarvis 6200 0.7337002 6200 5600
## 98 Smith, Steve 6200 0.7337002 6200 6400
## 62 Jackson, DeSean 5700 0.4316929 5700 6800
## 18 Brown, John 5500 0.3108899 5500 4500
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.