This data was retrieved from http://www.amstat.org/publications/jse/v6n2/datasets.watnik.html. It provides baseball statistcis for players including their salaries from 1991-1992 season.

Here is a summray of the data:

## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
##     Pname                Sal             BA             OBP       
##  Length:337         Min.   : 109   Min.   :0.063   Min.   :0.063  
##  Class :character   1st Qu.: 230   1st Qu.:0.238   1st Qu.:0.297  
##  Mode  :character   Median : 740   Median :0.260   Median :0.323  
##                     Mean   :1249   Mean   :0.258   Mean   :0.324  
##                     3rd Qu.:2150   3rd Qu.:0.281   3rd Qu.:0.354  
##                     Max.   :6100   Max.   :0.457   Max.   :0.486

The tale of two box plots

My objective for analysis was to see if player salaries correlated to “on base percentage” or “batting average”.

library(ggplot2)
require(ggplot2)
ggplot(obp, aes(x = Sal, y = OBP)) + geom_boxplot()

plot of chunk unnamed-chunk-2

ggplot(obp, aes(x = Sal, y = BA)) + geom_boxplot()

plot of chunk unnamed-chunk-2

The chicken or the egg delima

These two line charts also compare OBP to salaries and batting average to salaries. Both charts have an uptrend showing that a higher salary has some correlation to higher batting average and OBP.

One argument to my analysis would be, the salary increase may be due to teams paying players higher saleries because of their OBP and BA performance or the BA and OBP could have increased because of the higher salary.

My theory is that players are being paid more because of their performance. To investigate further we would need players statistics and salary increases from year to year throughout their careers and further analyze.

ggplot(obp, aes(x = Sal, y = OBP)) +geom_line()

plot of chunk unnamed-chunk-3

ggplot(obp, aes(x = Sal, y = BA)) + geom_line()

plot of chunk unnamed-chunk-3