In 2012 and 2013, there were 10 teams in the MLB playoffs: the six teams that had the most wins in each baseball division, and four “wild card” teams. The playoffs start between the four wild card teams - the two teams that win proceed in the playoffs (8 teams remaining). Then, these teams are paired off and play a series of games. The four teams that win are then paired and play to determine who will play in the World Series. We can assign rankings to the teams as follows:
Rank 1: the team that won the World Series Rank 2: the team that lost the World Series Rank 3: the two teams that lost to the teams in the World Series Rank 4: the four teams that made it past the wild card round, but lost to the above four teams Rank 5: the two teams that lost the wild card round In your R console, create a corresponding rank vector by typing teamRank = c(1,2,3,3,4,4,4,4,5,5)
#First we create the vector teamrank
teamRank = c(1,2,3,3,4,4,4,4,5,5)
In this quick question, we’ll see how well these rankings correlate with the regular season wins of the teams. In 2012, the ranking of the teams and their regular season wins were as follows: Rank 1: San Francisco Giants (Wins = 94) Rank 2: Detroit Tigers (Wins = 88) Rank 3: New York Yankees (Wins = 95), and St. Louis Cardinals (Wins = 88) Rank 4: Baltimore Orioles (Wins = 93), Oakland A’s (Wins = 94), Washington Nationals (Wins = 98), Cincinnati Reds (Wins = 97) Rank 5: Texas Rangers (Wins = 93), and Atlanta Braves (Wins = 94)
Create a vector in R called wins2012, that has the wins of each team in 2012, in order of rank (the vector should have 10 numbers).
#Here we are creating the vector Wins2012
Wins2012 = c(94,88,95,88,93,94,98,97,93,94)
# Let us find the correlation between Rank and the number of wins
cor(teamRank,Wins2012)
## [1] 0.3477129
cor.test(teamRank,Wins2012)
##
## Pearson's product-moment correlation
##
## data: teamRank and Wins2012
## t = 1.0489, df = 8, p-value = 0.3249
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3609319 0.8018015
## sample estimates:
## cor
## 0.3477129
Since we don’t reject the null hypothesis ( p value is greater than alpha) the correlation is not statistically significant.
In 2013, the ranking of the teams and their regular season wins were as follows: Rank 1: Boston Red Sox (Wins = 97) Rank 2: St. Louis Cardinals (Wins = 97) Rank 3: Los Angeles Dodgers (Wins = 92), and Detroit Tigers (Wins = 93) Rank 4: Tampa Bay Rays (Wins = 92), Oakland A’s (Wins = 96), Pittsburgh Pirates (Wins = 94), and Atlanta Braves (Wins = 96) Rank 5: Cleveland Indians (Wins = 92), and Cincinnati Reds (Wins = 90) Create another vector in R called wins2013, that has the wins of each team in 2013, in order of rank (the vector should have 10 numbers). What is the correlation between teamRank and wins2012?
teamrank2013 = c(1,2,3,3,4,4,4,4,5,5)
wins2013 = c(97,97,92,93,92,96,94,96,92,90)
cor(teamrank2013,wins2013)
## [1] -0.6556945
Based on the correlation results, this means that there is some evidence that if you win more games, you are most likely to win the post season. Given that you are already in.
cor.test(teamrank2013,wins2013)
##
## Pearson's product-moment correlation
##
## data: teamrank2013 and wins2013
## t = -2.4563, df = 8, p-value = 0.03955
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.90974104 -0.04439732
## sample estimates:
## cor
## -0.6556945
In this case the p value is less than alpha. Therefore, the correlation it is significant. In addition, there is some correlation between the teams rank and the number of wins. In fact, they are negatively correlated.