Today you will get practice merging data frames with inner and outer join functions available in package dplyr. To get started, load packages tidyverse and Lahman.
Package Lahman has numerous data frames about Major League Baseball. Type help(package = "Lahman") in your Console to see everything available.
You will work with data frames in package Lahman. When needed, utilize R’s help to get an understanding of what the variables are in a given data frame. For example, ?Salaries will provide a short description for each of the variables in data frame Salaries.
Select three data frames from package Lahman. Identify what variables are in common between any pair of the three data frames, and identify what variables are in common between all three data frames. What are the primary keys for each data frame?
Use data frames Managers and AwardsManagers to reproduce the data frame given below.
Use an outer join.
Filter for the year 2016.
Use data frames Managers, AwardsManagers, and Master to reproduce the data frame given below.
Use a natural outer join.
Use another outer join with argument by.
Filter for only managers that won an award since 2000.
Choose the appropriate variables and arrange by yearID.
Use data frames Teams and Salaries to reproduce the plot below. Bar colors represent each team’s primary color.
Create a data frame of the World Series winners that has variables teamID and yearID.
Use a filter join with Salaries to obtain the salary data for each player that played for the World Series winner in the given year.
Some layers used in ggplot(): geom_bar(), geom_text(), scale_x_discrete(), scale_fill_manual().
MLB team hex colors: c("#0C2340", "#5F259F", "#003263", "#FF6600", "#BD3039", "#27251F", "#C41E3A", "#BD3039", "#E81828", "#0C2340", "#FD5A1E", "#C41E3A", "#FD5A1E", "#BD3039", "#FD5A1E", "#004687", "#0E3386")
Adjust your plot in Question 4 for inflation with year 2000 as baseline. Comment on the differences between the plots.
| Year | USD Value | Inflation Rate |
|---|---|---|
| 2000 | $1.00 | 3.36% |
| 2001 | $1.03 | 2.85% |
| 2002 | $1.04 | 1.58% |
| 2003 | $1.07 | 2.28% |
| 2004 | $1.10 | 2.66% |
| 2005 | $1.13 | 3.39% |
| 2006 | $1.17 | 3.23% |
| 2007 | $1.20 | 2.85% |
| 2008 | $1.25 | 3.84% |
| 2009 | $1.25 | -0.36% |
| 2010 | $1.27 | 1.64% |
| 2011 | $1.31 | 3.16% |
| 2012 | $1.33 | 2.07% |
| 2013 | $1.35 | 1.46% |
| 2014 | $1.37 | 1.62% |
| 2015 | $1.38 | 0.12% |
| 2016 | $1.39 | 1.26% |
Lahman, S. (2017) Lahman’s Baseball Database, 1871-2016.
RStudio Cheatsheets