This Coding Problem Set 2 is to be completed after you have worked through Coding Lecture 2. Take some time reading through the notes from Coding Lecture 2. Like in Coding Problem Set 1, you should go through each code block in the Coding Lecture and see if you can both explain what all of the code does.

Payroll and Winning Percentage in the MLB

In the Stat Lecture: Tools for Comparisons, Prof. Wyner discussed the relationship between a team’s payroll and its winning percentage. We will create plots from his analysis in the following problems using the dataset mlb_relative_payrolls.csv, which you should download into the “data” folder of your working directory. You should save all of the code for this analysis in an R script called “ps2_mlb_payroll.R”.

  1. Read the data in from the csv file and save it as a tbl called “relative_payroll”, use the read_csv().

Trouble shooting: Make sure your working directory is correctly set to the folder ‘data’ on your desktop. Ensure you have the payroll file in that folder and the file names match your read_csv().

  1. Make a histogram of team winning percentages with ggplot(). Play around with different binwidths.

Trouble shooting: Is your aesthetic mappings (aes) correct? Are x and y defined? did you tell ggplot what tbl to use?

  1. Make a histogram of the relative payrolls, using geom_hist().

  2. Make a scatterplot with geom_point. Put relative payroll on the horizontal axis (x-axis) and winning percentage on the vertical axis (y-axis).

  3. Without executing the code below, consider if you can figure out what it is doing. And what will be generated by the code that follows:

relative_payroll %>%
  ggplot(aes(x = Year, y = Team_Payroll)) +
  geom_point()
  1. Execute the code above. What can you say about how team payrolls have evolved over time?

  2. Now Make 2nd related plot that visualizes how relative payrolls have evolved over time.

  3. Add an appropriate title and relabel the y-axis using the labs() function to the plot above.