Due: Monday, October 17th

Please use R and dplyr to answer the questions below on a separate sheet of paper.

R Studio Server: http://rstudio.saintannsny.org:8787/

Useful Labs:

You may want to look over the following labs to see how you answered similar questions in the past.

Data Transformation: http://rpubs.com/jcross/data_transformation_lahman

More Data Transformation: http://rpubs.com/jcross/data_transformation2

Grouping and Summarizing: http://rpubs.com/jcross/nba_play_by_play

PITCHf/x Data

# Reading in data on pitches from MLB 2016 regular season
p <- read.csv('/home/rstudioshared/shared_files/data/pitchdata2016.csv')

# loading the dplyr library
library(dplyr)

# looking at the first six rows of data
head(p)

The code above loads pitch data from the 2016 MLB regular season. If you’re interested, you can find a good primer on PITCHf/x here https://fastballs.wordpress.com/2010/04/18/a-pitchfx-primer/ but here’s an even briefer description of several of the columns:

Questions

Please use dplyr functions to answer the following questions (filter, group_by, summarize, top_n etc.)

  1. Find the median start speed for each pitch type (pitch_type). Write the top 3 and bottom 3 pitch types by median start speed.

  2. Find the median start speed for every ball/strike count. What count has the highest median pitch speed? What count has the lowest median pitch speed?

  3. Limiting the data to pitches in the fastball group (pitch_group==“fastball”), in what inning do pitchers throw the fastest?

BONUS: When looking for the innings in which pitchers throw the hardest and the softest in question #3, your results included some extra innings that are rarely reached. Try answering question #3 again but this time limiting your results to innings in which at least 500 fastballs were thrown.