Quarterback data and Pitcher data

Let’s start by reading in the data and loading the packages you’ll need:

qb <- read.csv('/home/rstudioshared/shared_files/data/qbstats.csv')

dips <- read.csv('/home/rstudioshared/shared_files/data/DIPS.csv')

library(dplyr); library(ggplot2)

There are three things you’ll want to be able to do in order to investigate this data:

1. Make Scatterplots

Notice that the last part of the code below adds a best-fit line:

ggplot(qb, aes(Ym2_passes, Ym2_completions)) + geom_point() + geom_smooth(method="lm")

2. Filter Data

Here, we filter the data and then plot it:

qb %>% filter(Ym2_passes > 100) %>%
  ggplot(aes(Ym2_tds, Ym2_ints)) + geom_point() + geom_smooth(method="lm")

3. Find Correlations

Notice that below, we filter the data before we find a correlation. We could of course, leave that out step. One interested questions is how correlations change when you exclude different segments of the data.

qb %>% filter(Ym2_passes > 100) %>%
  summarize(cor(Ym2_tds, Ym2_ints))

##   cor(Ym2_tds, Ym2_ints)
## 1              0.4602294

Goal:

Pick one of the two data sets and explore the relationships between the columns and describe what you find. You should do this in a R markdown file while you can later “Knit” into a word document. Your report should include graphs of the relationships you describe and you will likely want to include calculations or correlations. What does this data tell you about quarterback (or pitchers)? You don’t need to explore all or even most of the relationships. Just pick out what you find interesting.

Note:

Both data sets have data from players in consecutive seasons. “Ym1” and “Ym2” stand for “Y minus 1” and “Y minus 2” and denote that the data is from one or two years prior to year Y.

Year-to-Year Correlations

Sports Data Science

12/5/2016

Quarterback data and Pitcher data

1. Make Scatterplots

2. Filter Data

3. Find Correlations

Goal: