library('ggplot2')
library('tidyverse')
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble  2.1.3     ✔ purrr   0.3.2
## ✔ tidyr   1.0.0     ✔ dplyr   0.8.3
## ✔ readr   1.3.1     ✔ stringr 1.4.0
## ✔ tibble  2.1.3     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library('GGally')
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa

Free shots analysis

df <- read.csv('free_throws.csv', header = TRUE, stringsAsFactors = FALSE)
head(df)
str(df)
## 'data.frame':    618019 obs. of  11 variables:
##  $ end_result: chr  "106 - 114" "106 - 114" "106 - 114" "106 - 114" ...
##  $ game      : chr  "PHX - LAL" "PHX - LAL" "PHX - LAL" "PHX - LAL" ...
##  $ game_id   : num  2.61e+08 2.61e+08 2.61e+08 2.61e+08 2.61e+08 ...
##  $ period    : num  1 1 1 1 1 1 1 2 2 2 ...
##  $ play      : chr  "Andrew Bynum makes free throw 1 of 2" "Andrew Bynum makes free throw 2 of 2" "Andrew Bynum makes free throw 1 of 2" "Andrew Bynum misses free throw 2 of 2" ...
##  $ player    : chr  "Andrew Bynum" "Andrew Bynum" "Andrew Bynum" "Andrew Bynum" ...
##  $ playoffs  : chr  "regular" "regular" "regular" "regular" ...
##  $ score     : chr  "0 - 1" "0 - 2" "18 - 12" "18 - 12" ...
##  $ season    : chr  "2006 - 2007" "2006 - 2007" "2006 - 2007" "2006 - 2007" ...
##  $ shot_made : int  1 1 1 0 1 1 1 0 1 1 ...
##  $ time      : chr  "11:45" "11:45" "7:26" "7:26" ...

Data cleasing

end_resoult: chr, individual scores should be extracted game_id: num should be factor period: is numeric should be factor play: long string, how many of the free shots were converted playoffs: chr, should be factor score: chr, should be factor season: chr, should be factor time: chr, should be time class.

I don’t know what shot_made really is. Let’s have a look

df[,c("shot_made", 'play')]

So there is a row for echa free shot. Even when there are two free shots because of a faul there will be two rows, one for each of the free shots. So shot_made tell is if every single shot was made or not

Ideas to explore

ranking of players with more free shots ranking of players by scoring rate *quarter with more free shots: Expected more in last quarter

summary(df)
##   end_result            game              game_id              period     
##  Length:618019      Length:618019      Min.   :261031013   Min.   :1.000  
##  Class :character   Class :character   1st Qu.:281226023   1st Qu.:2.000  
##  Mode  :character   Mode  :character   Median :310306001   Median :3.000  
##                                        Mean   :333936881   Mean   :2.696  
##                                        3rd Qu.:400489501   3rd Qu.:4.000  
##                                        Max.   :400878160   Max.   :8.000  
##      play              player            playoffs        
##  Length:618019      Length:618019      Length:618019     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##     score              season            shot_made          time          
##  Length:618019      Length:618019      Min.   :0.0000   Length:618019     
##  Class :character   Class :character   1st Qu.:1.0000   Class :character  
##  Mode  :character   Mode  :character   Median :1.0000   Mode  :character  
##                                        Mean   :0.7568                     
##                                        3rd Qu.:1.0000                     
##                                        Max.   :1.0000
# ggpairs(df)
ggplot(data = df) +
  geom_bar(mapping = aes(x = season))

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

Exploratory Data Analysis