library('ggplot2')
library('tidyverse')
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble 2.1.3 ✔ purrr 0.3.2
## ✔ tidyr 1.0.0 ✔ dplyr 0.8.3
## ✔ readr 1.3.1 ✔ stringr 1.4.0
## ✔ tibble 2.1.3 ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library('GGally')
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
df <- read.csv('free_throws.csv', header = TRUE, stringsAsFactors = FALSE)
head(df)
str(df)
## 'data.frame': 618019 obs. of 11 variables:
## $ end_result: chr "106 - 114" "106 - 114" "106 - 114" "106 - 114" ...
## $ game : chr "PHX - LAL" "PHX - LAL" "PHX - LAL" "PHX - LAL" ...
## $ game_id : num 2.61e+08 2.61e+08 2.61e+08 2.61e+08 2.61e+08 ...
## $ period : num 1 1 1 1 1 1 1 2 2 2 ...
## $ play : chr "Andrew Bynum makes free throw 1 of 2" "Andrew Bynum makes free throw 2 of 2" "Andrew Bynum makes free throw 1 of 2" "Andrew Bynum misses free throw 2 of 2" ...
## $ player : chr "Andrew Bynum" "Andrew Bynum" "Andrew Bynum" "Andrew Bynum" ...
## $ playoffs : chr "regular" "regular" "regular" "regular" ...
## $ score : chr "0 - 1" "0 - 2" "18 - 12" "18 - 12" ...
## $ season : chr "2006 - 2007" "2006 - 2007" "2006 - 2007" "2006 - 2007" ...
## $ shot_made : int 1 1 1 0 1 1 1 0 1 1 ...
## $ time : chr "11:45" "11:45" "7:26" "7:26" ...
end_resoult: chr, individual scores should be extracted game_id: num should be factor period: is numeric should be factor play: long string, how many of the free shots were converted playoffs: chr, should be factor score: chr, should be factor season: chr, should be factor time: chr, should be time class.
I don’t know what shot_made really is. Let’s have a look
df[,c("shot_made", 'play')]
So there is a row for echa free shot. Even when there are two free shots because of a faul there will be two rows, one for each of the free shots. So shot_made tell is if every single shot was made or not
ranking of players with more free shots ranking of players by scoring rate *quarter with more free shots: Expected more in last quarter
summary(df)
## end_result game game_id period
## Length:618019 Length:618019 Min. :261031013 Min. :1.000
## Class :character Class :character 1st Qu.:281226023 1st Qu.:2.000
## Mode :character Mode :character Median :310306001 Median :3.000
## Mean :333936881 Mean :2.696
## 3rd Qu.:400489501 3rd Qu.:4.000
## Max. :400878160 Max. :8.000
## play player playoffs
## Length:618019 Length:618019 Length:618019
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## score season shot_made time
## Length:618019 Length:618019 Min. :0.0000 Length:618019
## Class :character Class :character 1st Qu.:1.0000 Class :character
## Mode :character Mode :character Median :1.0000 Mode :character
## Mean :0.7568
## 3rd Qu.:1.0000
## Max. :1.0000
# ggpairs(df)
ggplot(data = df) +
geom_bar(mapping = aes(x = season))
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.