M1 Midterm I

# Load package(s)
library(ggplot2)
library(tidyverse)

## -- Attaching packages ----------------------------------------------- tidyverse 1.2.1 --

## v tibble  2.1.1       v purrr   0.3.2  
## v tidyr   0.8.3       v dplyr   0.8.0.1
## v readr   1.3.1       v stringr 1.4.0  
## v tibble  2.1.1       v forcats 0.4.0

## -- Conflicts -------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)
library(knitr)
library(grid)
library(jpeg)

# Read in the dataset(s)
stephen_curry_shotdata_2014_15<-
  read_delim(file = "data/stephen_curry_shotdata_2014_15.txt",
             delim = "|")

## Parsed with column specification:
## cols(
##   GAME_ID = col_character(),
##   PLAYER_ID = col_double(),
##   PLAYER_NAME = col_character(),
##   TEAM_ID = col_double(),
##   TEAM_NAME = col_character(),
##   PERIOD = col_double(),
##   MINUTES_REMAINING = col_double(),
##   SECONDS_REMAINING = col_double(),
##   EVENT_TYPE = col_character(),
##   SHOT_TYPE = col_character(),
##   SHOT_DISTANCE = col_double(),
##   LOC_X = col_double(),
##   LOC_Y = col_double()
## )

Exercise 1

ggplot(stephen_curry_shotdata_2014_15, 
       aes(x = PERIOD, y = SHOT_DISTANCE)) +
  ggtitle("Stephen Curry \n2014-2015") +
  geom_boxplot(varwidth = TRUE, aes(group = PERIOD)) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size =14),
        axis.title  = element_text(face = "bold", size = 12),
        strip.text = element_text(face = "bold", size = 12),
        panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank()) +
  scale_x_discrete(limits = c("Q1","Q2","Q3","Q4","OT") ) +
  scale_y_continuous(labels = scales::unit_format(unit = "ft")) +
  facet_wrap(~EVENT_TYPE) +
  xlab("Quarter/Period") +
  ylab("")

label1<-paste("Made Shots")
label2<-paste("Missed Shots")

ggplot(stephen_curry_shotdata_2014_15, aes(x=SHOT_DISTANCE, fill = EVENT_TYPE)) +
  ggtitle("Stephen Curry \nShot Densities (2014-2015)") +
  theme_minimal() +
  geom_density(alpha = 0.3) +
  annotate("text", x = 6.8, y = 0.04, label = label1) +
  annotate("text", x = 30.8, y = 0.07, label = label2) +
  theme(legend.position = "none",
        panel.grid = element_blank(),
        plot.title = element_text(size = 14),
        axis.text.y = element_blank()) +
  scale_fill_manual(values = c("green","red")) +
  scale_x_continuous(labels = scales::unit_format(unit = "ft")) +
  xlab("") +
  ylab("")

According to the plots, Stephen Curry seems to enjoy shooting from an average distance from the hoop of about 20ft. Curry also seems to have made shots from a wider range of distances, approximately from 0~5 ft to 20~25ft. In missed shots, he seems to have shot from a narrower range of distance, from 10~15ft to 20~25ft. Curry also seems to have made most of his shots taken from a short distance, mainly shots from 5ft~15ft, approximately. That means the further he was from the goal, the harder it was for him to score. As a phenomenal athlete as he is, Curry seems to have scored a similar amount of goals over different time periods in the game.

Exercise 2

After examining the two graphics, what do you conclude about Stephen Curry’s shot selection (i.e. distance form hoop) for the 2014-2015 season? Out of the four graphics (two from Exercise 1 and two from Exercise 2), which graphic(s) do you find the most useful when trying to understand Stephen Curry’s shot selection? If you find them all useful, explain what information is better communicated in each.

#This code loads the nbahalfcourt.jpg image as the background setting of the plot, here noted as court.
court <- rasterGrob(readJPEG(source = "data/nbahalfcourt.jpg"),
                    width=unit(1,"npc"), height=unit(1,"npc"))

ggplot(stephen_curry_shotdata_2014_15, 
       aes(LOC_X, LOC_Y)) +
  ggtitle("Shot Chart \nStephen Curry") +
  theme_minimal() +
#The following annotation_custom function sets the limits of the x and y axis of the panel, which is set by the image of court. 
  annotation_custom(grob = court,
                    xmin = -250,xmax =  250,
                    ymin = -52, ymax =  418) +
  geom_hex(bins = 20, aes(alpha = 0.7), colour = "grey") +
  theme(panel.grid = element_blank(),
        axis.text = element_blank(),
        plot.title = element_text(face = "bold", size = 15),
        legend.title = element_text(size = 14)) +
  scale_fill_continuous(na.value = "red",
                        low = "yellow", high = "red",
                        limit = c(0, 15),
                        breaks = c(0, 5, 10, 15)) +
#The coord_fixed function orders the data to be plotted based on a certain scale. 
  coord_fixed() +
#The guides function disregards intentionally the 'extra alpha, extra size and their legends', which means, without the guides function, there would be another legend on the side saying that the alpha = 0.7, which is unnecessary.
  guides(alpha = FALSE, size = FALSE) +
  xlim(250, -250) +
  ylim(-52, 418) +
  labs(x = "", y = "", fill = "Shot \nAttempts")

#This code loads the nbahalfcourt.jpg image as the background setting of the plot, here noted as court.
court <- rasterGrob(readJPEG(source = "data/nbahalfcourt.jpg"),
                    width=unit(1,"npc"), height=unit(1,"npc"))

ggplot(stephen_curry_shotdata_2014_15, aes(LOC_X, LOC_Y, group = EVENT_TYPE)) + 
  ggtitle("Shot Chart \nStephen Curry") +
  theme_minimal() +
#The following annotation_custom function sets the limits of the x and y axis of the panel, which is set by the image of court. 
  annotation_custom(grob = court,
                    xmin = -250,xmax =  250,
                    ymin = -52, ymax =  418) +
  geom_point(aes(shape = EVENT_TYPE, color = EVENT_TYPE), size = 6) +
  scale_shape_manual(values = c(1, 4)) +
  scale_color_manual(values = c("green", "red")) +
  scale_fill_hue(l = 20) +
  theme(panel.grid = element_blank(),
        axis.text = element_blank(),
        plot.title = element_text(face = "bold", size = 14),
        legend.title = element_blank(),
        legend.position = "bottom",
        legend.text = element_text(size = 12),
        legend.spacing.x  = unit(0.5, 'cm')) +
#The coord_fixed function orders the data to be plotted based on a certain scale.
  coord_fixed() +
#The guides function disregards intentionally the 'extra alpha, extra size and their legends', which means, without the guides function, there would be another legend on the side saying that the alpha = 0.7, which is unnecessary.
  guides(alpha = FALSE, size = FALSE) +
  xlim(250, -250) +
  ylim(-52, 418) +
  xlab("") +
  ylab("")

Surprisingly, the two plots of Exercise 2 shows us with new, fresh interpretation of data, as they are extremely relatable plots of data to an actual basketball court. During the 2014-2015 NBA season Curry seems to have enjoyed the most shooting right outside of the 3 point line in the center, as phenomenal of an athlete he is. Also, he seemed to have enjoyed shooting from right beneath the hoop. In understanding Curry’s shot selection, I believe the first plot of Exercise 2 is the best because it helps us understand the spread and concentration of his shots in hexagons, which are fairly easy to read in shape and color.

Exercise 3

Part 1

In 3-5 sentences, describe the core concept/idea and structure of the ggplot2 package.

The core concept/idea of the ggplot 2 package is to create readable, comprehensive, and interpretable graphics and visualizations through a wide range of datasets, using the grammar of plotting. The grammar of ggplot2 structures the graphic so it is expressed in an aligned manner. The grammar, or structure of ggplot 2, includes factors such as data, which is plotted in the graphic, layers, which indicates what type of graphic shall be produced (such as a scatterplot for geom_point, bar plot for geom_bar, box plot for geom_boxplot, etc), scales, which indicates upon what scale the data shall be plotted on, coord, which provides the gridlines of the plot, faceting, which groups the data based on specific factors, and finally theme, which decides multiple aspects of the plot, such as color and size.

Part 2

Describe each of the following:

ggplot()

This function provides the foundation of the data visualization plot. It incorporates the data and aesthetic mapping components, which are the base roots of graphics.

aes()

The aesthetic mapping commands which x and y components from the dataset shall be mapped out onto the plot, and controls other factors such as color and size of the dataset mappings.

geoms

The geoms function determines what type of plot shall be mapped out. This includes geom_boxplot (creates a boxplot), geom_histogram (creates a histogram), geom_bar (creates a barplot), and many more.

stats

A stats function, such as stat_sum and stat_boxplot, summarises the data being plotted and therefore transforms the data to generate and create data visualization sets.

scales

Scales is a key component of ggplot that helps visualize data that is portrayed in scales, that is, on the xy plane. It is embodied in the basic ‘ggplot() + geom()’ format, and defines the x axis and y axis scales for the data to be shown.

theme()

The theme elements control non-data aspects of the plot, such as x and y axis labels, plot title, legend title, and overall layout design of the plot.

Part 3

Explain the difference between using this code geom_point(aes(color = VARIABLE)) as opposed to using geom_point(color = VARIABLE).

‘geom_point(aes(color = VARIABLE))’ scales the color of the scatterpoints based on their correspondence to that VARIABLE. ‘geom_point(color = VARIABLE)’ on the other hand, the value ‘VARIABLE’ is scaled to a certain color (like pink, etc).