This homework is due by Tuesday, January 15th, 8:00pm.

Upload a zipped folder to Canvas called 1-visualization_homework.zip which contains three files:
• 1-visualization_homework.Rmd
• 1-visualization_homework.html
• 1-visualization_homework.pdf

Instructions

In this homework, you’ll write a short blog post about a data set. Your goal is to tell us something interesting using a well-crafted, thoughtfully-prepared data graphic. One data graphic should suffice, but you may include more if you choose (not more than 3 though). Feel free to make plots with multiple panels by using the patchwork package we’ve discussed (or one of the alternatives such as cowplot).

Your blog post should be short (between 100 and 300 words). We envision an introductory paragraph that explains your findings and provides some context to your data, the data graphic(s), and then a caption-like paragraph providing more detail about what to look for in the data graphic and how to interpret it. That is it. You will not earn more points by including more words or data graphics. What we are looking for is something that is insightful and well-crafted.

Here are some examples of articles that are similar in spirit to yours. Most of these are much longer than yours will be, but the idea is similar: use a good data graphic to tell us something we don’t already know.

Data

You are free to use whatever data you want. However, the purpose of this exercise is to learn how to make good plots – not to wrangle data (we’ll do that next). So we don’t want you to spend much time wrangling data. There are perfectly good data sets available through R packages that are already well-curated. Here is a list of packages with data sets.

  • fivethirtyeight: provides access to data sets that drive many articles on FiveThirtyEight
  • nycflights13: data about flights leaving from the three major NYC airports in 2013
  • NHANES: Data from the US National Health and Nutrition Examination Study
  • Lahman: comprehensive historical archive of major league baseball data
  • fueleconomy: fuel economy data from the EPA, 1985–2015
  • datasets: package that contains a large number of data sets

For example, to take a look at the datasets in the fivethirtyeight package, you can do the following:

# install the package 
install.packages("fivethirtyeight") 

# load the package 
library("fivethirtyeight")

# take a look at the data sets that come with the package
data(package = "fivethirtyeight")

# take a look at the help file to get more information about the different data sets (not 
# all packages have help files)
help("fivethirtyeight")

# the "fivethirtyeight" provides a detailed overview over the different data sets with 
# this command
vignette("fivethirtyeight", package = "fivethirtyeight")

# to load a particular data set (e.g. US_births_2000_2014, replace with the name of the 
# data set you'd liked to load) into your environment, run the following 
df.data = US_births_2000_2014

Note that I’ve set the code chunk option for the code block above to eval=FALSE. Thi way, the code is not evaluated. You can find out more about the different chunk options here.

Are taller players better or worse shooters? Well, they’re both…

Load packages

Add the package with the data set that you’d like to load below.

library("knitr")
library("tidyverse")

Load the data set

# load the data set here

df <- read.csv("~/Uni/Psych 252/Homework/Week 1/week1homework/players_stats.csv")

Description

[Write a short text describing the data, and motivating your question here.]

Taller basketball players are worse shooters, at least according to common sense among fans. Shorter players, so the story goes, are more likely to have developed what they have in the way of skill to compensate for what they lack in the way of height. But while widely believed, statistics supporting this claim are rarely cited. What, then, should we make of it? To test this claim, we can analyze some data from the NBA 2014-2015 season. The data set features information about key statistics for 490 NBA players, including statistics on minutes played, rebounds made, and the like.

We might try to test the claim by examing the percentage of “free-throws” which players make: that is, the percentage of unopposed shots that are made from the free-throws, typically when “fouls” are called in game. This statistic would seem to be a valid test: after all, unlike any other kind of shot on the court, one free-throw occurs under virtually the same circumstances as any other–they all occur from the same distance, with no opposition, and so forth.

Figure

# replace this figure with an interesting one

theme_set(
  theme_classic() + #set the theme 
    theme(text = element_text(size = 20)) #set the default text size
)


#Data preparation: removing statistics for players who have attempted no free throw shots


df.filtered = df %>%
  filter(FTA!="0")

ggplot(data = df.filtered,
       mapping = aes(x = Height,
                     y = FT.
                     )) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm", se = F) +
  labs(title = "Percentage of Successful Free-Throws by Height", x = "Height", y = "%")
## Warning: Removed 66 rows containing non-finite values (stat_smooth).
## Warning: Removed 66 rows containing missing values (geom_point).

Caption: Taller players are less likely to be accurate shooters, at least from the free-throw.

However, a different story emerges when we consider the percentage of shots that are successfully made in game aside from free-throws. This is important, since most points are gained in game when facing opposition–not from the free-throw.

# replace this figure with an interesting one

#Data preparation:




df$A. <- (100 * (df$FGM + df$X3PM)/(df$FGA + df$X3PA))

df$AA <- (df$FGA + df$X3PA)

df.filtered = df %>%
  filter(df$AA!="0")

ggplot(data = df.filtered,
       mapping = aes(x = Height,
                     y = A.
                     )) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm", se = F) +
  labs(title = "Percentage of Successful In-Game Shots by Height", x = "Height", y = "%")
## Warning: Removed 67 rows containing non-finite values (stat_smooth).
## Warning: Removed 67 rows containing missing values (geom_point).

The above shows that taller players are more likely to make their shots successfully, perhaps because they choose to take shots in circumstances where they have the advantage or because their opposition is generally less effective in defending them.

In summary, conventional wisdom that taller players are worse shooter is a half-truth: it’s partly true insofar as taller players are slightly less skilled at shooting when other things are equal, but it is partly false insofar as taller players are more likely to make their shots when it matters most, that is, when playing in-game.

Write a caption that succinctly explains the figure here. (Ideally, the figure and caption can be understood without having to read the rest of the text.)

Session info

Information about this R session including which version of R was used, and what packages were loaded.

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2.2  forcats_0.3.0   stringr_1.3.1   dplyr_0.7.7    
##  [5] purrr_0.2.5     readr_1.1.1     tidyr_0.8.2     tibble_1.4.2   
##  [9] ggplot2_3.1.0   tidyverse_1.2.1 knitr_1.20     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18     cellranger_1.1.0 pillar_1.3.0     compiler_3.5.2  
##  [5] plyr_1.8.4       bindr_0.1.1      tools_3.5.2      digest_0.6.17   
##  [9] lubridate_1.7.4  jsonlite_1.5     evaluate_0.11    nlme_3.1-137    
## [13] gtable_0.2.0     lattice_0.20-38  pkgconfig_2.0.2  rlang_0.2.2     
## [17] cli_1.0.1        rstudioapi_0.7   yaml_2.2.0       haven_1.1.2     
## [21] withr_2.1.2      xml2_1.2.0       httr_1.3.1       hms_0.4.2       
## [25] rprojroot_1.3-2  grid_3.5.2       tidyselect_0.2.5 glue_1.3.0      
## [29] R6_2.2.2         readxl_1.1.0     rmarkdown_1.10   modelr_0.1.2    
## [33] magrittr_1.5     backports_1.1.2  scales_1.0.0     htmltools_0.3.6 
## [37] rvest_0.3.2      assertthat_0.2.0 colorspace_1.3-2 labeling_0.3    
## [41] stringi_1.1.7    lazyeval_0.2.1   munsell_0.5.0    broom_0.5.0     
## [45] crayon_1.3.4

Grading Rubric

There are 15 possible points for this homework.

Baseline
  • +1 for an .Rmd file that compiles without errors
  • +1 for describing the dataset
  • +1 for having a plot
  • +1 for including the code that generated the plot
  • +1 for describing the visual mapping (i.e. a key)
Average
  • +1 unnecessary messages from R are hidden from being displayed in the HTML
  • +1 for including a catchy and/or engaging title
  • +1 for having at least 100 words and no more than 500 words
  • +1 for explaining in a single coherent sentence what we can learn from this graphic
  • +1 for explaining the choice of geometric mapping
Advanced
  • +1 blog post text provides context or background useful in interpreting the graphic
  • +0-4 WOW factor: awarded at the grader’s discretion for submissions that are exceptionally compelling