This homework is due by Tuesday, January 15th, 8:00pm.
Upload a zipped folder to Canvas called
1-visualization_homework.zipwhich contains three files:
â¢1-visualization_homework.Rmd
â¢1-visualization_homework.html
â¢1-visualization_homework.pdf
In this homework, you’ll write a short blog post about a data set. Your goal is to tell us something interesting using a well-crafted, thoughtfully-prepared data graphic. One data graphic should suffice, but you may include more if you choose (not more than 3 though). Feel free to make plots with multiple panels by using the patchwork package we’ve discussed (or one of the alternatives such as cowplot).
Your blog post should be short (between 100 and 300 words). We envision an introductory paragraph that explains your findings and provides some context to your data, the data graphic(s), and then a caption-like paragraph providing more detail about what to look for in the data graphic and how to interpret it. That is it. You will not earn more points by including more words or data graphics. What we are looking for is something that is insightful and well-crafted.
Here are some examples of articles that are similar in spirit to yours. Most of these are much longer than yours will be, but the idea is similar: use a good data graphic to tell us something we donât already know.
You are free to use whatever data you want. However, the purpose of this exercise is to learn how to make good plots â not to wrangle data (we’ll do that next). So we donât want you to spend much time wrangling data. There are perfectly good data sets available through R packages that are already well-curated. Here is a list of packages with data sets.
fivethirtyeight: provides access to data sets that drive many articles on FiveThirtyEightnycflights13: data about flights leaving from the three major NYC airports in 2013NHANES: Data from the US National Health and Nutrition Examination StudyLahman: comprehensive historical archive of major league baseball datafueleconomy: fuel economy data from the EPA, 1985â“2015datasets: package that contains a large number of data setsFor example, to take a look at the datasets in the fivethirtyeight package, you can do the following:
# install the package
install.packages("fivethirtyeight")
# load the package
library("fivethirtyeight")
# take a look at the data sets that come with the package
data(package = "fivethirtyeight")
# take a look at the help file to get more information about the different data sets (not
# all packages have help files)
help("fivethirtyeight")
# the "fivethirtyeight" provides a detailed overview over the different data sets with
# this command
vignette("fivethirtyeight", package = "fivethirtyeight")
# to load a particular data set (e.g. US_births_2000_2014, replace with the name of the
# data set you'd liked to load) into your environment, run the following
df.data = US_births_2000_2014
Note that I’ve set the code chunk option for the code block above to eval=FALSE. Thi way, the code is not evaluated. You can find out more about the different chunk options here.
library("knitr")
library("tidyverse")
# load the data set here
df <- read.csv("~/Uni/Psych 252/Homework/Week 1/week1homework/players_stats.csv")
In general, taller basketball players are worse shooters, at least according to many basketball fans. Shorter players, so the story goes, are more likely to have developed what they have in the way of skill to compensate for what they lack in the way of height. But while widely believed, statistics supporting this claim are rarely cited. What, then, should we make of it? To test it, we can analyze some data from the NBA 2014-2015 season. The data set features information about key statistics for 490 NBA players, including statistics on minutes played, rebounds made, and the like.
We might try to test the claim by examining the percentage of “free-throws” which players sucessfully make, that is, the percentage of unopposed shots that are made from the free-throw line, typically when “fouls” are called in-game. This statistic would seem to be a valid test: after all, unlike any other kind of shot on the court, one free-throw occurs under virtually the same circumstances as any other–they all occur from the same distance, with no opposition, and so forth. Hence, we will use scatter plots to view the distribution of successful shooting percentages organised by height.
# Load a nicer theme
theme_set(
theme_classic() + #set the theme
theme(text = element_text(size = 20)))
#Data preparation: removing statistics for players who have attempted few or no free throw shots or whose height is not recorded
df.filtered <- df %>%
filter(FTA > 3, !is.na(Height))
ggplot(data = df.filtered,
mapping = aes(x = Height,
y = FT.
)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "lm", se = F) +
labs(title = "Percentage of Successful Free-Throws", x = "Height (centremetres)", y = "%")
Caption: Taller players are slightly less likely to be accurate shooters, at least from the free-throw.
However, a different story emerges when we consider the percentage of non-free throw shots that are successfully made in game. This is important, since most points are gained in game when facing opposition on the court, not from the free-throw.
# replace this figure with an interesting one
#Data preparation:
df = df %>% mutate(
A. = (100 * (FGM + X3PM)/(FGA + X3PA)), #make a variable for the percentage of in game shots made
AA = (df$FGA + df$X3PA) #make a variable for the number of in game shots attempted
)
#removing statistics for players who have attempted few or no free throw shots or whose height is not recorded
df.filtered <- df %>%
filter(AA > 3, !is.na(Height))
ggplot(data = df.filtered,
mapping = aes(x = Height,
y = A.
)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "lm", se = F) +
labs(title = "Percentage of Successful In-Game Shots", x = "Height (centremetres)", y = "%")
Caption: The above shows that taller players are more likely to make their shots successfully, perhaps because they choose to take shots in circumstances where they have the advantage or because the opposition is generally less effective in defending them.
In summary, the conventional wisdom that taller players are worse shooters is only a half-truth: it is partly true insofar as taller players are sightly less likely to make their shots when unopposed, but it is partly false insofar as taller players are more likely to make their shots when it matters most, that is, when playing in-game.
Information about this R session including which version of R was used, and what packages were loaded.
sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bindrcpp_0.2.2 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.7
## [5] purrr_0.2.5 readr_1.1.1 tidyr_0.8.2 tibble_1.4.2
## [9] ggplot2_3.1.0 tidyverse_1.2.1 knitr_1.20
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.18 cellranger_1.1.0 pillar_1.3.0 compiler_3.5.2
## [5] plyr_1.8.4 bindr_0.1.1 tools_3.5.2 digest_0.6.17
## [9] lubridate_1.7.4 jsonlite_1.5 evaluate_0.11 nlme_3.1-137
## [13] gtable_0.2.0 lattice_0.20-38 pkgconfig_2.0.2 rlang_0.2.2
## [17] cli_1.0.1 rstudioapi_0.7 yaml_2.2.0 haven_1.1.2
## [21] withr_2.1.2 xml2_1.2.0 httr_1.3.1 hms_0.4.2
## [25] grid_3.5.2 tidyselect_0.2.5 glue_1.3.0 R6_2.2.2
## [29] readxl_1.1.0 rmarkdown_1.11 modelr_0.1.2 magrittr_1.5
## [33] backports_1.1.3 scales_1.0.0 htmltools_0.3.6 rvest_0.3.2
## [37] assertthat_0.2.0 colorspace_1.3-2 labeling_0.3 stringi_1.1.7
## [41] lazyeval_0.2.1 munsell_0.5.0 broom_0.5.0 crayon_1.3.4
There are 15 possible points for this homework.