Directions

During ANLY 512 we will be studying the theory and practice of data visualization. We will be using R and the packages within R to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.

The objective of this assignment is to complete and explain basic plots before moving on to more complicated ways to graph data.

Each question is worth 5 points.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyper linked and that I can see the visualization and the code required to create it.

Questions

  1. Using data from the nasaweather package, create a scatter plot between wind and pressure, with color being used to distinguish the type of storm.
library(nasaweather)
head(storms)
## # A tibble: 6 × 11
##   name     year month   day  hour   lat  long pressure  wind type        seasday
##   <chr>   <int> <int> <int> <int> <dbl> <dbl>    <int> <int> <chr>         <int>
## 1 Allison  1995     6     3     0  17.4 -84.3     1005    30 Tropical D…       3
## 2 Allison  1995     6     3     6  18.3 -84.9     1004    30 Tropical D…       3
## 3 Allison  1995     6     3    12  19.3 -85.7     1003    35 Tropical S…       3
## 4 Allison  1995     6     3    18  20.6 -85.8     1001    40 Tropical S…       3
## 5 Allison  1995     6     4     0  22   -86        997    50 Tropical S…       4
## 6 Allison  1995     6     4     6  23.3 -86.3      995    60 Tropical S…       4
summary(storms)
##      name                year          month             day       
##  Length:2747        Min.   :1995   Min.   : 6.000   Min.   : 1.00  
##  Class :character   1st Qu.:1995   1st Qu.: 8.000   1st Qu.: 9.00  
##  Mode  :character   Median :1997   Median : 9.000   Median :18.00  
##                     Mean   :1997   Mean   : 8.803   Mean   :16.98  
##                     3rd Qu.:1999   3rd Qu.:10.000   3rd Qu.:25.00  
##                     Max.   :2000   Max.   :12.000   Max.   :31.00  
##       hour             lat             long            pressure     
##  Min.   : 0.000   Min.   : 8.30   Min.   :-107.30   Min.   : 905.0  
##  1st Qu.: 3.500   1st Qu.:17.25   1st Qu.: -77.60   1st Qu.: 980.0  
##  Median :12.000   Median :25.00   Median : -60.90   Median : 995.0  
##  Mean   : 9.057   Mean   :26.67   Mean   : -60.87   Mean   : 989.8  
##  3rd Qu.:18.000   3rd Qu.:33.90   3rd Qu.: -45.80   3rd Qu.:1004.0  
##  Max.   :18.000   Max.   :70.70   Max.   :   1.00   Max.   :1019.0  
##       wind            type              seasday     
##  Min.   : 15.00   Length:2747        Min.   :  3.0  
##  1st Qu.: 35.00   Class :character   1st Qu.: 84.0  
##  Median : 50.00   Mode  :character   Median :103.0  
##  Mean   : 54.68                      Mean   :102.6  
##  3rd Qu.: 70.00                      3rd Qu.:125.0  
##  Max.   :155.00                      Max.   :185.0
ggplot(storms, aes(x=pressure, y=wind)) +
  geom_point(size=2, shape=20, aes(color = type))+ 
  theme_gray()

  1. Use the MLB_teams data in the mdsr package to create an informative data graphic that illustrates the relationship between winning percentage and payroll in context.
library(mdsr)
team = MLB_teams
head(team)
## # A tibble: 6 × 11
##   yearID teamID lgID      W     L  WPct attendance normA…¹ payroll metro…² name 
##    <int> <chr>  <fct> <int> <int> <dbl>      <int>   <dbl>   <int>   <dbl> <chr>
## 1   2008 ARI    NL       82    80 0.506    2509924   0.584  6.62e7 4489109 Ariz…
## 2   2008 ATL    NL       72    90 0.444    2532834   0.589  1.02e8 5614323 Atla…
## 3   2008 BAL    AL       68    93 0.422    1950075   0.454  6.72e7 2785874 Balt…
## 4   2008 BOS    AL       95    67 0.586    3048250   0.709  1.33e8 4732161 Bost…
## 5   2008 CHA    AL       89    74 0.546    2500648   0.582  1.21e8 9554598 Chic…
## 6   2008 CHN    NL       97    64 0.602    3300200   0.768  1.18e8 9554598 Chic…
## # … with abbreviated variable names ¹​normAttend, ²​metroPop
ggplot(team, aes(x=WPct, y=payroll)) +
  geom_point()+ 
  theme_classic()+ 
  geom_smooth(method = 'lm', se = TRUE) + 
  labs(x = "Winning Percentage", y = "Payroll")

  1. The RailTrail data set from the mosaicData package describes the usage of a rail trail in Western Massachusetts. Use these data to answer the following questions.
  1. Create a scatterplot of the number of crossings per day volume against the high temperature that day
  2. Separate your plot into facets by weekday (an indicator of weekend/holiday vs. weekday)
  3. Add regression lines to the two facets
f = RailTrail
head(f)
##   hightemp lowtemp avgtemp spring summer fall cloudcover precip volume weekday
## 1       83      50    66.5      0      1    0        7.6   0.00    501    TRUE
## 2       73      49    61.0      0      1    0        6.3   0.29    419    TRUE
## 3       74      52    63.0      1      0    0        7.5   0.32    397    TRUE
## 4       95      61    78.0      0      1    0        2.6   0.00    385   FALSE
## 5       44      52    48.0      1      0    0       10.0   0.14    200    TRUE
## 6       69      54    61.5      1      0    0        6.6   0.02    375    TRUE
##   dayType
## 1 weekday
## 2 weekday
## 3 weekday
## 4 weekend
## 5 weekday
## 6 weekday
ggplot(f, aes(x = volume, y = hightemp)) +
  theme_classic() +
  geom_point(aes(color = weekday))+ 
  geom_smooth(method = 'lm', se = FALSE) +
  labs(x = "Volume", y = "High Temperature of the Day")

  1. Using data from the nasaweather package, use the geom_path function to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.
head(storms)
## # A tibble: 6 × 11
##   name     year month   day  hour   lat  long pressure  wind type        seasday
##   <chr>   <int> <int> <int> <int> <dbl> <dbl>    <int> <int> <chr>         <int>
## 1 Allison  1995     6     3     0  17.4 -84.3     1005    30 Tropical D…       3
## 2 Allison  1995     6     3     6  18.3 -84.9     1004    30 Tropical D…       3
## 3 Allison  1995     6     3    12  19.3 -85.7     1003    35 Tropical S…       3
## 4 Allison  1995     6     3    18  20.6 -85.8     1001    40 Tropical S…       3
## 5 Allison  1995     6     4     0  22   -86        997    50 Tropical S…       4
## 6 Allison  1995     6     4     6  23.3 -86.3      995    60 Tropical S…       4
ggplot(storms, aes(x = lat, y = long))+ 
  geom_path(aes(col = name))+ 
  theme_classic()+
  facet_wrap(~year)+ 
  labs(x = "Latitud", y = "Longitud")

  1. Using the penguins data set from the palmerpenguins package.
  1. Create a scatterplot of bill_length_mm against bill_depth_mm where individual species are colored and a regression line is added to each species. Add regression lines to all of your facets. What do you observe about the association of bill depth and bill length?
  2. Repeat the same scatterplot but now separate your plot into facets by species. How would you summarize the association between bill depth and bill length.
master = penguins
head(master)
## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex    year
##   <fct>   <fct>              <dbl>         <dbl>       <int>   <int> <fct> <int>
## 1 Adelie  Torgersen           39.1          18.7         181    3750 male   2007
## 2 Adelie  Torgersen           39.5          17.4         186    3800 fema…  2007
## 3 Adelie  Torgersen           40.3          18           195    3250 fema…  2007
## 4 Adelie  Torgersen           NA            NA            NA      NA <NA>   2007
## 5 Adelie  Torgersen           36.7          19.3         193    3450 fema…  2007
## 6 Adelie  Torgersen           39.3          20.6         190    3650 male   2007
## # … with abbreviated variable names ¹​flipper_length_mm, ²​body_mass_g
ggplot(master, aes(x = bill_length_mm, y = bill_depth_mm))+ 
  theme_classic()+
  geom_point(aes(color = species))+ 
  geom_smooth(method = 'lm', aes(color = species))