During ANLY 512 we will be studying the theory and practice of
data visualization. We will be using R and the
packages within R to assemble data and construct many
different types of visualizations. We begin by studying some of the
theoretical aspects of visualization. To do that we must appreciate the
basic steps in the process of making a visualization.
The objective of this assignment is to complete and explain basic plots before moving on to more complicated ways to graph data.
A couple of tips, remember that there may be pre-processing involved in your graphics so you may have to do summaries or calculations to prepare, those should be included in your work.
To ensure accuracy pay close attention to axes and labels, you will be evaluated based on the accuracy and expository nature of your graphics. Make sure your axis labels are easy to understand and are comprised of full words with units if necessary.
Each question is worth 5 points.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyper linked and that I can see the visualization and the code required to create it.
nasaweather package, create a
scatter plot between wind and pressure, with color being used to
distinguish the type of storm.head(storms)
## # A tibble: 6 × 11
## name year month day hour lat long pressure wind type seasday
## <chr> <int> <int> <int> <int> <dbl> <dbl> <int> <int> <chr> <int>
## 1 Allison 1995 6 3 0 17.4 -84.3 1005 30 Tropical D… 3
## 2 Allison 1995 6 3 6 18.3 -84.9 1004 30 Tropical D… 3
## 3 Allison 1995 6 3 12 19.3 -85.7 1003 35 Tropical S… 3
## 4 Allison 1995 6 3 18 20.6 -85.8 1001 40 Tropical S… 3
## 5 Allison 1995 6 4 0 22 -86 997 50 Tropical S… 4
## 6 Allison 1995 6 4 6 23.3 -86.3 995 60 Tropical S… 4
storms %>%
ggplot(aes(x=pressure, y=wind)) +
geom_point(aes(color=type)) +
ggtitle("scatter plot between wind and pressure")
MLB_teams data in the mdsr package
to create an informative data graphic that illustrates the relationship
between winning percentage and payroll in context.library(scales)
head(MLB_teams)
## # A tibble: 6 × 11
## yearID teamID lgID W L WPct attendance normA…¹ payroll metro…² name
## <int> <chr> <fct> <int> <int> <dbl> <int> <dbl> <int> <dbl> <chr>
## 1 2008 ARI NL 82 80 0.506 2509924 0.584 6.62e7 4489109 Ariz…
## 2 2008 ATL NL 72 90 0.444 2532834 0.589 1.02e8 5614323 Atla…
## 3 2008 BAL AL 68 93 0.422 1950075 0.454 6.72e7 2785874 Balt…
## 4 2008 BOS AL 95 67 0.586 3048250 0.709 1.33e8 4732161 Bost…
## 5 2008 CHA AL 89 74 0.546 2500648 0.582 1.21e8 9554598 Chic…
## 6 2008 CHN NL 97 64 0.602 3300200 0.768 1.18e8 9554598 Chic…
## # … with abbreviated variable names ¹​normAttend, ²​metroPop
ggplot(MLB_teams, aes(x=payroll, y=WPct)) +
geom_point() +
geom_smooth(method="lm") +
ggtitle('relationship between winning percentage and payroll') +
xlab('payroll') +
ylab('winning percentage') +
scale_y_continuous(labels=scales::percent) +
scale_x_continuous(labels=label_number(suffix="M", scale=1e-6))
RailTrail data set from the mosaicData
package describes the usage of a rail trail in Western Massachusetts.
Use these data to answer the following questions.volume against the high temperature that dayweekday (an indicator
of weekend/holiday vs. weekday)head(RailTrail)
## hightemp lowtemp avgtemp spring summer fall cloudcover precip volume weekday
## 1 83 50 66.5 0 1 0 7.6 0.00 501 TRUE
## 2 73 49 61.0 0 1 0 6.3 0.29 419 TRUE
## 3 74 52 63.0 1 0 0 7.5 0.32 397 TRUE
## 4 95 61 78.0 0 1 0 2.6 0.00 385 FALSE
## 5 44 52 48.0 1 0 0 10.0 0.14 200 TRUE
## 6 69 54 61.5 1 0 0 6.6 0.02 375 TRUE
## dayType
## 1 weekday
## 2 weekday
## 3 weekday
## 4 weekend
## 5 weekday
## 6 weekday
RailTrail %>%
mutate(weekday_symbol = ifelse(weekday, "weekday", "weekend/holiday")) %>%
ggplot(aes(x=hightemp, y=volume)) +
geom_point() +
ggtitle("scatterplot of the number of crossings per day volume against the high temperature") +
xlab("high temperature") +
ylab("number of crossings per day") +
facet_wrap(~weekday_symbol, ncol=2) +
geom_smooth(method="lm", se=TRUE)
nasaweather package, use the
geom_path function to plot the path of each tropical storm
in the storms data table. Use color to distinguish the
storms from one another, and use faceting to plot each year in its own
panel.head(storms)
## # A tibble: 6 × 11
## name year month day hour lat long pressure wind type seasday
## <chr> <int> <int> <int> <int> <dbl> <dbl> <int> <int> <chr> <int>
## 1 Allison 1995 6 3 0 17.4 -84.3 1005 30 Tropical D… 3
## 2 Allison 1995 6 3 6 18.3 -84.9 1004 30 Tropical D… 3
## 3 Allison 1995 6 3 12 19.3 -85.7 1003 35 Tropical S… 3
## 4 Allison 1995 6 3 18 20.6 -85.8 1001 40 Tropical S… 3
## 5 Allison 1995 6 4 0 22 -86 997 50 Tropical S… 4
## 6 Allison 1995 6 4 6 23.3 -86.3 995 60 Tropical S… 4
ggplot(storms, aes(x=lat, y=long)) +
geom_path(aes(color=name)) +
facet_wrap(~year, ncol = 2) +
ggtitle("path of each tropical storm") +
xlab("lat") +
ylab("long")
penguins data set from the
palmerpenguins package.head(penguins)
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
## <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
## 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
## 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
## 4 Adelie Torgersen NA NA NA NA <NA> 2007
## 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
## 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
## # … with abbreviated variable names ¹​flipper_length_mm, ²​body_mass_g
#a
ggplot(penguins, aes(x=bill_length_mm, y=bill_depth_mm, color=species)) +
geom_point() +
geom_smooth(method="lm", aes(color=species)) +
ggtitle("scatterplot of bill_length_mm against bill_depth_mm") +
xlab("length (mm)") +
ylab("depth (mm)")
#b
ggplot(penguins, aes(x=bill_length_mm, y=bill_depth_mm, color=species)) +
geom_point() +
geom_smooth(method="lm") +
ggtitle("scatterplot of bill_length_mm against bill_depth_mm separated by species") +
xlab("length (mm)") +
ylab("depth (mm)") +
facet_wrap(~species, ncol = 3)