Directions

During ANLY 512 we will be studying the theory and practice of data visualization. We will be using R and the packages within R to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.

The objective of this assignment is to complete and explain basic plots before moving on to more complicated ways to graph data.

Each question is worth 5 points.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.

Questions

  1. Using data from the nasaweather package, create a scatterplot between wind and pressure, with color being used to distinguish the type of storm. From the graphs below, we can see that there is an inverse relationship bwtee wind and pressure. The higher wind speed is, the lower the pressure will be. In terms of type, Tropical Depression has the lowest wind speed and highest pressure. Hurrican has the highest wind speed and lowest pressure. Extratropical and Tropical Storm are in the middle between the highest and lowest.
data <- storms

ggplot(data, aes(x = wind, y = pressure, color = type)) +
  theme_bw() +
  geom_point() + 
  labs(title = 'Scatterplot of wind vs pressre', x = 'wind', y = 'pressure')

  1. Use the MLB_teams data in the mdsr package to create an informative data graphic that illustrates the relationship between winning percentage and payroll in context. From the graphs below, we can tell that there shows a relatively positive relationship between winning percentage and payroll. The higher the winning percentage is, the higher payroll is.
data <- MLB_teams

ggplot(data, aes(x = WPct, y = payroll)) +
  theme_bw() +
  geom_point() + 
  geom_smooth(method = 'lm', se = TRUE) +
  labs(title = 'Scatterplot of winning percent vs payroll', x = 'WPct', y = 'payroll')

  1. The RailTrail data set from the mosaicData package describes the usage of a rail trail in Western Massachusetts. Use these data to answer the following questions.
  1. Create a scatterplot of the number of crossings per day volume against the high temperature that day
  2. Separate your plot into facets by weekday (an indicator of weekend/holiday vs. weekday)
  3. Add regression lines to the two facets

From the graph below, we can tell that there shows a positive relationship between volume and temperature. And the positive relationship on weekday looks stronger(the slope is more steeper) than that of weekend.

RetailTrail <- RailTrail

ggplot(RetailTrail, aes(x = volume, y = hightemp, color = weekday)) +
  theme_bw() +
  geom_point() + 
  geom_smooth(method = 'lm', se = FALSE) +
  labs(title = 'Scatterplot of volume vs high temperature', x = 'volume', y = 'hightemp')

  1. Using data from the nasaweather package, use the geom_path function to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.

In the graph below, there shows the latitude and longitude of storm across time. And we can tell that most storms moved from southwest to northeast.

storms <- storms

ggplot(storms, aes(x=lat, y=long)) +
  geom_path(aes(col=name)) +
  facet_wrap(~year) 

  1. Using the penguins data set from the palmerpenguins package.
  1. Create a scatterplot of bill_length_mm against bill_depth_mm where individual species are colored and a regression line is added to each species. Add regression lines to all of your facets. What do you observe about the association of bill depth and bill length?

From the first graph below, there shows a positive relationship between bill length and bill depth. In terms of species, Chinstrap shows the steepest slope, which means that as the bill length incrase, the bill depth grows the most. b. Repeat the same scatterplot but now separate your plot into facets by species. How would you summarize the association between bill depth and bill length.

From the second graph below, there shows positive relationship between bill length and bill depth in 3 different species in the charts. Chinstrap and Gentoo in the same length range [40,60]. However Adelie is in a smaller length range[0,46]. Besides, in range of bill depth, Adelie and Chinstrap show larger than Gentoo.

data <- penguins

ggplot(data, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  theme_bw() +
  geom_point() + 
  geom_smooth(method = 'lm', se = FALSE) +
  labs(title = 'Scatterplot of bill_length vs bill_depth', x = 'bill_length', y = 'bill_depth')

ggplot(data, aes(x = bill_length_mm, y = bill_depth_mm)) +
  theme_bw() +
  geom_point(alpha = 0.5, color = 'red') + 
  geom_smooth(method = 'lm', se = FALSE) +
  facet_wrap(~species) +
  labs(title = 'Scatterplot2 of bill_length vs bill_depth', x = 'bill_length', y = 'bill_depth')