ggplot2
basicsDuring ANLY 512 we will be studying the theory and practice of
data visualization. We will be using R and the
packages within R to assemble data and construct many
different types of visualizations. We begin by studying some of the
theoretical aspects of visualization. To do that we must appreciate the
basic steps in the process of making a visualization.
The objective of this assignment is to complete and explain basic plots before moving on to more complicated ways to graph data.
A couple of tips, remember that there may be preprocessing involved in your graphics so you may have to do summaries or calculations to prepare, those should be included in your work.
To ensure accuracy pay close attention to axes and labels, you will be evaluated based on the accuracy and expository nature of your graphics. Make sure your axis labels are easy to understand and are comprised of full words with units if necessary.
Each question is worth 5 points.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
nasaweather package, create a
scatterplot between wind and pressure, with color being used to
distinguish the type of storm. From the graphs below, we can see that
there is an inverse relationship bwtee wind and pressure. The higher
wind speed is, the lower the pressure will be. In terms of type,
Tropical Depression has the lowest wind speed and highest pressure.
Hurrican has the highest wind speed and lowest pressure. Extratropical
and Tropical Storm are in the middle between the highest and
lowest.data <- storms
ggplot(data, aes(x = wind, y = pressure, color = type)) +
theme_bw() +
geom_point() +
labs(title = 'Scatterplot of wind vs pressre', x = 'wind', y = 'pressure')
MLB_teams data in the mdsr package
to create an informative data graphic that illustrates the relationship
between winning percentage and payroll in context. From the graphs
below, we can tell that there shows a relatively positive relationship
between winning percentage and payroll. The higher the winning
percentage is, the higher payroll is.data <- MLB_teams
ggplot(data, aes(x = WPct, y = payroll)) +
theme_bw() +
geom_point() +
geom_smooth(method = 'lm', se = TRUE) +
labs(title = 'Scatterplot of winning percent vs payroll', x = 'WPct', y = 'payroll')
RailTrail data set from the mosaicData
package describes the usage of a rail trail in Western Massachusetts.
Use these data to answer the following questions.volume against the high temperature that dayweekday (an indicator
of weekend/holiday vs. weekday)From the graph below, we can tell that there shows a positive relationship between volume and temperature. And the positive relationship on weekday looks stronger(the slope is more steeper) than that of weekend.
RetailTrail <- RailTrail
ggplot(RetailTrail, aes(x = volume, y = hightemp, color = weekday)) +
theme_bw() +
geom_point() +
geom_smooth(method = 'lm', se = FALSE) +
labs(title = 'Scatterplot of volume vs high temperature', x = 'volume', y = 'hightemp')
nasaweather package, use the
geom_path function to plot the path of each tropical storm
in the storms data table. Use color to distinguish the
storms from one another, and use faceting to plot each year in its own
panel.In the graph below, there shows the latitude and longitude of storm across time. And we can tell that most storms moved from southwest to northeast.
storms <- storms
ggplot(storms, aes(x=lat, y=long)) +
geom_path(aes(col=name)) +
facet_wrap(~year)
penguins data set from the
palmerpenguins package.From the first graph below, there shows a positive relationship between bill length and bill depth. In terms of species, Chinstrap shows the steepest slope, which means that as the bill length incrase, the bill depth grows the most. b. Repeat the same scatterplot but now separate your plot into facets by species. How would you summarize the association between bill depth and bill length.
From the second graph below, there shows positive relationship between bill length and bill depth in 3 different species in the charts. Chinstrap and Gentoo in the same length range [40,60]. However Adelie is in a smaller length range[0,46]. Besides, in range of bill depth, Adelie and Chinstrap show larger than Gentoo.
data <- penguins
ggplot(data, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
theme_bw() +
geom_point() +
geom_smooth(method = 'lm', se = FALSE) +
labs(title = 'Scatterplot of bill_length vs bill_depth', x = 'bill_length', y = 'bill_depth')
ggplot(data, aes(x = bill_length_mm, y = bill_depth_mm)) +
theme_bw() +
geom_point(alpha = 0.5, color = 'red') +
geom_smooth(method = 'lm', se = FALSE) +
facet_wrap(~species) +
labs(title = 'Scatterplot2 of bill_length vs bill_depth', x = 'bill_length', y = 'bill_depth')