ggplot2
basicsDuring ANLY 512 we will be studying the theory and practice of
data visualization. We will be using R
and the
packages within R
to assemble data and construct many
different types of visualizations. We begin by studying some of the
theoretical aspects of visualization. To do that we must appreciate the
basic steps in the process of making a visualization.
The objective of this assignment is to complete and explain basic plots before moving on to more complicated ways to graph data.
A couple of tips, remember that there may be pre-processing involved in your graphics so you may have to do summaries or calculations to prepare, those should be included in your work.
To ensure accuracy pay close attention to axes and labels, you will be evaluated based on the accuracy and expository nature of your graphics. Make sure your axis labels are easy to understand and are comprised of full words with units if necessary.
Each question is worth 5 points.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyper linked and that I can see the visualization and the code required to create it.
nasaweather
package, create a
scatter plot between wind and pressure, with color being used to
distinguish the type of storm.library(nasaweather)
data("storms")
storms$type <- factor(storms$type)
plot(storms$wind, storms$pressure, col = as.numeric(storms$type), xlab = "Wind", ylab = "Pressure", main = "Wind vs Pressure by Storm Type")
legend("topright", legend = levels(storms$type), col = 1:length(levels(storms$type)), pch = 1, title = "Storm Type")
MLB_teams
data in the mdsr
package
to create an informative data graphic that illustrates the relationship
between winning percentage and payroll in context.library(mdsr)
library(ggplot2)
data("MLB_teams")
MLB_teams_fullseason <- subset(MLB_teams, W + L == 162)
ggplot() +
geom_point(data = MLB_teams_fullseason, aes(x = payroll, y = WPct, size = W, color = lgID), alpha = 0.7) +
scale_size(range = c(2, 10)) +
scale_color_manual(values = c("#1b78c1", "#b71234")) +
labs(x = "payroll (millions of USD)", y = "Winning percentage",
size = "Number of wins", color = "lgID") +
theme_classic()
RailTrail
data set from the mosaicData
package describes the usage of a rail trail in Western Massachusetts.
Use these data to answer the following questions.volume
against the high temperature that dayweekday
(an indicator
of weekend/holiday vs. weekday)library(mosaicData)
library(ggplot2)
data("RailTrail")
ggplot(RailTrail, aes(x = hightemp, y = volume)) +
geom_point() +
facet_wrap(~ weekday) +
labs(x = "High Temperature (F)", y = "Number of Crossings") +
theme_bw()
# Add regression lines
ggplot(RailTrail, aes(x = hightemp, y = volume)) +
geom_point() +
facet_wrap(~ weekday) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "High Temperature (F)", y = "Number of Crossings") +
theme_bw()
nasaweather
package, use the
geom_path
function to plot the path of each tropical storm
in the storms
data table. Use color to distinguish the
storms from one another, and use faceting to plot each year in its own
panel.library(nasaweather)
library(ggplot2)
data("storms")
ggplot(storms, aes(x = long, y = lat, color = name)) +
geom_path() +
facet_wrap(~ year, ncol = 2) +
labs(x = "long", y = "lat", color = "name") +
theme_bw()
penguins
data set from the
palmerpenguins
package.library(palmerpenguins)
library(ggplot2)
data("penguins")
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ species) +
labs(x = "Bill Depth (mm)", y = "Bill Length (mm)", color = "Species") +
theme_classic()
Given the positive slope of the regression lines in each species facet,
we can infer from the figure that there is a positive correlation
between bill depth and length. While the Adelie penguin species seems to
have a weaker correlation between bill depth and bill length, the
Chinstrap and Gentoo penguin species appear to have a stronger
association. When comparing the ranges of bill depth and length among
the three species, there is some overlap; therefore, it is advisable to
use caution.
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) +
geom_point(aes(color = species)) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ species) +
labs(x = "Bill Depth (mm)", y = "Bill Length (mm)", color = "Species") +
theme_classic()
In summary, the scatterplot with facets by species shows that there is a
positive association between bill depth and bill length for all three
penguin species, but the strength of the association varies by
species.