ggplot2 basicsDuring ANLY 512 we will be studying the theory and practice of data visualization. We will be using R and the packages within R to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.
The objective of this assignment is to complete and explain basic plots before moving on to more complicated ways to graph data.
A couple of tips, remember that there may be preprocessing involved in your graphics so you may have to do summaries or calculations to prepare, those should be included in your work.
To ensure accuracy pay close attention to axes and labels, you will be evaluated based on the accuracy and expository nature of your graphics. Make sure your axis labels are easy to understand and are comprised of full words with units if necessary.
Each question is worth 5 points.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
nasaweather package, create a scatterplot between wind and pressure, with color being used to distinguish the type of storm.sample <- storms %>%
select(wind, pressure, type) %>%
group_by(type)
p <-sample %>%
ggplot(aes( x= pressure, y= wind))
p + geom_point(aes(color = type))
MLB_teams data in the mdsr package to create an informative data graphic that illustrates the relationship between winning percentage and payroll in context.data("MLB_teams", package= 'mdsr')
MLB_teamsGr <- MLB_teams %>%
mutate(MLB_teamsPop= cut(metroPop,
breaks= c(0,2500000,5000000,10000000,25000000),
labels= c("low", "medium", "high", "very high")))
MLB_teamsGr %>%
ggplot(aes(x= payroll ,y= WPct)) +
geom_point(aes(color= MLB_teamsPop))+
geom_smooth(method= "lm", se=FALSE, aes(color= MLB_teamsPop))+
labs(x="Payroll per team", y="Winning Percentage")
RailTrail data set from the mosaicData package describes the usage of a rail trail in Western Massachusetts. Use these data to answer the following questions.volume against the high temperature that dayweekday (an indicator of weekend/holiday vs. weekday)ggplot(RailTrail, aes(x = hightemp, y = volume)) +
geom_point() +
geom_smooth(method = "lm") +
labs(
title = "# Crossings per day",
x = "High Temperature(F) ",
y = "Number of Crossings per day Volume")
RailTrail %>%
mutate(weekday_nice = ifelse(weekday, "Weekday", "Weekend/Holiday")) %>%
ggplot(aes(x = hightemp, y = volume)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~ weekday_nice, nrow = 1) +
labs(
title = "Number of Crossings per day Volume against the High Temperature (Weekday vs Weekend/Holiday)",
x = "High Temperature (deg F)",
y = "Number of Crossings per day Volume")
nasaweather package, use the geom_path function to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.ggplot(storms, aes(x = lat, y = long)) +
geom_path(aes(color = name)) +
facet_wrap(~year, nrow = 3) +
labs(
title = "The Path of Tropical Storms",
col = "Name",
x = "Longitude",
y = "Latitude"
)
penguins data set from the palmerpenguins package.#a
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
scale_color_brewer(palette = 'Dark2')+
geom_point() +
geom_smooth(method = "lm", aes(color = species)) +
labs(
col = 'Species',
title = 'Scatterplot Between Bill Length and Bill Depth By Species',
x = 'Bill Length (mm)',
y = 'Bill Depth (mm)',
)
# b
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
scale_color_brewer(palette = 'Dark2')+
geom_point() +
geom_smooth(method = "lm", color = "blue") +
facet_wrap(~species) +
labs(
col = 'Species',
title = 'Scatterplot Between Bill Length and Depth By Species',
x = 'Bill Length (mm)',
y = 'Bill Depth (mm)',
)