ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
ggplot() is used to construct the initial plot object, and is almost always followed by + to add component to the plot;
geom_TYPE is the geometrical object that a plot uses to represent data;
aes(x, y, ...) aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. Aesthetic mappings can be set in ggplot() and in individual layers.
facet is another way to add additional variables is with facet functions, particularly useful for categorical variables. They split your plot into facets, subplots that each display one subset of the data.
This lab assignment is based on mpg data from ggplot2 package. This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. Each row of the data frame represents a different car model and. There are 234 rows and 11 variables in the dataset. You can type ?mpg in the console to check details of the dataset.
You will need to modify the code chunks so that the code works within each of chunk (usually this means modifying anything in ALL CAPS). You will also need to modify the code outside the code chunk. When you get the desired result for each step, change Eval=F to Eval=T and knit the document to PDF to make sure it works. After you complete the lab, you should submit your PDF file of what you have completed to Gradescope before the deadline. For those who cannot knit to PDF, please knit to HTML then print HTML to PDF with your internet browser.
displ(engine displacement) and hwy (highway miles per gallon) from mpg with displ on x-axis and hwy on y-axis.ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy))
lm) as smoothing method.ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) +
geom_smooth(aes(x = displ, y = hwy),method = "lm")
ggplot() function. Is there any difference between plot (c) and plot (b)?ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = "lm")
ANSWER:_____No difference. The scatterplot and smoothing curve are based on the same variables so aesthetic mappings can be specified inside ggplot() function. Itβs called global mapping in this situation._____
class (type of car) in mpg.ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy, colour = class)) +
geom_smooth(aes(x = displ, y = hwy), method = "lm")
facet_wrap to visualize the relationship between displ and hwy based on class.ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(class~., nrow=2)
facet_grid to visualize the relationship between displ and hwy based on the relationship between drv (type of drive train) and cyl (number of cylinders).ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
(Note that both drv and cyl are categorical variables. Their relationship will form a contingency table. The final plot visualizes relationship between displ and hwy based on each element of the contingency table.)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
ANSWER: The plot in (a) displays only 126 points, even though there are 234 observations in the dataset.
position = "jitter" avoids overplotting.
geom_jitter is a convenient shortcut for geom_point(position = "jitter"). Generate the plot in (g) and set the points to be transparent with scale .5.ggplot(data = mpg) +
geom_jitter(mapping = aes(x = displ, y = hwy), alpha=.5)
hwy based on class.ggplot(data = mpg) +
geom_boxplot(mapping = aes(x = class, y = hwy))
Flip the coordinates of the boxplot with coord_flip().
ggplot(data = mpg) +
geom_boxplot(mapping = aes(x = class, y = hwy)) +
coord_flip()