1— title: “Week 2: Code along 1” author: “Alex St. Pierre” date: “2025-09-01” output: html_document: toc: yes pdf_document: default word_document: default editor_options: chunk_output_type: console —
1+2
## [1] 3
library(tidyverse)
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # ℹ 224 more rows
3.2.4 Exercises
ggplot(data = mpg)
ggplot(data = mpg) +
geom_point(mapping = aes(x = hwy, y = cyl))
4. A scatterplot of hwy VS cyl.
ggplot(data = mpg) +
geom_point(mapping = aes(x = class, y = drv))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
3.3.1 Exercises
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
mpg - Categorical Variables: manufacturer, model, trans, drv, fl, class. mpg - Continuous Variables: displ, year, cyl, cty, hwy.
Assigning a continuous variable to color changes the hue of the plot-points. Assigning this same variable to size changes the plot-point size. Assigning a continuous variable to shape throws an error message, halting execution. My analysis suggests that there are more years than available shapes. Mapping a categorical variable to these same aesthetics assigns each group with a distinct color, plot-point size, and symbol.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = year))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = year))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = drv))
## Warning: Using size for a discrete variable is not advised.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = drv))
4. Using the same continuous variable to multiple aesthetics assigns
both a specific color-gradient and size for the same plot-point. Using
the same categorical variable assigns a specific color, shape, and an
arbitrary size to the same plot-points.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = year, size = year))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv , size = drv, shape = drv))
## Warning: Using size for a discrete variable is not advised.
The stroke aesthetic determines the border width for certain shapes with both a fill and a border.
Mapping an aesthetic to something other than a variable name, in this case: ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5)) assigns a color based on whether or not the engine displacement is greater or less than 5 liters.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5))
How to get help
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~class, nrow = 2)
Exercises 3.5.1
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ displ)
ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl)) +
facet_grid(drv ~ cyl)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(~ class)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ class)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(class ~ drv)
different visual object to represent data
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
3.6.1 Exercises
Line Chart -> geom_line() Boxplot -> geom_boxplot() Histogram -> geom_histogram() Area Chart -> geom_area()
I predicted that chart would plot vehicles based on their engine displacement and fuel mileage while color-coding each point based on what type of drivetrain it had.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
3. show.legend = FALSE removes the legend from the side of the chart. If
you remove it, ggplot will show the legend by default. I assume
show.legend = FALSE was used to remove a legend where one would have
been generated automatically due to mapping color = drv.
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, group = drv))
ggplot(data = mpg) +
geom_smooth(
mapping = aes(x = displ, y = hwy, color = drv),
show.legend = FALSE
)
the “se” argument controls whether or not the shaded ribbon appears around the plotted line on the chart by assigning = TRUE or = FALSE to it.
Yes, each code chunk represents how mappings can either be set globally or locally within each geom.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
not every aesthetic works with every geom
two geoms in the same graph!
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
local vs. global mappings This makes it possible to display different aesthetics in different layers.
specify different data for each layer
3.7.1 Exercises
ggplot(data = diamonds) +
geom_pointrange(
mapping = aes(x = cut, y = depth),
stat = "summary",
fun.min = min,
fun.max = max,
fun = median
)
Geom_col() requires both an x and y value to create bars while geom_bar figures the y-value by itelf and only needs the input of an x-value. One needs both parameters while the other just needs one.
The one thing all geoms and stats have in common is that they are both assigned a default stat/geom and can be used interchangeably to chart data.
stat_smooth() computes “y” and “ymin/ymax”. It’s behavior is controlled with method, formula, se, level, span, n, and fullrange parameters.
Adding group = 1 calculates proportions across the whole dataset where ommitting it calculates proportions within each bar.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = after_stat(prop)))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = color, y = after_stat(prop)))
adjustments for bar charts
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
adjustments for scatterplots
switch x and y
set the aspect ratio correctly for maps
Polar coordinates reveal an interesting connection between a bar chart and a Coxcomb chart.
3.9.1 Exercises
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut)) +
coord_polar()
labs() enables you to rename or add titles, subtitles, captions etc. in ggplot.
coord_map() -> slower, more accurate. Uses map projections to make curved surfaces appear correctly on a flat plot. coord_quickmap() -> faster, less accurate. Adjusts the aspect ratio so that distances appear roughly accurate.
The plot shows that highway mpg is always higher than city mpg. The importance of coord_fixed is accurately shown at 45 degrees to avoid visual distortions when representing the plotted data. geom_abline() is responsible for drawing the reference line for equal city/highway mpg.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
geom_abline() +
coord_fixed()
The grammar of graphics is based on the insight that you can uniquely describe any plot as a combination of: