library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(palmerpenguins)
library(ggthemes)
ggplot(
data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
geom_point(aes(color = species, shape = species)) +
geom_smooth(method = "lm") +
labs(
title = "Body mass and flipper length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Flipper length (mm)", y = "Body mass (g)",
color = "Species", shape = "Species"
) +
scale_color_colorblind()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
penguins? How many columns?dim(penguins)
## [1] 344 8
bill_depth_mm variable in the
penguins data frame describe? Read the help for
?penguins to find out.?penguins
bill_depth_mm
vs. bill_length_mm. That is, make a scatterplot with
bill_depth_mm on the y-axis and bill_length_mm
on the x-axis. Describe the relationship between these two
variables.ggplot(penguins, aes(x=bill_length_mm, y=bill_depth_mm)) + geom_point()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
At a glance, there seems to be no relationships between these 2 variables.
But who knows? Maybe there a third variable that affects both of them
ggplot(penguins, aes(x=bill_length_mm, y=bill_depth_mm, color=species)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
> Yup!, as I thought!
species
vs. bill_depth_mm? What might be a better choice of geom?
> Since ‘species’ is a categorical variable. You can’t use a
scatterplot. In that case, we can use a bar chartggplot(penguins, aes(x=species, y=bill_depth_mm)) + geom_bar()
wait what? I thought you could have both aesthetic. Wait, let me ask AI, haha.
ohhh, so there is a dedicated geom for it, geom_col(). Let me try
ggplot(penguins, aes(x=species, y=bill_depth_mm)) + geom_col()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_col()`).
Okay, now that’s what I expected. However, Nemotron-4 said it’s better to visualize it using boxplot or violin plot. I wonder why? Well, let’s try it out
ggplot(penguins, aes(x=species, y=bill_depth_mm)) + geom_violin()
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_ydensity()`).
Ahh, that looks much better. Now we can see exactly how the bill_depth value changes across species. Wait, that’s a wrong interpreation. I mean, the distribution of bill_depth across the 3 species, instead of just comparing the frequency distribution.
Alright, let’s try the boxplot
ggplot(penguins, aes(x=species, y=bill_depth_mm)) + geom_boxplot()
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Pretty cool!
Why does the following give an error and how would you fix it?
ggplot(data = penguins) +
geom_point()Yeah, it’s missing the aes() function.
What does the na.rm argument do in
geom_point()? What is the default value of the argument?
Create a scatterplot where you successfully use this argument set to
TRUE.
Add the following caption to the plot you made in the previous
exercise: “Data come from the palmerpenguins package.” Hint: Take a look
at the documentation for labs().
Recreate the following visualization. What aesthetic should
bill_depth_mm be mapped to? And should it be mapped at the
global level or at the geom level?
Okay, the x and y is obvious, flipper_length_mm and body_mass,g while the bill_depth is also obvious, color. Since it applies to all geom, it should be set at global. Oh right, there is a smooth function to that runs across all the plots. So it should also be global
Alright, let’s test this out
ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g, color=bill_depth_mm)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
> Yup, perfect!