2026-06-22
Make the tidyverse available.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.3 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
A basic graph has three elements.
We will use the builtin dataframe iris for some basic example. Run glimpse to look at the contents.
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
Create a very simple ggplot object,p, and try to display it. In the call to ggplot, just specify that the data is the iris dataframe. Then display the object by typing its name
Creat an ggplot object p2. Specify that Sepal.length is mapped to x.
Display p2.
We can see that the variable Sepal.Length has been assigned to the horizontal axis, but we still don’t have a usable result. We need to add a geom to ask for some kind of graph. We’ll start with a histogram and build on p2, but we’ll keep p2 to try other geoms.
Create p2a by adding geom_histogram().
Display p2a.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
A density plot is a good alternative to a histogram. Create p2b using geom_density().
We can add on a second geom, a rug, to either of these. geom_rug() places a “bristle” where each actual value occurs.
Do this to create and display p2a_rug and p2b_rug.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Play with the bins argument for the histogram you just developed. Pick a value which seems good to you.
Play with the adjust parameter of the density geom with either p2a or p2b_rug to get a value you like. The default value is 1. Smaller values give you more detail while larger values give you a smoother plot.
We can look at the relationship between Sepal.Length and the categorical Species in a few different ways. One simple way is to include a mapping of color to Species.
Do this for the histogram. Use an aes() in the geom. Outside of the aes, set fill = “white”.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Do the same for the density.
Which of these graphs is easier to interpret?
An alternative is to create separate graphs using facet_wrap().
Do this for the histogram and p2a or p2b, setting ncol = 1.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Create a scatterplot, p3, of Sepal.Width and Sepal.Length.
It looks like there may be overplotting. Try geom_jitter() instead.
Let’s throw Species into the mix. Add it with a mapping to color in the aes in geom_jitter().
p3 = ggplot(data = iris,aes(x = Sepal.Width, y = Sepal.Length))
p3a = p3 + geom_jitter(aes(color = Species))
p3aDo that with ncol = 1.