For this problem, we will be using the gapminder dataset from the gapminder package. You will need to install and library the gapminder package to start. a) Begin by producing the graph in Figure 3.15, by using the code Healy provides at the top of p. 66. Then, do the following (note, these are some of the Where to Go Next problems from the end of Ch. 3): b) What happens when you change the y-axis limits by adding + ylim(0, 100) to the last line of the scatterplot code? c) What happens when you put geom_smooth() before geom_point instead of after it? What does this tell you about how the plot is drawn? Think about how this might be useful when drawing plots. [produce the graph and answer the questions in your markdown] d) What happens if you map color to year instead of continent? Generate two scatterplots (just use the geom_point, don’t worry about geom_smooth) In one scatterplot, set color = year. In the second, set color = factor(year). Why do these two graphs differ and how do they differ? Which result is what you expected? Thank about what class of object year is. Remember, you can use one of our functions for inspecing the data to determine the class of this variable [glimpse(gapminder), skim(gapminder), head(gapminder), etc.]. What does the factor() function do to year?
library(gapminder)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.0
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp,
color = continent, fill = continent))
p + geom_point() +
geom_smooth(method = "loess") +
scale_x_log10()
## `geom_smooth()` using formula = 'y ~ x'
The graph is “zoomed out”, making it a little more difficult to see distinctions among the groups/continents. The slopes on the lines also do not appear as steep.
library(gapminder)
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp,
color = continent, fill = continent))
p + geom_point() +
geom_smooth(method = "loess") +
scale_x_log10() +
ylim(0, 100)
## `geom_smooth()` using formula = 'y ~ x'
Nothing changes about the graph when you switch the order of geom_point. This suggests that the geoms are created independently and then overlaid each other on the graph. This is useful to know when plotting, because it seems like I won’t have to be concered with ordering geoms correct.
p +
geom_smooth(method = "loess") +
geom_point() +
scale_x_log10()
## `geom_smooth()` using formula = 'y ~ x'
When maping color to year, the graph produces a single color blue, with a gradient of darkness. Year is being is coded as a numeric variable in the dataset. When we apply the factor() function to it, it temporarily changes it to a factor variable of distinct categories, with eac year its own distinct category. And now the plot chooses distinct colors for each year, because it is treating it as distinct categories.
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp,
color = year))
p + geom_point() +
scale_x_log10()
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp,
color = factor(year)))
p + geom_point() +
scale_x_log10()