Make a Plot

Build your plots layer by layer

Next, we add the point geom. Notice that this is a second layer on top of the base layer because we use the + symbol (Note: in the rest of the tidyverse, we use %>% or |> as the piping operator).

ggplot(data=gapminder,
            mapping=aes(x=gdpPercap,y=lifeExp)) +
      geom_point()

Alternatively, if we have saved the base plot as an object, we can add the geom layer to it:

p<- ggplot(data=gapminder,
            mapping=aes(x=gdpPercap,y=lifeExp))

p + geom_point()

One thing we often want to do is to add a smoother. This allows us to visually highlight the general trend in the data. Think of it as a visual analogue to a correlation (or regression) coefficient.

To add a linear smoother:

p + 
  geom_point() + 
  geom_smooth(formula = y ~ x, method="lm")

Notice that the visual presentation shows that a linear best fit summary of the data is likely to be very misleading. This would not be as apparent with just a regression coefficient.

What about a loess type smoother? This is a non-parametric smoother that is more flexible than a linear smoother.

p+ geom_smooth(formula = y ~ x, method="loess")

When exploring the data, it’s useful to include the original data points on the plot. This can help you see how the smoother is fitting the data.

p + geom_point() + geom_smooth(formula = y ~ x, method="loess")

Let’s try transforming the x-axis scale of GDP to deal with the bunched up data. We can do this, for example, by adding a log scale. In this case, let’s try log base 10 or scale_x_log10() to the plot.

p + geom_point() + geom_smooth(formula = y ~ x, method = "lm") + scale_x_log10()

Let’s change the x axis labels from scientific notation to actual dollars.

Let’s try using the dollar function in the scales package.

p + geom_point() + 
  geom_smooth(method = "loess", formula = y ~ x) + 
  scale_x_log10(labels = dollar) +
  labs(x="GDP per Capita", y="Life Expectancy")

Setting aesthetic properties

We can set properties like colors, size, or transparency. We do this in the aes of the geom_point.

p + geom_point(alpha = 0.2) +
    geom_smooth(color = "orange", se = FALSE, linewidth = 2, method = "lm",formula = y ~ x) +
    scale_x_log10(labels = dollar)

Here’s a polished version of the plot, with alpha transparency, a dollar x-axis and log scale, a smoother, titles and labels, and a caption.

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y=lifeExp)) + 
    geom_point(alpha = 0.2) +
    scale_y_continuous(breaks=seq(20, 100, by = 10)) +
    geom_smooth(method = "lm",formula = y ~ x) +
    scale_x_log10(labels = dollar) +
    labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapminder.")

## Introduction to Themes in ggplot2

Themes in ggplot2 allow you to control the overall look of your plots. They can change the background color, gridlines, fonts, and many other elements. Using themes can help make your visualizations more readable and aesthetically pleasing. By default, ggplot2 uses the theme_grey(), but there are many other built-in themes you can use.

We’re going to work in the ggplot2 way: first, we set up a plot, then we add a different theme as a layer on top of the plot.

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y=lifeExp))
p + geom_point(alpha = 0.2) +
    scale_y_continuous(breaks=seq(20, 100, by = 10)) +
    geom_smooth(method = "lm",formula = y ~ x) +
    scale_x_log10(labels = scales::dollar) +
    labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapminder.") +
    theme_bw()

Or let’s try the minimal theme…

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y=lifeExp))
p + geom_point(alpha = 0.2) +
    scale_y_continuous(breaks=seq(20, 100, by = 10)) +
    geom_smooth(method = "lm",formula = y ~ x) +
    scale_x_log10(labels = scales::dollar) +
    labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapminder.") +
  theme_minimal()

Let’s try saving the plot at the width and height we want.

Notice that the last ggplot object, in this case p, is saved. You can manually set this to be something else if you want with the plot = object setting of ggsave. But I never do this…

ggsave(filename = "Plots/economic growth and health.png",width=8,height=5)
ggsave(filename = "Plots/economic growth and health.pdf",width=8,height=5)

Separating out the data

We can complicate the plot by separating out continents. We do this by setting a color aesthetic by the continent variable. Notice now we have a legend automatically appear, and five smoothers, one for each continent.

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap,
                          y = lifeExp,
                          color = continent))
p + geom_point(alpha=0.2) +
    geom_smooth(method = "lm", se=F, formula = y ~ x) +
    scale_x_log10(labels = scales::dollar) +
  labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapminder.") +
  theme_minimal()

Basic Faceting with facet_wrap()

facet_wrap() is used to create a grid of plots based on a single categorical variable. Let’s use it to create separate plots for each continent.

p <- ggplot(data = gapminder %>% filter(year==2007),
            mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(alpha = 0.2) +
    geom_smooth(method = "lm", formula = y ~ x) +
    scale_x_log10(labels = scales::dollar) +
    facet_wrap(~ continent) +
    labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapminder.") +
  theme_minimal()
## Warning in qt((1 - level)/2, df): NaNs produced
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

In this example: - facet_wrap(~ continent) creates a separate plot for each continent. - Each plot shares the same scales, making it easy to compare across continents.

Customizing Facets

You can customize the appearance of facets to improve readability and aesthetics.

Adjusting the Number of Rows and Columns

Control the layout of the facets using nrow or ncol parameters.

p + geom_point(alpha = 0.2) +
    geom_smooth(method = "lm", formula = y ~ x) +
    scale_x_log10(labels = scales::dollar) +
    facet_wrap(~ continent, ncol = 2) +
    labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapminder.") +
  theme_minimal()
## Warning in qt((1 - level)/2, df): NaNs produced
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

Here, ncol = 2 arranges the facets into two columns.

Free Scales

Sometimes, it’s useful to allow each facet to have its own scale for better visualization of individual trends.

p + geom_point(alpha = 0.2) +
  geom_smooth(method = "lm", formula = y ~ x) +
  scale_x_log10(labels = scales::dollar_format(accuracy = 1)) +
  facet_wrap(~ continent, scales = "free") +
  labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapminder.") +
  theme_minimal()
## Warning in qt((1 - level)/2, df): NaNs produced
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

  • scales = "free" allows each facet to have its own x and y scales.

Text as Labels

state.x77
##                Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama              3615   3624        2.1    69.05   15.1    41.3    20
## Alaska                365   6315        1.5    69.31   11.3    66.7   152
## Arizona              2212   4530        1.8    70.55    7.8    58.1    15
## Arkansas             2110   3378        1.9    70.66   10.1    39.9    65
## California          21198   5114        1.1    71.71   10.3    62.6    20
## Colorado             2541   4884        0.7    72.06    6.8    63.9   166
## Connecticut          3100   5348        1.1    72.48    3.1    56.0   139
## Delaware              579   4809        0.9    70.06    6.2    54.6   103
## Florida              8277   4815        1.3    70.66   10.7    52.6    11
## Georgia              4931   4091        2.0    68.54   13.9    40.6    60
## Hawaii                868   4963        1.9    73.60    6.2    61.9     0
## Idaho                 813   4119        0.6    71.87    5.3    59.5   126
## Illinois            11197   5107        0.9    70.14   10.3    52.6   127
## Indiana              5313   4458        0.7    70.88    7.1    52.9   122
## Iowa                 2861   4628        0.5    72.56    2.3    59.0   140
## Kansas               2280   4669        0.6    72.58    4.5    59.9   114
## Kentucky             3387   3712        1.6    70.10   10.6    38.5    95
## Louisiana            3806   3545        2.8    68.76   13.2    42.2    12
## Maine                1058   3694        0.7    70.39    2.7    54.7   161
## Maryland             4122   5299        0.9    70.22    8.5    52.3   101
## Massachusetts        5814   4755        1.1    71.83    3.3    58.5   103
## Michigan             9111   4751        0.9    70.63   11.1    52.8   125
## Minnesota            3921   4675        0.6    72.96    2.3    57.6   160
## Mississippi          2341   3098        2.4    68.09   12.5    41.0    50
## Missouri             4767   4254        0.8    70.69    9.3    48.8   108
## Montana               746   4347        0.6    70.56    5.0    59.2   155
## Nebraska             1544   4508        0.6    72.60    2.9    59.3   139
## Nevada                590   5149        0.5    69.03   11.5    65.2   188
## New Hampshire         812   4281        0.7    71.23    3.3    57.6   174
## New Jersey           7333   5237        1.1    70.93    5.2    52.5   115
## New Mexico           1144   3601        2.2    70.32    9.7    55.2   120
## New York            18076   4903        1.4    70.55   10.9    52.7    82
## North Carolina       5441   3875        1.8    69.21   11.1    38.5    80
## North Dakota          637   5087        0.8    72.78    1.4    50.3   186
## Ohio                10735   4561        0.8    70.82    7.4    53.2   124
## Oklahoma             2715   3983        1.1    71.42    6.4    51.6    82
## Oregon               2284   4660        0.6    72.13    4.2    60.0    44
## Pennsylvania        11860   4449        1.0    70.43    6.1    50.2   126
## Rhode Island          931   4558        1.3    71.90    2.4    46.4   127
## South Carolina       2816   3635        2.3    67.96   11.6    37.8    65
## South Dakota          681   4167        0.5    72.08    1.7    53.3   172
## Tennessee            4173   3821        1.7    70.11   11.0    41.8    70
## Texas               12237   4188        2.2    70.90   12.2    47.4    35
## Utah                 1203   4022        0.6    72.90    4.5    67.3   137
## Vermont               472   3907        0.6    71.64    5.5    57.1   168
## Virginia             4981   4701        1.4    70.08    9.5    47.8    85
## Washington           3559   4864        0.6    71.72    4.3    63.5    32
## West Virginia        1799   3617        1.4    69.48    6.7    41.6   100
## Wisconsin            4589   4468        0.7    72.48    3.0    54.5   149
## Wyoming               376   4566        0.6    70.29    6.9    62.9   173
##                  Area
## Alabama         50708
## Alaska         566432
## Arizona        113417
## Arkansas        51945
## California     156361
## Colorado       103766
## Connecticut      4862
## Delaware         1982
## Florida         54090
## Georgia         58073
## Hawaii           6425
## Idaho           82677
## Illinois        55748
## Indiana         36097
## Iowa            55941
## Kansas          81787
## Kentucky        39650
## Louisiana       44930
## Maine           30920
## Maryland         9891
## Massachusetts    7826
## Michigan        56817
## Minnesota       79289
## Mississippi     47296
## Missouri        68995
## Montana        145587
## Nebraska        76483
## Nevada         109889
## New Hampshire    9027
## New Jersey       7521
## New Mexico     121412
## New York        47831
## North Carolina  48798
## North Dakota    69273
## Ohio            40975
## Oklahoma        68782
## Oregon          96184
## Pennsylvania    44966
## Rhode Island     1049
## South Carolina  30225
## South Dakota    75955
## Tennessee       41328
## Texas          262134
## Utah            82096
## Vermont          9267
## Virginia        39780
## Washington      66570
## West Virginia   24070
## Wisconsin       54464
## Wyoming         97203
states <- as_tibble(state.x77, rownames = "state") %>%
  janitor::clean_names()

states
## # A tibble: 50 × 9
##    state       population income illiteracy life_exp murder hs_grad frost   area
##    <chr>            <dbl>  <dbl>      <dbl>    <dbl>  <dbl>   <dbl> <dbl>  <dbl>
##  1 Alabama           3615   3624        2.1     69.0   15.1    41.3    20  50708
##  2 Alaska             365   6315        1.5     69.3   11.3    66.7   152 566432
##  3 Arizona           2212   4530        1.8     70.6    7.8    58.1    15 113417
##  4 Arkansas          2110   3378        1.9     70.7   10.1    39.9    65  51945
##  5 California       21198   5114        1.1     71.7   10.3    62.6    20 156361
##  6 Colorado          2541   4884        0.7     72.1    6.8    63.9   166 103766
##  7 Connecticut       3100   5348        1.1     72.5    3.1    56     139   4862
##  8 Delaware           579   4809        0.9     70.1    6.2    54.6   103   1982
##  9 Florida           8277   4815        1.3     70.7   10.7    52.6    11  54090
## 10 Georgia           4931   4091        2       68.5   13.9    40.6    60  58073
## # ℹ 40 more rows
ggplot(data = states, aes(x=income, y=life_exp)) +                
  geom_text(aes(label = state), colour = "darkblue") +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title    = "Income vs. Life Expectancy (1977)",
    x        = "Per-capita income (USD)",
    y        = "Life expectancy (years)"
  ) +
  theme_minimal()

This is good but the labels run together. We can make text smaller, but we can also ensure any collisions are automatically taken care of using the geom_text_repel function from the ggrepel package. This will automatically adjust the labels to avoid overlap.

ggplot(data = states, aes(income, life_exp))  +
  geom_smooth(method = "lm", se = FALSE, colour = "grey40") +
  geom_text_repel(aes(label = state), colour = "darkblue", size = 3) +
  labs(
    title = "Income vs. Life Expectancy (1977)",
    subtitle = "Labels repel to avoid overlap (full state names)",
    x = "Per-capita income (USD)",
    y = "Life expectancy (years)"
  ) +
  theme_minimal()