Introduction to GG plot

Hari Poorna Kumar Kalahasti

2023-06-13

Setup

Running all the libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gapminder)
library(skimr)
library(ggthemes)

loading the data

data("gapminder")

b. Using the glimpse function to explore the data set gapminder

glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …

The categorical features present in the gapminder data set are:

  1. country - This describes the name of a country and it is a factor consisting 142 levels.

  2. continent - This describes the name of the continent and it is a factor with 5 levels.

The quantitative features in this data set are:

  1. lifeExp - This describes the life expectancy at birth in years.

  2. pop - This describes the population of a country.

  3. gdpPercap - This describes the GDP .i.e Gross domestic product.

  4. years - This describes the year.

C. Using the skim() to further explore the data set

skim(gapminder)
Data summary
Name gapminder
Number of rows 1704
Number of columns 6
_______________________
Column type frequency:
factor 2
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
country 0 1 FALSE 142 Afg: 12, Alb: 12, Alg: 12, Ang: 12
continent 0 1 FALSE 5 Afr: 624, Asi: 396, Eur: 360, Ame: 300

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 1979.50 17.27 1952.00 1965.75 1979.50 1993.25 2007.0 ▇▅▅▅▇
lifeExp 0 1 59.47 12.92 23.60 48.20 60.71 70.85 82.6 ▁▆▇▇▇
pop 0 1 29601212.32 106157896.74 60011.00 2793664.00 7023595.50 19585221.75 1318683096.0 ▇▁▁▁▁
gdpPercap 0 1 7215.33 9857.45 241.17 1202.06 3531.85 9325.46 113523.1 ▇▁▁▁▁

From the above table we can observe that there are no missing values in this data set.

d. Creating a scatter plot of Life expectancy Vs time

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year)) +
  geom_point() +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)")

e. Recreating the scatter plot now adding geom_smooth()

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

f. Coloring the plot

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

g. Facetting the plot by continent

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_grid(. ~ continent) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

h. Making the plot color blind friendly and adding a theme of my choice.

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_grid(. ~ continent) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)") +
  scale_colour_colorblind() +
  theme_grey()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

i. Rotating the labels in the x axis by 45 degrees.

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_grid(. ~ continent) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)") +
  scale_colour_colorblind() +
  theme_grey() +
  theme(axis.text.x = element_text(angle = 45))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

j. Removing the legend from the plot.

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_grid(. ~ continent) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)") +
  scale_colour_colorblind() +
  theme_grey() +
  theme(axis.text.x = element_text(angle = 45),
        legend.position = "none")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

2

create a data set called gapminder2007 which contains the top 20 most populated countries in 2007.

gapminder2007 <- gapminder %>% filter(year == 2007) %>% slice_max(pop, n = 20)
glimpse(gapminder2007)
## Rows: 20
## Columns: 6
## $ country   <fct> "China", "India", "United States", "Indonesia", "Brazil", "P…
## $ continent <fct> Asia, Asia, Americas, Asia, Americas, Asia, Asia, Africa, As…
## $ year      <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, …
## $ lifeExp   <dbl> 72.961, 64.698, 78.242, 70.650, 72.390, 65.483, 64.062, 46.8…
## $ pop       <int> 1318683096, 1110396331, 301139947, 223547000, 190010647, 169…
## $ gdpPercap <dbl> 4959.1149, 2452.2104, 42951.6531, 3540.6516, 9065.8008, 2605…

a. Creating a Bar chart using gapminder2007 and geom_col()

gapminder2007 %>% ggplot(aes(x = country,
                             y = pop)) +
  geom_col() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population")

b. Sorting the plot using fct_reorder()

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop)) +
  geom_col() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population")

c. Changing the colors based on continet and making the outline black.

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population")

d. Flipping the coordinates using coord_flip()

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip()

e. Moving the legend to the bottom

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population") +
    theme(legend.title = element_blank(),
          legend.position = "bottom")

f. Adding descriptive labels for the axes, title, and a caption below the plot.

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population",
       caption = "Data set from gapminder2007 which contains the top 20 most populated countries in 2007") +
    theme(legend.title = element_blank(),
          legend.position = "bottom")

g. Adding theme

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population") +
  theme_few() +
    theme(legend.title = element_blank(),
          legend.position = "bottom")

h: using color-blind friendly colors by adding a scale_fill_manual() layer using the code

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population") +
  theme_few() +
  theme(legend.title = element_blank(),
        legend.position = "bottom") +
  scale_fill_manual(values = c("#D55E00", "#009E73", "#56B4E9", "#CC79A7"))

g: displaying commas in the population numbers rather than scientific notation by adding a scale_y_continuous() layer using the code below.

library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population") +
  theme_few() +
  theme(legend.title = element_blank(),
        legend.position = "bottom") +
  scale_fill_manual(values = c("#D55E00", "#009E73", "#56B4E9", "#CC79A7")) +
  scale_y_continuous(labels = comma)

Removing the excess space in between the bars and the axis by specifying the expand argument inside of the scale_y_continuous() layer using expand = expansion(mult = c(0, .1)).

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population") +
  theme_few() +
  theme(legend.title = element_blank(),
        legend.position = "bottom") +
  scale_fill_manual(values = c("#D55E00", "#009E73", "#56B4E9", "#CC79A7")) +
  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, .1)))

Modify the previous plot by specifying a theme from the ggthemes package: https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/ and adding the custom theme layer before the final theme() call so that the positioning of the legend is kept at the bottom.

library(ggthemes)
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population") +
  theme_few() +
  scale_fill_manual(values = c("#D55E00", "#009E73", "#56B4E9", "#CC79A7")) +
  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, .1))) +
  theme_economist_white() +
  theme(legend.position = "bottom")