Setup
Running all the libraries
library(tidyverse)## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gapminder)
library(skimr)
library(ggthemes)loading the data
data("gapminder")b. Using the glimpse function to explore the data set gapminder
glimpse(gapminder)## Rows: 1,704
## Columns: 6
## $ country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
The categorical features present in the gapminder data set are:
country - This describes the name of a country and it is a factor consisting 142 levels.
continent - This describes the name of the continent and it is a factor with 5 levels.
The quantitative features in this data set are:
lifeExp - This describes the life expectancy at birth in years.
pop - This describes the population of a country.
gdpPercap - This describes the GDP .i.e Gross domestic product.
years - This describes the year.
C. Using the skim() to further explore the data set
skim(gapminder)| Name | gapminder |
| Number of rows | 1704 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| factor | 2 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| country | 0 | 1 | FALSE | 142 | Afg: 12, Alb: 12, Alg: 12, Ang: 12 |
| continent | 0 | 1 | FALSE | 5 | Afr: 624, Asi: 396, Eur: 360, Ame: 300 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 1979.50 | 17.27 | 1952.00 | 1965.75 | 1979.50 | 1993.25 | 2007.0 | ▇▅▅▅▇ |
| lifeExp | 0 | 1 | 59.47 | 12.92 | 23.60 | 48.20 | 60.71 | 70.85 | 82.6 | ▁▆▇▇▇ |
| pop | 0 | 1 | 29601212.32 | 106157896.74 | 60011.00 | 2793664.00 | 7023595.50 | 19585221.75 | 1318683096.0 | ▇▁▁▁▁ |
| gdpPercap | 0 | 1 | 7215.33 | 9857.45 | 241.17 | 1202.06 | 3531.85 | 9325.46 | 113523.1 | ▇▁▁▁▁ |
From the above table we can observe that there are no missing values in this data set.
d. Creating a scatter plot of Life expectancy Vs time
gapminder %>% ggplot(aes(y = lifeExp,
x = year)) +
geom_point() +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)")e. Recreating the scatter plot now adding geom_smooth()
gapminder %>% ggplot(aes(y = lifeExp,
x = year)) +
geom_point() +
geom_smooth(se = FALSE) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)")## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
f. Coloring the plot
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)")## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
g. Facetting the plot by continent
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(. ~ continent) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)")## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
h. Making the plot color blind friendly and adding a theme of my choice.
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(. ~ continent) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)") +
scale_colour_colorblind() +
theme_grey()## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
i. Rotating the labels in the x axis by 45 degrees.
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(. ~ continent) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)") +
scale_colour_colorblind() +
theme_grey() +
theme(axis.text.x = element_text(angle = 45))## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
j. Removing the legend from the plot.
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(. ~ continent) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)") +
scale_colour_colorblind() +
theme_grey() +
theme(axis.text.x = element_text(angle = 45),
legend.position = "none")## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
2
create a data set called gapminder2007 which contains the top 20 most populated countries in 2007.
gapminder2007 <- gapminder %>% filter(year == 2007) %>% slice_max(pop, n = 20)
glimpse(gapminder2007)## Rows: 20
## Columns: 6
## $ country <fct> "China", "India", "United States", "Indonesia", "Brazil", "P…
## $ continent <fct> Asia, Asia, Americas, Asia, Americas, Asia, Asia, Africa, As…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, …
## $ lifeExp <dbl> 72.961, 64.698, 78.242, 70.650, 72.390, 65.483, 64.062, 46.8…
## $ pop <int> 1318683096, 1110396331, 301139947, 223547000, 190010647, 169…
## $ gdpPercap <dbl> 4959.1149, 2452.2104, 42951.6531, 3540.6516, 9065.8008, 2605…
a. Creating a Bar chart using gapminder2007 and geom_col()
gapminder2007 %>% ggplot(aes(x = country,
y = pop)) +
geom_col() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population")b. Sorting the plot using fct_reorder()
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop)) +
geom_col() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population")c. Changing the colors based on continet and making the outline black.
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population")d. Flipping the coordinates using coord_flip()
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip()e. Moving the legend to the bottom
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population") +
theme(legend.title = element_blank(),
legend.position = "bottom")f. Adding descriptive labels for the axes, title, and a caption below the plot.
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population",
caption = "Data set from gapminder2007 which contains the top 20 most populated countries in 2007") +
theme(legend.title = element_blank(),
legend.position = "bottom")g. Adding theme
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population") +
theme_few() +
theme(legend.title = element_blank(),
legend.position = "bottom")h: using color-blind friendly colors by adding a scale_fill_manual() layer using the code
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population") +
theme_few() +
theme(legend.title = element_blank(),
legend.position = "bottom") +
scale_fill_manual(values = c("#D55E00", "#009E73", "#56B4E9", "#CC79A7"))g: displaying commas in the population numbers rather than scientific notation by adding a scale_y_continuous() layer using the code below.
library(scales)##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population") +
theme_few() +
theme(legend.title = element_blank(),
legend.position = "bottom") +
scale_fill_manual(values = c("#D55E00", "#009E73", "#56B4E9", "#CC79A7")) +
scale_y_continuous(labels = comma)Removing the excess space in between the bars and the axis by specifying the expand argument inside of the scale_y_continuous() layer using expand = expansion(mult = c(0, .1)).
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population") +
theme_few() +
theme(legend.title = element_blank(),
legend.position = "bottom") +
scale_fill_manual(values = c("#D55E00", "#009E73", "#56B4E9", "#CC79A7")) +
scale_y_continuous(labels = comma, expand = expansion(mult = c(0, .1)))Modify the previous plot by specifying a theme from the ggthemes package: https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/ and adding the custom theme layer before the final theme() call so that the positioning of the legend is kept at the bottom.
library(ggthemes)gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population") +
theme_few() +
scale_fill_manual(values = c("#D55E00", "#009E73", "#56B4E9", "#CC79A7")) +
scale_y_continuous(labels = comma, expand = expansion(mult = c(0, .1))) +
theme_economist_white() +
theme(legend.position = "bottom")