Setup
Loading all the required libraries
library(tidyverse)
library(gapminder)
library(skimr)
library(ggthemes)Loading the data
data("gapminder")b. Using the glimpse function to explore the data set gapminder
glimpse(gapminder)## Rows: 1,704
## Columns: 6
## $ country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
The categorical features present in the gapminder data set are:
country - This describes the name of a country and it is a factor consisting 142 levels.
continent - This describes the name of the continent and it is a factor with 5 levels.
The quantitative features in this data set are:
lifeExp - This describes the life expectancy at birth in years.
pop - This describes the population of a country.
gdpPercap - This describes the GDP .i.e Gross domestic product.
years - This describes the year.
C. Using the skim() to further explore the data set
skim(gapminder)| Name | gapminder |
| Number of rows | 1704 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| factor | 2 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| country | 0 | 1 | FALSE | 142 | Afg: 12, Alb: 12, Alg: 12, Ang: 12 |
| continent | 0 | 1 | FALSE | 5 | Afr: 624, Asi: 396, Eur: 360, Ame: 300 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 1979.50 | 17.27 | 1952.00 | 1965.75 | 1979.50 | 1993.25 | 2007.0 | ▇▅▅▅▇ |
| lifeExp | 0 | 1 | 59.47 | 12.92 | 23.60 | 48.20 | 60.71 | 70.85 | 82.6 | ▁▆▇▇▇ |
| pop | 0 | 1 | 29601212.32 | 106157896.74 | 60011.00 | 2793664.00 | 7023595.50 | 19585221.75 | 1318683096.0 | ▇▁▁▁▁ |
| gdpPercap | 0 | 1 | 7215.33 | 9857.45 | 241.17 | 1202.06 | 3531.85 | 9325.46 | 113523.1 | ▇▁▁▁▁ |
From the above table we can observe that there are no missing values in this data set
d. Creating a scatter plot of Life expectancy Vs time
gapminder %>% ggplot(aes(y = lifeExp,
x = year)) +
geom_point() +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)")From the above graphs we can see that time increases the maximum life expectancy also increases this is mostly linear in model.
e. Recreating the scatter plot now adding geom_smooth()
gapminder %>% ggplot(aes(y = lifeExp,
x = year)) +
geom_point() +
geom_smooth(se = FALSE) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)")f. Coloring the plot
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)")From the plot we can observe that Oceania has the highest life expectancy on average.
g. Facetting the plot by continent
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(. ~ continent) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)")h. Making the plot color blind friendly and adding a theme of my choice.
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(. ~ continent) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)") +
scale_colour_colorblind() +
theme_grey()i. Rotating the labels in the x axis by 45 degrees.
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(. ~ continent) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)") +
scale_colour_colorblind() +
theme_grey() +
theme(axis.text.x = element_text(angle = 45))j. Removing the legend from the plot.
gapminder %>% ggplot(aes(y = lifeExp,
x = year,
color = continent)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_grid(. ~ continent) +
labs(title = "Life expectancy across time",
x = "Year",
y = "Life expectancy (years)") +
scale_colour_colorblind() +
theme_grey() +
theme(axis.text.x = element_text(angle = 45),
legend.position = "none")2
Creating the gapminder2007 variable which consists top 20 countries having the highest population.
gapminder2007 <- gapminder %>% filter(year == 2007) %>% slice_max(pop, n = 20)
glimpse(gapminder2007)## Rows: 20
## Columns: 6
## $ country <fct> "China", "India", "United States", "Indonesia", "Brazil", "P…
## $ continent <fct> Asia, Asia, Americas, Asia, Americas, Asia, Asia, Africa, As…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, …
## $ lifeExp <dbl> 72.961, 64.698, 78.242, 70.650, 72.390, 65.483, 64.062, 46.8…
## $ pop <int> 1318683096, 1110396331, 301139947, 223547000, 190010647, 169…
## $ gdpPercap <dbl> 4959.1149, 2452.2104, 42951.6531, 3540.6516, 9065.8008, 2605…
a. Creating a Bar chart using gapminder2007 and geom_col()
gapminder2007 %>% ggplot(aes(x = country,
y = pop)) +
geom_col() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population")b. Sorting the plot using fct_reorder()
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop)) +
geom_col() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population")c. Changing the colors based on continet and making the outline black.
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population") ##
d. Flipping the coordinates using coord_flip()
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population")## $x
## [1] "Country"
##
## $y
## [1] "Population"
##
## $title
## [1] "World's most populated countries, 2007"
##
## attr(,"class")
## [1] "labels"
e. Moving the legend to the bottom
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population") +
theme(legend.title = element_blank(),
legend.position = "bottom")f. Adding theme
gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
y = pop,
fill = continent)) +
geom_col(color = "black") +
coord_flip() +
labs(title = "World's most populated countries, 2007",
x = "Country",
y = "Population") +
theme_few() +
theme(legend.title = element_blank(),
legend.position = "bottom")