Homework-3

Sujith Prakash Parsa

2022-10-11

Setup

Loading all the required libraries

library(tidyverse)
library(gapminder)
library(skimr)
library(ggthemes)

Loading the data

data("gapminder")

b. Using the glimpse function to explore the data set gapminder

glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …

The categorical features present in the gapminder data set are:

  1. country - This describes the name of a country and it is a factor consisting 142 levels.

  2. continent - This describes the name of the continent and it is a factor with 5 levels.

The quantitative features in this data set are:

  1. lifeExp - This describes the life expectancy at birth in years.

  2. pop - This describes the population of a country.

  3. gdpPercap - This describes the GDP .i.e Gross domestic product.

  4. years - This describes the year.

C. Using the skim() to further explore the data set

skim(gapminder)
Data summary
Name gapminder
Number of rows 1704
Number of columns 6
_______________________
Column type frequency:
factor 2
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
country 0 1 FALSE 142 Afg: 12, Alb: 12, Alg: 12, Ang: 12
continent 0 1 FALSE 5 Afr: 624, Asi: 396, Eur: 360, Ame: 300

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 1979.50 17.27 1952.00 1965.75 1979.50 1993.25 2007.0 ▇▅▅▅▇
lifeExp 0 1 59.47 12.92 23.60 48.20 60.71 70.85 82.6 ▁▆▇▇▇
pop 0 1 29601212.32 106157896.74 60011.00 2793664.00 7023595.50 19585221.75 1318683096.0 ▇▁▁▁▁
gdpPercap 0 1 7215.33 9857.45 241.17 1202.06 3531.85 9325.46 113523.1 ▇▁▁▁▁

From the above table we can observe that there are no missing values in this data set

d. Creating a scatter plot of Life expectancy Vs time

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year)) +
  geom_point() +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)")

From the above graphs we can see that time increases the maximum life expectancy also increases this is mostly linear in model.

e. Recreating the scatter plot now adding geom_smooth()

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)")

f. Coloring the plot

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)")

From the plot we can observe that Oceania has the highest life expectancy on average.

g. Facetting the plot by continent

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_grid(. ~ continent) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)")

h. Making the plot color blind friendly and adding a theme of my choice.

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_grid(. ~ continent) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)") +
  scale_colour_colorblind() +
  theme_grey()

i. Rotating the labels in the x axis by 45 degrees.

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_grid(. ~ continent) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)") +
  scale_colour_colorblind() +
  theme_grey() +
  theme(axis.text.x = element_text(angle = 45))

j. Removing the legend from the plot.

gapminder %>% ggplot(aes(y = lifeExp,
                         x = year,
                         color = continent)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  facet_grid(. ~ continent) +
  labs(title = "Life expectancy across time",
       x = "Year",
       y = "Life expectancy (years)") +
  scale_colour_colorblind() +
  theme_grey() +
  theme(axis.text.x = element_text(angle = 45),
        legend.position = "none")

2

Creating the gapminder2007 variable which consists top 20 countries having the highest population.

gapminder2007 <- gapminder %>% filter(year == 2007) %>% slice_max(pop, n = 20)
glimpse(gapminder2007)
## Rows: 20
## Columns: 6
## $ country   <fct> "China", "India", "United States", "Indonesia", "Brazil", "P…
## $ continent <fct> Asia, Asia, Americas, Asia, Americas, Asia, Asia, Africa, As…
## $ year      <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, …
## $ lifeExp   <dbl> 72.961, 64.698, 78.242, 70.650, 72.390, 65.483, 64.062, 46.8…
## $ pop       <int> 1318683096, 1110396331, 301139947, 223547000, 190010647, 169…
## $ gdpPercap <dbl> 4959.1149, 2452.2104, 42951.6531, 3540.6516, 9065.8008, 2605…

a. Creating a Bar chart using gapminder2007 and geom_col()

gapminder2007 %>% ggplot(aes(x = country,
                             y = pop)) +
  geom_col() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population")

b. Sorting the plot using fct_reorder()

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop)) +
  geom_col() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population")

c. Changing the colors based on continet and making the outline black.

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population")

## d. Flipping the coordinates using coord_flip()

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip()

  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population")
## $x
## [1] "Country"
## 
## $y
## [1] "Population"
## 
## $title
## [1] "World's most populated countries, 2007"
## 
## attr(,"class")
## [1] "labels"

e. Moving the legend to the bottom

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population") +
    theme(legend.title = element_blank(),
          legend.position = "bottom")

f. Adding theme

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                             fill = continent)) +
  geom_col(color = "black") +
  coord_flip() +
  labs(title = "World's most populated countries, 2007",
       x = "Country",
       y = "Population") +
  theme_few() +
    theme(legend.title = element_blank(),
          legend.position = "bottom")