library(tidyverse)
library(skimr)
library(ggthemes)
library(gapminder)

Question 1

  1. )
data("gapminder")
glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …

Country is a factor variable, and it tells you the country. Continent is a factor variable, and it tells you the continent that the country is on. year is an integer variable that ranges from 1952 to 2007 in increments of five. LifeExp is the life expectancy at birth in years and it is a double variable. pop is the population of the country and it is a integer variable. gpdPercap is the GDP per capita in US dollars and inflation adjusted. It is a double variable.

c.)

skim(gapminder)
Data summary
Name gapminder
Number of rows 1704
Number of columns 6
_______________________
Column type frequency:
factor 2
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
country 0 1 FALSE 142 Afg: 12, Alb: 12, Alg: 12, Ang: 12
continent 0 1 FALSE 5 Afr: 624, Asi: 396, Eur: 360, Ame: 300

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 1979.50 17.27 1952.00 1965.75 1979.50 1993.25 2007.0 ▇▅▅▅▇
lifeExp 0 1 59.47 12.92 23.60 48.20 60.71 70.85 82.6 ▁▆▇▇▇
pop 0 1 29601212.32 106157896.74 60011.00 2793664.00 7023595.50 19585221.75 1318683096.0 ▇▁▁▁▁
gdpPercap 0 1 7215.33 9857.45 241.17 1202.06 3531.85 9325.46 113523.1 ▇▁▁▁▁

There are no missing values.

d.)

gapminder %>% ggplot(aes(x = year, 
                         y = lifeExp)) + 
  geom_point() + 
  labs(title = "Life expectancy across time", 
       x = "year",
       y = "Life expectancy (years)" )

It looks like the life expectancy is going up a little bit.

e.)

gapminder %>% ggplot(aes(x = year, 
                         y = lifeExp)) + 
  geom_point() + 
  labs(title = "Life expectancy across time", 
       x = "year",
       y = "Life expectancy (years)" ) + 
  geom_smooth(se = FALSE)

f.)

gapminder %>% ggplot(aes(x = year, 
                         y = lifeExp,
                     color = continent)) + 
  geom_point() + 
  labs(title = "Life expectancy across time", 
       x = "year",
       y = "Life expectancy (years)" ) + 
  geom_smooth(se = FALSE, aes(color = continent))

Oceania has the highest life expectancy

g.)

gapminder %>% ggplot(aes(x = year, 
                         y = lifeExp,
                     color = continent)) + 
  geom_point() + 
  labs(title = "Life expectancy across time", 
       x = "year",
       y = "Life expectancy (years)" ) + 
  geom_smooth(se = FALSE, aes(color = continent)) +
  facet_grid(.~continent)

h.)

gapminder %>% ggplot(aes(x = year, 
                         y = lifeExp,
                     color = continent)) + 
  geom_point() + 
  labs(title = "Life expectancy across time", 
       x = "year",
       y = "Life expectancy (years)" ) + 
  geom_smooth(se = FALSE, aes(color = continent)) +
  facet_grid(.~continent) +
  scale_color_colorblind() +
  theme_bw() 

i and j.)

gapminder %>% ggplot(aes(x = year, 
                         y = lifeExp,
                     color = continent)) + 
  geom_point() + 
  labs(title = "Life expectancy across time", 
       x = "year",
       y = "Life expectancy (years)" ) + 
  geom_smooth(se = FALSE, aes(color = continent)) +
  facet_grid(.~continent) +
  scale_color_colorblind() +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45), legend.position = "none")

Question 2.)

creating new data set that holds the top 20 populated countries in 2007

gapminder2007 <- gapminder %>% filter(year == 2007) %>% slice_max(pop, n = 20)

a.)

gapminder2007 %>% ggplot(aes(x = pop,
                             y = country)) + 
  geom_col()

b.)

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop)) + 
  geom_col()

c,d)

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                         fill = continent)) +
  coord_flip() +
  geom_col(color = "black")

e, f )

gapminder2007 %>% ggplot(aes(x = fct_reorder(country, pop),
                             y = pop,
                         fill = continent)) +
  coord_flip() +
  geom_col(color = "black") +
  labs(x = "Country",
       y = "Population",
       title = "country by population")+
  theme_bw() +
  theme(legend.position = "bottom", legend.title = element_blank())