Week 4

What’s the point?

Well this week we are taking a look at the Unfiltered Gapminder dataset which contains data focused on life expectancy, GDP per capita, and population by country. Some of the data analysis was supplemented with visualization tools via ggplot.

Source code:

The variables contained in gapminder denote the following:

country factor with 142 levels

continent factor with 5 levels

year ranges from 1952 to 2007 in increments of 5 years

lifeExp life expectancy at birth, in years

pop population

gdpPercap GDP per capita

Data description

Gapminder contains 1704 rows and the 6 variables listed above. The data set contains 142 countries, 5 continents, and encompasses the years 1952 to 2007 in 5 year increments.

library(gapminder)
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(ggplot2)
gapminder %>%
  group_by(continent) %>%
  summarize(n_obs = n(), n_countries = n_distinct(country)) %>% 
  summarise(n_countries = sum(n_countries))
## # A tibble: 1 x 1
##   n_countries
##         <int>
## 1         142
gapminder %>%
  group_by(continent) %>%
  summarize(n_obs = n(), n_continent = n_distinct(continent))
## # A tibble: 5 x 3
##   continent n_obs n_continent
##      <fctr> <int>       <int>
## 1    Africa   624           1
## 2  Americas   300           1
## 3      Asia   396           1
## 4    Europe   360           1
## 5   Oceania    24           1

1.)

gdpall_07 <- gapminder_unfiltered %>% 
  filter(year == '2007') %>% 
  select(gdpPercap, country)
ggplot(gdpall_07, aes(x=gdpPercap)) +
  geom_histogram(colour="black", fill="blue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

2.)

gdpallcon_07 <- gapminder_unfiltered %>% 
  filter(year == '2007') %>% 
  select(gdpPercap, country, continent)
ggplot(data = gdpallcon_07, mapping = aes(x = continent, y = gdpPercap)) +
  geom_boxplot()

3.)

top10_gdp <- gapminder_unfiltered %>% 
  filter(year == '2007') %>% 
  select(country, gdpPercap) %>% 
  arrange(gdpPercap) %>% 
  top_n(10, wt = gdpPercap)

ggplot(top10_gdp, aes(country, gdpPercap)) +
  geom_bar(stat = "identity")

4.)

merca <- gapminder_unfiltered %>% 
  filter(country == 'United States') %>% 
  select(country, year, gdpPercap) %>% 
  arrange(year)

5.)

pgrowth07 <- gapminder_unfiltered %>% 
  filter(year == '2007') %>% 
  select(gdpPercap) %>% 
  summarise(gdpPercap = sum(gdpPercap))

pgrowth02 <- gapminder_unfiltered %>% 
  filter(year == '2002') %>% 
  select(gdpPercap) %>% 
  summarise(gdpPercap = sum(gdpPercap))

((pgrowth07 + pgrowth02) / pgrowth02) * 100
##   gdpPercap
## 1  213.8853