Well this week we are taking a look at the Unfiltered Gapminder dataset which contains data focused on life expectancy, GDP per capita, and population by country. Some of the data analysis was supplemented with visualization tools via ggplot.
country factor with 142 levels
continent factor with 5 levels
year ranges from 1952 to 2007 in increments of 5 years
lifeExp life expectancy at birth, in years
pop population
gdpPercap GDP per capita
Gapminder contains 1704 rows and the 6 variables listed above. The data set contains 142 countries, 5 continents, and encompasses the years 1952 to 2007 in 5 year increments.
library(gapminder)
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(ggplot2)
gapminder %>%
group_by(continent) %>%
summarize(n_obs = n(), n_countries = n_distinct(country)) %>%
summarise(n_countries = sum(n_countries))
## # A tibble: 1 x 1
## n_countries
## <int>
## 1 142
gapminder %>%
group_by(continent) %>%
summarize(n_obs = n(), n_continent = n_distinct(continent))
## # A tibble: 5 x 3
## continent n_obs n_continent
## <fctr> <int> <int>
## 1 Africa 624 1
## 2 Americas 300 1
## 3 Asia 396 1
## 4 Europe 360 1
## 5 Oceania 24 1
gdpall_07 <- gapminder_unfiltered %>%
filter(year == '2007') %>%
select(gdpPercap, country)
ggplot(gdpall_07, aes(x=gdpPercap)) +
geom_histogram(colour="black", fill="blue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
gdpallcon_07 <- gapminder_unfiltered %>%
filter(year == '2007') %>%
select(gdpPercap, country, continent)
ggplot(data = gdpallcon_07, mapping = aes(x = continent, y = gdpPercap)) +
geom_boxplot()
top10_gdp <- gapminder_unfiltered %>%
filter(year == '2007') %>%
select(country, gdpPercap) %>%
arrange(gdpPercap) %>%
top_n(10, wt = gdpPercap)
ggplot(top10_gdp, aes(country, gdpPercap)) +
geom_bar(stat = "identity")
merca <- gapminder_unfiltered %>%
filter(country == 'United States') %>%
select(country, year, gdpPercap) %>%
arrange(year)
pgrowth07 <- gapminder_unfiltered %>%
filter(year == '2007') %>%
select(gdpPercap) %>%
summarise(gdpPercap = sum(gdpPercap))
pgrowth02 <- gapminder_unfiltered %>%
filter(year == '2002') %>%
select(gdpPercap) %>%
summarise(gdpPercap = sum(gdpPercap))
((pgrowth07 + pgrowth02) / pgrowth02) * 100
## gdpPercap
## 1 213.8853