Synopsis

This R Markdown discusses the gapminder data which talks about life expectancy, GDP & population by country for various countries across continents.

Data Source

To create this report we will be working with the unfiltered version of the gapminder data which built into R as a part of the gapminder package. More information is available at Gapminder

Packages Required

library(knitr) # To allow the use of code chunks in the Rmd File
library(gapminder) # Base Data
library(ggplot2) # To create visualizations
library(dplyr) # Package for data manipulation
read_chunk("Week_4/Week_4.R") # Use the script for the Week 4 assignment

Details about the base data

The data contains 3313 rows and 6 variables. These variables are :

  1. country: factor with 142 levels
  2. continent: factor with 5 levels
  3. year: ranges from 1952 to 2007
  4. lifeExp: life expectancy at birth, in years
  5. pop: population
  6. gdpPercap: GDP per capita

Structure of the Input data

#Initialize the gapminder dataset
data("gapminder_unfiltered")

#Data Description
str(gapminder_unfiltered)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
#Check for NAs
sapply(gapminder_unfiltered, function(x)sum(is.na(x))) #None of the columns contain any NA values
##   country continent      year   lifeExp       pop gdpPercap 
##         0         0         0         0         0         0
#Quick peek at the first 6 rows of the data
head(gapminder_unfiltered)
## # A tibble: 6 × 6
##       country continent  year lifeExp      pop gdpPercap
##        <fctr>    <fctr> <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2 Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3 Afghanistan      Asia  1962  31.997 10267083  853.1007
## 4 Afghanistan      Asia  1967  34.020 11537966  836.1971
## 5 Afghanistan      Asia  1972  36.088 13079460  739.9811
## 6 Afghanistan      Asia  1977  38.438 14880372  786.1134
#Summary Statistics
summary(gapminder_unfiltered)
##            country        continent         year         lifeExp     
##  Czech Republic:  58   Africa  : 637   Min.   :1950   Min.   :23.60  
##  Denmark       :  58   Americas: 470   1st Qu.:1967   1st Qu.:58.33  
##  Finland       :  58   Asia    : 578   Median :1982   Median :69.61  
##  Iceland       :  58   Europe  :1302   Mean   :1980   Mean   :65.24  
##  Japan         :  58   FSU     : 139   3rd Qu.:1996   3rd Qu.:73.66  
##  Netherlands   :  58   Oceania : 187   Max.   :2007   Max.   :82.67  
##  (Other)       :2965                                                 
##       pop              gdpPercap       
##  Min.   :5.941e+04   Min.   :   241.2  
##  1st Qu.:2.680e+06   1st Qu.:  2505.3  
##  Median :7.560e+06   Median :  7825.8  
##  Mean   :3.177e+07   Mean   : 11313.8  
##  3rd Qu.:1.961e+07   3rd Qu.: 17355.8  
##  Max.   :1.319e+09   Max.   :113523.1  
## 
The structure of the dataset tells us that we have 6 Continents & 187 countries with 58 years of data overall starting from 1950 to 2007

Delving into the Data at various levels

Distribution of GDP by Country

The distribution of GDP per Capita by country shows highly skewed data with ~75% of the countries having GDP < 19000

gdp_2007 <- gapminder_unfiltered %>% group_by(continent, country) %>% filter(year==2007) %>% 
                    summarise(Avg_GDP = round(mean(gdpPercap),2))

p1 <- ggplot(data = gdp_2007, aes(Avg_GDP)) + geom_histogram(binwidth = 1000, fill = "#56B4E9", boundary = 0)
p1 + ylab("Count") + xlab("GDP per Capita") + ggtitle("Distribution of GDP by Country for 2007")

GDP by Continent

Europe has the highest median GDP per Capita while Africa has the lowest, Asia is home to the richest country in terms of GDP per Capita, Qatar

gdp_continent <- gapminder_unfiltered %>% filter(year==2007) %>% select(continent, year, gdpPercap)

p2 <- ggplot(data = gdp_continent , aes(continent, gdpPercap)) + 
      geom_boxplot(fill = "#56B4E9", outlier.colour = "#009E73", outlier.size = 3) + 
      stat_boxplot(geom = "errorbar")
p2 + ggtitle("GDP by Continent") + xlab("Continent") + ylab("GDP per Capita")

ggplot(data = gdp_continent, aes(gdpPercap)) + geom_histogram(binwidth = 1000, fill = "#56B4E9") + 
facet_wrap(~ continent, nrow = 2) + ggtitle("GDP for Continents in small multiples") + xlab("GDP per Capita") +
ylab("Count")

Top 10 countries by GDP

Top 10 countries by GDP per Capita are dominated by Asia with 6 Asian countries

top_gdp_2007 <- gdp_2007 %>% ungroup() %>% top_n(n=10)
## Selecting by Avg_GDP
p3 <- ggplot(data = top_gdp_2007, aes(x=reorder(country, Avg_GDP), y = Avg_GDP ,fill=continent)) + 
      geom_bar(stat = "identity", width = 0.5)
p3 + ggtitle("Top 10 countries by GDP") + xlab("Country") +
     ylab("GDP per Capita") + guides(fill = guide_legend("Continent")) + coord_flip()

GDP of India

Lets look at India’s GDP growth over the years

gdp_india <- gapminder_unfiltered %>% filter(country=="India")

p4 <- ggplot(data = gdp_india, aes(year,gdpPercap)) + geom_line()
p4 + ggtitle("GDP growth for India") + xlab("Year") + ylab("GDP per Capita")

gdp_growth <- gdp_india %>% mutate(GDP_Change = ((gdpPercap-lag(gdpPercap))/lag(gdpPercap))*100) %>% 
  select(year,gdpPercap,GDP_Change)
colnames(gdp_growth) <- c("Year", "Avg_GDP", "GDP_Growth")
print(gdp_growth)
## # A tibble: 12 × 3
##     Year   Avg_GDP GDP_Growth
##    <int>     <dbl>      <dbl>
## 1   1952  546.5657         NA
## 2   1957  590.0620   7.958100
## 3   1962  658.3472  11.572539
## 4   1967  700.7706   6.443935
## 5   1972  724.0325   3.319477
## 6   1977  813.3373  12.334362
## 7   1982  855.7235   5.211394
## 8   1987  976.5127  14.115439
## 9   1992 1164.4068  19.241341
## 10  1997 1458.8174  25.284173
## 11  2002 1746.7695  19.738728
## 12  2007 2452.2104  40.385464
India has had steady positive growth in GDP over the years, with the average growth rate being 15.05%
In 2007 India’s GDP per capita grew by 40.39%