Synopsis

This R-markdown file is the solution to problems of assignment 4 of the Data Wrangling with R class.

I have explored the gapminder dataset, specifically the GDP per capita of the countries. I started with finding the GDP of all the countries and continents in 2007 and plotting their distribution. Then found the top 10 countries by GDP and finally calculated the GDP trends on India, my country of origin.


Packages Required

I used the following packages for this assignment:

  library(gapminder) #to use the gapminder dataset
  library(ggplot2) #for plotting graphs
  library(dplyr) #for data manipulation
  library(gridExtra) #to plot a table
  library(formattable) #to change the format of a column
  library(plotly) #for plotting graphs

Source Code

  1. Country: This is the country name variable. Stored as factor with 187 levels.
  2. Continent: Names of 6 continents, stored as factors. Also contains FNU, “Former Soviet Union”
  3. Year: Year of the data, starting from 1950 to 2007 with increment of 5 years.
  4. lifeExp: Life expextancy of the population at birth, in years.
  5. Pop: Population of the country for that year.
  6. GdpPercap: Per Capita GDP respectively.

Data Description

We run some codes to get a basic understanding of the data. We observe the number of columns and rows. We also count the rows with no missing values, which is same as the total rows in the data. Then we see the structure of the data, to find the number of levels in countries and continents. We check the summary statistics of year variables followed by count of number of entries for each continent.

library(gapminder)
dim(gapminder_unfiltered)
## [1] 3313    6
str(gapminder_unfiltered)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
sum(complete.cases(gapminder_unfiltered))
## [1] 3313
summary(gapminder_unfiltered$year)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1950    1967    1982    1980    1996    2007
table(gapminder_unfiltered$continent)
## 
##   Africa Americas     Asia   Europe      FSU  Oceania 
##      637      470      578     1302      139      187

Exploratory Data Analysis

Answers to specific questions:

Question 1

GDP across countries - 2007

Get the summary statistic of GDP per capita for all the countries in 2007.

##    gdpPercap      
##  Min.   :  277.6  
##  1st Qu.: 2146.8  
##  Median : 6873.3  
##  Mean   :12403.1  
##  3rd Qu.:19003.5  
##  Max.   :82011.0

We also plot the GDP per capita for all countries for 2007 to gauge the distribution visually.

Question 2

GDP across Continents - 2007

Plotting a boxplot to observe the distribution of GDP across the continents

Question 3

Top 10 countries by GDP - 2007

Find the top 10 countries by 2007 GDP and output the table in decreasing order of the GDP

## Warning: package 'gridExtra' was built under R version 3.3.2
## Warning: package 'formattable' was built under R version 3.3.2
## Classes 'tbl_df', 'tbl' and 'data.frame':    10 obs. of  2 variables:
##  $ Country           : Factor w/ 187 levels "Afghanistan",..: 138 100 127 24 92 149 178 82 75 161
##  $ GDP per Capita ($): num  82011 54590 49357 48015 47307 ...

Question 4

Plot GDP per capita for India for all years (Using Plotly package)

Question 5

India’s GDP per capita growth for 2007

## [1] "GDP per Capita increase for India in 2007 was 40.39 %"

Question 6

Historial GDP growth for India

(One observation ignored because no growth value present for first year on record)

## Warning: Ignoring 1 observations
## [1] "Average historical GDP per Capita Growth for India is 15.05 %"