This R-markdown file is the solution to problems of assignment 4 of the Data Wrangling with R class.
I have explored the gapminder dataset, specifically the GDP per capita of the countries. I started with finding the GDP of all the countries and continents in 2007 and plotting their distribution. Then found the top 10 countries by GDP and finally calculated the GDP trends on India, my country of origin.
I used the following packages for this assignment:
library(gapminder) #to use the gapminder dataset
library(ggplot2) #for plotting graphs
library(dplyr) #for data manipulation
library(gridExtra) #to plot a table
library(formattable) #to change the format of a column
library(plotly) #for plotting graphs
We run some codes to get a basic understanding of the data. We observe the number of columns and rows. We also count the rows with no missing values, which is same as the total rows in the data. Then we see the structure of the data, to find the number of levels in countries and continents. We check the summary statistics of year variables followed by count of number of entries for each continent.
library(gapminder)
dim(gapminder_unfiltered)
## [1] 3313 6
str(gapminder_unfiltered)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3313 obs. of 6 variables:
## $ country : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
sum(complete.cases(gapminder_unfiltered))
## [1] 3313
summary(gapminder_unfiltered$year)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1950 1967 1982 1980 1996 2007
table(gapminder_unfiltered$continent)
##
## Africa Americas Asia Europe FSU Oceania
## 637 470 578 1302 139 187
Answers to specific questions:
GDP across countries - 2007
Get the summary statistic of GDP per capita for all the countries in 2007.
## gdpPercap
## Min. : 277.6
## 1st Qu.: 2146.8
## Median : 6873.3
## Mean :12403.1
## 3rd Qu.:19003.5
## Max. :82011.0
We also plot the GDP per capita for all countries for 2007 to gauge the distribution visually.
GDP across Continents - 2007
Plotting a boxplot to observe the distribution of GDP across the continents
Top 10 countries by GDP - 2007
Find the top 10 countries by 2007 GDP and output the table in decreasing order of the GDP
## Warning: package 'gridExtra' was built under R version 3.3.2
## Warning: package 'formattable' was built under R version 3.3.2
## Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 2 variables:
## $ Country : Factor w/ 187 levels "Afghanistan",..: 138 100 127 24 92 149 178 82 75 161
## $ GDP per Capita ($): num 82011 54590 49357 48015 47307 ...
Plot GDP per capita for India for all years (Using Plotly package)
India’s GDP per capita growth for 2007
## [1] "GDP per Capita increase for India in 2007 was 40.39 %"
Historial GDP growth for India
(One observation ignored because no growth value present for first year on record)
## Warning: Ignoring 1 observations
## [1] "Average historical GDP per Capita Growth for India is 15.05 %"