Synopsis

This is the week-4 homework for the BANA 8090 Data Wrangling With R course. This week focuses on data manipulations, transformations and visualizations in R.

Packages required

The following packages are required for this homework

library(gapminder)
library(dplyr)
library(Hmisc)
library(rworldmap)
library(ggplot2)
library(data.table)
require(mapdata)

Source Code

The data set contains the following variables from left to right

Data Description

This data set contains on life expectancy, GDP per capita, and population by country from 1950 to 2007.

The describe function shows the descriptive statistics on all the variables including NA’s, # of unique values, etc.
Hmisc::describe(gapminder_unfiltered)
gapminder_unfiltered 

 6  Variables      3313  Observations
---------------------------------------------------------------------------
country 
      n missing  unique 
   3313       0     187 

lowest : Afghanistan        Albania            Algeria            Angola             Argentina         
highest: Vietnam            West Bank and Gaza Yemen, Rep.        Zambia             Zimbabwe           
---------------------------------------------------------------------------
continent 
      n missing  unique 
   3313       0       6 

          Africa Americas Asia Europe FSU Oceania
Frequency    637      470  578   1302 139     187
%             19       14   17     39   4       6
---------------------------------------------------------------------------
year 
      n missing  unique    Info    Mean     .05     .10     .25     .50 
   3313       0      58       1    1980    1952    1957    1967    1982 
    .75     .90     .95 
   1996    2002    2007 

lowest : 1950 1951 1952 1953 1954, highest: 2003 2004 2005 2006 2007 
---------------------------------------------------------------------------
lifeExp 
      n missing  unique    Info    Mean     .05     .10     .25     .50 
   3313       0    2571       1   65.24   41.22   45.37   58.33   69.61 
    .75     .90     .95 
  73.66   77.12   78.68 

lowest : 23.60 28.80 30.00 30.02 30.33
highest: 82.21 82.27 82.36 82.60 82.67 
---------------------------------------------------------------------------
pop 
        n   missing    unique      Info      Mean       .05       .10 
     3313         0      3312         1  31773251    235605    436150 
      .25       .50       .75       .90       .95 
  2680018   7559776  19610538  56737055 121365965 

lowest :      59412      59461      60011      60427      61325
highest: 1110396331 1164970000 1230075000 1280400000 1318683096 
---------------------------------------------------------------------------
gdpPercap 
      n missing  unique    Info    Mean     .05     .10     .25     .50 
   3313       0    3313       1   11314   665.7   887.9  2505.3  7825.8 
    .75     .90     .95 
17355.7 26592.7 31534.9 

lowest :    241.2    277.6    298.8    299.9    312.2
highest:  82011.0  95458.1 108382.4 109347.9 113523.1 
---------------------------------------------------------------------------

Exploratory Data Analysis

Q1.

For the year 2007, what is the distribution of GDP per capita across all countries?

GDP <- gapminder_unfiltered

#Filter 2007 data
GDP_2007 <- subset(GDP,year==2007)
head(GDP_2007)

#create a map-shaped window
mapDevice('x11')
#join to a coarse resolution map
spdf <- joinCountryData2Map(GDP_2007, joinCode="NAME", nameJoinColumn="country")

mapCountryData(spdf, nameColumnToPlot="gdpPercap", catMethod="fixedWidth", colourPalette=c('yellow','orange','red','brown'),mapTitle="GDP per capita across countries for 2007")

Results: The map shows that countries like Qatar, Macau, united States and Canada have a high GDP per capita, the highest being ~82,000.

Q2.

For the year 2007, how do the distributions differ across the different continents?

GDP_2007_rollup <- with(GDP_2007, tapply(gdpPercap, continent, FUN = sum))
GDP_2007_rollup <- as.data.frame.table(GDP_2007_rollup)
names(GDP_2007_rollup) <- c("continent","gdpContinent")

GDP_2007_combined <- merge(GDP_2007_rollup,GDP_2007,by='continent')

mapDevice('x11')
#join to a coarse resolution map
spdf <- joinCountryData2Map(GDP_2007_combined, joinCode="NAME", nameJoinColumn="country")

mapCountryData(spdf, nameColumnToPlot="gdpContinent", catMethod="fixedWidth", colourPalette=c('yellow','orange','red','brown'),mapTitle="GDP per capita across continents for 2007")

Results: The map shows that Europe has the highest GDP per capita, followed by Asia.

Q3.

For the year 2007, what are the top 10 countries with the largest GDP per capita?

head(arrange(GDP_2007,desc(gdpPercap )),10)

Q4.

Plot the GDP per capita for your country of origin for all years available.

# filter India data

India<-filter(gapminder_unfiltered,country=='India')

# Plot the data

ggplot(data = India)+
geom_smooth(mapping = aes(x = year, y = gdpPercap),color="Brown",size=2 ,se = FALSE) +
ggtitle("GDP per Capita of India") +
ylab("GDP in USD")+
theme_light()

Results: As we can see that the GDP per Capita for India has been countinously rising since 1950s to 2007.

Q5.

What was the percent growth (or decline) in GDP per capita in 2007?

GDP %>%
  group_by(country) %>%
  mutate(percent_growth = {{gdpPercap - lag(gdpPercap)}/{lag(gdpPercap)}}*100)%>%
  filter(year==2007) %>%
  select(country , percent_growth)
#If we want the percent growth for just one country in this case the country I belong to i.e. India then :#
GDP %>%
  group_by(country) %>%
  mutate(percent_growth = {{gdpPercap - lag(gdpPercap)}/{lag(gdpPercap)}}*100)%>%
  filter(year==2007, country=="India") %>%
  select(country , percent_growth)

Q6.

What has been the historical growth (or decline) in GDP per capita for your country?

India<-arrange(India,year)

# add a column to India dataset 
mutate(India,growth=0)
nrow(India)

for (i in 1 : nrow(India)){
    if(i< nrow(India)){
    India[i+1,"growth"]<-(India[i+1,"gdpPercap"]- India[i,"gdpPercap"])/India[i,"gdpPercap"]*100
    }
}

# replace the NA with 0
India$growth<-replace(India$growth, is.na(India$growth), 0)

# plot the data

ggplot(data = India,aes(x = year, y = growth))+
    geom_line(colour = "Brown",size=2,arrow=arrow()) + #scale_colour_gradient(low="red") +
    ggtitle("GDP growth of India") +
    ylab("GDP growth")+
    scale_x_continuous(name = India$year, breaks =pretty(India$year,n=10),limits = c(1957,2007))+
    scale_y_continuous(name=India$growth, breaks = pretty(India$growth,n=10))+
    theme_light()

We can see here that the GDP/Capita was countinuously rising. This graph actually shows the amount by which the GDP/Capita was rising and we can observe that rise much rapid after 1980.