STAT 545A Homework 2

In this article, we will try to perform some basic analysis on Gapminder data. The Gapminder is obtained from the following: http://www.gapminder.org/. For this project, we will use the data from this repository: http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt

First, we bring the Gapminder data into R.

gDat <- read.delim(file = "~/Revolution/gapminderDataFiveYear.txt", sep = "\t")

Oftenly, we start by getting some ideas about the gDat object using str() command.

str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

The dimensions, and column names are obtained with the followings:

dim(gDat)
## [1] 1704    6
names(gDat)
## [1] "country"   "year"      "pop"       "continent" "lifeExp"   "gdpPercap"

A statistical overview can be obtained with summary()

summary(gDat)
##         country          year           pop              continent  
##  Afghanistan:  12   Min.   :1952   Min.   :6.00e+04   Africa  :624  
##  Albania    :  12   1st Qu.:1966   1st Qu.:2.79e+06   Americas:300  
##  Algeria    :  12   Median :1980   Median :7.02e+06   Asia    :396  
##  Angola     :  12   Mean   :1980   Mean   :2.96e+07   Europe  :360  
##  Argentina  :  12   3rd Qu.:1993   3rd Qu.:1.96e+07   Oceania : 24  
##  Australia  :  12   Max.   :2007   Max.   :1.32e+09                 
##  (Other)    :1632                                                   
##     lifeExp       gdpPercap     
##  Min.   :23.6   Min.   :   241  
##  1st Qu.:48.2   1st Qu.:  1202  
##  Median :60.7   Median :  3532  
##  Mean   :59.5   Mean   :  7215  
##  3rd Qu.:70.8   3rd Qu.:  9325  
##  Max.   :82.6   Max.   :113523  
## 

Load lattice library to perform some visualzations in order to extract usuful information about the data object:

library("lattice")

We will analyze life expectancy and gdp per capita between Algeria and Egypt. First, let us have a look the life expectancy between the two countries:

xyplot(lifeExp ~ year | country, subset = (country == c("Algeria", "Egypt")), 
    data = gDat, type = c("p", "l"))

plot of chunk unnamed-chunk-6

Clearly, the expectancy rate is substainely increased by time and if we analyze the gdp per captia for both countries within the same given interval then we observe the similar results as indicated below:

xyplot(gdpPercap ~ year | country, subset = (country == c("Algeria", "Egypt")), 
    data = gDat, type = c("p", "l"))

plot of chunk unnamed-chunk-7

We might expect that the increase in the life expectancy rate is a result of the improvement in gdp per capita for both countries as illustrated below and it might not be true for other countries:

xyplot(lifeExp ~ gdpPercap | country, subset = (country == c("Algeria", "Egypt")), 
    data = gDat, type = c("p", "l"))

plot of chunk unnamed-chunk-8