STAT 545A Homework #2

Prepared by: Amanda Yuen

This is an R Markdown document. For homework #2, we will carry out a brief analysis of the Gapminder data located here.

Let's import the data into R:

gDat <- read.delim("gapminderDataFiveYear.txt")

Now, let's get an overview of what the data looks like:

str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

This tells us that there are 1704 observations and 6 variables in the data set. What are these 6 variables?

names(gDat)
## [1] "country"   "year"      "pop"       "continent" "lifeExp"   "gdpPercap"

Ok, great. Let's get some basic descriptive statistics:

summary(gDat)
##         country          year           pop              continent  
##  Afghanistan:  12   Min.   :1952   Min.   :6.00e+04   Africa  :624  
##  Albania    :  12   1st Qu.:1966   1st Qu.:2.79e+06   Americas:300  
##  Algeria    :  12   Median :1980   Median :7.02e+06   Asia    :396  
##  Angola     :  12   Mean   :1980   Mean   :2.96e+07   Europe  :360  
##  Argentina  :  12   3rd Qu.:1993   3rd Qu.:1.96e+07   Oceania : 24  
##  Australia  :  12   Max.   :2007   Max.   :1.32e+09                 
##  (Other)    :1632                                                   
##     lifeExp       gdpPercap     
##  Min.   :23.6   Min.   :   241  
##  1st Qu.:48.2   1st Qu.:  1202  
##  Median :60.7   Median :  3532  
##  Mean   :59.5   Mean   :  7215  
##  3rd Qu.:70.8   3rd Qu.:  9325  
##  Max.   :82.6   Max.   :113523  
## 

Interesting. Let's look at the relationship between GDP per capita and life expectancy for the countries in the Americas. We should start by calling up the lattice library so that we could visualize the data using pretty graphs:

library(lattice)
xyplot(lifeExp ~ gdpPercap | country, gDat, subset = continent == "Americas")

plot of chunk unnamed-chunk-5

Very interesting. The average life expectancies in Canada, US, and Puerto Rico appear to start from a higher age and experience a more gradual increase as GDP per capita increases compared to the other countries. Let's take a closer look at Canada:

xyplot(lifeExp ~ gdpPercap, gDat, subset = country == "Canada", type = c("p", 
    "r"))

plot of chunk unnamed-chunk-6