stat545a-2013-hw02_woollard-geo

Let's explore the gap minder data. You can find it here

library(lattice)
## Warning: package 'lattice' was built under R version 2.15.3
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat  <- read.table(gdURL, header = TRUE, sep = '\t', quote = "\"")

What is in gDat?

str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Some elementary statistics

summary(gDat)
##         country          year           pop              continent  
##  Afghanistan:  12   Min.   :1952   Min.   :6.00e+04   Africa  :624  
##  Albania    :  12   1st Qu.:1966   1st Qu.:2.79e+06   Americas:300  
##  Algeria    :  12   Median :1980   Median :7.02e+06   Asia    :396  
##  Angola     :  12   Mean   :1980   Mean   :2.96e+07   Europe  :360  
##  Argentina  :  12   3rd Qu.:1993   3rd Qu.:1.96e+07   Oceania : 24  
##  Australia  :  12   Max.   :2007   Max.   :1.32e+09                 
##  (Other)    :1632                                                   
##     lifeExp       gdpPercap     
##  Min.   :23.6   Min.   :   241  
##  1st Qu.:48.2   1st Qu.:  1202  
##  Median :60.7   Median :  3532  
##  Mean   :59.5   Mean   :  7215  
##  3rd Qu.:70.8   3rd Qu.:  9325  
##  Max.   :82.6   Max.   :113523  
## 

The year range

min(gDat$year)
## [1] 1952
max(gDat$year)
## [1] 2007

Does the data seem reasonable?

gDat[which(gDat$pop == max(gDat$pop)), ]
##     country year       pop continent lifeExp gdpPercap
## 300   China 2007 1.319e+09      Asia   72.96      4959
gDat[which(gDat$pop == max(gDat$pop)), ]$pop
## [1] 1.319e+09
gDat[which(gDat$lifeExp == max(gDat$lifeExp)), ]
##     country year       pop continent lifeExp gdpPercap
## 804   Japan 2007 127467972      Asia    82.6     31656
gDat[which(gDat$lifeExp == max(gDat$lifeExp)), ]$lifeExp
## [1] 82.6

Let's make some pictures.

xyplot(pop ~ year | continent, gDat)

plot of chunk unnamed-chunk-6

Very interesting! Where are all those lovely asians coming from?

xyplot(pop ~ year | country, subset(within.data.frame(gDat, country <- strtrim(gDat$country, 
    8)), continent == "Asia"))

plot of chunk unnamed-chunk-7