Stat545A Homework 4

Summary:

Load libraries and data:

library(plyr)
library(lattice)
library(xtable)
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Html Print function:

htmlPrint <- function(x, ..., digits = 0, include.rownames = FALSE) {
    print(xtable(x, digits = digits, ...), type = "html", include.rownames = include.rownames, 
        ...)
}

Life expectancy for different years accross continents

Lets look at mean life expectancy on different continents. This is adapted from Jenny's Homework 3 example (Compute a trimmed mean of life expectancy for different years) with continent added to make it a bit more interesting.

meanLifePerYearCont <- ddply(gDat, ~year + continent, summarize, meanLifeExp = mean(lifeExp))
htmlPrint(meanLifePerYearCont)
year continent meanLifeExp
1952 Africa 39
1952 Americas 53
1952 Asia 46
1952 Europe 64
1952 Oceania 69
1957 Africa 41
1957 Americas 56
1957 Asia 49
1957 Europe 67
1957 Oceania 70
1962 Africa 43
1962 Americas 58
1962 Asia 52
1962 Europe 69
1962 Oceania 71
1967 Africa 45
1967 Americas 60
1967 Asia 55
1967 Europe 70
1967 Oceania 71
1972 Africa 47
1972 Americas 62
1972 Asia 57
1972 Europe 71
1972 Oceania 72
1977 Africa 50
1977 Americas 64
1977 Asia 60
1977 Europe 72
1977 Oceania 73
1982 Africa 52
1982 Americas 66
1982 Asia 63
1982 Europe 73
1982 Oceania 74
1987 Africa 53
1987 Americas 68
1987 Asia 65
1987 Europe 74
1987 Oceania 75
1992 Africa 54
1992 Americas 70
1992 Asia 67
1992 Europe 74
1992 Oceania 77
1997 Africa 54
1997 Americas 71
1997 Asia 68
1997 Europe 76
1997 Oceania 78
2002 Africa 53
2002 Americas 72
2002 Asia 69
2002 Europe 77
2002 Oceania 80
2007 Africa 55
2007 Americas 74
2007 Asia 71
2007 Europe 78
2007 Oceania 81
xyplot(meanLifeExp ~ year | continent, meanLifePerYearCont, type = c("p", "r"))

plot of chunk unnamed-chunk-3

Aside from the african continent there is a clear trend of increasing life expectancy throughout. Even Africa seems to be on track by the looks of it as seen by the last data point to having higher life expectancy.

Look at the spread of GDP per capita within the continents

I'm using Jenny's example from homework 3 (Get the maximum and minimum of GDP per capita for all continents in a “tall” format).

For this next part lets remove Oceania from the dataset:

gDat <- droplevels(subset(gDat, continent != "Oceania"))
summary(gDat$continent)
##   Africa Americas     Asia   Europe 
##      624      300      396      360

Next lets apply Jenny's code with year and country added for so we have more to talk about:

gdpMaxMin <- ddply(gDat, ~continent, function(x) {
    gdpPercap <- range(x$gdpPercap)
    year <- x[x$gdpPercap == gdpPercap, ]$year
    return(data.frame(year, gdpPercap, stat = c("min", "max")))
})
htmlPrint(gdpMaxMin)
continent year gdpPercap stat
Africa 2002 241 min
Africa 1977 21951 max
Americas 2007 1202 min
Americas 2007 42952 max
Asia 1957 331 min
Asia 1952 113523 max
Europe 1952 974 min
Europe 2007 49357 max

Now to apply the actual visualiztion:

stripplot(gdpPercap ~ continent, gdpMaxMin, groups = year, auto.key = TRUE)

plot of chunk unnamed-chunk-6

Each point is a year as well as a location. It is striking to see how rich Asia's richest country in 1952 even compared to today's (2007) riches countries standards. Quite striking also is how some of the more recent dates (2007, 2002) for Africa and America's are also the poorest since 1952.