STAT545 Homework 4

First, let's import the Gampinder data.

gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)

And lode these packages before writing functions.

library(plyr)
library(lattice)
library(xtable)

Since there are only two countries in the Oceania, we will remove its related observations from the dataset.

iDat <- droplevels(subset(gDat, continent != "Oceania"))

Then, let's explore the following data aggregation tasks.

Depict the maximum and minimum of GDP per capita over time by continents

We want to get the maximum and minimum GDP per capita for each continents.

minmaxgdpcont <- ddply(iDat, ~continent, summarize, gdpPercap = range(gdpPercap), 
    stat = c("min", "max"))

The table is like this. There are three columns, one shows the continents, another one shows the GDP per capita, and the last one indicates whether the GDP per capita showed in the second colomn is the maximum value or the minimum value of the continent.

minmaxgdpcont <- xtable(minmaxgdpcont)
print(minmaxgdpcont, type = "html", include.rownames = FALSE)
continent gdpPercap stat
Africa 241.17 min
Africa 21951.21 max
Americas 1201.64 min
Americas 42951.65 max
Asia 331.00 min
Asia 113523.13 max
Europe 973.53 min
Europe 49357.19 max

The following graph shows how the distribution of GDP per capita, including its min and max) changing over time for all continents.

bwplot(gdpPercap ~ as.factor(year) | continent, iDat)

plot of chunk unnamed-chunk-6

Compare Life expectancy over time on different continents

We want to know how the median of the life expectancy of each continents changes over time.

lifeyearcont <- daply(iDat, ~continent + year, summarize, medLifeExp = median(lifeExp))
lifeyearcont <- as.data.frame(lifeyearcont)

And the table is like this.

lifeyearcont <- xtable(lifeyearcont)
print(lifeyearcont, type = "html", include.rownames = TRUE)
1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
Africa 38.83 40.59 42.63 44.70 47.03 49.27 50.76 51.64 52.43 52.76 51.24 52.93
Americas 54.74 56.07 58.30 60.52 63.44 66.35 67.41 69.50 69.86 72.15 72.05 72.90
Asia 44.87 48.28 49.33 53.66 56.95 60.77 63.74 66.30 68.69 70.27 71.03 72.40
Europe 65.90 67.65 69.53 70.61 70.89 72.34 73.49 74.81 75.45 76.12 77.54 78.61

The graph shows how the life expectancy changes over time for each continents, especially the median.

stripplot(lifeExp ~ as.factor(year) | continent, iDat, grid = "h", type = c("p", 
    "a"), fun = median)

plot of chunk unnamed-chunk-9

Depict the number of countries with low life expectancy over time by continent

We define the low life expectancy to be less then 40. We want to get the number of countries with low life expectancy of each continents over time

countyearcont <- daply(iDat, ~continent + year, summarize, lowLifeExp = sum(lifeExp <= 
    40))
countyearcont <- as.data.frame(countyearcont)

The table is like this.

countyearcont <- xtable(countyearcont)
print(countyearcont, type = "html", include.rownames = TRUE)
1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
Africa 30.00 23.00 15.00 10.00 6.00 3.00 3.00 1.00 3.00 2.00 2.00 1.00
Americas 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Asia 10.00 5.00 3.00 2.00 2.00 2.00 1.00 0.00 0.00 0.00 0.00 0.00
Europe 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

The first graph shows the distribution of the life expectancy over time for each continent. The points below 40 are the countires with low life expectancy. The second grach shows the density of the life expectancy for each continents. We can see clearly the density of low life expectancy for each continents.

bwplot(lifeExp ~ as.factor(year) | continent, iDat)

plot of chunk unnamed-chunk-12

densityplot(~lifeExp, iDat, plot.points = FALSE, ref = TRUE, group = continent, 
    auto.key = list(columns = nlevels(iDat$continent)))

plot of chunk unnamed-chunk-13