STAT545 Homework 5

First, let's import the Gampinder data.

gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)

And lode these packages before writing functions.

library(plyr)
library(lattice)
library(xtable)
library(ggplot2)

Since there are only two countries in the Oceania, we will remove its related observations from the dataset.

iDat <- droplevels(subset(gDat, continent != "Oceania"))

Then, let's look at one of the perivious data aggregation tasks.

We want to know how the life expectancy of each continents changes over time, especially the mean.

lifeyearcont <- daply(iDat, ~continent + year, summarize, medLifeExp = median(lifeExp))
lifeyearcont <- as.data.frame(lifeyearcont)

And the table is like this.

lifeyearcont <- xtable(lifeyearcont)
print(lifeyearcont, type = "html", include.rownames = TRUE)
1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
Africa 38.83 40.59 42.63 44.70 47.03 49.27 50.76 51.64 52.43 52.76 51.24 52.93
Americas 54.74 56.07 58.30 60.52 63.44 66.35 67.41 69.50 69.86 72.15 72.05 72.90
Asia 44.87 48.28 49.33 53.66 56.95 60.77 63.74 66.30 68.69 70.27 71.03 72.40
Europe 65.90 67.65 69.53 70.61 70.89 72.34 73.49 74.81 75.45 76.12 77.54 78.61

The stripplots below show how the life expectancy changes over time for each continents, especially the median.

ggplot(gDat, aes(x = year, y = lifeExp, colour = continent, size = sqrt(pop/pi))) + 
    geom_point() + scale_x_log10() + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.
stripplot(lifeExp ~ factor(year), gDat, jitter.data = TRUE, group = reorder(continent, 
    lifeExp), type = c("p", "a"), fun = mean, alpha = 0.6, grid = "h", main = paste("Life expectancy, mean "), 
    scales = list(x = list(rot = c(45, 0))), auto.key = list(reverse.rows = TRUE, 
        x = 0.07, y = 0.95, corner = c(0, 1)))

plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6

The left graph is used by ggplot and the right one is used by lattice. We can see the left one looks nicer, and the graph of ggplot can show the size of the population having the same life expectancy with bigger or smaller circles.

Next, let's look at the graphs of the scatterplot of life expectancy with gdp per capita.

ggplot(gDat, aes(x = gdpPercap, y = lifeExp, color = continent, size = sqrt(pop/pi))) + 
    geom_point() + scale_x_log10()
xyplot(lifeExp ~ as.factor(gdpPercap), gDat)

plot of chunk unnamed-chunk-7plot of chunk unnamed-chunk-7

The left graph is used by ggplot and the right one is used by lattice. With ggplot, we can see these points are colored by continents and have a nicer view.