First, let's import the Gampinder data.
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
And lode these packages before writing functions.
library(plyr)
library(lattice)
library(xtable)
library(ggplot2)
Since there are only two countries in the Oceania, we will remove its related observations from the dataset.
iDat <- droplevels(subset(gDat, continent != "Oceania"))
Then, let's look at one of the perivious data aggregation tasks.
We want to know how the life expectancy of each continents changes over time, especially the mean.
lifeyearcont <- daply(iDat, ~continent + year, summarize, medLifeExp = median(lifeExp))
lifeyearcont <- as.data.frame(lifeyearcont)
And the table is like this.
lifeyearcont <- xtable(lifeyearcont)
print(lifeyearcont, type = "html", include.rownames = TRUE)
| 1952 | 1957 | 1962 | 1967 | 1972 | 1977 | 1982 | 1987 | 1992 | 1997 | 2002 | 2007 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Africa | 38.83 | 40.59 | 42.63 | 44.70 | 47.03 | 49.27 | 50.76 | 51.64 | 52.43 | 52.76 | 51.24 | 52.93 |
| Americas | 54.74 | 56.07 | 58.30 | 60.52 | 63.44 | 66.35 | 67.41 | 69.50 | 69.86 | 72.15 | 72.05 | 72.90 |
| Asia | 44.87 | 48.28 | 49.33 | 53.66 | 56.95 | 60.77 | 63.74 | 66.30 | 68.69 | 70.27 | 71.03 | 72.40 |
| Europe | 65.90 | 67.65 | 69.53 | 70.61 | 70.89 | 72.34 | 73.49 | 74.81 | 75.45 | 76.12 | 77.54 | 78.61 |
The stripplots below show how the life expectancy changes over time for each continents, especially the median.
ggplot(gDat, aes(x = year, y = lifeExp, colour = continent, size = sqrt(pop/pi))) +
geom_point() + scale_x_log10() + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.
stripplot(lifeExp ~ factor(year), gDat, jitter.data = TRUE, group = reorder(continent,
lifeExp), type = c("p", "a"), fun = mean, alpha = 0.6, grid = "h", main = paste("Life expectancy, mean "),
scales = list(x = list(rot = c(45, 0))), auto.key = list(reverse.rows = TRUE,
x = 0.07, y = 0.95, corner = c(0, 1)))
The left graph is used by ggplot and the right one is used by lattice. We can see the left one looks nicer, and the graph of ggplot can show the size of the population having the same life expectancy with bigger or smaller circles.
Next, let's look at the graphs of the scatterplot of life expectancy with gdp per capita.
ggplot(gDat, aes(x = gdpPercap, y = lifeExp, color = continent, size = sqrt(pop/pi))) +
geom_point() + scale_x_log10()
xyplot(lifeExp ~ as.factor(gdpPercap), gDat)
The left graph is used by ggplot and the right one is used by lattice. With ggplot, we can see these points are colored by continents and have a nicer view.