First, let's import the Gampinder data.
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
And lode these packages before writing functions.
library(plyr)
library(lattice)
library(xtable)
Since there are only two countries in the Oceania, we will remove its related observations from the dataset.
iDat <- droplevels(subset(gDat, continent != "Oceania"))
Then, let's explore the following data aggregation tasks.
We want to get the maximum and minimum GDP per capita for each continents.
minmaxgdpcont <- ddply(iDat, ~continent, summarize, gdpPercap = range(gdpPercap),
stat = c("min", "max"))
The table is like this. There are three columns, one shows the continents, another one shows the GDP per capita, and the last one indicates whether the GDP per capita showed in the second colomn is the maximum value or the minimum value of the continent.
minmaxgdpcont <- xtable(minmaxgdpcont)
print(minmaxgdpcont, type = "html", include.rownames = FALSE)
| continent | gdpPercap | stat |
|---|---|---|
| Africa | 241.17 | min |
| Africa | 21951.21 | max |
| Americas | 1201.64 | min |
| Americas | 42951.65 | max |
| Asia | 331.00 | min |
| Asia | 113523.13 | max |
| Europe | 973.53 | min |
| Europe | 49357.19 | max |
The following graph shows how the distribution of GDP per capita, including its min and max) changing over time for all continents.
bwplot(gdpPercap ~ as.factor(year) | continent, iDat)
We want to know how the median of the life expectancy of each continents changes over time.
lifeyearcont <- daply(iDat, ~continent + year, summarize, medLifeExp = median(lifeExp))
lifeyearcont <- as.data.frame(lifeyearcont)
And the table is like this.
lifeyearcont <- xtable(lifeyearcont)
print(lifeyearcont, type = "html", include.rownames = TRUE)
| 1952 | 1957 | 1962 | 1967 | 1972 | 1977 | 1982 | 1987 | 1992 | 1997 | 2002 | 2007 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Africa | 38.83 | 40.59 | 42.63 | 44.70 | 47.03 | 49.27 | 50.76 | 51.64 | 52.43 | 52.76 | 51.24 | 52.93 |
| Americas | 54.74 | 56.07 | 58.30 | 60.52 | 63.44 | 66.35 | 67.41 | 69.50 | 69.86 | 72.15 | 72.05 | 72.90 |
| Asia | 44.87 | 48.28 | 49.33 | 53.66 | 56.95 | 60.77 | 63.74 | 66.30 | 68.69 | 70.27 | 71.03 | 72.40 |
| Europe | 65.90 | 67.65 | 69.53 | 70.61 | 70.89 | 72.34 | 73.49 | 74.81 | 75.45 | 76.12 | 77.54 | 78.61 |
The graph shows how the life expectancy changes over time for each continents, especially the median.
stripplot(lifeExp ~ as.factor(year) | continent, iDat, grid = "h", type = c("p",
"a"), fun = median)
We define the low life expectancy to be less then 40. We want to get the number of countries with low life expectancy of each continents over time
countyearcont <- daply(iDat, ~continent + year, summarize, lowLifeExp = sum(lifeExp <=
40))
countyearcont <- as.data.frame(countyearcont)
The table is like this.
countyearcont <- xtable(countyearcont)
print(countyearcont, type = "html", include.rownames = TRUE)
| 1952 | 1957 | 1962 | 1967 | 1972 | 1977 | 1982 | 1987 | 1992 | 1997 | 2002 | 2007 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Africa | 30.00 | 23.00 | 15.00 | 10.00 | 6.00 | 3.00 | 3.00 | 1.00 | 3.00 | 2.00 | 2.00 | 1.00 |
| Americas | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Asia | 10.00 | 5.00 | 3.00 | 2.00 | 2.00 | 2.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Europe | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
The first graph shows the distribution of the life expectancy over time for each continent. The points below 40 are the countires with low life expectancy. The second grach shows the density of the life expectancy for each continents. We can see clearly the density of low life expectancy for each continents.
bwplot(lifeExp ~ as.factor(year) | continent, iDat)
densityplot(~lifeExp, iDat, plot.points = FALSE, ref = TRUE, group = continent,
auto.key = list(columns = nlevels(iDat$continent)))