library(plyr)
library(lattice)
library(xtable)
html_print <- function(x, ..., digits = 0, include.rownames = FALSE,
header=names(x)){
# print html table with customized header,
# without permanently changing column names of the dataframe
origHeader <- names(x)
names(x) <- header
print(xtable(x, digits = digits, ...),
type = 'html', include.rownames = include.rownames)
names(x) <- origHeader
}
gDat <- read.table('data/gapminderDataFiveYear.txt',
sep='\t', quote='"', header=TRUE)
gDat <- droplevels(subset(gDat, continent!='Oceania'))
Average of Life Expectancy over time is 59.2623. And this value is used as the cutoff for low life expectancy countries.
# split a vector by a threshold value, and count the lower and upper tier
count_splits <- function(c, threshold){
tierSizes <- c(lower=sum(c < threshold),
upper=sum(c >= threshold))
return(tierSizes)
}
lifeRatio <- ddply(gDat, ~continent + year, function(x){
count <- count_splits(x$lifeExp, benchmark)
ratio <- count/length(x$lifeExp)
tier <- c('low', 'high')
return(data.frame(tier, count, ratio))
})
html_print(head(lifeRatio),
digits=c(0,0,0,0,0,2),
header=c("continent", "year", "tier", "count", "ratio"))
| continent | year | tier | count | ratio |
|---|---|---|---|---|
| Africa | 1952 | low | 52 | 1.00 |
| Africa | 1952 | high | 0 | 0.00 |
| Africa | 1957 | low | 52 | 1.00 |
| Africa | 1957 | high | 0 | 0.00 |
| Africa | 1962 | low | 51 | 0.98 |
| Africa | 1962 | high | 1 | 0.02 |
Proportion of low life expectancy countries (LLEC) across all continents drops dramatically in the last half century. Having all trend lines in the same panel helps with comparison.
With Europe starting with the least proportion of LLEC, every other continents started with much higher proportion. Asia and America shows the most significant decline, and their proportion of LLEC approaches 0 at 2007 converging with that of Europe's. Africa, despite having declined proportion of LLEC, still have a large proportion remaining under our definition of low life expectancy.
However, it is noteworthy that the plot does not capture the full extend of increase in life expectency, because countries despite underwent significant increase in life expectancy still fall within low life expectancy tier.
xyplot(ratio ~ year, lifeRatio, group=continent,
subset=tier=='low',
auto.key=list(column=4),
type=c('o'),
xlab="Year", ylab="Ratio of Low Life Expectancy",
main="Trends in Low Life Expectancy Ratio in Different Continents",
)
This plot captures roughly the same information as the one above. But the bar plot provides
The cons are costing more space and paint.
barchart(as.character(year)~count|continent, lifeRatio, group=tier,
stack=TRUE,
auto.key=list(column=2, title='Life Expectancy Tier'),
xlab="Life Expectancy Tier Count", ylab="Year",
scales=list(x = list(relation="free")),
main="Trends in Low Life Expectancy Ratio in Different Continents",
)
extremGdp <- ddply(gDat, ~ continent+year, function(x) {
gdpPercap <- range(x$gdpPercap)
return(data.frame(gdpPercap, stat = c("min", "max")))
})
html_print(extremGdp)
| continent | year | gdpPercap | stat |
|---|---|---|---|
| Africa | 1952 | 299 | min |
| Africa | 1952 | 4725 | max |
| Africa | 1957 | 336 | min |
| Africa | 1957 | 5487 | max |
| Africa | 1962 | 355 | min |
| Africa | 1962 | 6757 | max |
| Africa | 1967 | 413 | min |
| Africa | 1967 | 18773 | max |
| Africa | 1972 | 464 | min |
| Africa | 1972 | 21011 | max |
| Africa | 1977 | 502 | min |
| Africa | 1977 | 21951 | max |
| Africa | 1982 | 462 | min |
| Africa | 1982 | 17364 | max |
| Africa | 1987 | 390 | min |
| Africa | 1987 | 11864 | max |
| Africa | 1992 | 411 | min |
| Africa | 1992 | 13522 | max |
| Africa | 1997 | 312 | min |
| Africa | 1997 | 14723 | max |
| Africa | 2002 | 241 | min |
| Africa | 2002 | 12522 | max |
| Africa | 2007 | 278 | min |
| Africa | 2007 | 13206 | max |
| Americas | 1952 | 1398 | min |
| Americas | 1952 | 13990 | max |
| Americas | 1957 | 1544 | min |
| Americas | 1957 | 14847 | max |
| Americas | 1962 | 1662 | min |
| Americas | 1962 | 16173 | max |
| Americas | 1967 | 1452 | min |
| Americas | 1967 | 19530 | max |
| Americas | 1972 | 1654 | min |
| Americas | 1972 | 21806 | max |
| Americas | 1977 | 1874 | min |
| Americas | 1977 | 24073 | max |
| Americas | 1982 | 2011 | min |
| Americas | 1982 | 25010 | max |
| Americas | 1987 | 1823 | min |
| Americas | 1987 | 29884 | max |
| Americas | 1992 | 1456 | min |
| Americas | 1992 | 32004 | max |
| Americas | 1997 | 1342 | min |
| Americas | 1997 | 35767 | max |
| Americas | 2002 | 1270 | min |
| Americas | 2002 | 39097 | max |
| Americas | 2007 | 1202 | min |
| Americas | 2007 | 42952 | max |
| Asia | 1952 | 331 | min |
| Asia | 1952 | 108382 | max |
| Asia | 1957 | 350 | min |
| Asia | 1957 | 113523 | max |
| Asia | 1962 | 388 | min |
| Asia | 1962 | 95458 | max |
| Asia | 1967 | 349 | min |
| Asia | 1967 | 80895 | max |
| Asia | 1972 | 357 | min |
| Asia | 1972 | 109348 | max |
| Asia | 1977 | 371 | min |
| Asia | 1977 | 59265 | max |
| Asia | 1982 | 424 | min |
| Asia | 1982 | 33693 | max |
| Asia | 1987 | 385 | min |
| Asia | 1987 | 28118 | max |
| Asia | 1992 | 347 | min |
| Asia | 1992 | 34933 | max |
| Asia | 1997 | 415 | min |
| Asia | 1997 | 40301 | max |
| Asia | 2002 | 611 | min |
| Asia | 2002 | 36023 | max |
| Asia | 2007 | 944 | min |
| Asia | 2007 | 47307 | max |
| Europe | 1952 | 974 | min |
| Europe | 1952 | 14734 | max |
| Europe | 1957 | 1354 | min |
| Europe | 1957 | 17909 | max |
| Europe | 1962 | 1710 | min |
| Europe | 1962 | 20431 | max |
| Europe | 1967 | 2172 | min |
| Europe | 1967 | 22966 | max |
| Europe | 1972 | 2860 | min |
| Europe | 1972 | 27195 | max |
| Europe | 1977 | 3528 | min |
| Europe | 1977 | 26982 | max |
| Europe | 1982 | 3631 | min |
| Europe | 1982 | 28398 | max |
| Europe | 1987 | 3739 | min |
| Europe | 1987 | 31541 | max |
| Europe | 1992 | 2497 | min |
| Europe | 1992 | 33966 | max |
| Europe | 1997 | 3193 | min |
| Europe | 1997 | 41283 | max |
| Europe | 2002 | 4604 | min |
| Europe | 2002 | 44684 | max |
| Europe | 2007 | 5937 | min |
| Europe | 2007 | 49357 | max |
xyplot(gdpPercap ~ year|stat, extremGdp, groups=continent,
type=c('o','g'),
scales=list(y = list(relation="free")),
layout=c(1,2),
xlab='Year', ylab='Extreme GDP per Capital',
auto.key=list(column=4))
xyplot(gdpPercap ~ year|continent, extremGdp, groups=stat,
type=c('o','g'),
layout=c(4,1),
xlab='Year', ylab='Extreme GDP per Capital',
auto.key=list(column=2))
I am having a hard time figuring out how to present this aggregation result. The two plots succeed at capturing different aspects of the result, but both have severe drawbacks.
The former draws attention to the converging GDP per capita in top countries of all continents, the unusual big increase in GDP per capita in the bottom European country (among the lack of change in bottom countries of all other continents). But it doesn't draw attention to the big gap still existing between the top and bottom countries across all continents. The other plots is exactly opposite in what it captures and what it does not.