Assignment 4 - Visualization

library(plyr)
library(lattice)
library(xtable)
html_print <- function(x, ..., digits = 0, include.rownames = FALSE, 
                       header=names(x)){
  # print html table with customized header, 
  # without permanently changing column names of the dataframe
  origHeader <- names(x)
  names(x) <- header
  print(xtable(x, digits = digits, ...), 
        type = 'html', include.rownames = include.rownames)
  names(x) <- origHeader
}
gDat <- read.table('data/gapminderDataFiveYear.txt',
                   sep='\t', quote='"', header=TRUE)
gDat <- droplevels(subset(gDat, continent!='Oceania'))

Proportion of countries with low life expectancy over time by continent

Code in this section is adapted from https://gist.github.com/ArephB/6667983#file-stat545a-2013-hw03_bolandnazar-moh-rmd

Average of Life Expectancy over time is 59.2623. And this value is used as the cutoff for low life expectancy countries.

# split a vector by a threshold value, and count the lower and upper tier
count_splits <- function(c, threshold){
  tierSizes <- c(lower=sum(c < threshold),
                 upper=sum(c >= threshold))
  return(tierSizes)
}
lifeRatio <- ddply(gDat, ~continent + year, function(x){
  count <- count_splits(x$lifeExp, benchmark)
  ratio <- count/length(x$lifeExp)
  tier <- c('low', 'high')
  return(data.frame(tier, count, ratio))
})
html_print(head(lifeRatio), 
           digits=c(0,0,0,0,0,2),
           header=c("continent", "year", "tier", "count", "ratio"))
continent year tier count ratio
Africa 1952 low 52 1.00
Africa 1952 high 0 0.00
Africa 1957 low 52 1.00
Africa 1957 high 0 0.00
Africa 1962 low 51 0.98
Africa 1962 high 1 0.02

Proportion of low life expectancy countries (LLEC) across all continents drops dramatically in the last half century. Having all trend lines in the same panel helps with comparison.

With Europe starting with the least proportion of LLEC, every other continents started with much higher proportion. Asia and America shows the most significant decline, and their proportion of LLEC approaches 0 at 2007 converging with that of Europe's. Africa, despite having declined proportion of LLEC, still have a large proportion remaining under our definition of low life expectancy.

However, it is noteworthy that the plot does not capture the full extend of increase in life expectency, because countries despite underwent significant increase in life expectancy still fall within low life expectancy tier.

xyplot(ratio ~ year, lifeRatio, group=continent, 
       subset=tier=='low',
       auto.key=list(column=4), 
       type=c('o'), 
       xlab="Year", ylab="Ratio of Low Life Expectancy",
       main="Trends in Low Life Expectancy Ratio in Different Continents",
       )

plot of chunk unnamed-chunk-7

This plot captures roughly the same information as the one above. But the bar plot provides

The cons are costing more space and paint.

barchart(as.character(year)~count|continent, lifeRatio, group=tier,
         stack=TRUE,
         auto.key=list(column=2, title='Life Expectancy Tier'), 
         xlab="Life Expectancy Tier Count", ylab="Year",
         scales=list(x = list(relation="free")),
         main="Trends in Low Life Expectancy Ratio in Different Continents",
         )

plot of chunk unnamed-chunk-8

Historical Trends in GDP per Capita by Most Extreme Case of a Continent

code is modified from Jenny's example

extremGdp <- ddply(gDat, ~ continent+year, function(x) {
  gdpPercap <- range(x$gdpPercap)
  return(data.frame(gdpPercap, stat = c("min", "max")))
})
html_print(extremGdp)
continent year gdpPercap stat
Africa 1952 299 min
Africa 1952 4725 max
Africa 1957 336 min
Africa 1957 5487 max
Africa 1962 355 min
Africa 1962 6757 max
Africa 1967 413 min
Africa 1967 18773 max
Africa 1972 464 min
Africa 1972 21011 max
Africa 1977 502 min
Africa 1977 21951 max
Africa 1982 462 min
Africa 1982 17364 max
Africa 1987 390 min
Africa 1987 11864 max
Africa 1992 411 min
Africa 1992 13522 max
Africa 1997 312 min
Africa 1997 14723 max
Africa 2002 241 min
Africa 2002 12522 max
Africa 2007 278 min
Africa 2007 13206 max
Americas 1952 1398 min
Americas 1952 13990 max
Americas 1957 1544 min
Americas 1957 14847 max
Americas 1962 1662 min
Americas 1962 16173 max
Americas 1967 1452 min
Americas 1967 19530 max
Americas 1972 1654 min
Americas 1972 21806 max
Americas 1977 1874 min
Americas 1977 24073 max
Americas 1982 2011 min
Americas 1982 25010 max
Americas 1987 1823 min
Americas 1987 29884 max
Americas 1992 1456 min
Americas 1992 32004 max
Americas 1997 1342 min
Americas 1997 35767 max
Americas 2002 1270 min
Americas 2002 39097 max
Americas 2007 1202 min
Americas 2007 42952 max
Asia 1952 331 min
Asia 1952 108382 max
Asia 1957 350 min
Asia 1957 113523 max
Asia 1962 388 min
Asia 1962 95458 max
Asia 1967 349 min
Asia 1967 80895 max
Asia 1972 357 min
Asia 1972 109348 max
Asia 1977 371 min
Asia 1977 59265 max
Asia 1982 424 min
Asia 1982 33693 max
Asia 1987 385 min
Asia 1987 28118 max
Asia 1992 347 min
Asia 1992 34933 max
Asia 1997 415 min
Asia 1997 40301 max
Asia 2002 611 min
Asia 2002 36023 max
Asia 2007 944 min
Asia 2007 47307 max
Europe 1952 974 min
Europe 1952 14734 max
Europe 1957 1354 min
Europe 1957 17909 max
Europe 1962 1710 min
Europe 1962 20431 max
Europe 1967 2172 min
Europe 1967 22966 max
Europe 1972 2860 min
Europe 1972 27195 max
Europe 1977 3528 min
Europe 1977 26982 max
Europe 1982 3631 min
Europe 1982 28398 max
Europe 1987 3739 min
Europe 1987 31541 max
Europe 1992 2497 min
Europe 1992 33966 max
Europe 1997 3193 min
Europe 1997 41283 max
Europe 2002 4604 min
Europe 2002 44684 max
Europe 2007 5937 min
Europe 2007 49357 max
xyplot(gdpPercap ~ year|stat, extremGdp, groups=continent,
       type=c('o','g'),
       scales=list(y = list(relation="free")),
       layout=c(1,2),
       xlab='Year', ylab='Extreme GDP per Capital',
       auto.key=list(column=4))

plot of chunk unnamed-chunk-10

xyplot(gdpPercap ~ year|continent, extremGdp, groups=stat,
       type=c('o','g'),
       layout=c(4,1),
       xlab='Year', ylab='Extreme GDP per Capital',
       auto.key=list(column=2))

plot of chunk unnamed-chunk-11

I am having a hard time figuring out how to present this aggregation result. The two plots succeed at capturing different aspects of the result, but both have severe drawbacks.

The former draws attention to the converging GDP per capita in top countries of all continents, the unusual big increase in GDP per capita in the bottom European country (among the lack of change in bottom countries of all other continents). But it doesn't draw attention to the big gap still existing between the top and bottom countries across all continents. The other plots is exactly opposite in what it captures and what it does not.