Assignment 5 - Plotting with ggplot2

library(plyr)
library(ggplot2)
library(lattice)
library(xtable)
html_print <- function(x, ..., digits = 0, include.rownames = FALSE, 
                       header=names(x)){
  # print html table with customized header, 
  # without permanently changing column names of the dataframe
  origHeader <- names(x)
  names(x) <- header
  print(xtable(x, digits = digits, ...), 
        type = 'html', include.rownames = include.rownames)
  names(x) <- origHeader
}
gDat <- read.table('data/gapminderDataFiveYear.txt',
                   sep='\t', quote='"', header=TRUE)
gDat <- droplevels(subset(gDat, continent!='Oceania'))

Historical Life Expectancy by Continents

pExp <- ggplot(gDat, aes(x=year, y=lifeExp, color=continent))
pExp + geom_point(alpha=0.3) + geom_smooth(method='loess')

plot of chunk unnamed-chunk-5


xyplot(lifeExp ~ year, gDat, group=continent, 
       type=c('p','smooth'),
       auto.key=TRUE)

plot of chunk unnamed-chunk-5

Across different continents, the life expectancy trend line over time are very distinct.

Similarities

Differences

Proportion of countries with low life expectancy over time by continent

Code in this section is adapted from https://gist.github.com/ArephB/6667983#file-stat545a-2013-hw03_bolandnazar-moh-rmd

Average of Life Expectancy over time is 59.2623. And this value is used as the cutoff for low life expectancy countries.

# split a vector by a threshold value, and count the lower and upper tier
count_splits <- function(c, threshold){
  tierSizes <- c(lower=sum(c < threshold),
                 upper=sum(c >= threshold))
  return(tierSizes)
}
lifeRatio <- ddply(gDat, ~continent + year, function(x){
  count <- count_splits(x$lifeExp, benchmark)
  ratio <- count/length(x$lifeExp)
  tier <- c('below', 'above')
  return(data.frame(tier, count, ratio))
})
pLowExpR <- ggplot(subset(lifeRatio, tier=='below'), aes(x=year, y=ratio, color=continent))
pLowExpR + geom_point() + geom_line()

plot of chunk unnamed-chunk-8

xyplot(ratio ~ year, lifeRatio, group=continent, 
       subset=tier=='below',
       auto.key=list(column=4), 
       type=c('o'), 
       xlab="Year", ylab="Ratio of Low Life Expectancy",
       main="Trends in Low Life Expectancy Ratio in Different Continents",
       )

plot of chunk unnamed-chunk-8

The bar chart below captures roughly the same information as the one above. But the bar plot provides

The cons are costing more space and paint.

barchart(as.character(year)~count|continent, lifeRatio, group=tier,
         stack=TRUE,
         auto.key=list(column=2, title='Life Expectancy Tier'), 
         xlab="Life Expectancy Tier Count", ylab="Year",
         scales=list(x = list(relation="free")),
         main="Trends in Low Life Expectancy Ratio in Different Continents",
         )

plot of chunk unnamed-chunk-9

Same plot done with ggplot2:

pLowExpR2 <- ggplot(lifeRatio, aes(x=year, y=count, fill=tier))
pLowExpR2 + scale_fill_manual(values = c("below" = "orange","above" = "skyblue")) +
  geom_bar(stat="identity") +
  facet_wrap(~continent, scales='free') 

plot of chunk unnamed-chunk-10

Similarity

Difference

Historical Trends in GDP per Capita by Most Extreme Case of a Continent

data aggregation code is modified from Jenny's example

extremGdp <- ddply(gDat, ~ continent+year, function(x) {
  gdpPercap <- range(x$gdpPercap)
  return(data.frame(gdpPercap, stat = c("min", "max")))
})
#html_print(extremGdp)
xyplot(gdpPercap ~ year|stat, extremGdp, groups=continent,
       type=c('o','g'),
       scales=list(y = list(relation="free")),
       layout=c(1,2),
       xlab='Year', ylab='Extreme GDP per Capital',
       auto.key=list(column=4))

plot of chunk unnamed-chunk-12


xyplot(gdpPercap ~ year|continent, extremGdp, groups=stat,
       type=c('o','g'),
       layout=c(4,1),
       xlab='Year', ylab='Extreme GDP per Capital',
       auto.key=list(column=2))

plot of chunk unnamed-chunk-12

minGdpPlt <- ggplot(extremGdp, 
                    aes(x=year, y=gdpPercap, color=continent, linetype=stat))
minGdpPlt + geom_point() + geom_line()

plot of chunk unnamed-chunk-13

I intend to show trends of GDP per capita given by the top and bottom countries of different continents. So I can see the magnitude of difference and continuously widening gap between the two collections of trend lines.

I have not found a simple solution in lattice. But ggplot2 certainly has a neat solution for it.

Difference

Other plots

Here the distribution of GDP per capital by continent is plotted. GDP per capita of Africa, Europe and America all have distinct mean from each other. GDP per capita of Asian countries seem most variable.

ggplot(gDat, aes(x=gdpPercap, fill=continent)) + 
  geom_density(alpha=0.3) + 
  scale_x_log10()

plot of chunk unnamed-chunk-14