STAT 545A Homework 5

In this report, we will continue exploring on how to make awesome plots but using ggplot2 package instead of lattice. We will use ggplot2 to represent the graphics which are plotted using lattice in our previous report. We download the Gapminder data from the repository, as in our previous article Home Work 4, and load plyr, xtable, lattice and ggplot2 packages:

gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
require(plyr)
## Loading required package: plyr
require(xtable)
## Loading required package: xtable
require(lattice)
## Loading required package: lattice
require(ggplot2)
## Loading required package: ggplot2
str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

For the rest of this article, we will drop Ocania, which contains only two countries, from the whole dataset.

noODat <- droplevels(subset(gDat, continent != "Oceania"))

Lets us look at the number and/or proportion of countries with low life expectancy over years by continent and visualize it using lattice. We note that our benchmark for the life expectancy is based on our sole choice:

bMark <- 45
tmp <- ddply(noODat, ~continent + year, function(x) {
    jCount = sum(x$lifeExp <= bMark)
    c(count = jCount, prop = jCount/nrow(x))
})
xyplot(x = prop ~ year | continent, data = tmp, pch = c(8), type = c("p", "r"))

plot of chunk unnamed-chunk-3

Now after we mastered the grammer, we perform the above figure using ggplot2. First, we create the plot ggplot object. This has two arguments: data and aesthetic mapping to variables. These could omitted if we specify data and aesthetics when adding each ploting layer. Next, we create geom object to represent the data visually. Finally, we plot for each continent in multiple grpahs on a single page using facet_wrap or facet_grid as following:

plot <- ggplot(tmp)
plot <- plot + aes(year, prop, colour = continent)
plot <- plot + geom_line() + geom_point() + facet_wrap(~continent)
plot

plot of chunk unnamed-chunk-4

Here is another example for displaying multiple plots in one page. Lets depict the maximum and minimum of GDP per capita for all continents over years using lattice:

tmp <- ddply(noODat, ~continent + year, function(x) {
    Levels <- c("min", "max")
    data.frame(gdpPercap = range(x$gdpPercap), stat = factor(Levels, levels = Levels))
})
xyplot(gdpPercap ~ year | continent, tmp, group = stat, type = "b", grid = "h", 
    as.table = TRUE, auto.key = list(columns = 2))

plot of chunk unnamed-chunk-5

Now lets try to use ggplots to duplicate the same figure above:

plot <- ggplot(tmp)
plot <- plot + aes(year, gdpPercap, colour = stat)
plot <- plot + geom_line() + facet_wrap(~continent)
plot

plot of chunk unnamed-chunk-6

There are alot of options in ggplot2 that are required in plotting glamorous graphs. In my next report, I will describe some of these options until then hope you enjoyed in reading my report.