STAT 545A Homework#5

Yiming Zhang

First, loading the Gapminder data and needed packages.

gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
library(lattice)
library(plyr)
library(xtable)
library(ggplot2)

Also drop the Oceania

gDat <- droplevels(subset(gDat, continent != "Oceania"))

Stripplot in ggplot2

In this case, I use lifeExp as the quantitative variable and year as categorical variable. For all continents, we have

ggplot(gDat, aes(x = factor(year), y = lifeExp, colour = year)) + geom_boxplot(aes(fill = factor(year))) + 
    geom_point()

plot of chunk unnamed-chunk-3

Then we can show that in separate continents

ggplot(gDat, aes(x = factor(year), y = lifeExp, colour = year)) + geom_boxplot(aes(fill = factor(year))) + 
    geom_point() + facet_grid(~continent)

plot of chunk unnamed-chunk-4

Notice that the color has made the plot messy, so I let it be plain.

ggplot(gDat, aes(x = factor(year), y = lifeExp, colour = year)) + geom_boxplot() + 
    geom_point() + facet_grid(~continent)

plot of chunk unnamed-chunk-5

That's better.

Scatterplot in ggplot2

Use two quantitative variables, lifeExp and GdpPercap. Also add facets as continent.

ggplot(subset(gDat, year == 2002), aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
    geom_point()

plot of chunk unnamed-chunk-6

The plot looks like expotiential, so let's add scale in it. And I also add size as population to each point.

ggplot(subset(gDat, year == 2002), aes(x = gdpPercap, y = lifeExp, colour = continent, 
    size = sqrt(pop))) + geom_point() + scale_x_log10()

plot of chunk unnamed-chunk-7

Compare ggplot2 and lattice

First plot the maximum of GDP per capital for all continents with lattice

GDPbyYear_tall <- ddply(gDat, ~year + continent, summarize, Max = max(gdpPercap))
xyplot(Max ~ year, GDPbyYear_tall, groups = continent, auto.key = TRUE, type = c("p", 
    "a"))

plot of chunk unnamed-chunk-8

Then do the same work with ggplot2

ggplot(GDPbyYear_tall, aes(x = year, y = Max, colour = continent)) + geom_point() + 
    geom_line()

plot of chunk unnamed-chunk-9

We can see the plot in ggplot is better.