Stat545A: Homework 05

gDat <- read.delim("gapminderDataFiveYear.txt")
# drop Oceania
gDat <- droplevels(subset(gDat, continent != "Oceania"))

The rate of population increase per 5 year period of countries in each continent in 1952 and in 2007.

First we need to calculate the rate of increase per 5 year period.

library(plyr)
library(ggplot2)

fct <- function(dat) {
    rate <- (c(dat$pop, NA) - c(NA, dat$pop))/c(NA, dat$pop)
    names(rate) <- c(dat$year, NA)
    rate
}

# create a temp obj?
rate.popIncre <- ddply(gDat, ~country + continent, fct)
# remove last col
rate.popIncre <- rate.popIncre[, -c(dim(rate.popIncre)[2])]
years <- names(rate.popIncre)[-c(1, 2)]
rate.popIncre <- reshape(rate.popIncre, varying = list(rate = years), v.names = c("rate"), 
    timevar = "year", times = years, direction = "long")
head(rate.popIncre)
##            country continent year rate id
## 1.1952 Afghanistan      Asia 1952   NA  1
## 2.1952     Albania    Europe 1952   NA  2
## 3.1952     Algeria    Africa 1952   NA  3
## 4.1952      Angola    Africa 1952   NA  4
## 5.1952   Argentina  Americas 1952   NA  5
## 6.1952     Austria    Europe 1952   NA  6

This was done perhaps not the most efficiently.

Next comes the graphing. I decided to adjust the alpha level to help with overplotting problem. A line is added to join the median rate across years to help visualize the change in rate over year.

A horizontal line at y=0 is overlaid on the plot to aid the assessment of movement towards negative rates.

######################## plot strip plots of all countries through the years
p <- ggplot(data = rate.popIncre, aes(x = as.factor(year), y = rate, colour = continent)) + 
    geom_point(alpha = 1/4) + facet_wrap(~continent) + stat_summary(fun.y = median, 
    geom = "line", aes(group = 1))

p + geom_hline(aes(xintercept = 0), colour = "#BB0000", linetype = "dashed") + 
    theme(axis.text.x = element_text(angle = 90))

plot of chunk unnamed-chunk-3

What about the rate of pop increase and life expectancy?

D <- merge(gDat, rate.popIncre)
Dsum <- ddply(D, ~continent + year, summarize, meanlife = median(lifeExp), meanrate = median(rate))
p <- ggplot(data = D, aes(x = lifeExp, y = rate, colour = continent)) + geom_point(alpha = 2/3)
p + facet_wrap(~year) + geom_point(data = Dsum, aes(x = meanlife, y = meanrate, 
    shape = continent), size = I(4), color = "black") + scale_shape(solid = FALSE)
## Warning: Removed 140 rows containing missing values (geom_point). Warning:
## Removed 4 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-4

Here I showed the progression of rate of population increase versus lifeExpectancy over the years. The median of each continent is also plotted to detect the change of the continent over the years. The graph showed several interesting patterns. For example: