gDat <- read.delim("gapminderDataFiveYear.txt")
# drop Oceania
gDat <- droplevels(subset(gDat, continent != "Oceania"))
First we need to calculate the rate of increase per 5 year period.
library(plyr)
library(ggplot2)
fct <- function(dat) {
rate <- (c(dat$pop, NA) - c(NA, dat$pop))/c(NA, dat$pop)
names(rate) <- c(dat$year, NA)
rate
}
# create a temp obj?
rate.popIncre <- ddply(gDat, ~country + continent, fct)
# remove last col
rate.popIncre <- rate.popIncre[, -c(dim(rate.popIncre)[2])]
years <- names(rate.popIncre)[-c(1, 2)]
rate.popIncre <- reshape(rate.popIncre, varying = list(rate = years), v.names = c("rate"),
timevar = "year", times = years, direction = "long")
head(rate.popIncre)
## country continent year rate id
## 1.1952 Afghanistan Asia 1952 NA 1
## 2.1952 Albania Europe 1952 NA 2
## 3.1952 Algeria Africa 1952 NA 3
## 4.1952 Angola Africa 1952 NA 4
## 5.1952 Argentina Americas 1952 NA 5
## 6.1952 Austria Europe 1952 NA 6
This was done perhaps not the most efficiently.
Next comes the graphing. I decided to adjust the alpha level to help with overplotting problem. A line is added to join the median rate across years to help visualize the change in rate over year.
A horizontal line at y=0 is overlaid on the plot to aid the assessment of movement towards negative rates.
######################## plot strip plots of all countries through the years
p <- ggplot(data = rate.popIncre, aes(x = as.factor(year), y = rate, colour = continent)) +
geom_point(alpha = 1/4) + facet_wrap(~continent) + stat_summary(fun.y = median,
geom = "line", aes(group = 1))
p + geom_hline(aes(xintercept = 0), colour = "#BB0000", linetype = "dashed") +
theme(axis.text.x = element_text(angle = 90))
D <- merge(gDat, rate.popIncre)
Dsum <- ddply(D, ~continent + year, summarize, meanlife = median(lifeExp), meanrate = median(rate))
p <- ggplot(data = D, aes(x = lifeExp, y = rate, colour = continent)) + geom_point(alpha = 2/3)
p + facet_wrap(~year) + geom_point(data = Dsum, aes(x = meanlife, y = meanrate,
shape = continent), size = I(4), color = "black") + scale_shape(solid = FALSE)
## Warning: Removed 140 rows containing missing values (geom_point). Warning:
## Removed 4 rows containing missing values (geom_point).
Here I showed the progression of rate of population increase versus lifeExpectancy over the years. The median of each continent is also plotted to detect the change of the continent over the years. The graph showed several interesting patterns. For example: