STAT 545A Homework 4

In this report, we will further demonstrate the results from the previous articles, which were about the data aggregation topic, but in more fancy visualization way. Here, we will focus more on the visualization of the results. But before that, we realized that the codes in previous articles were not in generalized form; therefore, we will address this issue starting with the print command to create an HTML table and print it:

htmlPrint <- function(x, ..., digits = 0, include.rownames = FALSE) {
    print(xtable(x, digits = digits, ...), type = "html", include.rownames = include.rownames, 
        ...)
}

We download the Gapminder data from the repository, unlike in our previous articles, and load plyr, xtable, and lattice packages:

gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
library(plyr)
library(xtable)
library(lattice)
str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

For the rest of this article, we will drop Ocania, which contains only two countries, from the whole dataset.

noODat <- droplevels(subset(gDat, continent != "Oceania"))

Let us explore the life expectancy of the continents over years using stripplot:

stripplot(x = lifeExp ~ continent, data = noODat, jitter.data = TRUE, alpha = 0.6, 
    grid = "h")

plot of chunk unnamed-chunk-4

In the figure, we have jittered the points and making them partially transparent. Now, let us examine the maximum and minimum life expectancy of the continents over years:

lifeExpectperYear <- function(x) {
    lifetable <- ddply(.data = x, .variables = .(continent), function(x) {
        lifeExp <- range(x$lifeExp)
        y <- data.frame(lifeExp, Stat = c("Min", "Max"))
        return(y)
    })
    return(lifetable)
}
tmp <- ddply(.data = noODat, .variables = .(year), .fun = lifeExpectperYear)
stripplot(x = lifeExp ~ year, groups = continent, data = tmp, jitter.data = TRUE, 
    alpha = 0.6, grid = "h", auto.key = TRUE, ylim = c(min(tmp$lifeExp), max(tmp$lifeExp)))

plot of chunk unnamed-chunk-5

The above figure can be illustrated in a more glamoring way by using densityplot:

densityplot(x = ~lifeExp, data = tmp, plot.points = FALSE, ref = TRUE, group = continent, 
    auto.key = list(columns = nlevels(tmp$continent)))

plot of chunk unnamed-chunk-6

Let us go deep another inch to distill the visual information, which is provided in the above figures, in order to search for the proportion of countries with low life expectancy over time by continent. We note that our benchmark for the life expectancy is based on our sole choice:

bMark <- 45
tmp <- ddply(noODat, ~continent + year, function(x) {
    jCount = sum(x$lifeExp <= bMark)
    c(count = jCount, prop = jCount/nrow(x))
})
xyplot(x = prop ~ year | continent, data = tmp, pch = c(8), type = c("p", "r"))

plot of chunk unnamed-chunk-7

We observe that africa exhibits sudden change in its life expectancy. For this reason, we segregate Africa continent from other continents and measure the proportion among its nations:

tmp <- subset(noODat, subset = continent == "Africa")
tmp <- ddply(tmp, ~year, function(x) {
    jCount = sum(x$lifeExp)
    c(count = jCount, prop = jCount/nrow(x))
})
xyplot(x = prop ~ year, data = tmp, pch = c(8), type = c("p", "r"))

plot of chunk unnamed-chunk-8

We might wonder if the improvement in the life expectancy was due to growth of GDP per capita in Africa:

tmp <- subset(noODat, subset = continent == "Africa")
tmp <- ddply(tmp, ~year, function(x) {
    jCount = sum(x$gdpPercap)
    c(count = jCount, prop = jCount/nrow(x))
})
xyplot(x = prop ~ year, data = tmp, pch = c(8), type = c("p", "r"))

plot of chunk unnamed-chunk-9

As we assumed the improvement was not sudden at all, infact, we can demonstrate an interesting result for african countries by simple measuring the intercept throughout the years:

tmp <- subset(noODat, subset = continent == "Africa")
yearMin <- min(gDat$year)
tmp <- ddply(tmp, ~country, function(x) {
    estCoefs <- coef(lm(lifeExp ~ I(year - yearMin), x))
    names(estCoefs) <- c("intercept", "slope")
    return(estCoefs)
})
tmp <- arrange(tmp, tmp$intercept)

htmlPrint(tmp)
country intercept slope
Gambia 28 1
Sierra Leone 31 0
Guinea 32 0
Guinea-Bissau 32 0
Angola 32 0
Mali 33 0
Mozambique 34 0
Equatorial Guinea 34 0
Somalia 35 0
Burkina Faso 35 0
Niger 35 0
Eritrea 36 0
Ethiopia 36 0
Djibouti 36 0
Madagascar 37 0
Senegal 37 1
Malawi 37 0
Nigeria 38 0
Sudan 38 0
Central African Republic 39 0
Gabon 39 0
Benin 40 0
Chad 40 0
Liberia 40 0
Comoros 40 0
Mauritania 40 0
Burundi 41 0
Egypt 41 1
Togo 41 0
Cameroon 41 0
Congo, Dem. Rep. 42 0
Libya 42 1
Morocco 43 1
Rwanda 43 -0
Tanzania 43 0
Algeria 43 1
Ghana 43 0
Uganda 44 0
Tunisia 45 1
Cote d'Ivoire 45 0
Swaziland 46 0
Kenya 47 0
Namibia 47 0
Congo, Rep. 47 0
Lesotho 47 0
Zambia 48 -0
Sao Tome and Principe 49 0
South Africa 49 0
Botswana 53 0
Reunion 54 0
Zimbabwe 55 -0
Mauritius 55 0

For further illustration, we pick Gambia to see its GDP growth and its life expectancy as well over years:

gambia <- subset(noODat, subset = country == "Gambia")
lgambia <- xyplot(x = lifeExp ~ year, data = gambia, pch = c(8), type = c("p", 
    "r"))
ggambia <- xyplot(x = gdpPercap ~ year, data = gambia, pch = c(8), type = c("p", 
    "r"))
print(lgambia, position = c(0, 0, 0.55, 1), more = TRUE)
print(ggambia, position = c(0.45, 0, 1, 1))

plot of chunk unnamed-chunk-11

The above figures proves our assumption that GDP and Life expectancy for Gambia exhibit strong correlation.

Thank you and see you with my next post.