In this report, we will further demonstrate the results from the previous articles, which were about the data aggregation topic, but in more fancy visualization way. Here, we will focus more on the visualization of the results. But before that, we realized that the codes in previous articles were not in generalized form; therefore, we will address this issue starting with the print command to create an HTML table and print it:
htmlPrint <- function(x, ..., digits = 0, include.rownames = FALSE) {
print(xtable(x, digits = digits, ...), type = "html", include.rownames = include.rownames,
...)
}
We download the Gapminder data from the repository, unlike in our previous articles, and load plyr, xtable, and lattice packages:
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
library(plyr)
library(xtable)
library(lattice)
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
For the rest of this article, we will drop Ocania, which contains only two countries, from the whole dataset.
noODat <- droplevels(subset(gDat, continent != "Oceania"))
Let us explore the life expectancy of the continents over years using stripplot:
stripplot(x = lifeExp ~ continent, data = noODat, jitter.data = TRUE, alpha = 0.6,
grid = "h")
In the figure, we have jittered the points and making them partially transparent. Now, let us examine the maximum and minimum life expectancy of the continents over years:
lifeExpectperYear <- function(x) {
lifetable <- ddply(.data = x, .variables = .(continent), function(x) {
lifeExp <- range(x$lifeExp)
y <- data.frame(lifeExp, Stat = c("Min", "Max"))
return(y)
})
return(lifetable)
}
tmp <- ddply(.data = noODat, .variables = .(year), .fun = lifeExpectperYear)
stripplot(x = lifeExp ~ year, groups = continent, data = tmp, jitter.data = TRUE,
alpha = 0.6, grid = "h", auto.key = TRUE, ylim = c(min(tmp$lifeExp), max(tmp$lifeExp)))
The above figure can be illustrated in a more glamoring way by using densityplot:
densityplot(x = ~lifeExp, data = tmp, plot.points = FALSE, ref = TRUE, group = continent,
auto.key = list(columns = nlevels(tmp$continent)))
Let us go deep another inch to distill the visual information, which is provided in the above figures, in order to search for the proportion of countries with low life expectancy over time by continent. We note that our benchmark for the life expectancy is based on our sole choice:
bMark <- 45
tmp <- ddply(noODat, ~continent + year, function(x) {
jCount = sum(x$lifeExp <= bMark)
c(count = jCount, prop = jCount/nrow(x))
})
xyplot(x = prop ~ year | continent, data = tmp, pch = c(8), type = c("p", "r"))
We observe that africa exhibits sudden change in its life expectancy. For this reason, we segregate Africa continent from other continents and measure the proportion among its nations:
tmp <- subset(noODat, subset = continent == "Africa")
tmp <- ddply(tmp, ~year, function(x) {
jCount = sum(x$lifeExp)
c(count = jCount, prop = jCount/nrow(x))
})
xyplot(x = prop ~ year, data = tmp, pch = c(8), type = c("p", "r"))
We might wonder if the improvement in the life expectancy was due to growth of GDP per capita in Africa:
tmp <- subset(noODat, subset = continent == "Africa")
tmp <- ddply(tmp, ~year, function(x) {
jCount = sum(x$gdpPercap)
c(count = jCount, prop = jCount/nrow(x))
})
xyplot(x = prop ~ year, data = tmp, pch = c(8), type = c("p", "r"))
As we assumed the improvement was not sudden at all, infact, we can demonstrate an interesting result for african countries by simple measuring the intercept throughout the years:
tmp <- subset(noODat, subset = continent == "Africa")
yearMin <- min(gDat$year)
tmp <- ddply(tmp, ~country, function(x) {
estCoefs <- coef(lm(lifeExp ~ I(year - yearMin), x))
names(estCoefs) <- c("intercept", "slope")
return(estCoefs)
})
tmp <- arrange(tmp, tmp$intercept)
htmlPrint(tmp)
| country | intercept | slope |
|---|---|---|
| Gambia | 28 | 1 |
| Sierra Leone | 31 | 0 |
| Guinea | 32 | 0 |
| Guinea-Bissau | 32 | 0 |
| Angola | 32 | 0 |
| Mali | 33 | 0 |
| Mozambique | 34 | 0 |
| Equatorial Guinea | 34 | 0 |
| Somalia | 35 | 0 |
| Burkina Faso | 35 | 0 |
| Niger | 35 | 0 |
| Eritrea | 36 | 0 |
| Ethiopia | 36 | 0 |
| Djibouti | 36 | 0 |
| Madagascar | 37 | 0 |
| Senegal | 37 | 1 |
| Malawi | 37 | 0 |
| Nigeria | 38 | 0 |
| Sudan | 38 | 0 |
| Central African Republic | 39 | 0 |
| Gabon | 39 | 0 |
| Benin | 40 | 0 |
| Chad | 40 | 0 |
| Liberia | 40 | 0 |
| Comoros | 40 | 0 |
| Mauritania | 40 | 0 |
| Burundi | 41 | 0 |
| Egypt | 41 | 1 |
| Togo | 41 | 0 |
| Cameroon | 41 | 0 |
| Congo, Dem. Rep. | 42 | 0 |
| Libya | 42 | 1 |
| Morocco | 43 | 1 |
| Rwanda | 43 | -0 |
| Tanzania | 43 | 0 |
| Algeria | 43 | 1 |
| Ghana | 43 | 0 |
| Uganda | 44 | 0 |
| Tunisia | 45 | 1 |
| Cote d'Ivoire | 45 | 0 |
| Swaziland | 46 | 0 |
| Kenya | 47 | 0 |
| Namibia | 47 | 0 |
| Congo, Rep. | 47 | 0 |
| Lesotho | 47 | 0 |
| Zambia | 48 | -0 |
| Sao Tome and Principe | 49 | 0 |
| South Africa | 49 | 0 |
| Botswana | 53 | 0 |
| Reunion | 54 | 0 |
| Zimbabwe | 55 | -0 |
| Mauritius | 55 | 0 |
For further illustration, we pick Gambia to see its GDP growth and its life expectancy as well over years:
gambia <- subset(noODat, subset = country == "Gambia")
lgambia <- xyplot(x = lifeExp ~ year, data = gambia, pch = c(8), type = c("p",
"r"))
ggambia <- xyplot(x = gdpPercap ~ year, data = gambia, pch = c(8), type = c("p",
"r"))
print(lgambia, position = c(0, 0, 0.55, 1), more = TRUE)
print(ggambia, position = c(0.45, 0, 1, 1))
The above figures proves our assumption that GDP and Life expectancy for Gambia exhibit strong correlation.
Thank you and see you with my next post.