library(plyr)
library(xtable)
gDat <- read.table("data/gapminderDataFiveYear.txt", sep = "\t", quote = "\"",
header = T)
print_table = function(df) {
print(xtable(df), type = "html", include.rownames = FALSE)
}
For some of the analysis below, I limited the data to those of year 2007, because it seems that it makes little sense that the max GDP per capital for each continent comes from different years.
Display maximum and minimum GDP per capita for all continents in year 2007, in a “wide” format, sorted by increasing maximum GDP per capita:
gdpByContinent <- ddply(subset(gDat, year == 2007), ~continent, summarize, maxGDPperCap = max(gdpPercap),
minGDPperCap = min(gdpPercap))
maxGdpByContinent <- arrange(gdpByContinent, maxGDPperCap)
print_table(maxGdpByContinent)
| continent | maxGDPperCap | minGDPperCap |
|---|---|---|
| Africa | 13206.48 | 277.55 |
| Oceania | 34435.37 | 25185.01 |
| Americas | 42951.65 | 1201.64 |
| Asia | 47306.99 | 944.00 |
| Europe | 49357.19 | 5937.03 |
The GDP per capita rank by continent varies depending on maximum or minimum GDP per captia is used in the comparison. Oceania has no poor countries, even though it's richest country falls behind the richest countries in serveral other continents. Asia, on the other hand, has both the very rich and the very poor countries.
Same as above, but in a long format
gdp_stat <- function(gDatCntnt) {
minGDP <- min(gDatCntnt$gdpPercap)
maxGDP <- max(gDatCntnt$gdpPercap)
return(data.frame(stat.type = c("min", "max"), GDP.per.cap = c(minGDP, maxGDP)))
}
gdpByContinentTall <- ddply(subset(gDat, year == 2007), ~continent, gdp_stat)
gdpByContinentTall <- arrange(gdpByContinentTall, GDP.per.cap)
print_table(gdpByContinentTall)
| continent | stat.type | GDP.per.cap |
|---|---|---|
| Africa | min | 277.55 |
| Asia | min | 944.00 |
| Americas | min | 1201.64 |
| Europe | min | 5937.03 |
| Africa | max | 13206.48 |
| Oceania | min | 25185.01 |
| Oceania | max | 34435.37 |
| Americas | max | 42951.65 |
| Asia | max | 47306.99 |
| Europe | max | 49357.19 |
Interesting that Oceania's “poorest” country is richer than the richest country in Africa.
Tabulate the variation of GDP per capita for each continent, as year 2007. Different measures of spread are used:
vGdpByContinent <- ddply(subset(gDat, year == 2007), ~continent, summarize,
STD = sd(gdpPercap), MAD = mad(gdpPercap), IQR = IQR(gdpPercap))
vGdpByContinent <- arrange(vGdpByContinent, STD)
print_table(vGdpByContinent)
| continent | STD | MAD | IQR |
|---|---|---|---|
| Africa | 3618.16 | 1032.21 | 3130.55 |
| Oceania | 6540.99 | 6857.29 | 4625.18 |
| Americas | 9713.21 | 4773.60 | 6249.22 |
| Europe | 11800.34 | 12506.17 | 19006.06 |
| Asia | 14154.94 | 4566.12 | 19863.98 |
It looks like that different measures of spread of GDP per capita may produce differnt ranks for these continents. For 2007 data, STD and IQR ranks the continents the same way, but MAD does something different. I guess the distribution of data and the presence of outlier have different impact on these statistics. But it's hard to just imagine how without making a densityplot…
Here shows how life expectancy chages through last ~50 years. For each year, the top and bottom 15% life expectancy by country is trimmed:
trimFrac <- 0.15
print_table(ddply(gDat, ~year, summarize, trimmedMean = mean(lifeExp, trim = trimFrac)))
| year | trimmedMean |
|---|---|
| 1952 | 48.20 |
| 1957 | 51.01 |
| 1962 | 53.41 |
| 1967 | 55.80 |
| 1972 | 58.08 |
| 1977 | 60.26 |
| 1982 | 62.32 |
| 1987 | 64.24 |
| 1992 | 65.55 |
| 1997 | 66.45 |
| 2002 | 67.26 |
| 2007 | 68.64 |
For the past 50 years, global life expectancy has increased steadily.
How is life expectancy changing over time on different continents: (Note that only the first few lines of a very long table are displayed here)
cyLifeExp <- ddply(gDat, ~continent + year, summarize, Mean.Life.Exp = mean(lifeExp))
print_table(head(cyLifeExp, n = 15))
| continent | year | Mean.Life.Exp |
|---|---|---|
| Africa | 1952 | 39.14 |
| Africa | 1957 | 41.27 |
| Africa | 1962 | 43.32 |
| Africa | 1967 | 45.33 |
| Africa | 1972 | 47.45 |
| Africa | 1977 | 49.58 |
| Africa | 1982 | 51.59 |
| Africa | 1987 | 53.34 |
| Africa | 1992 | 53.63 |
| Africa | 1997 | 53.60 |
| Africa | 2002 | 53.33 |
| Africa | 2007 | 54.81 |
| Americas | 1952 | 53.28 |
| Americas | 1957 | 55.96 |
| Americas | 1962 | 58.40 |
Very hard to access data displayed this way.
How is life expectancy changing over time on different continents (using the wide format)
cyLifeExpWide <- daply(gDat, ~year + continent, summarize, Mean.Life.Exp = mean(lifeExp))
print(xtable(cyLifeExpWide), type = "html")
| Africa | Americas | Asia | Europe | Oceania | |
|---|---|---|---|---|---|
| 1952 | 39.14 | 53.28 | 46.31 | 64.41 | 69.25 |
| 1957 | 41.27 | 55.96 | 49.32 | 66.70 | 70.30 |
| 1962 | 43.32 | 58.40 | 51.56 | 68.54 | 71.09 |
| 1967 | 45.33 | 60.41 | 54.66 | 69.74 | 71.31 |
| 1972 | 47.45 | 62.39 | 57.32 | 70.78 | 71.91 |
| 1977 | 49.58 | 64.39 | 59.61 | 71.94 | 72.85 |
| 1982 | 51.59 | 66.23 | 62.62 | 72.81 | 74.29 |
| 1987 | 53.34 | 68.09 | 64.85 | 73.64 | 75.32 |
| 1992 | 53.63 | 69.57 | 66.54 | 74.44 | 76.94 |
| 1997 | 53.60 | 71.15 | 68.02 | 75.51 | 78.19 |
| 2002 | 53.33 | 72.42 | 69.23 | 76.70 | 79.74 |
| 2007 | 54.81 | 73.61 | 70.73 | 77.65 | 80.72 |
Show that over time, how the proportion of countries in each continent with mean life expectancy < 40 changes:
ageCut <- 40
lowLECountriesByCntnt <- daply(gDat, ~year + continent, summarize, sum(lifeExp <
ageCut)/length(unique(country)))
print(xtable(lowLECountriesByCntnt), type = "html")
| Africa | Americas | Asia | Europe | Oceania | |
|---|---|---|---|---|---|
| 1952 | 0.56 | 0.04 | 0.30 | 0.00 | 0.00 |
| 1957 | 0.44 | 0.00 | 0.15 | 0.00 | 0.00 |
| 1962 | 0.29 | 0.00 | 0.09 | 0.00 | 0.00 |
| 1967 | 0.19 | 0.00 | 0.06 | 0.00 | 0.00 |
| 1972 | 0.12 | 0.00 | 0.06 | 0.00 | 0.00 |
| 1977 | 0.06 | 0.00 | 0.06 | 0.00 | 0.00 |
| 1982 | 0.06 | 0.00 | 0.03 | 0.00 | 0.00 |
| 1987 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1992 | 0.06 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1997 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2002 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2007 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 |
The ratio of countries with low life expectancy decrease dramatically across all continents. As of 2007, nearly all countries in all continents have life expectancy above 40 years old.
Here shows top countries with big residual in its linear model fit for life expactancy across time, and what year the big residual appears:
max_residual <- function(cDat) {
clm <- lm(lifeExp ~ I(year - min(cDat$year)), data = cDat)
largestResidual <- abs(clm$residuals)
maxRsdl <- max(largestResidual)
return(c(Max.Residual = maxRsdl, Year = cDat[which.max(largestResidual),
"year"]))
}
interesting <- ddply(gDat, ~country, max_residual)
interesting <- arrange(interesting, abs(Max.Residual), decreasing = T)
print_table(head(interesting))
| country | Max.Residual | Year |
|---|---|---|
| Rwanda | 17.31 | 1992.00 |
| Cambodia | 15.69 | 1977.00 |
| Swaziland | 12.00 | 2007.00 |
| Zimbabwe | 10.58 | 2002.00 |
| Lesotho | 10.04 | 2007.00 |
| Botswana | 9.33 | 2002.00 |
I sorted of expected a deviation from predicted life expectancy in Rwanda after the genocide, but it actually happens before the genocide (1994). Maybe it's due to the ongoing civil war?