2013-09-22
Import Gapminder data as provided in gapminderDataFiveYear.txt
gDat <- read.delim("gapminderDataFiveYear.txt")
Perform a superficial check that data import went OK.
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
head(gDat)
## country year pop continent lifeExp gdpPercap
## 1 Afghanistan 1952 8425333 Asia 28.80 779.4
## 2 Afghanistan 1957 9240934 Asia 30.33 820.9
## 3 Afghanistan 1962 10267083 Asia 32.00 853.1
## 4 Afghanistan 1967 11537966 Asia 34.02 836.2
## 5 Afghanistan 1972 13079460 Asia 36.09 740.0
## 6 Afghanistan 1977 14880372 Asia 38.44 786.1
Produce a data.frame with 3 variables: a continent factor, the minimum GDP per capita, the maximum GDP per capita and sort on maximum GDP per capita.
library(plyr)
GdpByCont <- ddply(gDat, ~continent, summarize, maxGdpPercap = max(gdpPercap),
minGdpPercap = min(gdpPercap))
GdpByCont <- arrange(GdpByCont, maxGdpPercap)
library(xtable)
print(xtable(GdpByCont), type = "html", include.rownames = FALSE)
| continent | maxGdpPercap | minGdpPercap |
|---|---|---|
| Africa | 21951.21 | 241.17 |
| Oceania | 34435.37 | 10039.60 |
| Americas | 42951.65 | 1201.64 |
| Europe | 49357.19 | 973.53 |
| Asia | 113523.13 | 331.00 |
Produce a data.frame with 3 variables: a continent factor, a GDP per capita summary statistic, a factor that describes the statistic.
SumGdpByCont <- ddply(gDat, ~continent, summarize, gdpSummary = c(max(gdpPercap),
min(gdpPercap)), SummaryDesp = factor(c("max", "min"), levels = c("max",
"min")))
print(xtable(SumGdpByCont), type = "html", include.rownames = FALSE)
| continent | gdpSummary | SummaryDesp |
|---|---|---|
| Africa | 21951.21 | max |
| Africa | 241.17 | min |
| Americas | 42951.65 | max |
| Americas | 1201.64 | min |
| Asia | 113523.13 | max |
| Asia | 331.00 | min |
| Europe | 49357.19 | max |
| Europe | 973.53 | min |
| Oceania | 34435.37 | max |
| Oceania | 10039.60 | min |
Produce a data.frame with one row per continent and variables containing the different measures of spread for GDP/capita and sort on IQR.
SpreadGdp <- arrange(ddply(gDat, ~continent, summarize, varGdpPercap = var(gdpPercap),
madGdpPercap = mad(gdpPercap), IQRGdpPercap = IQR(gdpPercap)), IQRGdpPercap)
print(xtable(SpreadGdp), type = "html", include.rownames = FALSE)
| continent | varGdpPercap | madGdpPercap | IQRGdpPercap |
|---|---|---|---|
| Africa | 7997187.31 | 775.32 | 1616.17 |
| Americas | 40918591.10 | 3269.33 | 4402.43 |
| Asia | 197272505.85 | 2820.83 | 7492.26 |
| Oceania | 40436668.87 | 6459.10 | 8072.26 |
| Europe | 87520019.60 | 8846.05 | 13248.30 |
meanLifeExp <- ddply(gDat, ~year, summarize, avgLifeExp = mean(lifeExp))
print(xtable(meanLifeExp), type = "html", include.rownames = FALSE)
| year | avgLifeExp |
|---|---|
| 1952 | 49.06 |
| 1957 | 51.51 |
| 1962 | 53.61 |
| 1967 | 55.68 |
| 1972 | 57.65 |
| 1977 | 59.57 |
| 1982 | 61.53 |
| 1987 | 63.21 |
| 1992 | 64.16 |
| 1997 | 65.01 |
| 2002 | 65.69 |
| 2007 | 67.01 |
A trimmed mean is a statistical measure of central tendency. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end. There are 1704/12=142 records of life expectancy each year, so I will discard 12.5% (18 records) data both at high and low end.
trimLifeExp <- ddply(gDat, ~year, summarize, trmLifeExp = mean(lifeExp, trim = 0.125))
print(xtable(trimLifeExp), type = "html", include.rownames = FALSE)
| year | trmLifeExp |
|---|---|
| 1952 | 48.42 |
| 1957 | 51.16 |
| 1962 | 53.51 |
| 1967 | 55.85 |
| 1972 | 58.05 |
| 1977 | 60.17 |
| 1982 | 62.22 |
| 1987 | 64.07 |
| 1992 | 65.34 |
| 1997 | 66.21 |
| 2002 | 66.95 |
| 2007 | 68.34 |
Get a data.frame with 3 variables: continent, year, and mean or median life expectancy.
ddply(gDat, ~continent + year, summarize, avgLifeExp = mean(lifeExp))
## continent year avgLifeExp
## 1 Africa 1952 39.14
## 2 Africa 1957 41.27
## 3 Africa 1962 43.32
## 4 Africa 1967 45.33
## 5 Africa 1972 47.45
## 6 Africa 1977 49.58
## 7 Africa 1982 51.59
## 8 Africa 1987 53.34
## 9 Africa 1992 53.63
## 10 Africa 1997 53.60
## 11 Africa 2002 53.33
## 12 Africa 2007 54.81
## 13 Americas 1952 53.28
## 14 Americas 1957 55.96
## 15 Americas 1962 58.40
## 16 Americas 1967 60.41
## 17 Americas 1972 62.39
## 18 Americas 1977 64.39
## 19 Americas 1982 66.23
## 20 Americas 1987 68.09
## 21 Americas 1992 69.57
## 22 Americas 1997 71.15
## 23 Americas 2002 72.42
## 24 Americas 2007 73.61
## 25 Asia 1952 46.31
## 26 Asia 1957 49.32
## 27 Asia 1962 51.56
## 28 Asia 1967 54.66
## 29 Asia 1972 57.32
## 30 Asia 1977 59.61
## 31 Asia 1982 62.62
## 32 Asia 1987 64.85
## 33 Asia 1992 66.54
## 34 Asia 1997 68.02
## 35 Asia 2002 69.23
## 36 Asia 2007 70.73
## 37 Europe 1952 64.41
## 38 Europe 1957 66.70
## 39 Europe 1962 68.54
## 40 Europe 1967 69.74
## 41 Europe 1972 70.78
## 42 Europe 1977 71.94
## 43 Europe 1982 72.81
## 44 Europe 1987 73.64
## 45 Europe 1992 74.44
## 46 Europe 1997 75.51
## 47 Europe 2002 76.70
## 48 Europe 2007 77.65
## 49 Oceania 1952 69.25
## 50 Oceania 1957 70.30
## 51 Oceania 1962 71.09
## 52 Oceania 1967 71.31
## 53 Oceania 1972 71.91
## 54 Oceania 1977 72.85
## 55 Oceania 1982 74.29
## 56 Oceania 1987 75.32
## 57 Oceania 1992 76.94
## 58 Oceania 1997 78.19
## 59 Oceania 2002 79.74
## 60 Oceania 2007 80.72
Get a data.frame with 1 + 5 = 6 variables: year and then one variable per continent, giving mean or median life expectancy.
daply()meanLifeExp <- as.data.frame(daply(gDat, ~year + continent, summarize, mean(lifeExp)))
print(xtable(meanLifeExp), type = "html", include.rownames = T)
| Africa | Americas | Asia | Europe | Oceania | |
|---|---|---|---|---|---|
| 1952 | 39.14 | 53.28 | 46.31 | 64.41 | 69.25 |
| 1957 | 41.27 | 55.96 | 49.32 | 66.70 | 70.30 |
| 1962 | 43.32 | 58.40 | 51.56 | 68.54 | 71.09 |
| 1967 | 45.33 | 60.41 | 54.66 | 69.74 | 71.31 |
| 1972 | 47.45 | 62.39 | 57.32 | 70.78 | 71.91 |
| 1977 | 49.58 | 64.39 | 59.61 | 71.94 | 72.85 |
| 1982 | 51.59 | 66.23 | 62.62 | 72.81 | 74.29 |
| 1987 | 53.34 | 68.09 | 64.85 | 73.64 | 75.32 |
| 1992 | 53.63 | 69.57 | 66.54 | 74.44 | 76.94 |
| 1997 | 53.60 | 71.15 | 68.02 | 75.51 | 78.19 |
| 2002 | 53.33 | 72.42 | 69.23 | 76.70 | 79.74 |
| 2007 | 54.81 | 73.61 | 70.73 | 77.65 | 80.72 |
tapply()meanLifeExp <- with(gDat, tapply(lifeExp, list(year, continent), mean))
print(xtable(meanLifeExp), type = "html", include.rownames = T)
| Africa | Americas | Asia | Europe | Oceania | |
|---|---|---|---|---|---|
| 1952 | 39.14 | 53.28 | 46.31 | 64.41 | 69.25 |
| 1957 | 41.27 | 55.96 | 49.32 | 66.70 | 70.30 |
| 1962 | 43.32 | 58.40 | 51.56 | 68.54 | 71.09 |
| 1967 | 45.33 | 60.41 | 54.66 | 69.74 | 71.31 |
| 1972 | 47.45 | 62.39 | 57.32 | 70.78 | 71.91 |
| 1977 | 49.58 | 64.39 | 59.61 | 71.94 | 72.85 |
| 1982 | 51.59 | 66.23 | 62.62 | 72.81 | 74.29 |
| 1987 | 53.34 | 68.09 | 64.85 | 73.64 | 75.32 |
| 1992 | 53.63 | 69.57 | 66.54 | 74.44 | 76.94 |
| 1997 | 53.60 | 71.15 | 68.02 | 75.51 | 78.19 |
| 2002 | 53.33 | 72.42 | 69.23 | 76.70 | 79.74 |
| 2007 | 54.81 | 73.61 | 70.73 | 77.65 | 80.72 |
Produce the worldwide meanlife expectancy.
(avgwldLe <- mean(gDat$lifeExp))
## [1] 59.47
Produce a data.frame with 3 variables: continent, year, and a country count whose life expectancy is lower than worldwide meanlife expectancy.
ddply(gDat, ~continent + year, summarize, countrycout = sum(lifeExp < avgwldLe,
na.rm = T))
## continent year countrycout
## 1 Africa 1952 52
## 2 Africa 1957 52
## 3 Africa 1962 51
## 4 Africa 1967 50
## 5 Africa 1972 50
## 6 Africa 1977 49
## 7 Africa 1982 43
## 8 Africa 1987 39
## 9 Africa 1992 38
## 10 Africa 1997 39
## 11 Africa 2002 41
## 12 Africa 2007 40
## 13 Americas 1952 19
## 14 Americas 1957 15
## 15 Americas 1962 13
## 16 Americas 1967 10
## 17 Americas 1972 8
## 18 Americas 1977 7
## 19 Americas 1982 5
## 20 Americas 1987 2
## 21 Americas 1992 1
## 22 Americas 1997 1
## 23 Americas 2002 1
## 24 Americas 2007 0
## 25 Asia 1952 29
## 26 Asia 1957 26
## 27 Asia 1962 25
## 28 Asia 1967 23
## 29 Asia 1972 19
## 30 Asia 1977 14
## 31 Asia 1982 11
## 32 Asia 1987 8
## 33 Asia 1992 7
## 34 Asia 1997 6
## 35 Asia 2002 3
## 36 Asia 2007 1
## 37 Europe 1952 5
## 38 Europe 1957 3
## 39 Europe 1962 1
## 40 Europe 1967 1
## 41 Europe 1972 1
## 42 Europe 1977 0
## 43 Europe 1982 0
## 44 Europe 1987 0
## 45 Europe 1992 0
## 46 Europe 1997 0
## 47 Europe 2002 0
## 48 Europe 2007 0
## 49 Oceania 1952 0
## 50 Oceania 1957 0
## 51 Oceania 1962 0
## 52 Oceania 1967 0
## 53 Oceania 1972 0
## 54 Oceania 1977 0
## 55 Oceania 1982 0
## 56 Oceania 1987 0
## 57 Oceania 1992 0
## 58 Oceania 1997 0
## 59 Oceania 2002 0
## 60 Oceania 2007 0
Produce a data.frame with 4 variables: continent, year, a country count with “low life expectancy” and the proportion of “low life expectancy”.
ddply(gDat, ~continent + year, summarize, countrycout = sum(lifeExp < avgwldLe,
na.rm = T), countryprop = round(sum(lifeExp < avgwldLe, na.rm = T)/length(lifeExp),
2))
## continent year countrycout countryprop
## 1 Africa 1952 52 1.00
## 2 Africa 1957 52 1.00
## 3 Africa 1962 51 0.98
## 4 Africa 1967 50 0.96
## 5 Africa 1972 50 0.96
## 6 Africa 1977 49 0.94
## 7 Africa 1982 43 0.83
## 8 Africa 1987 39 0.75
## 9 Africa 1992 38 0.73
## 10 Africa 1997 39 0.75
## 11 Africa 2002 41 0.79
## 12 Africa 2007 40 0.77
## 13 Americas 1952 19 0.76
## 14 Americas 1957 15 0.60
## 15 Americas 1962 13 0.52
## 16 Americas 1967 10 0.40
## 17 Americas 1972 8 0.32
## 18 Americas 1977 7 0.28
## 19 Americas 1982 5 0.20
## 20 Americas 1987 2 0.08
## 21 Americas 1992 1 0.04
## 22 Americas 1997 1 0.04
## 23 Americas 2002 1 0.04
## 24 Americas 2007 0 0.00
## 25 Asia 1952 29 0.88
## 26 Asia 1957 26 0.79
## 27 Asia 1962 25 0.76
## 28 Asia 1967 23 0.70
## 29 Asia 1972 19 0.58
## 30 Asia 1977 14 0.42
## 31 Asia 1982 11 0.33
## 32 Asia 1987 8 0.24
## 33 Asia 1992 7 0.21
## 34 Asia 1997 6 0.18
## 35 Asia 2002 3 0.09
## 36 Asia 2007 1 0.03
## 37 Europe 1952 5 0.17
## 38 Europe 1957 3 0.10
## 39 Europe 1962 1 0.03
## 40 Europe 1967 1 0.03
## 41 Europe 1972 1 0.03
## 42 Europe 1977 0 0.00
## 43 Europe 1982 0 0.00
## 44 Europe 1987 0 0.00
## 45 Europe 1992 0 0.00
## 46 Europe 1997 0 0.00
## 47 Europe 2002 0 0.00
## 48 Europe 2007 0 0.00
## 49 Oceania 1952 0 0.00
## 50 Oceania 1957 0 0.00
## 51 Oceania 1962 0 0.00
## 52 Oceania 1967 0 0.00
## 53 Oceania 1972 0 0.00
## 54 Oceania 1977 0 0.00
## 55 Oceania 1982 0 0.00
## 56 Oceania 1987 0 0.00
## 57 Oceania 1992 0 0.00
## 58 Oceania 1997 0 0.00
## 59 Oceania 2002 0 0.00
## 60 Oceania 2007 0 0.00
Create a table with one row per year and one column per continent and report the relative abundance of “low life expectancy” countries in that “column”.
daply()abunLifeExp <- as.data.frame(daply(gDat, ~year + continent, summarize, round(sum(lifeExp <
avgwldLe, na.rm = T)/length(lifeExp), 2)))
print(xtable(abunLifeExp), type = "html", include.rownames = T)
| Africa | Americas | Asia | Europe | Oceania | |
|---|---|---|---|---|---|
| 1952 | 1.00 | 0.76 | 0.88 | 0.17 | 0.00 |
| 1957 | 1.00 | 0.60 | 0.79 | 0.10 | 0.00 |
| 1962 | 0.98 | 0.52 | 0.76 | 0.03 | 0.00 |
| 1967 | 0.96 | 0.40 | 0.70 | 0.03 | 0.00 |
| 1972 | 0.96 | 0.32 | 0.58 | 0.03 | 0.00 |
| 1977 | 0.94 | 0.28 | 0.42 | 0.00 | 0.00 |
| 1982 | 0.83 | 0.20 | 0.33 | 0.00 | 0.00 |
| 1987 | 0.75 | 0.08 | 0.24 | 0.00 | 0.00 |
| 1992 | 0.73 | 0.04 | 0.21 | 0.00 | 0.00 |
| 1997 | 0.75 | 0.04 | 0.18 | 0.00 | 0.00 |
| 2002 | 0.79 | 0.04 | 0.09 | 0.00 | 0.00 |
| 2007 | 0.77 | 0.00 | 0.03 | 0.00 | 0.00 |
tapply()abunLifeExp <- with(gDat, tapply(lifeExp, list(year, continent), function(x) {
round(sum(x < avgwldLe, na.rm = T)/length(x), 2)
}))
print(xtable(abunLifeExp), type = "html", include.rownames = T)
| Africa | Americas | Asia | Europe | Oceania | |
|---|---|---|---|---|---|
| 1952 | 1.00 | 0.76 | 0.88 | 0.17 | 0.00 |
| 1957 | 1.00 | 0.60 | 0.79 | 0.10 | 0.00 |
| 1962 | 0.98 | 0.52 | 0.76 | 0.03 | 0.00 |
| 1967 | 0.96 | 0.40 | 0.70 | 0.03 | 0.00 |
| 1972 | 0.96 | 0.32 | 0.58 | 0.03 | 0.00 |
| 1977 | 0.94 | 0.28 | 0.42 | 0.00 | 0.00 |
| 1982 | 0.83 | 0.20 | 0.33 | 0.00 | 0.00 |
| 1987 | 0.75 | 0.08 | 0.24 | 0.00 | 0.00 |
| 1992 | 0.73 | 0.04 | 0.21 | 0.00 | 0.00 |
| 1997 | 0.75 | 0.04 | 0.18 | 0.00 | 0.00 |
| 2002 | 0.79 | 0.04 | 0.09 | 0.00 | 0.00 |
| 2007 | 0.77 | 0.00 | 0.03 | 0.00 | 0.00 |
ddply(gDat, ~year + continent, summarize, minLifeExp = min(lifeExp), country = country[which.min(lifeExp)])
## year continent minLifeExp country
## 1 1952 Africa 30.00 Gambia
## 2 1952 Americas 37.58 Haiti
## 3 1952 Asia 28.80 Afghanistan
## 4 1952 Europe 43.59 Turkey
## 5 1952 Oceania 69.12 Australia
## 6 1957 Africa 31.57 Sierra Leone
## 7 1957 Americas 40.70 Haiti
## 8 1957 Asia 30.33 Afghanistan
## 9 1957 Europe 48.08 Turkey
## 10 1957 Oceania 70.26 New Zealand
## 11 1962 Africa 32.77 Sierra Leone
## 12 1962 Americas 43.43 Bolivia
## 13 1962 Asia 32.00 Afghanistan
## 14 1962 Europe 52.10 Turkey
## 15 1962 Oceania 70.93 Australia
## 16 1967 Africa 34.11 Sierra Leone
## 17 1967 Americas 45.03 Bolivia
## 18 1967 Asia 34.02 Afghanistan
## 19 1967 Europe 54.34 Turkey
## 20 1967 Oceania 71.10 Australia
## 21 1972 Africa 35.40 Sierra Leone
## 22 1972 Americas 46.71 Bolivia
## 23 1972 Asia 36.09 Afghanistan
## 24 1972 Europe 57.01 Turkey
## 25 1972 Oceania 71.89 New Zealand
## 26 1977 Africa 36.79 Sierra Leone
## 27 1977 Americas 49.92 Haiti
## 28 1977 Asia 31.22 Cambodia
## 29 1977 Europe 59.51 Turkey
## 30 1977 Oceania 72.22 New Zealand
## 31 1982 Africa 38.45 Sierra Leone
## 32 1982 Americas 51.46 Haiti
## 33 1982 Asia 39.85 Afghanistan
## 34 1982 Europe 61.04 Turkey
## 35 1982 Oceania 73.84 New Zealand
## 36 1987 Africa 39.91 Angola
## 37 1987 Americas 53.64 Haiti
## 38 1987 Asia 40.82 Afghanistan
## 39 1987 Europe 63.11 Turkey
## 40 1987 Oceania 74.32 New Zealand
## 41 1992 Africa 23.60 Rwanda
## 42 1992 Americas 55.09 Haiti
## 43 1992 Asia 41.67 Afghanistan
## 44 1992 Europe 66.15 Turkey
## 45 1992 Oceania 76.33 New Zealand
## 46 1997 Africa 36.09 Rwanda
## 47 1997 Americas 56.67 Haiti
## 48 1997 Asia 41.76 Afghanistan
## 49 1997 Europe 68.83 Turkey
## 50 1997 Oceania 77.55 New Zealand
## 51 2002 Africa 39.19 Zambia
## 52 2002 Americas 58.14 Haiti
## 53 2002 Asia 42.13 Afghanistan
## 54 2002 Europe 70.84 Turkey
## 55 2002 Oceania 79.11 New Zealand
## 56 2007 Africa 39.61 Swaziland
## 57 2007 Americas 60.92 Haiti
## 58 2007 Asia 43.83 Afghanistan
## 59 2007 Europe 71.78 Turkey
## 60 2007 Oceania 80.20 New Zealand
Fit a linear regression for each country, modelling life expectancy as a function of the year and then retain the estimated intercepts and slopes.
(yearMin <- min(gDat$year))
## [1] 1952
jFun <- function(x) {
estCoefs <- coef(lm(lifeExp ~ I(year - yearMin), x))
names(estCoefs) <- c("intercept", "slope")
return(estCoefs)
}
jCoefs <- ddply(gDat, ~continent + country, jFun)
Report the country with min slope and the country with min intercept within each continent from the above jCoefs dataset.
CoefsLe <- ddply(jCoefs, ~continent, summarize, minslope = min(slope), countryslope = country[which.min(slope)],
minintercept = min(intercept), countryintercept = country[which.min(intercept)])
print(xtable(CoefsLe), type = "html", include.rownames = FALSE)
| continent | minslope | countryslope | minintercept | countryintercept |
|---|---|---|---|---|
| Africa | -0.09 | Zimbabwe | 28.40 | Gambia |
| Americas | 0.16 | Paraguay | 38.76 | Bolivia |
| Asia | 0.24 | Iraq | 29.91 | Afghanistan |
| Europe | 0.12 | Denmark | 46.02 | Turkey |
| Oceania | 0.19 | New Zealand | 68.40 | Australia |
Fit a linear regression for each country, modelling life expectancy as a function of the year and then retain the maximun residuals.
kFun <- function(x) {
maxRes <- max(residuals(lm(lifeExp ~ I(year - yearMin), x)))
names(maxRes) <- "maxResLe"
return(maxRes)
}
kRes <- ddply(gDat, ~country + continent, kFun)
Report the country whose max residul of lifeExp is larger than 5, sorted by maximun residuals, and along with summary statistics on life expectancy.
kRes <- arrange(subset(kRes, maxResLe > 5), country)
hitcountry <- kRes$country
summaryLe <- arrange(ddply(subset(gDat, country %in% hitcountry), ~country,
summarize, minLifeExp = min(lifeExp), maxLifeExp = max(lifeExp), avgLifeExp = round(mean(lifeExp),
2)), country)
interestCy <- arrange(merge(kRes, summaryLe, by = "country"), maxResLe, decreasing = T)
print(xtable(interestCy), type = "html", include.rownames = FALSE)
| country | continent | maxResLe | minLifeExp | maxLifeExp | avgLifeExp |
|---|---|---|---|---|---|
| Zimbabwe | Africa | 10.39 | 39.99 | 62.35 | 52.66 |
| Botswana | Africa | 8.57 | 46.63 | 63.62 | 54.60 |
| Lesotho | Africa | 8.48 | 42.14 | 59.69 | 50.01 |
| Swaziland | Africa | 8.28 | 39.61 | 58.47 | 49.00 |
| Iraq | Asia | 6.70 | 45.32 | 65.04 | 56.58 |
| Rwanda | Africa | 6.02 | 23.60 | 46.24 | 41.48 |
| Zambia | Africa | 5.98 | 39.19 | 51.82 | 46.00 |
| South Africa | Africa | 5.78 | 45.01 | 61.89 | 53.99 |
| Gabon | Africa | 5.62 | 37.00 | 61.37 | 51.22 |
| Namibia | Africa | 5.62 | 41.73 | 62.00 | 53.49 |
| Kenya | Africa | 5.57 | 42.27 | 59.34 | 52.68 |
| China | Asia | 5.31 | 44.00 | 72.96 | 61.79 |
| Cote d'Ivoire | Africa | 5.24 | 40.48 | 54.66 | 48.44 |
| Central African Republic | Africa | 5.24 | 35.46 | 50.48 | 43.87 |
Fit two linear regressions, the ordinary least squares and a robust technique(M-estimator) for each country, modelling life expectancy as a function of the year and then retain ratios of estimated intercepts and slopes obtained from the two models. The MASS package is required to perform the robust regression using an M-estimator.
library(MASS)
rFun <- function(x) {
lmCoefs <- coef(lm(lifeExp ~ I(year - yearMin), x))
rlmCoefs <- coef(rlm(lifeExp ~ I(year - yearMin), x))
ratio <- c(rlmCoefs[1]/lmCoefs[1], rlmCoefs[2]/lmCoefs[2])
names(ratio) <- c("ratio.inter", "ratio.slope")
return(ratio)
}
rCoefs <- ddply(gDat, ~country + continent, rFun)
## Warning: 'rlm' failed to converge in 20 steps Warning: 'rlm' failed to
## converge in 20 steps Warning: 'rlm' failed to converge in 20 steps
Look at the range of intercept ratio and slope ratio.
list(ratio.inter = range(rCoefs$ratio.inter), ratio.slope = range(rCoefs$ratio.slope))
## $ratio.inter
## [1] 0.9604 1.0682
##
## $ratio.slope
## [1] -0.7065 2.2105
The range intercept ratio is much narrower than slope ratio, so I focus on slope ratio here and report “interesting” countries whose slope ratios under the two approaches is greater than 1.1 or smaller than 0.9.
rCoefs <- subset(rCoefs, subset = (ratio.slope > 1.1 | ratio.slope < 0.9))
hitcountry <- rCoefs$country
summaryLe <- arrange(ddply(subset(gDat, country %in% hitcountry), ~country,
summarize, minLifeExp = min(lifeExp), maxLifeExp = max(lifeExp), avgLifeExp = round(mean(lifeExp),
2)), country)
interestCy <- arrange(merge(rCoefs, summaryLe, by = "country"), ratio.slope,
decreasing = T)
print(xtable(interestCy), type = "html", include.rownames = FALSE)
| country | continent | ratio.inter | ratio.slope | minLifeExp | maxLifeExp | avgLifeExp |
|---|---|---|---|---|---|---|
| Swaziland | Africa | 0.96 | 2.21 | 39.61 | 58.47 | 49.00 |
| Lesotho | Africa | 0.97 | 2.06 | 42.14 | 59.69 | 50.01 |
| South Africa | Africa | 0.96 | 1.66 | 45.01 | 61.89 | 53.99 |
| Poland | Europe | 1.01 | 0.90 | 61.31 | 75.56 | 70.18 |
| Mauritius | Africa | 1.03 | 0.90 | 50.99 | 72.80 | 64.95 |
| Bulgaria | Europe | 1.03 | 0.67 | 59.60 | 73.00 | 69.74 |
| Rwanda | Africa | 0.99 | -0.71 | 23.60 | 46.24 | 41.48 |