Here is a link to Sean's code and report that I will be using in this homework.
Before getting on with the task at hand, a few packages and data tables were sorted.
library(plyr)
library(xtable)
library(lattice)
gDat <- read.delim("gapminderDataFiveYear.txt")
iDat <- droplevels(subset(gDat, continent != "Oceania")) #drops Oceania from rest of work
Note that Oceania will be left out of the output, so I have slightly altered Sean's code to now refer to iDat rather than gDat.
Here is Sean's table on life expectancy:
lifeExpCont <- ddply(iDat, .(continent, year), summarize, meanLifeExp = mean(lifeExp),
medianLifeExp = median(lifeExp))
print(lifeExpCont, type = "html", include.rownames = TRUE)
## continent year meanLifeExp medianLifeExp
## 1 Africa 1952 39.14 38.83
## 2 Africa 1957 41.27 40.59
## 3 Africa 1962 43.32 42.63
## 4 Africa 1967 45.33 44.70
## 5 Africa 1972 47.45 47.03
## 6 Africa 1977 49.58 49.27
## 7 Africa 1982 51.59 50.76
## 8 Africa 1987 53.34 51.64
## 9 Africa 1992 53.63 52.43
## 10 Africa 1997 53.60 52.76
## 11 Africa 2002 53.33 51.24
## 12 Africa 2007 54.81 52.93
## 13 Americas 1952 53.28 54.74
## 14 Americas 1957 55.96 56.07
## 15 Americas 1962 58.40 58.30
## 16 Americas 1967 60.41 60.52
## 17 Americas 1972 62.39 63.44
## 18 Americas 1977 64.39 66.35
## 19 Americas 1982 66.23 67.41
## 20 Americas 1987 68.09 69.50
## 21 Americas 1992 69.57 69.86
## 22 Americas 1997 71.15 72.15
## 23 Americas 2002 72.42 72.05
## 24 Americas 2007 73.61 72.90
## 25 Asia 1952 46.31 44.87
## 26 Asia 1957 49.32 48.28
## 27 Asia 1962 51.56 49.33
## 28 Asia 1967 54.66 53.66
## 29 Asia 1972 57.32 56.95
## 30 Asia 1977 59.61 60.77
## 31 Asia 1982 62.62 63.74
## 32 Asia 1987 64.85 66.30
## 33 Asia 1992 66.54 68.69
## 34 Asia 1997 68.02 70.27
## 35 Asia 2002 69.23 71.03
## 36 Asia 2007 70.73 72.40
## 37 Europe 1952 64.41 65.90
## 38 Europe 1957 66.70 67.65
## 39 Europe 1962 68.54 69.53
## 40 Europe 1967 69.74 70.61
## 41 Europe 1972 70.78 70.89
## 42 Europe 1977 71.94 72.34
## 43 Europe 1982 72.81 73.49
## 44 Europe 1987 73.64 74.81
## 45 Europe 1992 74.44 75.45
## 46 Europe 1997 75.51 76.12
## 47 Europe 2002 76.70 77.54
## 48 Europe 2007 77.65 78.61
To help understand this large data frame, I included some graphs to improve visualisation.
xyplot(meanLifeExp ~ year, lifeExpCont, groups = continent, auto.key = TRUE,
xlab = "Year", ylab = "Mean Life Expectancy", main = "Mean Life Exp. per Continent",
type = "o")
xyplot(medianLifeExp ~ year, lifeExpCont, groups = continent, auto.key = TRUE,
xlab = "Year", ylab = "Median Life Expectancy", main = "Median Life Exp. per Continent",
type = "o")
densityplot(~meanLifeExp, group = continent, lifeExpCont, auto.key = TRUE, xlab = "Mean Life Expectancy",
main = "Density Plot of Mean Life Expectancy")
These three graphs give us an easy way to extract information from the overly large data table. The mean and median graphs show how life expectancy has changed over time in Asia, Europe, Africa and the Americas in a clear manner. Immediately we can see the increasing trend of all continents and also the anomaly of Africa's reduced life expectancy in the late 1980s. The final graph shows the density of life expectancy and again show us in a glance that Europe generally has a high life expectancy, whilst Africa has a relatively low one. In this graph however, we do lose the variable year which would in my view make the first two graphs more attractive in analysis of the table.
One further graphical output I produced gave further detail into the trend of African life expectancy, where I took box plots of life expectancy from each continent over time.
bwplot(lifeExp ~ as.factor(year) | continent, subset(iDat, continent == "Africa"),
panel = function(..., box.ratio) {
panel.violin(..., col = "transparent", border = "grey60", varwidth = FALSE,
box.ratio = box.ratio)
panel.bwplot(..., fill = NULL, box.ratio = 0.1)
}, ylab = "Mean Life Expectancy")
This graph reveals a large anomaly did not appear to be present in the reduced average life expectancy in 2002. In fact when compared to 1997 the minimum life expectancy is actually higher. This suggests a trend caused the average life expectancy to dip, not single extremely low result.
Again using Sean's data:
lifeExpTrim <- ddply(iDat, ~year, summarize, StandardMean = mean(lifeExp), TrimmedMean = mean(lifeExp,
trim = 0.1), `StandardMean - TrimmedMean` = StandardMean - TrimmedMean)
lifeExpTrim2 <- ddply(iDat, ~year + continent, summarize, StandardMean = mean(lifeExp),
TrimmedMean = mean(lifeExp, trim = 0.1), `StandardMean - TrimmedMean` = StandardMean -
TrimmedMean)
print(lifeExpTrim, type = "html", include.rownames = TRUE)
## year StandardMean TrimmedMean StandardMean - TrimmedMean
## 1 1952 48.77 48.25 0.51942
## 2 1957 51.24 50.95 0.28524
## 3 1962 53.36 53.28 0.07615
## 4 1967 55.45 55.59 -0.13889
## 5 1972 57.44 57.77 -0.32267
## 6 1977 59.38 59.87 -0.49441
## 7 1982 61.35 61.90 -0.54959
## 8 1987 63.04 63.73 -0.68652
## 9 1992 63.98 64.99 -1.00732
## 10 1997 64.83 65.81 -0.98514
## 11 2002 65.49 66.50 -1.00778
## 12 2007 66.81 67.91 -1.09477
print(lifeExpTrim2, type = "html", include.rownames = TRUE)
## year continent StandardMean TrimmedMean StandardMean - TrimmedMean
## 1 1952 Africa 39.14 38.93 0.20252
## 2 1952 Americas 53.28 53.18 0.09827
## 3 1952 Asia 46.31 45.98 0.33332
## 4 1952 Europe 64.41 65.10 -0.68858
## 5 1957 Africa 41.27 40.97 0.29642
## 6 1957 Americas 55.96 56.05 -0.08596
## 7 1957 Asia 49.32 49.16 0.15432
## 8 1957 Europe 66.70 67.31 -0.60456
## 9 1962 Africa 43.32 43.05 0.27125
## 10 1962 Americas 58.40 58.64 -0.24129
## 11 1962 Asia 51.56 51.45 0.10965
## 12 1962 Europe 68.54 69.08 -0.53997
## 13 1967 Africa 45.33 45.12 0.21704
## 14 1967 Americas 60.41 60.75 -0.33994
## 15 1967 Asia 54.66 54.79 -0.12318
## 16 1967 Europe 69.74 70.20 -0.45915
## 17 1972 Africa 47.45 47.26 0.18599
## 18 1972 Americas 62.39 62.86 -0.46589
## 19 1972 Asia 57.32 57.71 -0.39306
## 20 1972 Europe 70.78 71.15 -0.37438
## 21 1977 Africa 49.58 49.35 0.23059
## 22 1977 Americas 64.39 64.87 -0.47477
## 23 1977 Asia 59.61 60.42 -0.80705
## 24 1977 Europe 71.94 72.22 -0.28290
## 25 1982 Africa 51.59 51.32 0.27289
## 26 1982 Americas 66.23 66.67 -0.43740
## 27 1982 Asia 62.62 62.99 -0.37543
## 28 1982 Europe 72.81 73.10 -0.29343
## 29 1987 Africa 53.34 53.01 0.33738
## 30 1987 Americas 68.09 68.55 -0.45695
## 31 1987 Asia 64.85 65.31 -0.46145
## 32 1987 Europe 73.64 73.97 -0.32521
## 33 1992 Africa 53.63 53.71 -0.08088
## 34 1992 Americas 69.57 70.01 -0.43750
## 35 1992 Asia 66.54 66.99 -0.45675
## 36 1992 Europe 74.44 74.73 -0.29186
## 37 1997 Africa 53.60 53.08 0.51455
## 38 1997 Americas 71.15 71.63 -0.47671
## 39 1997 Asia 68.02 68.50 -0.47582
## 40 1997 Europe 75.51 75.77 -0.26858
## 41 2002 Africa 53.33 52.47 0.85056
## 42 2002 Americas 72.42 72.89 -0.46548
## 43 2002 Asia 69.23 69.84 -0.60316
## 44 2002 Europe 76.70 76.89 -0.18903
## 45 2007 Africa 54.81 54.08 0.72851
## 46 2007 Americas 73.61 74.01 -0.40607
## 47 2007 Asia 70.73 71.31 -0.58218
## 48 2007 Europe 77.65 77.83 -0.17644
Here I used Sean's code with iDat then included a slight change where I added a column for continent. To illustrate these tables I used the following graphs:
xyplot(TrimmedMean + StandardMean ~ year, lifeExpTrim, type = "l", main = "Trimmed and Standard Mean")
xyplot(TrimmedMean + StandardMean ~ year | continent, lifeExpTrim2, type = "l",
main = "Trimmed and Standard Mean per Continent")
These first two basic graphs map out the change of trimmed and standard mean over time. The first graph appears to show the disparity increasing after the mid 1960s quite drastically. Therefore I created another graph which separated the results into continents to see which caused this change. It appeared that both Asia and Africa also had a large disparity between trimmed and standard mean by 2007, whilst Europe and the Americas did not.