STAT545-Assignment 4. Daniel Dinsdale

Here is a link to Sean's code and report that I will be using in this homework.

Preliminaries:

Before getting on with the task at hand, a few packages and data tables were sorted.

library(plyr)
library(xtable)
library(lattice)
gDat <- read.delim("gapminderDataFiveYear.txt")
iDat <- droplevels(subset(gDat, continent != "Oceania"))  #drops Oceania from rest of work

Note that Oceania will be left out of the output, so I have slightly altered Sean's code to now refer to iDat rather than gDat.

How Life Expectancy Changes Over Time

Here is Sean's table on life expectancy:

lifeExpCont <- ddply(iDat, .(continent, year), summarize, meanLifeExp = mean(lifeExp), 
    medianLifeExp = median(lifeExp))
print(lifeExpCont, type = "html", include.rownames = TRUE)
##    continent year meanLifeExp medianLifeExp
## 1     Africa 1952       39.14         38.83
## 2     Africa 1957       41.27         40.59
## 3     Africa 1962       43.32         42.63
## 4     Africa 1967       45.33         44.70
## 5     Africa 1972       47.45         47.03
## 6     Africa 1977       49.58         49.27
## 7     Africa 1982       51.59         50.76
## 8     Africa 1987       53.34         51.64
## 9     Africa 1992       53.63         52.43
## 10    Africa 1997       53.60         52.76
## 11    Africa 2002       53.33         51.24
## 12    Africa 2007       54.81         52.93
## 13  Americas 1952       53.28         54.74
## 14  Americas 1957       55.96         56.07
## 15  Americas 1962       58.40         58.30
## 16  Americas 1967       60.41         60.52
## 17  Americas 1972       62.39         63.44
## 18  Americas 1977       64.39         66.35
## 19  Americas 1982       66.23         67.41
## 20  Americas 1987       68.09         69.50
## 21  Americas 1992       69.57         69.86
## 22  Americas 1997       71.15         72.15
## 23  Americas 2002       72.42         72.05
## 24  Americas 2007       73.61         72.90
## 25      Asia 1952       46.31         44.87
## 26      Asia 1957       49.32         48.28
## 27      Asia 1962       51.56         49.33
## 28      Asia 1967       54.66         53.66
## 29      Asia 1972       57.32         56.95
## 30      Asia 1977       59.61         60.77
## 31      Asia 1982       62.62         63.74
## 32      Asia 1987       64.85         66.30
## 33      Asia 1992       66.54         68.69
## 34      Asia 1997       68.02         70.27
## 35      Asia 2002       69.23         71.03
## 36      Asia 2007       70.73         72.40
## 37    Europe 1952       64.41         65.90
## 38    Europe 1957       66.70         67.65
## 39    Europe 1962       68.54         69.53
## 40    Europe 1967       69.74         70.61
## 41    Europe 1972       70.78         70.89
## 42    Europe 1977       71.94         72.34
## 43    Europe 1982       72.81         73.49
## 44    Europe 1987       73.64         74.81
## 45    Europe 1992       74.44         75.45
## 46    Europe 1997       75.51         76.12
## 47    Europe 2002       76.70         77.54
## 48    Europe 2007       77.65         78.61

To help understand this large data frame, I included some graphs to improve visualisation.

xyplot(meanLifeExp ~ year, lifeExpCont, groups = continent, auto.key = TRUE, 
    xlab = "Year", ylab = "Mean Life Expectancy", main = "Mean Life Exp. per Continent", 
    type = "o")

plot of chunk unnamed-chunk-3

xyplot(medianLifeExp ~ year, lifeExpCont, groups = continent, auto.key = TRUE, 
    xlab = "Year", ylab = "Median Life Expectancy", main = "Median Life Exp. per Continent", 
    type = "o")

plot of chunk unnamed-chunk-3

densityplot(~meanLifeExp, group = continent, lifeExpCont, auto.key = TRUE, xlab = "Mean Life Expectancy", 
    main = "Density Plot of Mean Life Expectancy")

plot of chunk unnamed-chunk-3

These three graphs give us an easy way to extract information from the overly large data table. The mean and median graphs show how life expectancy has changed over time in Asia, Europe, Africa and the Americas in a clear manner. Immediately we can see the increasing trend of all continents and also the anomaly of Africa's reduced life expectancy in the late 1980s. The final graph shows the density of life expectancy and again show us in a glance that Europe generally has a high life expectancy, whilst Africa has a relatively low one. In this graph however, we do lose the variable year which would in my view make the first two graphs more attractive in analysis of the table.

One further graphical output I produced gave further detail into the trend of African life expectancy, where I took box plots of life expectancy from each continent over time.

bwplot(lifeExp ~ as.factor(year) | continent, subset(iDat, continent == "Africa"), 
    panel = function(..., box.ratio) {
        panel.violin(..., col = "transparent", border = "grey60", varwidth = FALSE, 
            box.ratio = box.ratio)
        panel.bwplot(..., fill = NULL, box.ratio = 0.1)
    }, ylab = "Mean Life Expectancy")

plot of chunk unnamed-chunk-4

This graph reveals a large anomaly did not appear to be present in the reduced average life expectancy in 2002. In fact when compared to 1997 the minimum life expectancy is actually higher. This suggests a trend caused the average life expectancy to dip, not single extremely low result.

Trimmed and Standard Means

Again using Sean's data:

lifeExpTrim <- ddply(iDat, ~year, summarize, StandardMean = mean(lifeExp), TrimmedMean = mean(lifeExp, 
    trim = 0.1), `StandardMean - TrimmedMean` = StandardMean - TrimmedMean)
lifeExpTrim2 <- ddply(iDat, ~year + continent, summarize, StandardMean = mean(lifeExp), 
    TrimmedMean = mean(lifeExp, trim = 0.1), `StandardMean - TrimmedMean` = StandardMean - 
        TrimmedMean)
print(lifeExpTrim, type = "html", include.rownames = TRUE)
##    year StandardMean TrimmedMean StandardMean - TrimmedMean
## 1  1952        48.77       48.25                    0.51942
## 2  1957        51.24       50.95                    0.28524
## 3  1962        53.36       53.28                    0.07615
## 4  1967        55.45       55.59                   -0.13889
## 5  1972        57.44       57.77                   -0.32267
## 6  1977        59.38       59.87                   -0.49441
## 7  1982        61.35       61.90                   -0.54959
## 8  1987        63.04       63.73                   -0.68652
## 9  1992        63.98       64.99                   -1.00732
## 10 1997        64.83       65.81                   -0.98514
## 11 2002        65.49       66.50                   -1.00778
## 12 2007        66.81       67.91                   -1.09477
print(lifeExpTrim2, type = "html", include.rownames = TRUE)
##    year continent StandardMean TrimmedMean StandardMean - TrimmedMean
## 1  1952    Africa        39.14       38.93                    0.20252
## 2  1952  Americas        53.28       53.18                    0.09827
## 3  1952      Asia        46.31       45.98                    0.33332
## 4  1952    Europe        64.41       65.10                   -0.68858
## 5  1957    Africa        41.27       40.97                    0.29642
## 6  1957  Americas        55.96       56.05                   -0.08596
## 7  1957      Asia        49.32       49.16                    0.15432
## 8  1957    Europe        66.70       67.31                   -0.60456
## 9  1962    Africa        43.32       43.05                    0.27125
## 10 1962  Americas        58.40       58.64                   -0.24129
## 11 1962      Asia        51.56       51.45                    0.10965
## 12 1962    Europe        68.54       69.08                   -0.53997
## 13 1967    Africa        45.33       45.12                    0.21704
## 14 1967  Americas        60.41       60.75                   -0.33994
## 15 1967      Asia        54.66       54.79                   -0.12318
## 16 1967    Europe        69.74       70.20                   -0.45915
## 17 1972    Africa        47.45       47.26                    0.18599
## 18 1972  Americas        62.39       62.86                   -0.46589
## 19 1972      Asia        57.32       57.71                   -0.39306
## 20 1972    Europe        70.78       71.15                   -0.37438
## 21 1977    Africa        49.58       49.35                    0.23059
## 22 1977  Americas        64.39       64.87                   -0.47477
## 23 1977      Asia        59.61       60.42                   -0.80705
## 24 1977    Europe        71.94       72.22                   -0.28290
## 25 1982    Africa        51.59       51.32                    0.27289
## 26 1982  Americas        66.23       66.67                   -0.43740
## 27 1982      Asia        62.62       62.99                   -0.37543
## 28 1982    Europe        72.81       73.10                   -0.29343
## 29 1987    Africa        53.34       53.01                    0.33738
## 30 1987  Americas        68.09       68.55                   -0.45695
## 31 1987      Asia        64.85       65.31                   -0.46145
## 32 1987    Europe        73.64       73.97                   -0.32521
## 33 1992    Africa        53.63       53.71                   -0.08088
## 34 1992  Americas        69.57       70.01                   -0.43750
## 35 1992      Asia        66.54       66.99                   -0.45675
## 36 1992    Europe        74.44       74.73                   -0.29186
## 37 1997    Africa        53.60       53.08                    0.51455
## 38 1997  Americas        71.15       71.63                   -0.47671
## 39 1997      Asia        68.02       68.50                   -0.47582
## 40 1997    Europe        75.51       75.77                   -0.26858
## 41 2002    Africa        53.33       52.47                    0.85056
## 42 2002  Americas        72.42       72.89                   -0.46548
## 43 2002      Asia        69.23       69.84                   -0.60316
## 44 2002    Europe        76.70       76.89                   -0.18903
## 45 2007    Africa        54.81       54.08                    0.72851
## 46 2007  Americas        73.61       74.01                   -0.40607
## 47 2007      Asia        70.73       71.31                   -0.58218
## 48 2007    Europe        77.65       77.83                   -0.17644

Here I used Sean's code with iDat then included a slight change where I added a column for continent. To illustrate these tables I used the following graphs:

xyplot(TrimmedMean + StandardMean ~ year, lifeExpTrim, type = "l", main = "Trimmed and Standard Mean")

plot of chunk unnamed-chunk-6

xyplot(TrimmedMean + StandardMean ~ year | continent, lifeExpTrim2, type = "l", 
    main = "Trimmed and Standard Mean per Continent")

plot of chunk unnamed-chunk-6

These first two basic graphs map out the change of trimmed and standard mean over time. The first graph appears to show the disparity increasing after the mid 1960s quite drastically. Therefore I created another graph which separated the results into continents to see which caused this change. It appeared that both Asia and Africa also had a large disparity between trimmed and standard mean by 2007, whilst Europe and the Americas did not.