STAT545A Homework 4

Jack Ni

Importing the Gapminder dataset from Jenny's website. Doing a quick check to see if the import went fine.

gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Loading the “plyr”, “xtable”, and “lattice” package.

library(plyr)
library(xtable)
library(lattice)

I am using the data aggregation from Jinyuan's report on mean life expectancy by continent and year in the “tall” format. The “tall” format, while much harder for a person to read and interpret in a table, is far easier to make figures and plots with.

ChangeLife = ddply(gDat, .(continent, year), summarize, MeanLifeExp = mean(lifeExp))
ChangeLife <- within(ChangeLife, continent <- reorder(continent, MeanLifeExp))
ChangeLifeTable = xtable(ChangeLife)
print(ChangeLifeTable, type = "html", include.rownames = FALSE)
continent year MeanLifeExp
Africa 1952 39.14
Africa 1957 41.27
Africa 1962 43.32
Africa 1967 45.33
Africa 1972 47.45
Africa 1977 49.58
Africa 1982 51.59
Africa 1987 53.34
Africa 1992 53.63
Africa 1997 53.60
Africa 2002 53.33
Africa 2007 54.81
Americas 1952 53.28
Americas 1957 55.96
Americas 1962 58.40
Americas 1967 60.41
Americas 1972 62.39
Americas 1977 64.39
Americas 1982 66.23
Americas 1987 68.09
Americas 1992 69.57
Americas 1997 71.15
Americas 2002 72.42
Americas 2007 73.61
Asia 1952 46.31
Asia 1957 49.32
Asia 1962 51.56
Asia 1967 54.66
Asia 1972 57.32
Asia 1977 59.61
Asia 1982 62.62
Asia 1987 64.85
Asia 1992 66.54
Asia 1997 68.02
Asia 2002 69.23
Asia 2007 70.73
Europe 1952 64.41
Europe 1957 66.70
Europe 1962 68.54
Europe 1967 69.74
Europe 1972 70.78
Europe 1977 71.94
Europe 1982 72.81
Europe 1987 73.64
Europe 1992 74.44
Europe 1997 75.51
Europe 2002 76.70
Europe 2007 77.65
Oceania 1952 69.25
Oceania 1957 70.30
Oceania 1962 71.09
Oceania 1967 71.31
Oceania 1972 71.91
Oceania 1977 72.85
Oceania 1982 74.29
Oceania 1987 75.32
Oceania 1992 76.94
Oceania 1997 78.19
Oceania 2002 79.74
Oceania 2007 80.72

I started off plotting a stripplot, sorted by continents of increasing mean life expectancy. It's easily seen that countries in Africa has an overall lower life expectancy than countres in the other continents. From this, we can also see that Asia and Americas has a larger variation in mean life expectancy over the years than the other continents.

stripplot(MeanLifeExp ~ continent, ChangeLife, grid = "h", type = c("p", "a"))

plot of chunk unnamed-chunk-4

Here I made two density plots to give a better picture. The first gives an overall plot, and we can see that it is slightly bimodal. The second plot shows the reason why. The lower mean life expectancy for Africa is creating the first smaller bump in the first plot. The res of the continents have a more similar life expectancy.

densityplot(~MeanLifeExp, ChangeLife)

plot of chunk unnamed-chunk-5

densityplot(~MeanLifeExp, ChangeLife, plot.points = FALSE, ref = TRUE, group = continent, 
    auto.key = list(columns = nlevels(ChangeLife$continent)))

plot of chunk unnamed-chunk-5

Here I a data aggregation result on mean life expectancy and mean population by continent and year (with Oceania dropped). I created another column for the standardized ratio of life expectancy to population. I wanted to see how the life expectancy changed as population changed.

gDat <- droplevels(subset(gDat, continent != "Oceania"))
leAndPopByYearAndCont <- ddply(gDat, ~year + continent, summarize, meanPop = mean(pop), 
    meanLe = mean(lifeExp), ratio = (meanLe/var(lifeExp)/(meanPop/var(pop))))
leAndPopByYearAndContTable = xtable(leAndPopByYearAndCont)
print(leAndPopByYearAndContTable, type = "html", include.rownames = FALSE)
year continent meanPop meanLe ratio
1952 Africa 4570009.63 39.14 12878230.97
1952 Americas 13806097.84 53.28 46410619.93
1952 Asia 42283556.12 46.31 162647486.72
1952 Europe 13937361.53 64.41 33974226.07
1957 Africa 5093033.42 41.27 12844247.39
1957 Americas 15478156.64 55.96 55955284.34
1957 Asia 47356987.85 49.32 184058507.66
1957 Europe 14596345.03 66.70 51814938.60
1962 Africa 5702247.40 43.32 13935592.34
1962 Americas 17330810.16 58.40 70431446.72
1962 Asia 51404763.09 51.56 192659865.21
1962 Europe 15345171.83 68.54 83981250.16
1967 Africa 6447874.79 45.33 15342943.71
1967 Americas 19229864.92 60.41 88274581.86
1967 Asia 57747360.61 54.66 238971408.90
1967 Europe 16039298.60 69.74 113849032.74
1972 Africa 7305375.79 47.45 16193058.24
1972 Americas 21175368.40 62.39 110947270.12
1972 Asia 65180977.21 57.32 281953855.52
1972 Europe 16687835.30 70.78 164472197.31
1977 Africa 8328096.56 49.58 17238761.53
1977 Americas 23122707.96 64.39 127866860.79
1977 Asia 72257986.55 59.61 301848472.86
1977 Europe 17238817.70 71.94 181101819.19
1982 Africa 9602857.44 51.59 17881422.99
1982 Americas 25211636.80 66.23 153016889.63
1982 Asia 79095017.64 62.62 463590776.05
1982 Europe 17708896.70 72.81 174576635.55
1987 Africa 11054502.12 53.34 18212083.43
1987 Americas 27310158.84 68.09 219669241.33
1987 Asia 87006689.76 64.85 564321955.25
1987 Europe 18103138.67 73.64 184900915.27
1992 Africa 12674644.56 53.63 14580469.18
1992 Americas 29570964.16 69.57 297537724.10
1992 Asia 94948248.21 66.54 644800365.88
1992 Europe 18604759.90 74.44 190137058.54
1997 Africa 14304480.46 53.60 17856657.18
1997 Americas 31876016.40 71.15 359557566.65
1997 Asia 102523803.03 68.02 697515222.10
1997 Europe 18964804.93 75.51 213740856.10
2002 Africa 16033152.23 53.33 18001989.27
2002 Americas 33990910.48 72.42 398021870.44
2002 Asia 109145521.30 69.23 692482261.82
2002 Europe 19274128.97 76.70 251346406.73
2007 Africa 17875763.31 54.81 20523794.07
2007 Americas 35954847.36 73.61 491835901.18
2007 Asia 115513752.33 70.73 810112703.30
2007 Europe 19536617.63 77.65 249827671.88

There are 4 figures. The first two are box and whisker plots of the mean life expectancy and mean population. Asia has the most variation in population by a large amount but is only slightly larger for life expectancy.

bwplot(meanLe ~ continent, leAndPopByYearAndCont)

plot of chunk unnamed-chunk-7

bwplot(meanPop ~ continent, leAndPopByYearAndCont)

plot of chunk unnamed-chunk-7

The 3rd plot is an xyplot of the ratio. Asia, Europe, and Americas all have an increasing ratio which means their standards of living (a big factor in life expectancy in my opinion) is increasing faster than the increase in their population. I made a 4th figure for Africa to see if its slope really is flat or if the scale of the 3rd figure is too big. The 4th figure shows Africa follows the same trend.

xyplot(ratio ~ year | factor(continent), leAndPopByYearAndCont, scales = list(y = list(draw = FALSE)))

plot of chunk unnamed-chunk-8

xyplot(ratio ~ year, leAndPopByYearAndCont, subset = continent == "Africa", 
    scales = list(y = list(draw = FALSE)))

plot of chunk unnamed-chunk-8