STAT 545A Homework 3

Jack Ni

Importing the Gapminder dataset from Jenny's website. Doing a quick check to see if the import went fine.

gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Loading the “plyr”, “xtable”, and “lattice” package.

library(plyr)
library(xtable)
library(lattice)

Here, I find the maximum and minimum GDP per capita for each continent. The data frame is sorted by max GDP per capita. Looking at the data, there seems to be a general trend where having a higher maximum GPD per capita also results in that continent having a lower minimum, with Africa being the exception. Since the GDP per capita records are taken by country and by year, this means that there is a larger gap between either the countries or the years in Asia and Europe as compared to Oceania and Americas.

maxMinGdpByCont <- ddply(gDat, ~continent, summarize, maxGdpPerCap = max(gdpPercap), 
    minGdpPerCap = min(gdpPercap))
maxMinGdpByCont <- arrange(maxMinGdpByCont, maxGdpPerCap)
colnames(maxMinGdpByCont) <- c("continent", "max GDP per capita", "min GDP per capita")
maxMinGdpByCont <- xtable(maxMinGdpByCont)
print(maxMinGdpByCont, type = "html", include.rownames = FALSE)
continent max GDP per capita min GDP per capita
Africa 21951.21 241.17
Oceania 34435.37 10039.60
Americas 42951.65 1201.64
Europe 49357.19 973.53
Asia 113523.13 331.00

This expands upon the above example. Each min and max value has a separate row and is labelled as such.

minMaxGdp <- function(x) {
    xMin <- min(x$gdpPercap)
    xMax <- max(x$gdpPercap)
    makeMatrix <- matrix(c("Min", "Max", xMin, xMax), nrow = 2, ncol = 2)
    colnames(makeMatrix) <- c("statistic", "value")
    return(makeMatrix)
}

minMaxGdpByContSep <- ddply(gDat, ~continent, minMaxGdp)
minMaxGdpByContSep <- xtable(minMaxGdpByContSep)
print(minMaxGdpByContSep, type = "html", include.rownames = FALSE)
continent statistic value
Africa Min 241.1658765
Africa Max 21951.21176
Americas Min 1201.637154
Americas Max 42951.65309
Asia Min 331
Asia Max 113523.1329
Europe Min 973.5331948
Europe Max 49357.19017
Oceania Min 10039.59564
Oceania Max 34435.36744

The mean life expectancy globally increases from 1952 to 2007. Removing the lowest and highest 10% of life expectancy records gives a slightly lower mean up til 1977 at which point the trimmed mean is higher.

meanLeByYear <- ddply(gDat, ~year, summarize, meanLe = mean(lifeExp), trimMeanLe = mean(lifeExp, 
    trim = 0.1))
colnames(meanLeByYear) <- c("year", "mean", "trimmed_mean")
meanLeByYear <- xtable(meanLeByYear)
print(meanLeByYear, type = "html", include.rownames = FALSE)
year mean trimmed_mean
1952 49.06 48.58
1957 51.51 51.27
1962 53.61 53.58
1967 55.68 55.87
1972 57.65 58.01
1977 59.57 60.10
1982 61.53 62.12
1987 63.21 63.92
1992 64.16 65.19
1997 65.01 66.02
2002 65.69 66.72
2007 67.01 68.11

This shows the average life expectancy by year per continent. The average goes increases by year and this trend is the same for every continent. However, this data is hard to assess by looking due to its layout.

leByContYear <- ddply(gDat, .(continent, year), summarize, meanLe = mean(lifeExp))
colnames(leByContYear) <- c("continent", "year", "mean life expectancy")
leByContYear <- xtable(leByContYear)
print(leByContYear, type = "html", include.rownames = FALSE)
continent year mean life expectancy
Africa 1952 39.14
Africa 1957 41.27
Africa 1962 43.32
Africa 1967 45.33
Africa 1972 47.45
Africa 1977 49.58
Africa 1982 51.59
Africa 1987 53.34
Africa 1992 53.63
Africa 1997 53.60
Africa 2002 53.33
Africa 2007 54.81
Americas 1952 53.28
Americas 1957 55.96
Americas 1962 58.40
Americas 1967 60.41
Americas 1972 62.39
Americas 1977 64.39
Americas 1982 66.23
Americas 1987 68.09
Americas 1992 69.57
Americas 1997 71.15
Americas 2002 72.42
Americas 2007 73.61
Asia 1952 46.31
Asia 1957 49.32
Asia 1962 51.56
Asia 1967 54.66
Asia 1972 57.32
Asia 1977 59.61
Asia 1982 62.62
Asia 1987 64.85
Asia 1992 66.54
Asia 1997 68.02
Asia 2002 69.23
Asia 2007 70.73
Europe 1952 64.41
Europe 1957 66.70
Europe 1962 68.54
Europe 1967 69.74
Europe 1972 70.78
Europe 1977 71.94
Europe 1982 72.81
Europe 1987 73.64
Europe 1992 74.44
Europe 1997 75.51
Europe 2002 76.70
Europe 2007 77.65
Oceania 1952 69.25
Oceania 1957 70.30
Oceania 1962 71.09
Oceania 1967 71.31
Oceania 1972 71.91
Oceania 1977 72.85
Oceania 1982 74.29
Oceania 1987 75.32
Oceania 1992 76.94
Oceania 1997 78.19
Oceania 2002 79.74
Oceania 2007 80.72

This shows the number of countries in a given continent in a specific year who life expectancy is lower than our retirement age of 65. Africa and Asia stand out as having the most number of these countries.

leCounByContYear <- ddply(gDat, .(continent, year), summarize, countryCountOfLe = sum(lifeExp < 
    65))
colnames(leCounByContYear) <- c("continent", "year", "life expectancy")
leCounByContYear <- xtable(leCounByContYear)
print(leCounByContYear, type = "html", include.rownames = FALSE)
continent year life expectancy
Africa 1952 52
Africa 1957 52
Africa 1962 52
Africa 1967 52
Africa 1972 52
Africa 1977 51
Africa 1982 50
Africa 1987 47
Africa 1992 46
Africa 1997 45
Africa 2002 45
Africa 2007 43
Americas 1952 22
Americas 1957 21
Americas 1962 18
Americas 1967 16
Americas 1972 13
Americas 1977 11
Americas 1982 10
Americas 1987 7
Americas 1992 3
Americas 1997 2
Americas 2002 2
Americas 2007 1
Asia 1952 32
Asia 1957 31
Asia 1962 28
Asia 1967 28
Asia 1972 25
Asia 1977 22
Asia 1982 20
Asia 1987 13
Asia 1992 11
Asia 1997 10
Asia 2002 9
Asia 2007 8
Europe 1952 13
Europe 1957 8
Europe 1962 6
Europe 1967 2
Europe 1972 1
Europe 1977 1
Europe 1982 1
Europe 1987 1
Europe 1992 0
Europe 1997 0
Europe 2002 0
Europe 2007 0
Oceania 1952 0
Oceania 1957 0
Oceania 1962 0
Oceania 1967 0
Oceania 1972 0
Oceania 1977 0
Oceania 1982 0
Oceania 1987 0
Oceania 1992 0
Oceania 1997 0
Oceania 2002 0
Oceania 2007 0

Here, the data gives the maximum life expectancy in a continent in a specific year. It also includes that country that has this max life expectancy.

leByContYearCountry <- ddply(gDat, .(continent, year), summarize, maxLe = max(lifeExp), 
    country = country[which.max(lifeExp)])
colnames(leByContYearCountry) <- c("continent", "year", "max life expectancy", 
    "country")
leByContYearCountry <- xtable(leByContYearCountry)
print(leByContYearCountry, type = "html", include.rownames = FALSE)
continent year max life expectancy country
Africa 1952 52.72 Reunion
Africa 1957 58.09 Mauritius
Africa 1962 60.25 Mauritius
Africa 1967 61.56 Mauritius
Africa 1972 64.27 Reunion
Africa 1977 67.06 Reunion
Africa 1982 69.89 Reunion
Africa 1987 71.91 Reunion
Africa 1992 73.61 Reunion
Africa 1997 74.77 Reunion
Africa 2002 75.74 Reunion
Africa 2007 76.44 Reunion
Americas 1952 68.75 Canada
Americas 1957 69.96 Canada
Americas 1962 71.30 Canada
Americas 1967 72.13 Canada
Americas 1972 72.88 Canada
Americas 1977 74.21 Canada
Americas 1982 75.76 Canada
Americas 1987 76.86 Canada
Americas 1992 77.95 Canada
Americas 1997 78.61 Canada
Americas 2002 79.77 Canada
Americas 2007 80.65 Canada
Asia 1952 65.39 Israel
Asia 1957 67.84 Israel
Asia 1962 69.39 Israel
Asia 1967 71.43 Japan
Asia 1972 73.42 Japan
Asia 1977 75.38 Japan
Asia 1982 77.11 Japan
Asia 1987 78.67 Japan
Asia 1992 79.36 Japan
Asia 1997 80.69 Japan
Asia 2002 82.00 Japan
Asia 2007 82.60 Japan
Europe 1952 72.67 Norway
Europe 1957 73.47 Iceland
Europe 1962 73.68 Iceland
Europe 1967 74.16 Sweden
Europe 1972 74.72 Sweden
Europe 1977 76.11 Iceland
Europe 1982 76.99 Iceland
Europe 1987 77.41 Switzerland
Europe 1992 78.77 Iceland
Europe 1997 79.39 Sweden
Europe 2002 80.62 Switzerland
Europe 2007 81.76 Iceland
Oceania 1952 69.39 New Zealand
Oceania 1957 70.33 Australia
Oceania 1962 71.24 New Zealand
Oceania 1967 71.52 New Zealand
Oceania 1972 71.93 Australia
Oceania 1977 73.49 Australia
Oceania 1982 74.74 Australia
Oceania 1987 76.32 Australia
Oceania 1992 77.56 Australia
Oceania 1997 78.83 Australia
Oceania 2002 80.37 Australia
Oceania 2007 81.23 Australia

Worked in collaboration with Jonathan Zhang