STAT 545A Homework 2, Xinxin Xue

For this demographic data set with life expectancy across countries in varies continents, i am going to explore: Min and Max life expectancy for a selected continent Correlation between life expectancy and gdp per capita, for 5 continents, respectively

First, import data, and see how many countries, continent and years were surveyed.

library(lattice)
library(plyr)
library(xtable)
gap <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gap)
names(gDat)
## [1] "country"   "year"      "pop"       "continent" "lifeExp"   "gdpPercap"
country = data.frame(unique(gDat$country))
nrow(country)
## [1] 142
continent = data.frame(unique(gDat$continent))
nrow(continent)
## [1] 5
yr = data.frame(unique(gDat$year))
nrow(yr)
## [1] 12

What's the situation like in Asia, is the same country consistantly yielding minimum (or maximum) life expectancy? From below, we see it's not! But what happened, why not?! Extract from original data and examine.

Asia <- subset(gDat, continent == "Asia")
a <- data.frame(ddply(Asia, ~year, summarize, max = max(lifeExp), min = min(lifeExp), 
    ave = mean(lifeExp)))
a
##    year   max   min   ave
## 1  1952 65.39 28.80 46.31
## 2  1957 67.84 30.33 49.32
## 3  1962 69.39 32.00 51.56
## 4  1967 71.43 34.02 54.66
## 5  1972 73.42 36.09 57.32
## 6  1977 75.38 31.22 59.61
## 7  1982 77.11 39.85 62.62
## 8  1987 78.67 40.82 64.85
## 9  1992 79.36 41.67 66.54
## 10 1997 80.69 41.76 68.02
## 11 2002 82.00 42.13 69.23
## 12 2007 82.60 43.83 70.73
low <- subset(gDat, lifeExp == a$min)
low
##         country year      pop continent lifeExp gdpPercap
## 1   Afghanistan 1952  8425333      Asia   28.80     779.4
## 2   Afghanistan 1957  9240934      Asia   30.33     820.9
## 3   Afghanistan 1962 10267083      Asia   32.00     853.1
## 4   Afghanistan 1967 11537966      Asia   34.02     836.2
## 5   Afghanistan 1972 13079460      Asia   36.09     740.0
## 7   Afghanistan 1982 12881816      Asia   39.85     978.0
## 8   Afghanistan 1987 13867957      Asia   40.82     852.4
## 9   Afghanistan 1992 16317921      Asia   41.67     649.3
## 10  Afghanistan 1997 22227415      Asia   41.76     635.3
## 11  Afghanistan 2002 25268405      Asia   42.13     726.7
## 12  Afghanistan 2007 31889923      Asia   43.83     974.6
## 222    Cambodia 1977  6978607      Asia   31.22     525.0
af <- subset(gDat, year == 1977 & country == "Afghanistan")
af
##       country year      pop continent lifeExp gdpPercap
## 6 Afghanistan 1977 14880372      Asia   38.44     786.1
high <- subset(gDat, lifeExp == a$max)
high
##     country year       pop continent lifeExp gdpPercap
## 757  Israel 1952   1620914      Asia   65.39      4087
## 758  Israel 1957   1944401      Asia   67.84      5385
## 759  Israel 1962   2310904      Asia   69.39      7106
## 796   Japan 1967 100825279      Asia   71.43      9848
## 797   Japan 1972 107188273      Asia   73.42     14779
## 798   Japan 1977 113872473      Asia   75.38     16610
## 799   Japan 1982 118454974      Asia   77.11     19384
## 800   Japan 1987 122091325      Asia   78.67     22376
## 801   Japan 1992 124329269      Asia   79.36     26825
## 802   Japan 1997 125956499      Asia   80.69     28817
## 803   Japan 2002 127065841      Asia   82.00     28605
## 804   Japan 2007 127467972      Asia   82.60     31656
IJ <- subset(gDat, country %in% c("Israel", "Japan"))
IJ
##     country year       pop continent lifeExp gdpPercap
## 757  Israel 1952   1620914      Asia   65.39      4087
## 758  Israel 1957   1944401      Asia   67.84      5385
## 759  Israel 1962   2310904      Asia   69.39      7106
## 760  Israel 1967   2693585      Asia   70.75      8394
## 761  Israel 1972   3095893      Asia   71.63     12787
## 762  Israel 1977   3495918      Asia   73.06     13307
## 763  Israel 1982   3858421      Asia   74.45     15367
## 764  Israel 1987   4203148      Asia   75.60     17122
## 765  Israel 1992   4936550      Asia   76.93     18052
## 766  Israel 1997   5531387      Asia   78.27     20897
## 767  Israel 2002   6029529      Asia   79.70     21906
## 768  Israel 2007   6426679      Asia   80.75     25523
## 793   Japan 1952  86459025      Asia   63.03      3217
## 794   Japan 1957  91563009      Asia   65.50      4318
## 795   Japan 1962  95831757      Asia   68.73      6577
## 796   Japan 1967 100825279      Asia   71.43      9848
## 797   Japan 1972 107188273      Asia   73.42     14779
## 798   Japan 1977 113872473      Asia   75.38     16610
## 799   Japan 1982 118454974      Asia   77.11     19384
## 800   Japan 1987 122091325      Asia   78.67     22376
## 801   Japan 1992 124329269      Asia   79.36     26825
## 802   Japan 1997 125956499      Asia   80.69     28817
## 803   Japan 2002 127065841      Asia   82.00     28605
## 804   Japan 2007 127467972      Asia   82.60     31656

We see, it looks more like Japanese and Isreal are the two longevity champions, no particular indication of unusual historical event. This is unlike the minimum value case where we see evindence of genocide in Cambodia (Khmer Rouge).

Next, I want to see if correlation is strong in all continents between life expectancy and gdp per capita. And put the results in a html table. (I seem to have problem with getting the right slope, will consult in class)

gFun <- function(x) {
    est <- coef(lm(lifeExp ~ gdpPercap, x))
    names(est) <- c("intercept", "slope")
    return(est)
}
foo <- ddply(gDat, ~continent, gFun)
foo
##   continent intercept     slope
## 1    Africa     45.84 0.0013771
## 2  Americas     58.84 0.0008157
## 3      Asia     57.51 0.0003227
## 4    Europe     65.34 0.0004535
## 5   Oceania     63.69 0.0005709

*And print out in a html table

options(rstudio.markdownToHTML = function(inputFile, outputFile) {
    require(markdown)

    htmlOptions <- markdownHTMLOptions(defaults = TRUE)
    pathToCSS <- "C:\\Users\\xue\\Documents\\STATS540\\markdown7.css"

    markdownToHTML(inputFile, outputFile, options = htmlOptions, stylesheet = pathToCSS)
})
tb <- xtable(foo)
print(tb, type = "html", include.rownames = T)
continent intercept slope
1 Africa 45.84 0.00
2 Americas 58.84 0.00
3 Asia 57.51 0.00
4 Europe 65.34 0.00
5 Oceania 63.69 0.00