STAT 545A Homework 2, Xinxin Xue
For this demographic data set with life expectancy across countries in varies continents, i am going to explore: Min and Max life expectancy for a selected continent Correlation between life expectancy and gdp per capita, for 5 continents, respectively
First, import data, and see how many countries, continent and years were surveyed.
library(lattice)
library(plyr)
library(xtable)
gap <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gap)
names(gDat)
## [1] "country" "year" "pop" "continent" "lifeExp" "gdpPercap"
country = data.frame(unique(gDat$country))
nrow(country)
## [1] 142
continent = data.frame(unique(gDat$continent))
nrow(continent)
## [1] 5
yr = data.frame(unique(gDat$year))
nrow(yr)
## [1] 12
What's the situation like in Asia, is the same country consistantly yielding minimum (or maximum) life expectancy? From below, we see it's not! But what happened, why not?! Extract from original data and examine.
Asia <- subset(gDat, continent == "Asia")
a <- data.frame(ddply(Asia, ~year, summarize, max = max(lifeExp), min = min(lifeExp),
ave = mean(lifeExp)))
a
## year max min ave
## 1 1952 65.39 28.80 46.31
## 2 1957 67.84 30.33 49.32
## 3 1962 69.39 32.00 51.56
## 4 1967 71.43 34.02 54.66
## 5 1972 73.42 36.09 57.32
## 6 1977 75.38 31.22 59.61
## 7 1982 77.11 39.85 62.62
## 8 1987 78.67 40.82 64.85
## 9 1992 79.36 41.67 66.54
## 10 1997 80.69 41.76 68.02
## 11 2002 82.00 42.13 69.23
## 12 2007 82.60 43.83 70.73
low <- subset(gDat, lifeExp == a$min)
low
## country year pop continent lifeExp gdpPercap
## 1 Afghanistan 1952 8425333 Asia 28.80 779.4
## 2 Afghanistan 1957 9240934 Asia 30.33 820.9
## 3 Afghanistan 1962 10267083 Asia 32.00 853.1
## 4 Afghanistan 1967 11537966 Asia 34.02 836.2
## 5 Afghanistan 1972 13079460 Asia 36.09 740.0
## 7 Afghanistan 1982 12881816 Asia 39.85 978.0
## 8 Afghanistan 1987 13867957 Asia 40.82 852.4
## 9 Afghanistan 1992 16317921 Asia 41.67 649.3
## 10 Afghanistan 1997 22227415 Asia 41.76 635.3
## 11 Afghanistan 2002 25268405 Asia 42.13 726.7
## 12 Afghanistan 2007 31889923 Asia 43.83 974.6
## 222 Cambodia 1977 6978607 Asia 31.22 525.0
af <- subset(gDat, year == 1977 & country == "Afghanistan")
af
## country year pop continent lifeExp gdpPercap
## 6 Afghanistan 1977 14880372 Asia 38.44 786.1
high <- subset(gDat, lifeExp == a$max)
high
## country year pop continent lifeExp gdpPercap
## 757 Israel 1952 1620914 Asia 65.39 4087
## 758 Israel 1957 1944401 Asia 67.84 5385
## 759 Israel 1962 2310904 Asia 69.39 7106
## 796 Japan 1967 100825279 Asia 71.43 9848
## 797 Japan 1972 107188273 Asia 73.42 14779
## 798 Japan 1977 113872473 Asia 75.38 16610
## 799 Japan 1982 118454974 Asia 77.11 19384
## 800 Japan 1987 122091325 Asia 78.67 22376
## 801 Japan 1992 124329269 Asia 79.36 26825
## 802 Japan 1997 125956499 Asia 80.69 28817
## 803 Japan 2002 127065841 Asia 82.00 28605
## 804 Japan 2007 127467972 Asia 82.60 31656
IJ <- subset(gDat, country %in% c("Israel", "Japan"))
IJ
## country year pop continent lifeExp gdpPercap
## 757 Israel 1952 1620914 Asia 65.39 4087
## 758 Israel 1957 1944401 Asia 67.84 5385
## 759 Israel 1962 2310904 Asia 69.39 7106
## 760 Israel 1967 2693585 Asia 70.75 8394
## 761 Israel 1972 3095893 Asia 71.63 12787
## 762 Israel 1977 3495918 Asia 73.06 13307
## 763 Israel 1982 3858421 Asia 74.45 15367
## 764 Israel 1987 4203148 Asia 75.60 17122
## 765 Israel 1992 4936550 Asia 76.93 18052
## 766 Israel 1997 5531387 Asia 78.27 20897
## 767 Israel 2002 6029529 Asia 79.70 21906
## 768 Israel 2007 6426679 Asia 80.75 25523
## 793 Japan 1952 86459025 Asia 63.03 3217
## 794 Japan 1957 91563009 Asia 65.50 4318
## 795 Japan 1962 95831757 Asia 68.73 6577
## 796 Japan 1967 100825279 Asia 71.43 9848
## 797 Japan 1972 107188273 Asia 73.42 14779
## 798 Japan 1977 113872473 Asia 75.38 16610
## 799 Japan 1982 118454974 Asia 77.11 19384
## 800 Japan 1987 122091325 Asia 78.67 22376
## 801 Japan 1992 124329269 Asia 79.36 26825
## 802 Japan 1997 125956499 Asia 80.69 28817
## 803 Japan 2002 127065841 Asia 82.00 28605
## 804 Japan 2007 127467972 Asia 82.60 31656
We see, it looks more like Japanese and Isreal are the two longevity champions, no particular indication of unusual historical event. This is unlike the minimum value case where we see evindence of genocide in Cambodia (Khmer Rouge).
Next, I want to see if correlation is strong in all continents between life expectancy and gdp per capita. And put the results in a html table. (I seem to have problem with getting the right slope, will consult in class)
gFun <- function(x) {
est <- coef(lm(lifeExp ~ gdpPercap, x))
names(est) <- c("intercept", "slope")
return(est)
}
foo <- ddply(gDat, ~continent, gFun)
foo
## continent intercept slope
## 1 Africa 45.84 0.0013771
## 2 Americas 58.84 0.0008157
## 3 Asia 57.51 0.0003227
## 4 Europe 65.34 0.0004535
## 5 Oceania 63.69 0.0005709
*And print out in a html table
options(rstudio.markdownToHTML = function(inputFile, outputFile) {
require(markdown)
htmlOptions <- markdownHTMLOptions(defaults = TRUE)
pathToCSS <- "C:\\Users\\xue\\Documents\\STATS540\\markdown7.css"
markdownToHTML(inputFile, outputFile, options = htmlOptions, stylesheet = pathToCSS)
})
tb <- xtable(foo)
print(tb, type = "html", include.rownames = T)
| continent | intercept | slope | |
|---|---|---|---|
| 1 | Africa | 45.84 | 0.00 |
| 2 | Americas | 58.84 | 0.00 |
| 3 | Asia | 57.51 | 0.00 |
| 4 | Europe | 65.34 | 0.00 |
| 5 | Oceania | 63.69 | 0.00 |