STAT 545A Homework 4

Yumian Hu

2013-09-29

Load the Gapminder data.
Examine trimmed mean of life expectancy for different years.
How is life expectancy changing over time on different continents?
Depict the maximum and minimum of GDP per capita for all continents
Look at the spread of GDP per capita within the continents in year 2007.
Depict the proportion of countries with low life expectancy over time by continent.
Find countries exhibit extremely rapid and slow life expectancy gains.

Load the Gapminder data.

Gapminder data can be found here.

> gDat <- read.delim("gapminderDataFiveYear.txt")

Perform a superficial check that data import went OK.

> str(gDat)

'data.frame':   1704 obs. of  6 variables:
 $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
 $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
 $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
 $ gdpPercap: num  779 821 853 836 740 ...

Load the lattice, plyr and xtable packages.

> library(xtable)
> library(plyr)
> library(lattice)

Drop Oceania right from the start.

> gDat <- droplevels(subset(gDat, continent != "Oceania"))
> ## Double-check it's really gone
> table(gDat$continent)


  Africa Americas     Asia   Europe 
     624      300      396      360

Examine trimmed mean of life expectancy for different years.

The code below is borrowed from Jenny's menu of data aggregation tasks.

> jTrim <- 0.2 ## the trim level for the trimmed means is 0.2
> foo <- ddply(gDat, ~ year, summarize, tMean = mean(lifeExp, trim = jTrim))
> xyplot(tMean ~ year, foo, grid=T, type=c("p", "l"))

plot of chunk unnamed-chunk-6

The trimmed mean of life expectancy is increasing over year.

How is life expectancy changing over time on different continents?

Plot the gdpPercap over year for each continent, and connect the dots for the year specific median

> bwplot(gdpPercap ~ as.factor(year) | continent, gDat)

plot of chunk unnamed-chunk-7

There is an obvious increase trend of median gdpPercap over year in Europe.

Note there are extreme large values of gdpPercap in Asia so the sacle of the figure is affected by these outliers.

Depict the maximum and minimum of GDP per capita for all continents

The task is inspired by Jonathan Baik's Homework #3

> foo <- ddply(gDat, ~ year + continent, function(x) {
+   gdpPercap <- range(x$gdpPercap)
+   return(data.frame(gdpPercap, stat = c("min", "max")))
+   })
> xyplot(gdpPercap ~ year | continent, foo, group=stat, auto.key = T, type=c("p", "l"))

plot of chunk unnamed-chunk-8

Unlike other continents, the maximum gerPercap of Asia shows a decreasing trend over years and has some extreme large values from year 1952 to 1972.

Look at the spread of GDP per capita within the continents in year 2007.

Focus on the year 2007.

> hDat <- subset(gDat, year==2007)

Boxplot

> bwplot(gdpPercap ~ continent, hDat, 
+        panel = function(..., box.ratio) {
+          panel.violin(..., col = "transparent", border = "grey60", varwidth = FALSE, 
+                       box.ratio = box.ratio)
+          panel.bwplot(..., fill = NULL, box.ratio = 0.1)
+          })

plot of chunk unnamed-chunk-10

Stripplot

> stripplot(gdpPercap ~ continent, hDat, jitter.data = TRUE, grid = "h", type = c("p", "a"), fun=median)

plot of chunk unnamed-chunk-11

Densityplot

> densityplot( ~ gdpPercap, hDat, plot.points = FALSE, ref = TRUE, group = continent, 
+             auto.key = list(columns = nlevels(hDat$continent)))

plot of chunk unnamed-chunk-12

All the three figures agree that Europe has the highest gpdPercap while Africa has the lowest gdpPercap.

Depict the proportion of countries with low life expectancy over time by continent.

The task is inspired by Jenny's menu of data aggregation tasks.

Any life expectancy which is less than the 20th percentile of the whole lifeExp is defined as low life expectancy.

> (bMark <- quantile(gDat$lifeExp, 0.2))

  20% 
45.68

> foo <- ddply(gDat, ~ continent + year, function(x) {
+   c(prop = sum(x$lifeExp <= bMark) / nrow(x))
+ })
> xyplot(prop ~ year, foo, group=continent, auto.key = T, type=c("p", "l"))

plot of chunk unnamed-chunk-13

Find countries exhibit extremely rapid and slow life expectancy gains.

The task is inspired by Sean Jewell's Homework #3.

Fit a linear regression for each country, modelling life expectancy as a function of the year and then retain the estimated intercepts and slopes.

> yearMin <- min(gDat$year)
> jFun <- function(x) {
+   estCoefs <- coef(lm(lifeExp ~ I(year - yearMin), x))
+   names(estCoefs) <- c("intercept", "slope")
+   return(estCoefs)
+ }
> jCoefs <- ddply(gDat, ~ country + continent, jFun)

Report the country with min and max intercept within each continent from the above jCoefs dataset.

> foo <- ddply(jCoefs, ~ continent, function(x) {
+   lerange <- c(which.min(x$slope), which.max(x$slope))
+   cbind(x[lerange, c("continent", "country", "intercept", "slope")], stat=c("min.slope", "max.slope"))
+ })
> print(xtable(foo), type = "html", include.rownames = FALSE, digits = c(0, 0, 2, 2, 0))

continent	country	intercept	slope	stat
Africa	Zimbabwe	55.22	-0.09	min.slope
Africa	Libya	42.10	0.63	max.slope
Americas	Paraguay	62.48	0.16	min.slope
Americas	Nicaragua	43.05	0.56	max.slope
Asia	Iraq	50.11	0.24	min.slope
Asia	Oman	37.21	0.77	max.slope
Europe	Denmark	71.03	0.12	min.slope
Europe	Turkey	46.02	0.50	max.slope

Note the results displayed in the above table is consistent with those in the course website

Subset the whole dataset, only include information about these “interesting” countries and add the description of these fitted slopes (max.slope vs min.slope)

> hitcountry <- foo$country
> iDat <- subset(gDat, country %in% hitcountry)
> iDat <- merge(iDat, foo[, c("country", "stat")], by = "country")

Plot lifeExp over year with fitted linear regression line for the above interesting countries, seperated by different continents.

> xyplot(lifeExp ~ year | continent, iDat, group=stat, auto.key = T, layout = c(2 ,2),
+        panel = panel.superpose,
+        panel.groups=function(x, y, ...){
+            panel.xyplot(x, y, ...)
+            panel.lmline(x, y, ...)
+          }
+        )

plot of chunk unnamed-chunk-17

Look at the country of Irap or Zimbabwe, which has the mininal fitted slope in Asia or Afica. Most of their points don't fall on the fitted regression line, so a polynomial curve may be better fitted.