Homework #4 Visualize a Quantitative Variable with lattice

Huiting Ma

This homework will focus on the following parts:

Data Import

First, the Gapminder data can be imported from here. Then, make sure that you are working in the correct directory on your computer.

# setwd("C:/Users/user/Desktop/UBC/STAT545")
gdURL<-"http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat<-read.delim(gdURL)

Now, Let us check whether the dataset has imported correctly.

str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Ok, the sample set looks like is what we want! Then, we can load lattice and plyr packages.

library(lattice)
## Warning: package 'lattice' was built under R version 3.0.2
library(plyr)
## Warning: package 'plyr' was built under R version 3.0.2
library(xtable)

Visualize a Quantitative Variable

Now, Let us start our jounery. Through analyzing dataset from last homework, I noticed that there are only two measly coutries in Oceania. I would like to drop these observation and focus my analysis on other continents.

iDat <- droplevels(subset(gDat, continent != "Oceania"))

Let us check whether we did it!

table(iDat$continent)
## 
##   Africa Americas     Asia   Europe 
##      624      300      396      360

Yes! Oceania has dropped successfully.

Depict the Maximum and Minimum of GDP/Capita for all Continents

The first plot I will introdue is to plot the maximum and minimum of GDP/Capita for all continents.The code that I am going to use is from Sean here and JB here. However, instead of using gDat, I will choose to use iDat that I defined above. Let us try to get the table from last assignment.

printTable <- function(df)
  {
  print(xtable(df), type = 'html', include.rownames = F)
  }

In order to simplify the code, he chose to write a help function. Smart Way! The same like JB way!

minMaxGDPwide <- ddply(iDat, ~continent, summarize, minGDP = min(gdpPercap), maxGDP = max(gdpPercap))
minMaxGDPwide <- arrange(minMaxGDPwide, minGDP) # sort on minGDP
## However, this is a wide "format". Let's use JB's code to make the table to "tall" format.

minMaxGDPwideT<-ddply(iDat, ~ continent, function(x){
  gdpPercap <-range(x$gdpPercap)
  return(data.frame(gdpPercap, stat = c("min", "max")))
})
printTable(minMaxGDPwideT)
continent gdpPercap stat
Africa 241.17 min
Africa 21951.21 max
Americas 1201.64 min
Americas 42951.65 max
Asia 331.00 min
Asia 113523.13 max
Europe 973.53 min
Europe 49357.19 max
xyplot(gdpPercap~continent,data = arrange (minMaxGDPwideT,stat, gdpPercap), auto.key = list (TRUE, space = "right"), groups = stat, main = "Depict the Maximum and Minimum of GDP/Capita for all Continents")

plot of chunk unnamed-chunk-9

However, if we only draw above graph, we will not have a lot of information. Let us try to plot all information from each continent.

stripplot(gdpPercap~continent, iDat, jitter.data = TRUE,grid = "h", type = c("p", "a"),fun=max, col = "black")

plot of chunk unnamed-chunk-10

Based on the table, I noticed that the minimum GDP/Capita for all continents are similar (all close to 0). Therefore, I tried to draw a line to show the changes of maximum GDP/Capita for all continents.

Descrbe the Spread of GDP/Capita within the Continents

spreadGDP <- ddply(gDat, ~continent, summarize, SD = sd(gdpPercap), MAD = mad(gdpPercap), 
    IQR = IQR(gdpPercap))
spreadGDP <- arrange(spreadGDP, SD)
printTable(spreadGDP)
continent SD MAD IQR
Africa 2827.93 775.32 1616.17
Oceania 6358.98 6459.10 8072.26
Americas 6396.76 3269.33 4402.43
Europe 9355.21 8846.05 13248.30
Asia 14045.37 2820.83 7492.26

In order to describe the spread of GDP/Capital within the continents, he measured the standard deviation, median absolute deviance, and Interquartile Range. However, for this question, table cannot give you a vivid description.

First, let us try the stripplot.

stripplot(gdpPercap~continent, iDat, jitter.data = TRUE,grid = "h", type = c("p", "a"),fun=mean)

plot of chunk unnamed-chunk-12

For this graph, we still cannot get a lot of information on the spread of GDP/Capita within the continents. Now, let us to try to use boxplot to describe the spread of GDP/Capita within the continents better.

bwplot(gdpPercap ~ continent, iDat)

plot of chunk unnamed-chunk-13

Boxplot gives us a graphic description for the spread of GDP/Capita. By looking at above graph, we can notice that the Asia is experiencin the huge varation in GDP/Capita. Even though Africa has low standard deviation in GDP/Capita, the overal level of GDP/Capita is relatively very small.

bwplot(gdpPercap ~ continent, iDat, panel = panel.violin)

plot of chunk unnamed-chunk-14

Display How is Life Expectancy Changing over Time on Diffferent Continents

lifeExpChange<-ddply(iDat, ~continent + year, summarize, avgLifeExp = mean(lifeExp))
printTable(lifeExpChange)
continent year avgLifeExp
Africa 1952 39.14
Africa 1957 41.27
Africa 1962 43.32
Africa 1967 45.33
Africa 1972 47.45
Africa 1977 49.58
Africa 1982 51.59
Africa 1987 53.34
Africa 1992 53.63
Africa 1997 53.60
Africa 2002 53.33
Africa 2007 54.81
Americas 1952 53.28
Americas 1957 55.96
Americas 1962 58.40
Americas 1967 60.41
Americas 1972 62.39
Americas 1977 64.39
Americas 1982 66.23
Americas 1987 68.09
Americas 1992 69.57
Americas 1997 71.15
Americas 2002 72.42
Americas 2007 73.61
Asia 1952 46.31
Asia 1957 49.32
Asia 1962 51.56
Asia 1967 54.66
Asia 1972 57.32
Asia 1977 59.61
Asia 1982 62.62
Asia 1987 64.85
Asia 1992 66.54
Asia 1997 68.02
Asia 2002 69.23
Asia 2007 70.73
Europe 1952 64.41
Europe 1957 66.70
Europe 1962 68.54
Europe 1967 69.74
Europe 1972 70.78
Europe 1977 71.94
Europe 1982 72.81
Europe 1987 73.64
Europe 1992 74.44
Europe 1997 75.51
Europe 2002 76.70
Europe 2007 77.65
stripplot(avgLifeExp ~ as.factor(year)|continent, lifeExpChange, jitter.data = TRUE, grid = "h", main ="How is Life Expectancy Changing over Time on Diffferent Continents", type ="p")

plot of chunk unnamed-chunk-16

Here, we only can know how expected life expectancy changing over time on different continents.

Now, let us try to all life expectancy and expected life expectancy changing over time on different continents.

stripplot(lifeExp ~ as.factor(year)|continent, iDat, jitter.data = TRUE, grid = "h", main ="How is Life Expectancy Changing over Time on Diffferent Continents", type =c("p","a"))

plot of chunk unnamed-chunk-17

Based on above plot, we can find some details about life expectancy changing over time on different continents.

Let us also try boxplot, which can give us a detailed description about life expectancy changing over time on different continents.

bwplot(lifeExp ~ as.factor(year) | continent, iDat, main ="How is Life Expectancy Changing over Time on Diffferent Continents")

plot of chunk unnamed-chunk-18

To be specific, the black points inside box explains how median of life expectancy changing over time on different continents.

Now, let us to think a way to put the plots for average life expectancy for all continents into one single plot.

xyplot(avgLifeExp ~ year, lifeExpChange, groups = continent, grid = "h",type =c("p","a"), main ="How is Life Expectancy Changing over Time on Diffferent Continents",  auto.key = list(TRUE, space = "right"))

plot of chunk unnamed-chunk-19


Depict the Number of Countries with Low Life Expectancy over Time by Continent

For this question, I will use code from Sean here again!

lowLifeExp <- as.numeric(quantile(gDat$lifeExp, probs = 0.1))
continentLifeExp <- ddply(iDat, .(continent, year), summarize, lowLifeInstances = sum(lifeExp <= 
    lowLifeExp))  # this is tall
continentLifeExp <- ddply(continentLifeExp, ~year, function(t) setNames(t$lowLifeInstances, 
    unique(t$continent)))
printTable(continentLifeExp)
year Africa Americas Asia Europe
1952 35 2 11 0
1957 29 1 8 0
1962 22 0 4 0
1967 14 0 3 0
1972 10 0 3 0
1977 5 0 2 0
1982 3 0 1 0
1987 3 0 1 0
1992 5 0 0 0
1997 4 0 0 0
2002 4 0 0 0
2007 1 0 0 0

Based on the table above, we found that none of countries from Europe are under low life expectancy. That is not fun. Let us delete Europe and do it again!

lowLifeExp <- as.numeric(quantile(gDat$lifeExp, probs = 0.1))
continentLifeExp <- ddply(subset(iDat, continent != "Europe"), .(continent, year), summarize, lowLifeInstances = sum(lifeExp <= 
    lowLifeExp))  # this is tall
printTable(continentLifeExp)
continent year lowLifeInstances
Africa 1952 35
Africa 1957 29
Africa 1962 22
Africa 1967 14
Africa 1972 10
Africa 1977 5
Africa 1982 3
Africa 1987 3
Africa 1992 5
Africa 1997 4
Africa 2002 4
Africa 2007 1
Americas 1952 2
Americas 1957 1
Americas 1962 0
Americas 1967 0
Americas 1972 0
Americas 1977 0
Americas 1982 0
Americas 1987 0
Americas 1992 0
Americas 1997 0
Americas 2002 0
Americas 2007 0
Asia 1952 11
Asia 1957 8
Asia 1962 4
Asia 1967 3
Asia 1972 3
Asia 1977 2
Asia 1982 1
Asia 1987 1
Asia 1992 0
Asia 1997 0
Asia 2002 0
Asia 2007 0

OK, let us plot this one through stripplot.

stripplot(lowLifeInstances ~ as.factor(year)|continent, continentLifeExp, jitter.data = TRUE, grid = "h", type ="p")

plot of chunk unnamed-chunk-22

Now, we can plot all of them into one graph.

stripplot(lowLifeInstances ~ year, continentLifeExp, groups = continent, grid = "h",type ="p", auto.key = list(TRUE, space = "right"), main ="Number of Countries with Low Life Expectancy over Time by Continent")

plot of chunk unnamed-chunk-23

Here we go! We have the results!