latticeHuiting Ma
This homework will focus on the following parts:
First, the Gapminder data can be imported from here. Then, make sure that you are working in the correct directory on your computer.
# setwd("C:/Users/user/Desktop/UBC/STAT545")
gdURL<-"http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat<-read.delim(gdURL)
Now, Let us check whether the dataset has imported correctly.
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Ok, the sample set looks like is what we want! Then, we can load lattice and plyr packages.
library(lattice)
## Warning: package 'lattice' was built under R version 3.0.2
library(plyr)
## Warning: package 'plyr' was built under R version 3.0.2
library(xtable)
Now, Let us start our jounery. Through analyzing dataset from last homework, I noticed that there are only two measly coutries in Oceania. I would like to drop these observation and focus my analysis on other continents.
iDat <- droplevels(subset(gDat, continent != "Oceania"))
Let us check whether we did it!
table(iDat$continent)
##
## Africa Americas Asia Europe
## 624 300 396 360
Yes! Oceania has dropped successfully.
The first plot I will introdue is to plot the maximum and minimum of GDP/Capita for all continents.The code that I am going to use is from Sean here and JB here. However, instead of using gDat, I will choose to use iDat that I defined above. Let us try to get the table from last assignment.
printTable <- function(df)
{
print(xtable(df), type = 'html', include.rownames = F)
}
In order to simplify the code, he chose to write a help function. Smart Way! The same like JB way!
minMaxGDPwide <- ddply(iDat, ~continent, summarize, minGDP = min(gdpPercap), maxGDP = max(gdpPercap))
minMaxGDPwide <- arrange(minMaxGDPwide, minGDP) # sort on minGDP
## However, this is a wide "format". Let's use JB's code to make the table to "tall" format.
minMaxGDPwideT<-ddply(iDat, ~ continent, function(x){
gdpPercap <-range(x$gdpPercap)
return(data.frame(gdpPercap, stat = c("min", "max")))
})
printTable(minMaxGDPwideT)
| continent | gdpPercap | stat |
|---|---|---|
| Africa | 241.17 | min |
| Africa | 21951.21 | max |
| Americas | 1201.64 | min |
| Americas | 42951.65 | max |
| Asia | 331.00 | min |
| Asia | 113523.13 | max |
| Europe | 973.53 | min |
| Europe | 49357.19 | max |
xyplot(gdpPercap~continent,data = arrange (minMaxGDPwideT,stat, gdpPercap), auto.key = list (TRUE, space = "right"), groups = stat, main = "Depict the Maximum and Minimum of GDP/Capita for all Continents")
However, if we only draw above graph, we will not have a lot of information. Let us try to plot all information from each continent.
stripplot(gdpPercap~continent, iDat, jitter.data = TRUE,grid = "h", type = c("p", "a"),fun=max, col = "black")
Based on the table, I noticed that the minimum GDP/Capita for all continents are similar (all close to 0). Therefore, I tried to draw a line to show the changes of maximum GDP/Capita for all continents.
spreadGDP <- ddply(gDat, ~continent, summarize, SD = sd(gdpPercap), MAD = mad(gdpPercap),
IQR = IQR(gdpPercap))
spreadGDP <- arrange(spreadGDP, SD)
printTable(spreadGDP)
| continent | SD | MAD | IQR |
|---|---|---|---|
| Africa | 2827.93 | 775.32 | 1616.17 |
| Oceania | 6358.98 | 6459.10 | 8072.26 |
| Americas | 6396.76 | 3269.33 | 4402.43 |
| Europe | 9355.21 | 8846.05 | 13248.30 |
| Asia | 14045.37 | 2820.83 | 7492.26 |
In order to describe the spread of GDP/Capital within the continents, he measured the standard deviation, median absolute deviance, and Interquartile Range. However, for this question, table cannot give you a vivid description.
First, let us try the stripplot.
stripplot(gdpPercap~continent, iDat, jitter.data = TRUE,grid = "h", type = c("p", "a"),fun=mean)
For this graph, we still cannot get a lot of information on the spread of GDP/Capita within the continents. Now, let us to try to use boxplot to describe the spread of GDP/Capita within the continents better.
bwplot(gdpPercap ~ continent, iDat)
Boxplot gives us a graphic description for the spread of GDP/Capita. By looking at above graph, we can notice that the Asia is experiencin the huge varation in GDP/Capita. Even though Africa has low standard deviation in GDP/Capita, the overal level of GDP/Capita is relatively very small.
bwplot(gdpPercap ~ continent, iDat, panel = panel.violin)
lifeExpChange<-ddply(iDat, ~continent + year, summarize, avgLifeExp = mean(lifeExp))
printTable(lifeExpChange)
| continent | year | avgLifeExp |
|---|---|---|
| Africa | 1952 | 39.14 |
| Africa | 1957 | 41.27 |
| Africa | 1962 | 43.32 |
| Africa | 1967 | 45.33 |
| Africa | 1972 | 47.45 |
| Africa | 1977 | 49.58 |
| Africa | 1982 | 51.59 |
| Africa | 1987 | 53.34 |
| Africa | 1992 | 53.63 |
| Africa | 1997 | 53.60 |
| Africa | 2002 | 53.33 |
| Africa | 2007 | 54.81 |
| Americas | 1952 | 53.28 |
| Americas | 1957 | 55.96 |
| Americas | 1962 | 58.40 |
| Americas | 1967 | 60.41 |
| Americas | 1972 | 62.39 |
| Americas | 1977 | 64.39 |
| Americas | 1982 | 66.23 |
| Americas | 1987 | 68.09 |
| Americas | 1992 | 69.57 |
| Americas | 1997 | 71.15 |
| Americas | 2002 | 72.42 |
| Americas | 2007 | 73.61 |
| Asia | 1952 | 46.31 |
| Asia | 1957 | 49.32 |
| Asia | 1962 | 51.56 |
| Asia | 1967 | 54.66 |
| Asia | 1972 | 57.32 |
| Asia | 1977 | 59.61 |
| Asia | 1982 | 62.62 |
| Asia | 1987 | 64.85 |
| Asia | 1992 | 66.54 |
| Asia | 1997 | 68.02 |
| Asia | 2002 | 69.23 |
| Asia | 2007 | 70.73 |
| Europe | 1952 | 64.41 |
| Europe | 1957 | 66.70 |
| Europe | 1962 | 68.54 |
| Europe | 1967 | 69.74 |
| Europe | 1972 | 70.78 |
| Europe | 1977 | 71.94 |
| Europe | 1982 | 72.81 |
| Europe | 1987 | 73.64 |
| Europe | 1992 | 74.44 |
| Europe | 1997 | 75.51 |
| Europe | 2002 | 76.70 |
| Europe | 2007 | 77.65 |
stripplot(avgLifeExp ~ as.factor(year)|continent, lifeExpChange, jitter.data = TRUE, grid = "h", main ="How is Life Expectancy Changing over Time on Diffferent Continents", type ="p")
Here, we only can know how expected life expectancy changing over time on different continents.
Now, let us try to all life expectancy and expected life expectancy changing over time on different continents.
stripplot(lifeExp ~ as.factor(year)|continent, iDat, jitter.data = TRUE, grid = "h", main ="How is Life Expectancy Changing over Time on Diffferent Continents", type =c("p","a"))
Based on above plot, we can find some details about life expectancy changing over time on different continents.
Let us also try boxplot, which can give us a detailed description about life expectancy changing over time on different continents.
bwplot(lifeExp ~ as.factor(year) | continent, iDat, main ="How is Life Expectancy Changing over Time on Diffferent Continents")
To be specific, the black points inside box explains how median of life expectancy changing over time on different continents.
Now, let us to think a way to put the plots for average life expectancy for all continents into one single plot.
xyplot(avgLifeExp ~ year, lifeExpChange, groups = continent, grid = "h",type =c("p","a"), main ="How is Life Expectancy Changing over Time on Diffferent Continents", auto.key = list(TRUE, space = "right"))
For this question, I will use code from Sean here again!
lowLifeExp <- as.numeric(quantile(gDat$lifeExp, probs = 0.1))
continentLifeExp <- ddply(iDat, .(continent, year), summarize, lowLifeInstances = sum(lifeExp <=
lowLifeExp)) # this is tall
continentLifeExp <- ddply(continentLifeExp, ~year, function(t) setNames(t$lowLifeInstances,
unique(t$continent)))
printTable(continentLifeExp)
| year | Africa | Americas | Asia | Europe |
|---|---|---|---|---|
| 1952 | 35 | 2 | 11 | 0 |
| 1957 | 29 | 1 | 8 | 0 |
| 1962 | 22 | 0 | 4 | 0 |
| 1967 | 14 | 0 | 3 | 0 |
| 1972 | 10 | 0 | 3 | 0 |
| 1977 | 5 | 0 | 2 | 0 |
| 1982 | 3 | 0 | 1 | 0 |
| 1987 | 3 | 0 | 1 | 0 |
| 1992 | 5 | 0 | 0 | 0 |
| 1997 | 4 | 0 | 0 | 0 |
| 2002 | 4 | 0 | 0 | 0 |
| 2007 | 1 | 0 | 0 | 0 |
Based on the table above, we found that none of countries from Europe are under low life expectancy. That is not fun. Let us delete Europe and do it again!
lowLifeExp <- as.numeric(quantile(gDat$lifeExp, probs = 0.1))
continentLifeExp <- ddply(subset(iDat, continent != "Europe"), .(continent, year), summarize, lowLifeInstances = sum(lifeExp <=
lowLifeExp)) # this is tall
printTable(continentLifeExp)
| continent | year | lowLifeInstances |
|---|---|---|
| Africa | 1952 | 35 |
| Africa | 1957 | 29 |
| Africa | 1962 | 22 |
| Africa | 1967 | 14 |
| Africa | 1972 | 10 |
| Africa | 1977 | 5 |
| Africa | 1982 | 3 |
| Africa | 1987 | 3 |
| Africa | 1992 | 5 |
| Africa | 1997 | 4 |
| Africa | 2002 | 4 |
| Africa | 2007 | 1 |
| Americas | 1952 | 2 |
| Americas | 1957 | 1 |
| Americas | 1962 | 0 |
| Americas | 1967 | 0 |
| Americas | 1972 | 0 |
| Americas | 1977 | 0 |
| Americas | 1982 | 0 |
| Americas | 1987 | 0 |
| Americas | 1992 | 0 |
| Americas | 1997 | 0 |
| Americas | 2002 | 0 |
| Americas | 2007 | 0 |
| Asia | 1952 | 11 |
| Asia | 1957 | 8 |
| Asia | 1962 | 4 |
| Asia | 1967 | 3 |
| Asia | 1972 | 3 |
| Asia | 1977 | 2 |
| Asia | 1982 | 1 |
| Asia | 1987 | 1 |
| Asia | 1992 | 0 |
| Asia | 1997 | 0 |
| Asia | 2002 | 0 |
| Asia | 2007 | 0 |
OK, let us plot this one through stripplot.
stripplot(lowLifeInstances ~ as.factor(year)|continent, continentLifeExp, jitter.data = TRUE, grid = "h", type ="p")
Now, we can plot all of them into one graph.
stripplot(lowLifeInstances ~ year, continentLifeExp, groups = continent, grid = "h",type ="p", auto.key = list(TRUE, space = "right"), main ="Number of Countries with Low Life Expectancy over Time by Continent")
Here we go! We have the results!