Huiting Ma
This homework will focus on the following parts:
First, the Gapminder data can be imported from here. Then, make sure that you are working in the correct directory on your computer.
setwd("C:/Users/user/Desktop/545A")
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(gdURL)
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Now, let us get started. In order to do the following steps, make sure you have already installed “plyr” and “xytable” packages.
library(xtable)
library(lattice)
library(plyr)
The first example that I want to discuss is how to get the maximum and minimum of GDP per capica for all continents.I will introduce two ways to achieve our results.
Way # 1
The following function helps to determine the maximum and minimum GDP per capita for each continent (for example “Africa”). Also, this function can give a good names for the resutls.
hcontinent <- "Africa"
hDat <- subset(gDat, continent == hcontinent)
hFun <- function(x) {
maxgdpPercap <- max(x$gdpPercap)
mingdpPercap <- min(x$gdpPercap)
names(maxgdpPercap) <- "maxgdpPercap"
names(mingdpPercap) <- "mingdpPercap"
return(c(maxgdpPercap, mingdpPercap))
}
hFun(hDat)
## maxgdpPercap mingdpPercap
## 21951.2 241.2
Now, Let us place this function inside a ddply().
GetmaxminPercapway1 <- ddply(gDat, ~continent, hFun)
Yes! We got our results. However, we want to sort the table on minimum GDP per capital. Then, we need apply arrange().
GetmaxminPercapway1 <- arrange(GetmaxminPercapway1, mingdpPercap)
Last step, we can display the table through apply xtable().
GetmaxminPercapway1 <- xtable(GetmaxminPercapway1)
print(GetmaxminPercapway1, type = "html", include.rownames = FALSE)
| continent | maxgdpPercap | mingdpPercap |
|---|---|---|
| Africa | 21951.21 | 241.17 |
| Asia | 113523.13 | 331.00 |
| Europe | 49357.19 | 973.53 |
| Americas | 42951.65 | 1201.64 |
| Oceania | 34435.37 | 10039.60 |
Based on the table above, we noticed that the minimum GDP per capita in African is much less than Oceania.
Way # 2
In order to simplify the code, we can try this:
GetmaxminPercapway2 <- ddply(gDat, ~continent, summarize, maxgdpPercap = max(gdpPercap),
mingdpPercap = min(gdpPercap))
GetmaxminPercapway2 <- arrange(GetmaxminPercapway2, mingdpPercap)
Now, we can print out our results through applying xytable.
GetmaxminPercapway2 <- xtable(GetmaxminPercapway2)
print(GetmaxminPercapway2, type = "html", include.rownames = FALSE)
| continent | maxgdpPercap | mingdpPercap |
|---|---|---|
| Africa | 21951.21 | 241.17 |
| Asia | 113523.13 | 331.00 |
| Europe | 49357.19 | 973.53 |
| Americas | 42951.65 | 1201.64 |
| Oceania | 34435.37 | 10039.60 |
Here we go, the results are the same as way # 1.
Next table we are going to talk about is how to change our “wide” format to “long” format. We could do that through reshape(), but today we will focus on using ddply().
GetmaxminPercapLong <- ddply(gDat, ~continent, summarize, gdpPercapSummary = c(max(gdpPercap),
min(gdpPercap)), SummaryDescribe = factor(c("maximum", "minimum")))
Now, we can print out our results through applying xytable.
GetmaxminPercapLong <- xtable(GetmaxminPercapLong)
print(GetmaxminPercapLong, type = "html", include.rownames = FALSE)
| continent | gdpPercapSummary | SummaryDescribe |
|---|---|---|
| Africa | 21951.21 | maximum |
| Africa | 241.17 | minimum |
| Americas | 42951.65 | maximum |
| Americas | 1201.64 | minimum |
| Asia | 113523.13 | maximum |
| Asia | 331.00 | minimum |
| Europe | 49357.19 | maximum |
| Europe | 973.53 | minimum |
| Oceania | 34435.37 | maximum |
| Oceania | 10039.60 | minimum |
The above table gives us the same information as “wide” format.
Now, we can look at the spread of GDP per capita within the continents.
GetSpread <- ddply(gDat, ~continent, summarize, sdGdpPercap = sd(gdpPercap),
varGdpPercap = var(gdpPercap), medGdpPercap = median(gdpPercap), IQRgdpPercap = IQR(gdpPercap))
GetSpread <- arrange(GetSpread, sdGdpPercap)
Then, we can print out our answers.
GetSpread <- xtable(GetSpread)
print(GetSpread, type = "html", include.rownames = FALSE)
| continent | sdGdpPercap | varGdpPercap | medGdpPercap | IQRgdpPercap |
|---|---|---|---|---|
| Africa | 2827.93 | 7997187.31 | 1192.14 | 1616.17 |
| Oceania | 6358.98 | 40436668.87 | 17983.30 | 8072.26 |
| Americas | 6396.76 | 40918591.10 | 5465.51 | 4402.43 |
| Europe | 9355.21 | 87520019.60 | 12081.75 | 13248.30 |
| Asia | 14045.37 | 197272505.85 | 2646.79 | 7492.26 |
Based on the table above, Asia is experiencing the hugest varability amoung all contingents.
Trimmed mean is also known as truncated mean. It is a statistical measure of central tendency. The following steps are introducing a way to compute a trimmed mean of life expectancy for diffferent years.
trimmedMeanByYear <- ddply(gDat, ~year, summarize, trimmedmean = mean(lifeExp,
trim = 1))
Now, we can print out our results through applying xytable.
trimmedMeanByYear <- xtable(trimmedMeanByYear)
print(trimmedMeanByYear, type = "html", include.rownames = FALSE)
| year | trimmedmean |
|---|---|
| 1952 | 45.14 |
| 1957 | 48.36 |
| 1962 | 50.88 |
| 1967 | 53.83 |
| 1972 | 56.53 |
| 1977 | 59.67 |
| 1982 | 62.44 |
| 1987 | 65.83 |
| 1992 | 67.70 |
| 1997 | 69.39 |
| 2002 | 70.83 |
| 2007 | 71.94 |
According to the above table, the trimmed mean is increasing from 1952 to 2007.
Another table is to compare life expectancy over time on different continents.
LifeExpByYearContinent <- ddply(gDat, ~continent + year, summarize, LifeExpMean = mean(lifeExp))
Now, we can print out our results through applying xytable.
LifeExpByYearContinent <- xtable(LifeExpByYearContinent)
print(LifeExpByYearContinent, type = "html", include.rownames = FALSE)
| continent | year | LifeExpMean |
|---|---|---|
| Africa | 1952 | 39.14 |
| Africa | 1957 | 41.27 |
| Africa | 1962 | 43.32 |
| Africa | 1967 | 45.33 |
| Africa | 1972 | 47.45 |
| Africa | 1977 | 49.58 |
| Africa | 1982 | 51.59 |
| Africa | 1987 | 53.34 |
| Africa | 1992 | 53.63 |
| Africa | 1997 | 53.60 |
| Africa | 2002 | 53.33 |
| Africa | 2007 | 54.81 |
| Americas | 1952 | 53.28 |
| Americas | 1957 | 55.96 |
| Americas | 1962 | 58.40 |
| Americas | 1967 | 60.41 |
| Americas | 1972 | 62.39 |
| Americas | 1977 | 64.39 |
| Americas | 1982 | 66.23 |
| Americas | 1987 | 68.09 |
| Americas | 1992 | 69.57 |
| Americas | 1997 | 71.15 |
| Americas | 2002 | 72.42 |
| Americas | 2007 | 73.61 |
| Asia | 1952 | 46.31 |
| Asia | 1957 | 49.32 |
| Asia | 1962 | 51.56 |
| Asia | 1967 | 54.66 |
| Asia | 1972 | 57.32 |
| Asia | 1977 | 59.61 |
| Asia | 1982 | 62.62 |
| Asia | 1987 | 64.85 |
| Asia | 1992 | 66.54 |
| Asia | 1997 | 68.02 |
| Asia | 2002 | 69.23 |
| Asia | 2007 | 70.73 |
| Europe | 1952 | 64.41 |
| Europe | 1957 | 66.70 |
| Europe | 1962 | 68.54 |
| Europe | 1967 | 69.74 |
| Europe | 1972 | 70.78 |
| Europe | 1977 | 71.94 |
| Europe | 1982 | 72.81 |
| Europe | 1987 | 73.64 |
| Europe | 1992 | 74.44 |
| Europe | 1997 | 75.51 |
| Europe | 2002 | 76.70 |
| Europe | 2007 | 77.65 |
| Oceania | 1952 | 69.25 |
| Oceania | 1957 | 70.30 |
| Oceania | 1962 | 71.09 |
| Oceania | 1967 | 71.31 |
| Oceania | 1972 | 71.91 |
| Oceania | 1977 | 72.85 |
| Oceania | 1982 | 74.29 |
| Oceania | 1987 | 75.32 |
| Oceania | 1992 | 76.94 |
| Oceania | 1997 | 78.19 |
| Oceania | 2002 | 79.74 |
| Oceania | 2007 | 80.72 |
The results show that Oceania has longer life expectancy than Africa.
If the life expectancy for a country is less than 60, I defined this country to be low life expectancy.
benchmark <- 60
NumlowLifeEXbyContinent <- ddply(gDat, ~continent + year, summarize, CountLowLifeEX = sum(lifeExp <
benchmark))
Now, we can print out our results.
NumlowLifeEXbyContinent <- xtable(NumlowLifeEXbyContinent)
print(NumlowLifeEXbyContinent, type = "html", include.rownames = FALSE)
| continent | year | CountLowLifeEX |
|---|---|---|
| Africa | 1952 | 52 |
| Africa | 1957 | 52 |
| Africa | 1962 | 51 |
| Africa | 1967 | 50 |
| Africa | 1972 | 50 |
| Africa | 1977 | 50 |
| Africa | 1982 | 44 |
| Africa | 1987 | 40 |
| Africa | 1992 | 39 |
| Africa | 1997 | 39 |
| Africa | 2002 | 41 |
| Africa | 2007 | 40 |
| Americas | 1952 | 19 |
| Americas | 1957 | 15 |
| Americas | 1962 | 13 |
| Americas | 1967 | 11 |
| Americas | 1972 | 10 |
| Americas | 1977 | 7 |
| Americas | 1982 | 5 |
| Americas | 1987 | 2 |
| Americas | 1992 | 2 |
| Americas | 1997 | 1 |
| Americas | 2002 | 1 |
| Americas | 2007 | 0 |
| Asia | 1952 | 29 |
| Asia | 1957 | 27 |
| Asia | 1962 | 25 |
| Asia | 1967 | 25 |
| Asia | 1972 | 19 |
| Asia | 1977 | 14 |
| Asia | 1982 | 12 |
| Asia | 1987 | 8 |
| Asia | 1992 | 7 |
| Asia | 1997 | 6 |
| Asia | 2002 | 4 |
| Asia | 2007 | 3 |
| Europe | 1952 | 7 |
| Europe | 1957 | 3 |
| Europe | 1962 | 1 |
| Europe | 1967 | 1 |
| Europe | 1972 | 1 |
| Europe | 1977 | 1 |
| Europe | 1982 | 0 |
| Europe | 1987 | 0 |
| Europe | 1992 | 0 |
| Europe | 1997 | 0 |
| Europe | 2002 | 0 |
| Europe | 2007 | 0 |
| Oceania | 1952 | 0 |
| Oceania | 1957 | 0 |
| Oceania | 1962 | 0 |
| Oceania | 1967 | 0 |
| Oceania | 1972 | 0 |
| Oceania | 1977 | 0 |
| Oceania | 1982 | 0 |
| Oceania | 1987 | 0 |
| Oceania | 1992 | 0 |
| Oceania | 1997 | 0 |
| Oceania | 2002 | 0 |
| Oceania | 2007 | 0 |
By looking at the table above, we can find that there are a lot of countries in Africa which have low life expectancy. However, the overall life expectancy in Oceania is much higher!! (No country has low life expectancy).
The last table we will look at is a extension of above question. We will try to get the proportion of countries with low life expectancy instead the number of countries.
benchmark <- 60
ProlowLifeEXbyContinent <- ddply(gDat, ~continent + year, summarize, PropLowLifeEXbyContinent = sum(lifeExp <
benchmark)/length(lifeExp))
Now, let us see the results!
ProlowLifeEXbyContinent <- xtable(ProlowLifeEXbyContinent)
print(ProlowLifeEXbyContinent, type = "html", include.rownames = FALSE)
| continent | year | PropLowLifeEXbyContinent |
|---|---|---|
| Africa | 1952 | 1.00 |
| Africa | 1957 | 1.00 |
| Africa | 1962 | 0.98 |
| Africa | 1967 | 0.96 |
| Africa | 1972 | 0.96 |
| Africa | 1977 | 0.96 |
| Africa | 1982 | 0.85 |
| Africa | 1987 | 0.77 |
| Africa | 1992 | 0.75 |
| Africa | 1997 | 0.75 |
| Africa | 2002 | 0.79 |
| Africa | 2007 | 0.77 |
| Americas | 1952 | 0.76 |
| Americas | 1957 | 0.60 |
| Americas | 1962 | 0.52 |
| Americas | 1967 | 0.44 |
| Americas | 1972 | 0.40 |
| Americas | 1977 | 0.28 |
| Americas | 1982 | 0.20 |
| Americas | 1987 | 0.08 |
| Americas | 1992 | 0.08 |
| Americas | 1997 | 0.04 |
| Americas | 2002 | 0.04 |
| Americas | 2007 | 0.00 |
| Asia | 1952 | 0.88 |
| Asia | 1957 | 0.82 |
| Asia | 1962 | 0.76 |
| Asia | 1967 | 0.76 |
| Asia | 1972 | 0.58 |
| Asia | 1977 | 0.42 |
| Asia | 1982 | 0.36 |
| Asia | 1987 | 0.24 |
| Asia | 1992 | 0.21 |
| Asia | 1997 | 0.18 |
| Asia | 2002 | 0.12 |
| Asia | 2007 | 0.09 |
| Europe | 1952 | 0.23 |
| Europe | 1957 | 0.10 |
| Europe | 1962 | 0.03 |
| Europe | 1967 | 0.03 |
| Europe | 1972 | 0.03 |
| Europe | 1977 | 0.03 |
| Europe | 1982 | 0.00 |
| Europe | 1987 | 0.00 |
| Europe | 1992 | 0.00 |
| Europe | 1997 | 0.00 |
| Europe | 2002 | 0.00 |
| Europe | 2007 | 0.00 |
| Oceania | 1952 | 0.00 |
| Oceania | 1957 | 0.00 |
| Oceania | 1962 | 0.00 |
| Oceania | 1967 | 0.00 |
| Oceania | 1972 | 0.00 |
| Oceania | 1977 | 0.00 |
| Oceania | 1982 | 0.00 |
| Oceania | 1987 | 0.00 |
| Oceania | 1992 | 0.00 |
| Oceania | 1997 | 0.00 |
| Oceania | 2002 | 0.00 |
| Oceania | 2007 | 0.00 |
We can find that more than 50% countries in Africa have low life expectancy.