Jack Ni
Importing the Gapminder dataset from Jenny's website. Doing a quick check to see if the import went fine.
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Loading the “plyr”, “xtable”, and “lattice” package.
library(plyr)
library(xtable)
library(lattice)
I am using the data aggregation from Jinyuan's report on mean life expectancy by continent and year in the “tall” format. The “tall” format, while much harder for a person to read and interpret in a table, is far easier to make figures and plots with.
ChangeLife = ddply(gDat, .(continent, year), summarize, MeanLifeExp = mean(lifeExp))
ChangeLife <- within(ChangeLife, continent <- reorder(continent, MeanLifeExp))
ChangeLifeTable = xtable(ChangeLife)
print(ChangeLifeTable, type = "html", include.rownames = FALSE)
| continent | year | MeanLifeExp |
|---|---|---|
| Africa | 1952 | 39.14 |
| Africa | 1957 | 41.27 |
| Africa | 1962 | 43.32 |
| Africa | 1967 | 45.33 |
| Africa | 1972 | 47.45 |
| Africa | 1977 | 49.58 |
| Africa | 1982 | 51.59 |
| Africa | 1987 | 53.34 |
| Africa | 1992 | 53.63 |
| Africa | 1997 | 53.60 |
| Africa | 2002 | 53.33 |
| Africa | 2007 | 54.81 |
| Americas | 1952 | 53.28 |
| Americas | 1957 | 55.96 |
| Americas | 1962 | 58.40 |
| Americas | 1967 | 60.41 |
| Americas | 1972 | 62.39 |
| Americas | 1977 | 64.39 |
| Americas | 1982 | 66.23 |
| Americas | 1987 | 68.09 |
| Americas | 1992 | 69.57 |
| Americas | 1997 | 71.15 |
| Americas | 2002 | 72.42 |
| Americas | 2007 | 73.61 |
| Asia | 1952 | 46.31 |
| Asia | 1957 | 49.32 |
| Asia | 1962 | 51.56 |
| Asia | 1967 | 54.66 |
| Asia | 1972 | 57.32 |
| Asia | 1977 | 59.61 |
| Asia | 1982 | 62.62 |
| Asia | 1987 | 64.85 |
| Asia | 1992 | 66.54 |
| Asia | 1997 | 68.02 |
| Asia | 2002 | 69.23 |
| Asia | 2007 | 70.73 |
| Europe | 1952 | 64.41 |
| Europe | 1957 | 66.70 |
| Europe | 1962 | 68.54 |
| Europe | 1967 | 69.74 |
| Europe | 1972 | 70.78 |
| Europe | 1977 | 71.94 |
| Europe | 1982 | 72.81 |
| Europe | 1987 | 73.64 |
| Europe | 1992 | 74.44 |
| Europe | 1997 | 75.51 |
| Europe | 2002 | 76.70 |
| Europe | 2007 | 77.65 |
| Oceania | 1952 | 69.25 |
| Oceania | 1957 | 70.30 |
| Oceania | 1962 | 71.09 |
| Oceania | 1967 | 71.31 |
| Oceania | 1972 | 71.91 |
| Oceania | 1977 | 72.85 |
| Oceania | 1982 | 74.29 |
| Oceania | 1987 | 75.32 |
| Oceania | 1992 | 76.94 |
| Oceania | 1997 | 78.19 |
| Oceania | 2002 | 79.74 |
| Oceania | 2007 | 80.72 |
I started off plotting a stripplot, sorted by continents of increasing mean life expectancy. It's easily seen that countries in Africa has an overall lower life expectancy than countres in the other continents. From this, we can also see that Asia and Americas has a larger variation in mean life expectancy over the years than the other continents.
stripplot(MeanLifeExp ~ continent, ChangeLife, grid = "h", type = c("p", "a"))
Here I made two density plots to give a better picture. The first gives an overall plot, and we can see that it is slightly bimodal. The second plot shows the reason why. The lower mean life expectancy for Africa is creating the first smaller bump in the first plot. The res of the continents have a more similar life expectancy.
densityplot(~MeanLifeExp, ChangeLife)
densityplot(~MeanLifeExp, ChangeLife, plot.points = FALSE, ref = TRUE, group = continent,
auto.key = list(columns = nlevels(ChangeLife$continent)))
Here I a data aggregation result on mean life expectancy and mean population by continent and year (with Oceania dropped). I created another column for the standardized ratio of life expectancy to population. I wanted to see how the life expectancy changed as population changed.
gDat <- droplevels(subset(gDat, continent != "Oceania"))
leAndPopByYearAndCont <- ddply(gDat, ~year + continent, summarize, meanPop = mean(pop),
meanLe = mean(lifeExp), ratio = (meanLe/var(lifeExp)/(meanPop/var(pop))))
leAndPopByYearAndContTable = xtable(leAndPopByYearAndCont)
print(leAndPopByYearAndContTable, type = "html", include.rownames = FALSE)
| year | continent | meanPop | meanLe | ratio |
|---|---|---|---|---|
| 1952 | Africa | 4570009.63 | 39.14 | 12878230.97 |
| 1952 | Americas | 13806097.84 | 53.28 | 46410619.93 |
| 1952 | Asia | 42283556.12 | 46.31 | 162647486.72 |
| 1952 | Europe | 13937361.53 | 64.41 | 33974226.07 |
| 1957 | Africa | 5093033.42 | 41.27 | 12844247.39 |
| 1957 | Americas | 15478156.64 | 55.96 | 55955284.34 |
| 1957 | Asia | 47356987.85 | 49.32 | 184058507.66 |
| 1957 | Europe | 14596345.03 | 66.70 | 51814938.60 |
| 1962 | Africa | 5702247.40 | 43.32 | 13935592.34 |
| 1962 | Americas | 17330810.16 | 58.40 | 70431446.72 |
| 1962 | Asia | 51404763.09 | 51.56 | 192659865.21 |
| 1962 | Europe | 15345171.83 | 68.54 | 83981250.16 |
| 1967 | Africa | 6447874.79 | 45.33 | 15342943.71 |
| 1967 | Americas | 19229864.92 | 60.41 | 88274581.86 |
| 1967 | Asia | 57747360.61 | 54.66 | 238971408.90 |
| 1967 | Europe | 16039298.60 | 69.74 | 113849032.74 |
| 1972 | Africa | 7305375.79 | 47.45 | 16193058.24 |
| 1972 | Americas | 21175368.40 | 62.39 | 110947270.12 |
| 1972 | Asia | 65180977.21 | 57.32 | 281953855.52 |
| 1972 | Europe | 16687835.30 | 70.78 | 164472197.31 |
| 1977 | Africa | 8328096.56 | 49.58 | 17238761.53 |
| 1977 | Americas | 23122707.96 | 64.39 | 127866860.79 |
| 1977 | Asia | 72257986.55 | 59.61 | 301848472.86 |
| 1977 | Europe | 17238817.70 | 71.94 | 181101819.19 |
| 1982 | Africa | 9602857.44 | 51.59 | 17881422.99 |
| 1982 | Americas | 25211636.80 | 66.23 | 153016889.63 |
| 1982 | Asia | 79095017.64 | 62.62 | 463590776.05 |
| 1982 | Europe | 17708896.70 | 72.81 | 174576635.55 |
| 1987 | Africa | 11054502.12 | 53.34 | 18212083.43 |
| 1987 | Americas | 27310158.84 | 68.09 | 219669241.33 |
| 1987 | Asia | 87006689.76 | 64.85 | 564321955.25 |
| 1987 | Europe | 18103138.67 | 73.64 | 184900915.27 |
| 1992 | Africa | 12674644.56 | 53.63 | 14580469.18 |
| 1992 | Americas | 29570964.16 | 69.57 | 297537724.10 |
| 1992 | Asia | 94948248.21 | 66.54 | 644800365.88 |
| 1992 | Europe | 18604759.90 | 74.44 | 190137058.54 |
| 1997 | Africa | 14304480.46 | 53.60 | 17856657.18 |
| 1997 | Americas | 31876016.40 | 71.15 | 359557566.65 |
| 1997 | Asia | 102523803.03 | 68.02 | 697515222.10 |
| 1997 | Europe | 18964804.93 | 75.51 | 213740856.10 |
| 2002 | Africa | 16033152.23 | 53.33 | 18001989.27 |
| 2002 | Americas | 33990910.48 | 72.42 | 398021870.44 |
| 2002 | Asia | 109145521.30 | 69.23 | 692482261.82 |
| 2002 | Europe | 19274128.97 | 76.70 | 251346406.73 |
| 2007 | Africa | 17875763.31 | 54.81 | 20523794.07 |
| 2007 | Americas | 35954847.36 | 73.61 | 491835901.18 |
| 2007 | Asia | 115513752.33 | 70.73 | 810112703.30 |
| 2007 | Europe | 19536617.63 | 77.65 | 249827671.88 |
There are 4 figures. The first two are box and whisker plots of the mean life expectancy and mean population. Asia has the most variation in population by a large amount but is only slightly larger for life expectancy.
bwplot(meanLe ~ continent, leAndPopByYearAndCont)
bwplot(meanPop ~ continent, leAndPopByYearAndCont)
The 3rd plot is an xyplot of the ratio. Asia, Europe, and Americas all have an increasing ratio which means their standards of living (a big factor in life expectancy in my opinion) is increasing faster than the increase in their population. I made a 4th figure for Africa to see if its slope really is flat or if the scale of the 3rd figure is too big. The 4th figure shows Africa follows the same trend.
xyplot(ratio ~ year | factor(continent), leAndPopByYearAndCont, scales = list(y = list(draw = FALSE)))
xyplot(ratio ~ year, leAndPopByYearAndCont, subset = continent == "Africa",
scales = list(y = list(draw = FALSE)))