Yiming Zhang
In this exercise, I will mainly use the “plyr” to do some analysis on Gapminder data.
First, loading the Gapminder data and needed packages.
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
library(lattice)
library(plyr)
Then have a quick check of the imported data.
names(gDat)
## [1] "country" "year" "pop" "continent" "lifeExp" "gdpPercap"
(1) Look at the spread of GDP per capital within the continents.
GDPByCont <- ddply(gDat, ~continent, summarize, minGdpPercap = min(gdpPercap),
maxGdpPercap = max(gdpPercap), RANGE = max(gdpPercap) - min(gdpPercap),
VAR = var(gdpPercap), MEAN = mean(gdpPercap))
(GdpByCont <- arrange(GDPByCont, RANGE))
## continent minGdpPercap maxGdpPercap RANGE VAR MEAN
## 1 Africa 241.2 21951 21710 7997187 2194
## 2 Oceania 10039.6 34435 24396 40436669 18622
## 3 Americas 1201.6 42952 41750 40918591 7136
## 4 Europe 973.5 49357 48384 87520020 14469
## 5 Asia 331.0 113523 113192 197272506 7902
Sorted by the range of GDP per capital, we can see that except Asia, other four continents' range are in a same level. For Asia, the range is extremely high, this is because Asia has the max GDP per capital and the second min GDP per capital, which means Asia actually has the greatest gap between the rich and the poor. This can also be showed by the variance. Another thing should be noticed is that Africa has lowest range and also variance, this is because most of the Africa countries are similarly poor, which can be showed by the mean.
(2)Look at the life expectancy changing over time on different continents.
In a tall format, we can get
(LifeExpByyear.tall <- ddply(gDat, ~continent + year, summarize, meanLifeExp = mean(lifeExp,
trim = 0.25)))
## continent year meanLifeExp
## 1 Africa 1952 39.14
## 2 Africa 1957 41.04
## 3 Africa 1962 42.96
## 4 Africa 1967 44.97
## 5 Africa 1972 47.07
## 6 Africa 1977 49.15
## 7 Africa 1982 51.09
## 8 Africa 1987 52.63
## 9 Africa 1992 53.44
## 10 Africa 1997 52.65
## 11 Africa 2002 51.60
## 12 Africa 2007 53.41
## 13 Americas 1952 53.17
## 14 Americas 1957 56.41
## 15 Americas 1962 59.23
## 16 Americas 1967 61.27
## 17 Americas 1972 63.24
## 18 Americas 1977 65.21
## 19 Americas 1982 67.15
## 20 Americas 1987 68.72
## 21 Americas 1992 69.99
## 22 Americas 1997 71.51
## 23 Americas 2002 72.62
## 24 Americas 2007 73.75
## 25 Asia 1952 45.13
## 26 Asia 1957 48.35
## 27 Asia 1962 50.56
## 28 Asia 1967 54.24
## 29 Asia 1972 57.75
## 30 Asia 1977 60.75
## 31 Asia 1982 63.39
## 32 Asia 1987 65.82
## 33 Asia 1992 67.53
## 34 Asia 1997 68.97
## 35 Asia 2002 70.27
## 36 Asia 2007 71.64
## 37 Europe 1952 65.27
## 38 Europe 1957 67.58
## 39 Europe 1962 69.36
## 40 Europe 1967 70.39
## 41 Europe 1972 71.05
## 42 Europe 1977 72.07
## 43 Europe 1982 73.07
## 44 Europe 1987 74.00
## 45 Europe 1992 74.91
## 46 Europe 1997 75.96
## 47 Europe 2002 77.08
## 48 Europe 2007 78.06
## 49 Oceania 1952 69.25
## 50 Oceania 1957 70.30
## 51 Oceania 1962 71.09
## 52 Oceania 1967 71.31
## 53 Oceania 1972 71.91
## 54 Oceania 1977 72.85
## 55 Oceania 1982 74.29
## 56 Oceania 1987 75.32
## 57 Oceania 1992 76.94
## 58 Oceania 1997 78.19
## 59 Oceania 2002 79.74
## 60 Oceania 2007 80.72
This is pretty ugly, so let's try to show in a wide format.
(LifeExpByyear.wide <- daply(gDat, ~year + continent, summarize, meanLifeExp = mean(lifeExp,
trim = 0.25)))
## continent
## year Africa Americas Asia Europe Oceania
## 1952 39.14 53.17 45.13 65.27 69.25
## 1957 41.04 56.41 48.35 67.58 70.3
## 1962 42.96 59.23 50.56 69.36 71.09
## 1967 44.97 61.27 54.24 70.39 71.31
## 1972 47.07 63.24 57.75 71.05 71.91
## 1977 49.15 65.21 60.75 72.07 72.85
## 1982 51.09 67.15 63.39 73.07 74.29
## 1987 52.63 68.72 65.82 74 75.32
## 1992 53.44 69.99 67.53 74.91 76.94
## 1997 52.65 71.51 68.97 75.96 78.19
## 2002 51.6 72.62 70.27 77.08 79.74
## 2007 53.41 73.75 71.64 78.06 80.72
This is better. We can see clearly that the life expectancy were increasing over time in every continent, and for the Africa, its life expectancy is significantly lower than other continents.
(3)Get the number of countries with low life expectancy over time by continent.
First, to be simple, I set the benchmark as 60. Then for those countries whose life expectancy are lower than the benchmark, they are classified as low life expectancy country. Then try to get the number of countries with low life expectancy over time by continent.
(nLowLifeExpCountries <- daply(gDat, ~year + continent, summarize, nCountries = length(which(lifeExp <
60))))
## continent
## year Africa Americas Asia Europe Oceania
## 1952 52 19 29 7 0
## 1957 52 15 27 3 0
## 1962 51 13 25 1 0
## 1967 50 11 25 1 0
## 1972 50 10 19 1 0
## 1977 50 7 14 1 0
## 1982 44 5 12 0 0
## 1987 40 2 8 0 0
## 1992 39 2 7 0 0
## 1997 39 1 6 0 0
## 2002 41 1 4 0 0
## 2007 40 0 3 0 0
Get the percentage of those low life expectancy countries.
(PercentLowLifeExpCountries <- daply(gDat, ~year + continent, summarize, Percentage = length(which(lifeExp <
60))/length(unique(country))))
## continent
## year Africa Americas Asia Europe Oceania
## 1952 1 0.76 0.8788 0.2333 0
## 1957 1 0.6 0.8182 0.1 0
## 1962 0.9808 0.52 0.7576 0.03333 0
## 1967 0.9615 0.44 0.7576 0.03333 0
## 1972 0.9615 0.4 0.5758 0.03333 0
## 1977 0.9615 0.28 0.4242 0.03333 0
## 1982 0.8462 0.2 0.3636 0 0
## 1987 0.7692 0.08 0.2424 0 0
## 1992 0.75 0.08 0.2121 0 0
## 1997 0.75 0.04 0.1818 0 0
## 2002 0.7885 0.04 0.1212 0 0
## 2007 0.7692 0 0.09091 0 0
From this chart, we can see that the percentage of low life expectancy countries for all continents are decreasing.
(4)Check out an interesting ratio
Let's check the GDP per capital and life expectancy ratio between different countries.
gdplifeRatio <- ddply(gDat, ~country, summarize, Ratio = mean(gdpPercap)/mean(lifeExp))
tail(arrange(gdplifeRatio, Ratio))
## country Ratio
## 137 Canada 299.2
## 138 Saudi Arabia 345.3
## 139 Norway 352.7
## 140 United States 357.4
## 141 Switzerland 358.3
## 142 Kuwait 947.9
We can see that most countries that have greater ratio are developed countries, this could means that when GDP per capital grows to a higher level, the life expectancy increase is not significant when economy develops.
In summary, the plyr package is really powerful and convenient. In this assignment, I haven't used the “xtable”“ to show the data output, so it the report is not so beautiful, I think I will try to use it in the next assignment.