STAT 545A Homework#3

Yiming Zhang

In this exercise, I will mainly use the “plyr” to do some analysis on Gapminder data.

First, loading the Gapminder data and needed packages.


gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
library(lattice)
library(plyr)

Then have a quick check of the imported data.

names(gDat)
## [1] "country"   "year"      "pop"       "continent" "lifeExp"   "gdpPercap"

(1) Look at the spread of GDP per capital within the continents.

GDPByCont <- ddply(gDat, ~continent, summarize, minGdpPercap = min(gdpPercap), 
    maxGdpPercap = max(gdpPercap), RANGE = max(gdpPercap) - min(gdpPercap), 
    VAR = var(gdpPercap), MEAN = mean(gdpPercap))
(GdpByCont <- arrange(GDPByCont, RANGE))
##   continent minGdpPercap maxGdpPercap  RANGE       VAR  MEAN
## 1    Africa        241.2        21951  21710   7997187  2194
## 2   Oceania      10039.6        34435  24396  40436669 18622
## 3  Americas       1201.6        42952  41750  40918591  7136
## 4    Europe        973.5        49357  48384  87520020 14469
## 5      Asia        331.0       113523 113192 197272506  7902

Sorted by the range of GDP per capital, we can see that except Asia, other four continents' range are in a same level. For Asia, the range is extremely high, this is because Asia has the max GDP per capital and the second min GDP per capital, which means Asia actually has the greatest gap between the rich and the poor. This can also be showed by the variance. Another thing should be noticed is that Africa has lowest range and also variance, this is because most of the Africa countries are similarly poor, which can be showed by the mean.

(2)Look at the life expectancy changing over time on different continents.

In a tall format, we can get

(LifeExpByyear.tall <- ddply(gDat, ~continent + year, summarize, meanLifeExp = mean(lifeExp, 
    trim = 0.25)))
##    continent year meanLifeExp
## 1     Africa 1952       39.14
## 2     Africa 1957       41.04
## 3     Africa 1962       42.96
## 4     Africa 1967       44.97
## 5     Africa 1972       47.07
## 6     Africa 1977       49.15
## 7     Africa 1982       51.09
## 8     Africa 1987       52.63
## 9     Africa 1992       53.44
## 10    Africa 1997       52.65
## 11    Africa 2002       51.60
## 12    Africa 2007       53.41
## 13  Americas 1952       53.17
## 14  Americas 1957       56.41
## 15  Americas 1962       59.23
## 16  Americas 1967       61.27
## 17  Americas 1972       63.24
## 18  Americas 1977       65.21
## 19  Americas 1982       67.15
## 20  Americas 1987       68.72
## 21  Americas 1992       69.99
## 22  Americas 1997       71.51
## 23  Americas 2002       72.62
## 24  Americas 2007       73.75
## 25      Asia 1952       45.13
## 26      Asia 1957       48.35
## 27      Asia 1962       50.56
## 28      Asia 1967       54.24
## 29      Asia 1972       57.75
## 30      Asia 1977       60.75
## 31      Asia 1982       63.39
## 32      Asia 1987       65.82
## 33      Asia 1992       67.53
## 34      Asia 1997       68.97
## 35      Asia 2002       70.27
## 36      Asia 2007       71.64
## 37    Europe 1952       65.27
## 38    Europe 1957       67.58
## 39    Europe 1962       69.36
## 40    Europe 1967       70.39
## 41    Europe 1972       71.05
## 42    Europe 1977       72.07
## 43    Europe 1982       73.07
## 44    Europe 1987       74.00
## 45    Europe 1992       74.91
## 46    Europe 1997       75.96
## 47    Europe 2002       77.08
## 48    Europe 2007       78.06
## 49   Oceania 1952       69.25
## 50   Oceania 1957       70.30
## 51   Oceania 1962       71.09
## 52   Oceania 1967       71.31
## 53   Oceania 1972       71.91
## 54   Oceania 1977       72.85
## 55   Oceania 1982       74.29
## 56   Oceania 1987       75.32
## 57   Oceania 1992       76.94
## 58   Oceania 1997       78.19
## 59   Oceania 2002       79.74
## 60   Oceania 2007       80.72

This is pretty ugly, so let's try to show in a wide format.

(LifeExpByyear.wide <- daply(gDat, ~year + continent, summarize, meanLifeExp = mean(lifeExp, 
    trim = 0.25)))
##       continent
## year   Africa Americas Asia  Europe Oceania
##   1952 39.14  53.17    45.13 65.27  69.25  
##   1957 41.04  56.41    48.35 67.58  70.3   
##   1962 42.96  59.23    50.56 69.36  71.09  
##   1967 44.97  61.27    54.24 70.39  71.31  
##   1972 47.07  63.24    57.75 71.05  71.91  
##   1977 49.15  65.21    60.75 72.07  72.85  
##   1982 51.09  67.15    63.39 73.07  74.29  
##   1987 52.63  68.72    65.82 74     75.32  
##   1992 53.44  69.99    67.53 74.91  76.94  
##   1997 52.65  71.51    68.97 75.96  78.19  
##   2002 51.6   72.62    70.27 77.08  79.74  
##   2007 53.41  73.75    71.64 78.06  80.72

This is better. We can see clearly that the life expectancy were increasing over time in every continent, and for the Africa, its life expectancy is significantly lower than other continents.

(3)Get the number of countries with low life expectancy over time by continent.

First, to be simple, I set the benchmark as 60. Then for those countries whose life expectancy are lower than the benchmark, they are classified as low life expectancy country. Then try to get the number of countries with low life expectancy over time by continent.

(nLowLifeExpCountries <- daply(gDat, ~year + continent, summarize, nCountries = length(which(lifeExp < 
    60))))
##       continent
## year   Africa Americas Asia Europe Oceania
##   1952 52     19       29   7      0      
##   1957 52     15       27   3      0      
##   1962 51     13       25   1      0      
##   1967 50     11       25   1      0      
##   1972 50     10       19   1      0      
##   1977 50     7        14   1      0      
##   1982 44     5        12   0      0      
##   1987 40     2        8    0      0      
##   1992 39     2        7    0      0      
##   1997 39     1        6    0      0      
##   2002 41     1        4    0      0      
##   2007 40     0        3    0      0

Get the percentage of those low life expectancy countries.

(PercentLowLifeExpCountries <- daply(gDat, ~year + continent, summarize, Percentage = length(which(lifeExp < 
    60))/length(unique(country))))
##       continent
## year   Africa Americas Asia    Europe  Oceania
##   1952 1      0.76     0.8788  0.2333  0      
##   1957 1      0.6      0.8182  0.1     0      
##   1962 0.9808 0.52     0.7576  0.03333 0      
##   1967 0.9615 0.44     0.7576  0.03333 0      
##   1972 0.9615 0.4      0.5758  0.03333 0      
##   1977 0.9615 0.28     0.4242  0.03333 0      
##   1982 0.8462 0.2      0.3636  0       0      
##   1987 0.7692 0.08     0.2424  0       0      
##   1992 0.75   0.08     0.2121  0       0      
##   1997 0.75   0.04     0.1818  0       0      
##   2002 0.7885 0.04     0.1212  0       0      
##   2007 0.7692 0        0.09091 0       0

From this chart, we can see that the percentage of low life expectancy countries for all continents are decreasing.

(4)Check out an interesting ratio

Let's check the GDP per capital and life expectancy ratio between different countries.

gdplifeRatio <- ddply(gDat, ~country, summarize, Ratio = mean(gdpPercap)/mean(lifeExp))
tail(arrange(gdplifeRatio, Ratio))
##           country Ratio
## 137        Canada 299.2
## 138  Saudi Arabia 345.3
## 139        Norway 352.7
## 140 United States 357.4
## 141   Switzerland 358.3
## 142        Kuwait 947.9

We can see that most countries that have greater ratio are developed countries, this could means that when GDP per capital grows to a higher level, the life expectancy increase is not significant when economy develops.

In summary, the plyr package is really powerful and convenient. In this assignment, I haven't used the “xtable”“ to show the data output, so it the report is not so beautiful, I think I will try to use it in the next assignment.