STAT 545A Homework #3

Prepared by: Amanda Yuen

This is an R Markdown document. For homework #3, we will perform some data aggregation tasks on the Gapminder data.

First, let's import the data into R.

gDat <- read.delim("gapminderDataFiveYear.txt")

It's always wise to do a quick check that the data has been imported properly.

str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Looks good. Before we start our data aggregating fun, we should load the packages that we will likely need.

library(plyr)
## Warning: package 'plyr' was built under R version 2.15.3
library(xtable)
## Warning: package 'xtable' was built under R version 2.15.2

Alright, let's start by obtaining the minimum and maximum GDP per capita for all continents shall we? We can sort the continents by the minimum values and then see if there is a trend with the maximum values.

minmaxGDPpercap <- ddply(gDat, ~continent, summarize, minGDPpercap = min(gdpPercap), 
    maxGDPpercap = max(gdpPercap))
minmaxGDPpercap <- arrange(minmaxGDPpercap, minGDPpercap)
minmaxGDPpercap <- xtable(minmaxGDPpercap)
print(minmaxGDPpercap, type = "html", include.rownames = TRUE)
continent minGDPpercap maxGDPpercap
1 Africa 241.17 21951.21
2 Asia 331.00 113523.13
3 Europe 973.53 49357.19
4 Americas 1201.64 42951.65
5 Oceania 10039.60 34435.37

Interesting. From the table above, we see that Africa has the lowest minimum GDP per capita as well as the lowest maximum GDP per capita. Asia has the second lowest minimum GDP per capita but also the highest maximum GDP per capita. Oceania has the highest minimum GDP per capita but only the fourth highest maximum GDP per capita.

Ok, we've seen the minimum and maximum GDP per capita, but perhaps it would be more helpful to take a look at the mean and spread of the values for each continent.

GDPpercapspread <- ddply(gDat, ~continent, summarize, meanGDPpercap = mean(gdpPercap), 
    sdGDPpercap = sd(gdpPercap), varGDPpercap = var(gdpPercap), madGDPpercap = mad(gdpPercap), 
    IQRGDPpercap = IQR(gdpPercap))
GDPpercapspread <- arrange(GDPpercapspread, sdGDPpercap)
GDPpercapspread <- xtable(GDPpercapspread)
print(GDPpercapspread, type = "html", include.rownames = TRUE)
continent meanGDPpercap sdGDPpercap varGDPpercap madGDPpercap IQRGDPpercap
1 Africa 2193.75 2827.93 7997187.31 775.32 1616.17
2 Oceania 18621.61 6358.98 40436668.87 6459.10 8072.26
3 Americas 7136.11 6396.76 40918591.10 3269.33 4402.43
4 Europe 14469.48 9355.21 87520019.60 8846.05 13248.30
5 Asia 7902.15 14045.37 197272505.85 2820.83 7492.26

The continents are sorted from the smallest to the largest standard deviation of GDP per capita. Not surprisingly, the variance follows the same trend as the standard deviation since the standard deviation is simply the square root of the variance, but this provides a good sanity check that the results are what they should be. The mean absolute deviation and the interquartile range, however, are quite different from the trend seen in the standard deviation and variance. This table provides a better overview of the data. We see that Africa has the lowest mean GDP percapita as well as the smallest spread based on all measures, which indicate that most African countries are poor. We also see that Oceania and Europe have the two high mean GDP per capita as well as fairly large spreads, so these two continents have a greater mix of wealthy and poor countries.

Now let's take a look at how GDP per capita changes over time for each continent.

GDPpercaptime <- ddply(gDat, ~continent + year, summarize, meanGDPpercap = mean(gdpPercap))
GDPpercaptime <- arrange(GDPpercaptime, continent)
GDPpercaptime <- xtable(GDPpercaptime)
print(GDPpercaptime, type = "html", include.rownames = TRUE)
continent year meanGDPpercap
1 Africa 1952 1252.57
2 Africa 1957 1385.24
3 Africa 1962 1598.08
4 Africa 1967 2050.36
5 Africa 1972 2339.62
6 Africa 1977 2585.94
7 Africa 1982 2481.59
8 Africa 1987 2282.67
9 Africa 1992 2281.81
10 Africa 1997 2378.76
11 Africa 2002 2599.39
12 Africa 2007 3089.03
13 Americas 1952 4079.06
14 Americas 1957 4616.04
15 Americas 1962 4901.54
16 Americas 1967 5668.25
17 Americas 1972 6491.33
18 Americas 1977 7352.01
19 Americas 1982 7506.74
20 Americas 1987 7793.40
21 Americas 1992 8044.93
22 Americas 1997 8889.30
23 Americas 2002 9287.68
24 Americas 2007 11003.03
25 Asia 1952 5195.48
26 Asia 1957 5787.73
27 Asia 1962 5729.37
28 Asia 1967 5971.17
29 Asia 1972 8187.47
30 Asia 1977 7791.31
31 Asia 1982 7434.14
32 Asia 1987 7608.23
33 Asia 1992 8639.69
34 Asia 1997 9834.09
35 Asia 2002 10174.09
36 Asia 2007 12473.03
37 Europe 1952 5661.06
38 Europe 1957 6963.01
39 Europe 1962 8365.49
40 Europe 1967 10143.82
41 Europe 1972 12479.58
42 Europe 1977 14283.98
43 Europe 1982 15617.90
44 Europe 1987 17214.31
45 Europe 1992 17061.57
46 Europe 1997 19076.78
47 Europe 2002 21711.73
48 Europe 2007 25054.48
49 Oceania 1952 10298.09
50 Oceania 1957 11598.52
51 Oceania 1962 12696.45
52 Oceania 1967 14495.02
53 Oceania 1972 16417.33
54 Oceania 1977 17283.96
55 Oceania 1982 18554.71
56 Oceania 1987 20448.04
57 Oceania 1992 20894.05
58 Oceania 1997 24024.18
59 Oceania 2002 26938.78
60 Oceania 2007 29810.19

The above table shows us the mean GDP per capita for each continent every 5 years within the time range of the data set. Right off the bat we can see that the GDP per capita increases over time all over the world. Upon closer inspection, we can see certain trends within each continent. For example, after 1977 Africa experienced a dip in GDP per capita and it took roughly 3 decades to return to the levels seen before the dip. Asia also experienced a dip after 1972 and it took about 2 decades to recover. Europe and Oceania experienced a short period of stagnation around 1987 to 1992, while the Americas enjoyed a fairly steady increase over time.