Prepared by: Amanda Yuen
This is an R Markdown document. For homework #3, we will perform some data aggregation tasks on the Gapminder data.
First, let's import the data into R.
gDat <- read.delim("gapminderDataFiveYear.txt")
It's always wise to do a quick check that the data has been imported properly.
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Looks good. Before we start our data aggregating fun, we should load the packages that we will likely need.
library(plyr)
## Warning: package 'plyr' was built under R version 2.15.3
library(xtable)
## Warning: package 'xtable' was built under R version 2.15.2
Alright, let's start by obtaining the minimum and maximum GDP per capita for all continents shall we? We can sort the continents by the minimum values and then see if there is a trend with the maximum values.
minmaxGDPpercap <- ddply(gDat, ~continent, summarize, minGDPpercap = min(gdpPercap),
maxGDPpercap = max(gdpPercap))
minmaxGDPpercap <- arrange(minmaxGDPpercap, minGDPpercap)
minmaxGDPpercap <- xtable(minmaxGDPpercap)
print(minmaxGDPpercap, type = "html", include.rownames = TRUE)
| continent | minGDPpercap | maxGDPpercap | |
|---|---|---|---|
| 1 | Africa | 241.17 | 21951.21 |
| 2 | Asia | 331.00 | 113523.13 |
| 3 | Europe | 973.53 | 49357.19 |
| 4 | Americas | 1201.64 | 42951.65 |
| 5 | Oceania | 10039.60 | 34435.37 |
Interesting. From the table above, we see that Africa has the lowest minimum GDP per capita as well as the lowest maximum GDP per capita. Asia has the second lowest minimum GDP per capita but also the highest maximum GDP per capita. Oceania has the highest minimum GDP per capita but only the fourth highest maximum GDP per capita.
Ok, we've seen the minimum and maximum GDP per capita, but perhaps it would be more helpful to take a look at the mean and spread of the values for each continent.
GDPpercapspread <- ddply(gDat, ~continent, summarize, meanGDPpercap = mean(gdpPercap),
sdGDPpercap = sd(gdpPercap), varGDPpercap = var(gdpPercap), madGDPpercap = mad(gdpPercap),
IQRGDPpercap = IQR(gdpPercap))
GDPpercapspread <- arrange(GDPpercapspread, sdGDPpercap)
GDPpercapspread <- xtable(GDPpercapspread)
print(GDPpercapspread, type = "html", include.rownames = TRUE)
| continent | meanGDPpercap | sdGDPpercap | varGDPpercap | madGDPpercap | IQRGDPpercap | |
|---|---|---|---|---|---|---|
| 1 | Africa | 2193.75 | 2827.93 | 7997187.31 | 775.32 | 1616.17 |
| 2 | Oceania | 18621.61 | 6358.98 | 40436668.87 | 6459.10 | 8072.26 |
| 3 | Americas | 7136.11 | 6396.76 | 40918591.10 | 3269.33 | 4402.43 |
| 4 | Europe | 14469.48 | 9355.21 | 87520019.60 | 8846.05 | 13248.30 |
| 5 | Asia | 7902.15 | 14045.37 | 197272505.85 | 2820.83 | 7492.26 |
The continents are sorted from the smallest to the largest standard deviation of GDP per capita. Not surprisingly, the variance follows the same trend as the standard deviation since the standard deviation is simply the square root of the variance, but this provides a good sanity check that the results are what they should be. The mean absolute deviation and the interquartile range, however, are quite different from the trend seen in the standard deviation and variance. This table provides a better overview of the data. We see that Africa has the lowest mean GDP percapita as well as the smallest spread based on all measures, which indicate that most African countries are poor. We also see that Oceania and Europe have the two high mean GDP per capita as well as fairly large spreads, so these two continents have a greater mix of wealthy and poor countries.
Now let's take a look at how GDP per capita changes over time for each continent.
GDPpercaptime <- ddply(gDat, ~continent + year, summarize, meanGDPpercap = mean(gdpPercap))
GDPpercaptime <- arrange(GDPpercaptime, continent)
GDPpercaptime <- xtable(GDPpercaptime)
print(GDPpercaptime, type = "html", include.rownames = TRUE)
| continent | year | meanGDPpercap | |
|---|---|---|---|
| 1 | Africa | 1952 | 1252.57 |
| 2 | Africa | 1957 | 1385.24 |
| 3 | Africa | 1962 | 1598.08 |
| 4 | Africa | 1967 | 2050.36 |
| 5 | Africa | 1972 | 2339.62 |
| 6 | Africa | 1977 | 2585.94 |
| 7 | Africa | 1982 | 2481.59 |
| 8 | Africa | 1987 | 2282.67 |
| 9 | Africa | 1992 | 2281.81 |
| 10 | Africa | 1997 | 2378.76 |
| 11 | Africa | 2002 | 2599.39 |
| 12 | Africa | 2007 | 3089.03 |
| 13 | Americas | 1952 | 4079.06 |
| 14 | Americas | 1957 | 4616.04 |
| 15 | Americas | 1962 | 4901.54 |
| 16 | Americas | 1967 | 5668.25 |
| 17 | Americas | 1972 | 6491.33 |
| 18 | Americas | 1977 | 7352.01 |
| 19 | Americas | 1982 | 7506.74 |
| 20 | Americas | 1987 | 7793.40 |
| 21 | Americas | 1992 | 8044.93 |
| 22 | Americas | 1997 | 8889.30 |
| 23 | Americas | 2002 | 9287.68 |
| 24 | Americas | 2007 | 11003.03 |
| 25 | Asia | 1952 | 5195.48 |
| 26 | Asia | 1957 | 5787.73 |
| 27 | Asia | 1962 | 5729.37 |
| 28 | Asia | 1967 | 5971.17 |
| 29 | Asia | 1972 | 8187.47 |
| 30 | Asia | 1977 | 7791.31 |
| 31 | Asia | 1982 | 7434.14 |
| 32 | Asia | 1987 | 7608.23 |
| 33 | Asia | 1992 | 8639.69 |
| 34 | Asia | 1997 | 9834.09 |
| 35 | Asia | 2002 | 10174.09 |
| 36 | Asia | 2007 | 12473.03 |
| 37 | Europe | 1952 | 5661.06 |
| 38 | Europe | 1957 | 6963.01 |
| 39 | Europe | 1962 | 8365.49 |
| 40 | Europe | 1967 | 10143.82 |
| 41 | Europe | 1972 | 12479.58 |
| 42 | Europe | 1977 | 14283.98 |
| 43 | Europe | 1982 | 15617.90 |
| 44 | Europe | 1987 | 17214.31 |
| 45 | Europe | 1992 | 17061.57 |
| 46 | Europe | 1997 | 19076.78 |
| 47 | Europe | 2002 | 21711.73 |
| 48 | Europe | 2007 | 25054.48 |
| 49 | Oceania | 1952 | 10298.09 |
| 50 | Oceania | 1957 | 11598.52 |
| 51 | Oceania | 1962 | 12696.45 |
| 52 | Oceania | 1967 | 14495.02 |
| 53 | Oceania | 1972 | 16417.33 |
| 54 | Oceania | 1977 | 17283.96 |
| 55 | Oceania | 1982 | 18554.71 |
| 56 | Oceania | 1987 | 20448.04 |
| 57 | Oceania | 1992 | 20894.05 |
| 58 | Oceania | 1997 | 24024.18 |
| 59 | Oceania | 2002 | 26938.78 |
| 60 | Oceania | 2007 | 29810.19 |
The above table shows us the mean GDP per capita for each continent every 5 years within the time range of the data set. Right off the bat we can see that the GDP per capita increases over time all over the world. Upon closer inspection, we can see certain trends within each continent. For example, after 1977 Africa experienced a dip in GDP per capita and it took roughly 3 decades to return to the levels seen before the dip. Asia also experienced a dip after 1972 and it took about 2 decades to recover. Europe and Oceania experienced a short period of stagnation around 1987 to 1992, while the Americas enjoyed a fairly steady increase over time.