Homework #3: Data Aggregation

Setup the data and libraries

library(plyr)
library(xtable)
# Import the Gapminder dataset
gDat <- read.delim("gapminderDataFiveYear.txt")

Part 1: Presenting minimum and maximum GDP per capita in “wide” format

table1 <- ddply(gDat, ~continent, summarize, minimum.GDP.per.capita = min(gdpPercap), 
    `maximum GDP per capita` = max(gdpPercap))
table1 <- arrange(table1, minimum.GDP.per.capita)
print(xtable(table1), type = "html", include.rownames = F)
continent minimum.GDP.per.capita maximum GDP per capita
Africa 241.17 21951.21
Asia 331.00 113523.13
Europe 973.53 49357.19
Americas 1201.64 42951.65
Oceania 10039.60 34435.37

Results:

Africa has the country lowest minimum and maximum GDP.


Part 2: Spread of GDP per capita in different continents

table2 <- ddply(gDat, ~continent, summarize, GDP.standard.deviation = sd(gdpPercap), 
    GDP.median.absolute.deviation = mad(gdpPercap), GDP.interquartile.range = IQR(gdpPercap))
table2 <- arrange(table2, GDP.standard.deviation)
print(xtable(table2), type = "html", include.rownames = F)
continent GDP.standard.deviation GDP.median.absolute.deviation GDP.interquartile.range
Africa 2827.93 775.32 1616.17
Oceania 6358.98 6459.10 8072.26
Americas 6396.76 3269.33 4402.43
Europe 9355.21 8846.05 13248.30
Asia 14045.37 2820.83 7492.26

Results:

Asia has the highest standard devaition of GDP, while Europe has the highest variance for median absolute deviation and interquartile range. This is likely due to outliers in Asia.


Part 3: How life expectancy changes over years

table3 <- ddply(gDat, ~year, summarize, mean.life.expectancy = mean(lifeExp), 
    mean.life.expectancy.trimmed.20.percent = mean(lifeExp, trim = 0.1), mean.life.expectancy.trimmed.40.percent = mean(lifeExp, 
        trim = 0.2))
print(xtable(table3), type = "html", include.rownames = F)
year mean.life.expectancy mean.life.expectancy.trimmed.20.percent mean.life.expectancy.trimmed.40.percent
1952 49.06 48.58 47.75
1957 51.51 51.27 50.64
1962 53.61 53.58 53.13
1967 55.68 55.87 55.64
1972 57.65 58.01 58.12
1977 59.57 60.10 60.39
1982 61.53 62.12 62.47
1987 63.21 63.92 64.48
1992 64.16 65.19 65.89
1997 65.01 66.02 66.84
2002 65.69 66.72 67.77
2007 67.01 68.11 69.17

Results:

Mean life expectancy increases over time. In the 50's, trimming outliers decreases life expectancy, while in the 60's onward, the reverse is true. In the 50's the mean life expectancy was driven upward by high outliers. Past that, low outliers reduced the mean life expectancy.