Setup the data and libraries
library(plyr)
library(xtable)
# Import the Gapminder dataset
gDat <- read.delim("gapminderDataFiveYear.txt")
table1 <- ddply(gDat, ~continent, summarize, minimum.GDP.per.capita = min(gdpPercap),
`maximum GDP per capita` = max(gdpPercap))
table1 <- arrange(table1, minimum.GDP.per.capita)
print(xtable(table1), type = "html", include.rownames = F)
| continent | minimum.GDP.per.capita | maximum GDP per capita |
|---|---|---|
| Africa | 241.17 | 21951.21 |
| Asia | 331.00 | 113523.13 |
| Europe | 973.53 | 49357.19 |
| Americas | 1201.64 | 42951.65 |
| Oceania | 10039.60 | 34435.37 |
Africa has the country lowest minimum and maximum GDP.
table2 <- ddply(gDat, ~continent, summarize, GDP.standard.deviation = sd(gdpPercap),
GDP.median.absolute.deviation = mad(gdpPercap), GDP.interquartile.range = IQR(gdpPercap))
table2 <- arrange(table2, GDP.standard.deviation)
print(xtable(table2), type = "html", include.rownames = F)
| continent | GDP.standard.deviation | GDP.median.absolute.deviation | GDP.interquartile.range |
|---|---|---|---|
| Africa | 2827.93 | 775.32 | 1616.17 |
| Oceania | 6358.98 | 6459.10 | 8072.26 |
| Americas | 6396.76 | 3269.33 | 4402.43 |
| Europe | 9355.21 | 8846.05 | 13248.30 |
| Asia | 14045.37 | 2820.83 | 7492.26 |
Asia has the highest standard devaition of GDP, while Europe has the highest variance for median absolute deviation and interquartile range. This is likely due to outliers in Asia.
table3 <- ddply(gDat, ~year, summarize, mean.life.expectancy = mean(lifeExp),
mean.life.expectancy.trimmed.20.percent = mean(lifeExp, trim = 0.1), mean.life.expectancy.trimmed.40.percent = mean(lifeExp,
trim = 0.2))
print(xtable(table3), type = "html", include.rownames = F)
| year | mean.life.expectancy | mean.life.expectancy.trimmed.20.percent | mean.life.expectancy.trimmed.40.percent |
|---|---|---|---|
| 1952 | 49.06 | 48.58 | 47.75 |
| 1957 | 51.51 | 51.27 | 50.64 |
| 1962 | 53.61 | 53.58 | 53.13 |
| 1967 | 55.68 | 55.87 | 55.64 |
| 1972 | 57.65 | 58.01 | 58.12 |
| 1977 | 59.57 | 60.10 | 60.39 |
| 1982 | 61.53 | 62.12 | 62.47 |
| 1987 | 63.21 | 63.92 | 64.48 |
| 1992 | 64.16 | 65.19 | 65.89 |
| 1997 | 65.01 | 66.02 | 66.84 |
| 2002 | 65.69 | 66.72 | 67.77 |
| 2007 | 67.01 | 68.11 | 69.17 |
Mean life expectancy increases over time. In the 50's, trimming outliers decreases life expectancy, while in the 60's onward, the reverse is true. In the 50's the mean life expectancy was driven upward by high outliers. Past that, low outliers reduced the mean life expectancy.