Fist, let's import the data.
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
Let's study the 2 variables, continent and GDP per capita.
The first thing we want to know is the minimum and the maximum GDP per capita of each continent.
minandmaxgdpByCont <- ddply(gDat, ~continent, summarize, mingdpPercap = min(gdpPercap),
maxgdpPercap = max(gdpPercap))
Let's sort the minimum GDP per capital.
minandmaxgdpByCont <- arrange(minandmaxgdpByCont, mingdpPercap)
And the table is like this:
minandmaxgdpByCont <- xtable(minandmaxgdpByCont)
print(minandmaxgdpByCont, type = "html", include.rownames = TRUE)
| continent | mingdpPercap | maxgdpPercap | |
|---|---|---|---|
| 1 | Africa | 241.17 | 21951.21 |
| 2 | Asia | 331.00 | 113523.13 |
| 3 | Europe | 973.53 | 49357.19 |
| 4 | Americas | 1201.64 | 42951.65 |
| 5 | Oceania | 10039.60 | 34435.37 |
Then, let's sort the maximum GDP per capital.
minandmaxgdpByCont <- arrange(minandmaxgdpByCont, maxgdpPercap)
And the table is:
print(minandmaxgdpByCont, type = "html", include.rownames = TRUE)
| continent | mingdpPercap | maxgdpPercap | |
|---|---|---|---|
| 1 | Africa | 241.17 | 21951.21 |
| 2 | Oceania | 10039.60 | 34435.37 |
| 3 | Americas | 1201.64 | 42951.65 |
| 4 | Europe | 973.53 | 49357.19 |
| 5 | Asia | 331.00 | 113523.13 |
From these two tables, we can see that the rankings of continents are quite different, especially for Asia. Asia has the very low minimum GDP per capital but a very high maximum GDP per captial. One reason may be some countries in Asia are undeveloped, while some are highly developed.
The next thing we want to know is the spread of GDP per capita within the continents.
spreadgdpByCont <- ddply(gDat, ~continent, summarize, SDgdpPercap = sd(gdpPercap),
VARgdpPercap = var(gdpPercap), MEgdpPercap = mad(gdpPercap), IQRgdpPercap = IQR(gdpPercap))
And let's sort the variance of GDP per capita.
spreadgdpByCont <- arrange(spreadgdpByCont, VARgdpPercap)
And the table is:
spreadgdpByCont <- xtable(spreadgdpByCont)
print(spreadgdpByCont, type = "html", include.rownames = TRUE)
| continent | SDgdpPercap | VARgdpPercap | MEgdpPercap | IQRgdpPercap | |
|---|---|---|---|---|---|
| 1 | Africa | 2827.93 | 7997187.31 | 775.32 | 1616.17 |
| 2 | Oceania | 6358.98 | 40436668.87 | 6459.10 | 8072.26 |
| 3 | Americas | 6396.76 | 40918591.10 | 3269.33 | 4402.43 |
| 4 | Europe | 9355.21 | 87520019.60 | 8846.05 | 13248.30 |
| 5 | Asia | 14045.37 | 197272505.85 | 2820.83 | 7492.26 |
And let's sort the median of GDP per capita.
spreadgdpByCont <- arrange(spreadgdpByCont, MEgdpPercap)
And the table is:
print(spreadgdpByCont, type = "html", include.rownames = TRUE)
| continent | SDgdpPercap | VARgdpPercap | MEgdpPercap | IQRgdpPercap | |
|---|---|---|---|---|---|
| 1 | Africa | 2827.93 | 7997187.31 | 775.32 | 1616.17 |
| 2 | Asia | 14045.37 | 197272505.85 | 2820.83 | 7492.26 |
| 3 | Americas | 6396.76 | 40918591.10 | 3269.33 | 4402.43 |
| 4 | Oceania | 6358.98 | 40436668.87 | 6459.10 | 8072.26 |
| 5 | Europe | 9355.21 | 87520019.60 | 8846.05 | 13248.30 |
The rankings of continents in these two tables are also very different. And we are not surprised to see Asia has a very high variance of GDP per capita but a very low median of GDP per capita.