Mina Park
In this exercise, we are producing accompanying figures for the data aggregation exercises we did in homework assignment #3. We are using the lattice package.
The data aggregation exercises in this assignment are based on variations from my submission the previous week, with adjustments made to improve upon the original code, based on what was presented by other students' and the instructor's code.
1.) Clean workspace, load libraries, load data, and perform superficial check that data import occurred properly
# clean workspace
rm(list = ls())
# load libraries
library(plyr)
library(lattice)
# load data
Dat <- read.delim("gapminderDataFiveYear.txt")
str(Dat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Things to notice about the data are the variables (country, year, pop, continent, lifeExp, gdpPercap) and that the data is in the form of a data frame.
2.) Create a new dataset to work from
Because Oceania only has 2 countries, it is not very informative. So we will remove it from our data.
aDat <- subset(Dat, continent != "Oceania")
str(aDat) #Oceania is still present
## 'data.frame': 1680 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
aDat <- droplevels(subset(Dat, continent != "Oceania"))
unique(aDat$continent)
## [1] Asia Europe Africa Americas
## Levels: Africa Americas Asia Europe
We can see that Oceania has been dropped from the continent factor, using the droplevels() function.
We also want to look at only the years 1952 and 2007.
bDat <- subset(aDat, year %in% c("1952", "2007"))
unique(bDat$year)
## [1] 1952 2007
Now we have a dataset for the years 1952 and 2007, for all continents except Oceania.
3.) Explore GDP and life expectancy per capita
First, we will look at GDP per capita.
avgGdp <- ddply(bDat, ~continent + year, summarize, country = country, gdpPercap = gdpPercap)
avgGdp
## continent year country gdpPercap
## 1 Africa 1952 Algeria 2449.0
## 2 Africa 1952 Angola 3520.6
## 3 Africa 1952 Benin 1062.8
## 4 Africa 1952 Botswana 851.2
## 5 Africa 1952 Burkina Faso 543.3
## 6 Africa 1952 Burundi 339.3
## 7 Africa 1952 Cameroon 1172.7
## 8 Africa 1952 Central African Republic 1071.3
## 9 Africa 1952 Chad 1178.7
## 10 Africa 1952 Comoros 1103.0
## 11 Africa 1952 Congo, Dem. Rep. 780.5
## 12 Africa 1952 Congo, Rep. 2125.6
## 13 Africa 1952 Cote d'Ivoire 1388.6
## 14 Africa 1952 Djibouti 2669.5
## 15 Africa 1952 Egypt 1418.8
## 16 Africa 1952 Equatorial Guinea 375.6
## 17 Africa 1952 Eritrea 328.9
## 18 Africa 1952 Ethiopia 362.1
## 19 Africa 1952 Gabon 4293.5
## 20 Africa 1952 Gambia 485.2
## 21 Africa 1952 Ghana 911.3
## 22 Africa 1952 Guinea 510.2
## 23 Africa 1952 Guinea-Bissau 299.9
## 24 Africa 1952 Kenya 853.5
## 25 Africa 1952 Lesotho 298.8
## 26 Africa 1952 Liberia 575.6
## 27 Africa 1952 Libya 2387.5
## 28 Africa 1952 Madagascar 1443.0
## 29 Africa 1952 Malawi 369.2
## 30 Africa 1952 Mali 452.3
## 31 Africa 1952 Mauritania 743.1
## 32 Africa 1952 Mauritius 1968.0
## 33 Africa 1952 Morocco 1688.2
## 34 Africa 1952 Mozambique 468.5
## 35 Africa 1952 Namibia 2423.8
## 36 Africa 1952 Niger 761.9
## 37 Africa 1952 Nigeria 1077.3
## 38 Africa 1952 Reunion 2718.9
## 39 Africa 1952 Rwanda 493.3
## 40 Africa 1952 Sao Tome and Principe 879.6
## 41 Africa 1952 Senegal 1450.4
## 42 Africa 1952 Sierra Leone 879.8
## 43 Africa 1952 Somalia 1135.7
## 44 Africa 1952 South Africa 4725.3
## 45 Africa 1952 Sudan 1616.0
## 46 Africa 1952 Swaziland 1148.4
## 47 Africa 1952 Tanzania 716.7
## 48 Africa 1952 Togo 859.8
## 49 Africa 1952 Tunisia 1468.5
## 50 Africa 1952 Uganda 734.8
## 51 Africa 1952 Zambia 1147.4
## 52 Africa 1952 Zimbabwe 406.9
## 53 Africa 2007 Algeria 6223.4
## 54 Africa 2007 Angola 4797.2
## 55 Africa 2007 Benin 1441.3
## 56 Africa 2007 Botswana 12569.9
## 57 Africa 2007 Burkina Faso 1217.0
## 58 Africa 2007 Burundi 430.1
## 59 Africa 2007 Cameroon 2042.1
## 60 Africa 2007 Central African Republic 706.0
## 61 Africa 2007 Chad 1704.1
## 62 Africa 2007 Comoros 986.1
## 63 Africa 2007 Congo, Dem. Rep. 277.6
## 64 Africa 2007 Congo, Rep. 3632.6
## 65 Africa 2007 Cote d'Ivoire 1544.8
## 66 Africa 2007 Djibouti 2082.5
## 67 Africa 2007 Egypt 5581.2
## 68 Africa 2007 Equatorial Guinea 12154.1
## 69 Africa 2007 Eritrea 641.4
## 70 Africa 2007 Ethiopia 690.8
## 71 Africa 2007 Gabon 13206.5
## 72 Africa 2007 Gambia 752.7
## 73 Africa 2007 Ghana 1327.6
## 74 Africa 2007 Guinea 942.7
## 75 Africa 2007 Guinea-Bissau 579.2
## 76 Africa 2007 Kenya 1463.2
## 77 Africa 2007 Lesotho 1569.3
## 78 Africa 2007 Liberia 414.5
## 79 Africa 2007 Libya 12057.5
## 80 Africa 2007 Madagascar 1044.8
## 81 Africa 2007 Malawi 759.3
## 82 Africa 2007 Mali 1042.6
## 83 Africa 2007 Mauritania 1803.2
## 84 Africa 2007 Mauritius 10957.0
## 85 Africa 2007 Morocco 3820.2
## 86 Africa 2007 Mozambique 823.7
## 87 Africa 2007 Namibia 4811.1
## 88 Africa 2007 Niger 619.7
## 89 Africa 2007 Nigeria 2014.0
## 90 Africa 2007 Reunion 7670.1
## 91 Africa 2007 Rwanda 863.1
## 92 Africa 2007 Sao Tome and Principe 1598.4
## 93 Africa 2007 Senegal 1712.5
## 94 Africa 2007 Sierra Leone 862.5
## 95 Africa 2007 Somalia 926.1
## 96 Africa 2007 South Africa 9269.7
## 97 Africa 2007 Sudan 2602.4
## 98 Africa 2007 Swaziland 4513.5
## 99 Africa 2007 Tanzania 1107.5
## 100 Africa 2007 Togo 883.0
## 101 Africa 2007 Tunisia 7092.9
## 102 Africa 2007 Uganda 1056.4
## 103 Africa 2007 Zambia 1271.2
## 104 Africa 2007 Zimbabwe 469.7
## 105 Americas 1952 Argentina 5911.3
## 106 Americas 1952 Bolivia 2677.3
## 107 Americas 1952 Brazil 2108.9
## 108 Americas 1952 Canada 11367.2
## 109 Americas 1952 Chile 3940.0
## 110 Americas 1952 Colombia 2144.1
## 111 Americas 1952 Costa Rica 2627.0
## 112 Americas 1952 Cuba 5586.5
## 113 Americas 1952 Dominican Republic 1397.7
## 114 Americas 1952 Ecuador 3522.1
## 115 Americas 1952 El Salvador 3048.3
## 116 Americas 1952 Guatemala 2428.2
## 117 Americas 1952 Haiti 1840.4
## 118 Americas 1952 Honduras 2194.9
## 119 Americas 1952 Jamaica 2898.5
## 120 Americas 1952 Mexico 3478.1
## 121 Americas 1952 Nicaragua 3112.4
## 122 Americas 1952 Panama 2480.4
## 123 Americas 1952 Paraguay 1952.3
## 124 Americas 1952 Peru 3758.5
## 125 Americas 1952 Puerto Rico 3082.0
## 126 Americas 1952 Trinidad and Tobago 3023.3
## 127 Americas 1952 United States 13990.5
## 128 Americas 1952 Uruguay 5716.8
## 129 Americas 1952 Venezuela 7689.8
## 130 Americas 2007 Argentina 12779.4
## 131 Americas 2007 Bolivia 3822.1
## 132 Americas 2007 Brazil 9065.8
## 133 Americas 2007 Canada 36319.2
## 134 Americas 2007 Chile 13171.6
## 135 Americas 2007 Colombia 7006.6
## 136 Americas 2007 Costa Rica 9645.1
## 137 Americas 2007 Cuba 8948.1
## 138 Americas 2007 Dominican Republic 6025.4
## 139 Americas 2007 Ecuador 6873.3
## 140 Americas 2007 El Salvador 5728.4
## 141 Americas 2007 Guatemala 5186.1
## 142 Americas 2007 Haiti 1201.6
## 143 Americas 2007 Honduras 3548.3
## 144 Americas 2007 Jamaica 7320.9
## 145 Americas 2007 Mexico 11977.6
## 146 Americas 2007 Nicaragua 2749.3
## 147 Americas 2007 Panama 9809.2
## 148 Americas 2007 Paraguay 4172.8
## 149 Americas 2007 Peru 7408.9
## 150 Americas 2007 Puerto Rico 19328.7
## 151 Americas 2007 Trinidad and Tobago 18008.5
## 152 Americas 2007 United States 42951.7
## 153 Americas 2007 Uruguay 10611.5
## 154 Americas 2007 Venezuela 11415.8
## 155 Asia 1952 Afghanistan 779.4
## 156 Asia 1952 Bahrain 9867.1
## 157 Asia 1952 Bangladesh 684.2
## 158 Asia 1952 Cambodia 368.5
## 159 Asia 1952 China 400.4
## 160 Asia 1952 Hong Kong, China 3054.4
## 161 Asia 1952 India 546.6
## 162 Asia 1952 Indonesia 749.7
## 163 Asia 1952 Iran 3035.3
## 164 Asia 1952 Iraq 4129.8
## 165 Asia 1952 Israel 4086.5
## 166 Asia 1952 Japan 3217.0
## 167 Asia 1952 Jordan 1546.9
## 168 Asia 1952 Korea, Dem. Rep. 1088.3
## 169 Asia 1952 Korea, Rep. 1030.6
## 170 Asia 1952 Kuwait 108382.4
## 171 Asia 1952 Lebanon 4834.8
## 172 Asia 1952 Malaysia 1831.1
## 173 Asia 1952 Mongolia 786.6
## 174 Asia 1952 Myanmar 331.0
## 175 Asia 1952 Nepal 545.9
## 176 Asia 1952 Oman 1828.2
## 177 Asia 1952 Pakistan 684.6
## 178 Asia 1952 Philippines 1272.9
## 179 Asia 1952 Saudi Arabia 6459.6
## 180 Asia 1952 Singapore 2315.1
## 181 Asia 1952 Sri Lanka 1083.5
## 182 Asia 1952 Syria 1643.5
## 183 Asia 1952 Taiwan 1206.9
## 184 Asia 1952 Thailand 757.8
## 185 Asia 1952 Vietnam 605.1
## 186 Asia 1952 West Bank and Gaza 1515.6
## 187 Asia 1952 Yemen, Rep. 781.7
## 188 Asia 2007 Afghanistan 974.6
## 189 Asia 2007 Bahrain 29796.0
## 190 Asia 2007 Bangladesh 1391.3
## 191 Asia 2007 Cambodia 1713.8
## 192 Asia 2007 China 4959.1
## 193 Asia 2007 Hong Kong, China 39725.0
## 194 Asia 2007 India 2452.2
## 195 Asia 2007 Indonesia 3540.7
## 196 Asia 2007 Iran 11605.7
## 197 Asia 2007 Iraq 4471.1
## 198 Asia 2007 Israel 25523.3
## 199 Asia 2007 Japan 31656.1
## 200 Asia 2007 Jordan 4519.5
## 201 Asia 2007 Korea, Dem. Rep. 1593.1
## 202 Asia 2007 Korea, Rep. 23348.1
## 203 Asia 2007 Kuwait 47307.0
## 204 Asia 2007 Lebanon 10461.1
## 205 Asia 2007 Malaysia 12451.7
## 206 Asia 2007 Mongolia 3095.8
## 207 Asia 2007 Myanmar 944.0
## 208 Asia 2007 Nepal 1091.4
## 209 Asia 2007 Oman 22316.2
## 210 Asia 2007 Pakistan 2605.9
## 211 Asia 2007 Philippines 3190.5
## 212 Asia 2007 Saudi Arabia 21654.8
## 213 Asia 2007 Singapore 47143.2
## 214 Asia 2007 Sri Lanka 3970.1
## 215 Asia 2007 Syria 4184.5
## 216 Asia 2007 Taiwan 28718.3
## 217 Asia 2007 Thailand 7458.4
## 218 Asia 2007 Vietnam 2441.6
## 219 Asia 2007 West Bank and Gaza 3025.3
## 220 Asia 2007 Yemen, Rep. 2280.8
## 221 Europe 1952 Albania 1601.1
## 222 Europe 1952 Austria 6137.1
## 223 Europe 1952 Belgium 8343.1
## 224 Europe 1952 Bosnia and Herzegovina 973.5
## 225 Europe 1952 Bulgaria 2444.3
## 226 Europe 1952 Croatia 3119.2
## 227 Europe 1952 Czech Republic 6876.1
## 228 Europe 1952 Denmark 9692.4
## 229 Europe 1952 Finland 6424.5
## 230 Europe 1952 France 7029.8
## 231 Europe 1952 Germany 7144.1
## 232 Europe 1952 Greece 3530.7
## 233 Europe 1952 Hungary 5263.7
## 234 Europe 1952 Iceland 7267.7
## 235 Europe 1952 Ireland 5210.3
## 236 Europe 1952 Italy 4931.4
## 237 Europe 1952 Montenegro 2647.6
## 238 Europe 1952 Netherlands 8941.6
## 239 Europe 1952 Norway 10095.4
## 240 Europe 1952 Poland 4029.3
## 241 Europe 1952 Portugal 3068.3
## 242 Europe 1952 Romania 3144.6
## 243 Europe 1952 Serbia 3581.5
## 244 Europe 1952 Slovak Republic 5074.7
## 245 Europe 1952 Slovenia 4215.0
## 246 Europe 1952 Spain 3834.0
## 247 Europe 1952 Sweden 8527.8
## 248 Europe 1952 Switzerland 14734.2
## 249 Europe 1952 Turkey 1969.1
## 250 Europe 1952 United Kingdom 9979.5
## 251 Europe 2007 Albania 5937.0
## 252 Europe 2007 Austria 36126.5
## 253 Europe 2007 Belgium 33692.6
## 254 Europe 2007 Bosnia and Herzegovina 7446.3
## 255 Europe 2007 Bulgaria 10680.8
## 256 Europe 2007 Croatia 14619.2
## 257 Europe 2007 Czech Republic 22833.3
## 258 Europe 2007 Denmark 35278.4
## 259 Europe 2007 Finland 33207.1
## 260 Europe 2007 France 30470.0
## 261 Europe 2007 Germany 32170.4
## 262 Europe 2007 Greece 27538.4
## 263 Europe 2007 Hungary 18008.9
## 264 Europe 2007 Iceland 36180.8
## 265 Europe 2007 Ireland 40676.0
## 266 Europe 2007 Italy 28569.7
## 267 Europe 2007 Montenegro 9253.9
## 268 Europe 2007 Netherlands 36797.9
## 269 Europe 2007 Norway 49357.2
## 270 Europe 2007 Poland 15389.9
## 271 Europe 2007 Portugal 20509.6
## 272 Europe 2007 Romania 10808.5
## 273 Europe 2007 Serbia 9786.5
## 274 Europe 2007 Slovak Republic 18678.3
## 275 Europe 2007 Slovenia 25768.3
## 276 Europe 2007 Spain 28821.1
## 277 Europe 2007 Sweden 33859.7
## 278 Europe 2007 Switzerland 37506.4
## 279 Europe 2007 Turkey 8458.3
## 280 Europe 2007 United Kingdom 33203.3
stripplot(continent ~ gdpPercap | as.factor(year), bDat, main = "GDP by continent in 1952 and 2007")
We can see that there is a very clear outlier in Asia in 1952. I'm interested in knowing what this is.
findMax <- function(x) {
theMax <- which.max(x$gdpPercap)
x[theMax, c("continent", "gdpPercap", "year", "country")]
}
findMax(bDat) #Kuwait
## continent gdpPercap year country
## 853 Asia 108382 1952 Kuwait
We discover that the outlier is Kuwait, with a whopping GDP per capita of $108,382.40 in 1952.
Now, we will look at life expectancy.
cDat <- within(bDat, continent <- reorder(continent, lifeExp)) #reorder in order of life expectancy
avgLifeExp <- daply(cDat, ~year + continent, summarize, avgLifeExp = mean(lifeExp))
avgLifeExp
## continent
## year Africa Asia Americas Europe
## 1952 39.14 46.31 53.28 64.41
## 2007 54.81 70.73 73.61 77.65
stripplot(lifeExp ~ factor(year) | continent, cDat, grid = "h", type = c("p",
"a"), main = "Life expectancy by continent in 1952 and 2007")
We can see that average life expectancy has increased in all continents from 1952 to 2007.
4.) Find the number of countries with low life expectancy and low GDP
This was done to redeem myself from the train-wreck that was my last attempt at figuring out how to find the number of countries with low life expectancy. No accompanying figures are included. If you must look at what I did, here's the link. (There goes another piece of my ego).
lowGdp <- 8000
lowLifeExp <- 65
lowGdp <- 8000
nLowLifeAndGdp <- ddply(bDat, ~continent + year, function(x) {
lowLifeExp = c(lowLifeExp = sum(x$lifeExp <= lowLifeExp))
lowGdp = c(lowGdp = sum(x$gdpPercap <= lowGdp))
c(lowLifeExp, lowGdp)
})
nLowLifeAndGdp
## continent year lowLifeExp lowGdp
## 1 Africa 1952 52 52
## 2 Africa 2007 43 46
## 3 Americas 1952 22 23
## 4 Americas 2007 1 12
## 5 Asia 1952 32 31
## 6 Asia 2007 8 20
## 7 Europe 1952 13 23
## 8 Europe 2007 0 2
5.) Now, we want to look at average life expectancy and average GDP
I'm interested in whether the relationship between GDP and life expectancy has become stronger or weaker over time.
avgLifeExpAndGdp <- ddply(bDat, ~year + continent, summarize, avgGdp = mean(gdpPercap),
avgLifeExp = mean(lifeExp))
avgLifeExpAndGdp
## year continent avgGdp avgLifeExp
## 1 1952 Africa 1253 39.14
## 2 1952 Americas 4079 53.28
## 3 1952 Asia 5195 46.31
## 4 1952 Europe 5661 64.41
## 5 2007 Africa 3089 54.81
## 6 2007 Americas 11003 73.61
## 7 2007 Asia 12473 70.73
## 8 2007 Europe 25054 77.65
xyplot(avgLifeExp ~ avgGdp, avgLifeExpAndGdp, group = year, type = c("p", "r"),
auto.key = TRUE, main = "Relationship between GDP and life expectancy for 1952 and 2007")
We can see that the relationship between GDP and life expectancy seems to have become weaker, given the decrease in slope from 1952 to 2007.
To sum, lattice is a very useful package for creating graphics.