Mina Park
In this exercise, we are reproducing figures from homework #4 using the R package, ggplot2. I am basing my figures on my submission from last week, where the lattice version of the figures can be found.
1.) Clean workspace, load libraries, load data, and perform superficial check that data import occurred properly
# clean workspace
rm(list = ls())
# load libraries
library(plyr)
library(ggplot2)
# load data
Dat <- read.delim("gapminderDataFiveYear.txt")
str(Dat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Things to notice about the data are the variables (country, year, pop, continent, lifeExp, gdpPercap) and that the data is in the form of a data frame.
2.) Create new datasets to work from
Because Oceania only has 2 countries, it is not very informative. So we will remove it from our data.
aDat <- subset(Dat, continent != "Oceania")
str(aDat) #Oceania is still present
## 'data.frame': 1680 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
aDat <- droplevels(subset(Dat, continent != "Oceania"))
unique(aDat$continent)
## [1] Asia Europe Africa Americas
## Levels: Africa Americas Asia Europe
We can see that Oceania has been dropped from the continent factor, using the droplevels() function.
We also want to look at only the years 1952 and 2007 for some questions, so I will generate another data frame for this purpose specifically.
bDat <- subset(aDat, year %in% c("1952", "2007"))
unique(bDat$year)
## [1] 1952 2007
3.) Explore GDP and life expectancy per capita. We will be generating stripplots.
First, we will look at GDP per capita.
avgGdp <- ddply(bDat, ~continent + year, summarize, country = country, gdpPercap = gdpPercap)
avgGdp
## continent year country gdpPercap
## 1 Africa 1952 Algeria 2449.0
## 2 Africa 1952 Angola 3520.6
## 3 Africa 1952 Benin 1062.8
## 4 Africa 1952 Botswana 851.2
## 5 Africa 1952 Burkina Faso 543.3
## 6 Africa 1952 Burundi 339.3
## 7 Africa 1952 Cameroon 1172.7
## 8 Africa 1952 Central African Republic 1071.3
## 9 Africa 1952 Chad 1178.7
## 10 Africa 1952 Comoros 1103.0
## 11 Africa 1952 Congo, Dem. Rep. 780.5
## 12 Africa 1952 Congo, Rep. 2125.6
## 13 Africa 1952 Cote d'Ivoire 1388.6
## 14 Africa 1952 Djibouti 2669.5
## 15 Africa 1952 Egypt 1418.8
## 16 Africa 1952 Equatorial Guinea 375.6
## 17 Africa 1952 Eritrea 328.9
## 18 Africa 1952 Ethiopia 362.1
## 19 Africa 1952 Gabon 4293.5
## 20 Africa 1952 Gambia 485.2
## 21 Africa 1952 Ghana 911.3
## 22 Africa 1952 Guinea 510.2
## 23 Africa 1952 Guinea-Bissau 299.9
## 24 Africa 1952 Kenya 853.5
## 25 Africa 1952 Lesotho 298.8
## 26 Africa 1952 Liberia 575.6
## 27 Africa 1952 Libya 2387.5
## 28 Africa 1952 Madagascar 1443.0
## 29 Africa 1952 Malawi 369.2
## 30 Africa 1952 Mali 452.3
## 31 Africa 1952 Mauritania 743.1
## 32 Africa 1952 Mauritius 1968.0
## 33 Africa 1952 Morocco 1688.2
## 34 Africa 1952 Mozambique 468.5
## 35 Africa 1952 Namibia 2423.8
## 36 Africa 1952 Niger 761.9
## 37 Africa 1952 Nigeria 1077.3
## 38 Africa 1952 Reunion 2718.9
## 39 Africa 1952 Rwanda 493.3
## 40 Africa 1952 Sao Tome and Principe 879.6
## 41 Africa 1952 Senegal 1450.4
## 42 Africa 1952 Sierra Leone 879.8
## 43 Africa 1952 Somalia 1135.7
## 44 Africa 1952 South Africa 4725.3
## 45 Africa 1952 Sudan 1616.0
## 46 Africa 1952 Swaziland 1148.4
## 47 Africa 1952 Tanzania 716.7
## 48 Africa 1952 Togo 859.8
## 49 Africa 1952 Tunisia 1468.5
## 50 Africa 1952 Uganda 734.8
## 51 Africa 1952 Zambia 1147.4
## 52 Africa 1952 Zimbabwe 406.9
## 53 Africa 2007 Algeria 6223.4
## 54 Africa 2007 Angola 4797.2
## 55 Africa 2007 Benin 1441.3
## 56 Africa 2007 Botswana 12569.9
## 57 Africa 2007 Burkina Faso 1217.0
## 58 Africa 2007 Burundi 430.1
## 59 Africa 2007 Cameroon 2042.1
## 60 Africa 2007 Central African Republic 706.0
## 61 Africa 2007 Chad 1704.1
## 62 Africa 2007 Comoros 986.1
## 63 Africa 2007 Congo, Dem. Rep. 277.6
## 64 Africa 2007 Congo, Rep. 3632.6
## 65 Africa 2007 Cote d'Ivoire 1544.8
## 66 Africa 2007 Djibouti 2082.5
## 67 Africa 2007 Egypt 5581.2
## 68 Africa 2007 Equatorial Guinea 12154.1
## 69 Africa 2007 Eritrea 641.4
## 70 Africa 2007 Ethiopia 690.8
## 71 Africa 2007 Gabon 13206.5
## 72 Africa 2007 Gambia 752.7
## 73 Africa 2007 Ghana 1327.6
## 74 Africa 2007 Guinea 942.7
## 75 Africa 2007 Guinea-Bissau 579.2
## 76 Africa 2007 Kenya 1463.2
## 77 Africa 2007 Lesotho 1569.3
## 78 Africa 2007 Liberia 414.5
## 79 Africa 2007 Libya 12057.5
## 80 Africa 2007 Madagascar 1044.8
## 81 Africa 2007 Malawi 759.3
## 82 Africa 2007 Mali 1042.6
## 83 Africa 2007 Mauritania 1803.2
## 84 Africa 2007 Mauritius 10957.0
## 85 Africa 2007 Morocco 3820.2
## 86 Africa 2007 Mozambique 823.7
## 87 Africa 2007 Namibia 4811.1
## 88 Africa 2007 Niger 619.7
## 89 Africa 2007 Nigeria 2014.0
## 90 Africa 2007 Reunion 7670.1
## 91 Africa 2007 Rwanda 863.1
## 92 Africa 2007 Sao Tome and Principe 1598.4
## 93 Africa 2007 Senegal 1712.5
## 94 Africa 2007 Sierra Leone 862.5
## 95 Africa 2007 Somalia 926.1
## 96 Africa 2007 South Africa 9269.7
## 97 Africa 2007 Sudan 2602.4
## 98 Africa 2007 Swaziland 4513.5
## 99 Africa 2007 Tanzania 1107.5
## 100 Africa 2007 Togo 883.0
## 101 Africa 2007 Tunisia 7092.9
## 102 Africa 2007 Uganda 1056.4
## 103 Africa 2007 Zambia 1271.2
## 104 Africa 2007 Zimbabwe 469.7
## 105 Americas 1952 Argentina 5911.3
## 106 Americas 1952 Bolivia 2677.3
## 107 Americas 1952 Brazil 2108.9
## 108 Americas 1952 Canada 11367.2
## 109 Americas 1952 Chile 3940.0
## 110 Americas 1952 Colombia 2144.1
## 111 Americas 1952 Costa Rica 2627.0
## 112 Americas 1952 Cuba 5586.5
## 113 Americas 1952 Dominican Republic 1397.7
## 114 Americas 1952 Ecuador 3522.1
## 115 Americas 1952 El Salvador 3048.3
## 116 Americas 1952 Guatemala 2428.2
## 117 Americas 1952 Haiti 1840.4
## 118 Americas 1952 Honduras 2194.9
## 119 Americas 1952 Jamaica 2898.5
## 120 Americas 1952 Mexico 3478.1
## 121 Americas 1952 Nicaragua 3112.4
## 122 Americas 1952 Panama 2480.4
## 123 Americas 1952 Paraguay 1952.3
## 124 Americas 1952 Peru 3758.5
## 125 Americas 1952 Puerto Rico 3082.0
## 126 Americas 1952 Trinidad and Tobago 3023.3
## 127 Americas 1952 United States 13990.5
## 128 Americas 1952 Uruguay 5716.8
## 129 Americas 1952 Venezuela 7689.8
## 130 Americas 2007 Argentina 12779.4
## 131 Americas 2007 Bolivia 3822.1
## 132 Americas 2007 Brazil 9065.8
## 133 Americas 2007 Canada 36319.2
## 134 Americas 2007 Chile 13171.6
## 135 Americas 2007 Colombia 7006.6
## 136 Americas 2007 Costa Rica 9645.1
## 137 Americas 2007 Cuba 8948.1
## 138 Americas 2007 Dominican Republic 6025.4
## 139 Americas 2007 Ecuador 6873.3
## 140 Americas 2007 El Salvador 5728.4
## 141 Americas 2007 Guatemala 5186.1
## 142 Americas 2007 Haiti 1201.6
## 143 Americas 2007 Honduras 3548.3
## 144 Americas 2007 Jamaica 7320.9
## 145 Americas 2007 Mexico 11977.6
## 146 Americas 2007 Nicaragua 2749.3
## 147 Americas 2007 Panama 9809.2
## 148 Americas 2007 Paraguay 4172.8
## 149 Americas 2007 Peru 7408.9
## 150 Americas 2007 Puerto Rico 19328.7
## 151 Americas 2007 Trinidad and Tobago 18008.5
## 152 Americas 2007 United States 42951.7
## 153 Americas 2007 Uruguay 10611.5
## 154 Americas 2007 Venezuela 11415.8
## 155 Asia 1952 Afghanistan 779.4
## 156 Asia 1952 Bahrain 9867.1
## 157 Asia 1952 Bangladesh 684.2
## 158 Asia 1952 Cambodia 368.5
## 159 Asia 1952 China 400.4
## 160 Asia 1952 Hong Kong, China 3054.4
## 161 Asia 1952 India 546.6
## 162 Asia 1952 Indonesia 749.7
## 163 Asia 1952 Iran 3035.3
## 164 Asia 1952 Iraq 4129.8
## 165 Asia 1952 Israel 4086.5
## 166 Asia 1952 Japan 3217.0
## 167 Asia 1952 Jordan 1546.9
## 168 Asia 1952 Korea, Dem. Rep. 1088.3
## 169 Asia 1952 Korea, Rep. 1030.6
## 170 Asia 1952 Kuwait 108382.4
## 171 Asia 1952 Lebanon 4834.8
## 172 Asia 1952 Malaysia 1831.1
## 173 Asia 1952 Mongolia 786.6
## 174 Asia 1952 Myanmar 331.0
## 175 Asia 1952 Nepal 545.9
## 176 Asia 1952 Oman 1828.2
## 177 Asia 1952 Pakistan 684.6
## 178 Asia 1952 Philippines 1272.9
## 179 Asia 1952 Saudi Arabia 6459.6
## 180 Asia 1952 Singapore 2315.1
## 181 Asia 1952 Sri Lanka 1083.5
## 182 Asia 1952 Syria 1643.5
## 183 Asia 1952 Taiwan 1206.9
## 184 Asia 1952 Thailand 757.8
## 185 Asia 1952 Vietnam 605.1
## 186 Asia 1952 West Bank and Gaza 1515.6
## 187 Asia 1952 Yemen, Rep. 781.7
## 188 Asia 2007 Afghanistan 974.6
## 189 Asia 2007 Bahrain 29796.0
## 190 Asia 2007 Bangladesh 1391.3
## 191 Asia 2007 Cambodia 1713.8
## 192 Asia 2007 China 4959.1
## 193 Asia 2007 Hong Kong, China 39725.0
## 194 Asia 2007 India 2452.2
## 195 Asia 2007 Indonesia 3540.7
## 196 Asia 2007 Iran 11605.7
## 197 Asia 2007 Iraq 4471.1
## 198 Asia 2007 Israel 25523.3
## 199 Asia 2007 Japan 31656.1
## 200 Asia 2007 Jordan 4519.5
## 201 Asia 2007 Korea, Dem. Rep. 1593.1
## 202 Asia 2007 Korea, Rep. 23348.1
## 203 Asia 2007 Kuwait 47307.0
## 204 Asia 2007 Lebanon 10461.1
## 205 Asia 2007 Malaysia 12451.7
## 206 Asia 2007 Mongolia 3095.8
## 207 Asia 2007 Myanmar 944.0
## 208 Asia 2007 Nepal 1091.4
## 209 Asia 2007 Oman 22316.2
## 210 Asia 2007 Pakistan 2605.9
## 211 Asia 2007 Philippines 3190.5
## 212 Asia 2007 Saudi Arabia 21654.8
## 213 Asia 2007 Singapore 47143.2
## 214 Asia 2007 Sri Lanka 3970.1
## 215 Asia 2007 Syria 4184.5
## 216 Asia 2007 Taiwan 28718.3
## 217 Asia 2007 Thailand 7458.4
## 218 Asia 2007 Vietnam 2441.6
## 219 Asia 2007 West Bank and Gaza 3025.3
## 220 Asia 2007 Yemen, Rep. 2280.8
## 221 Europe 1952 Albania 1601.1
## 222 Europe 1952 Austria 6137.1
## 223 Europe 1952 Belgium 8343.1
## 224 Europe 1952 Bosnia and Herzegovina 973.5
## 225 Europe 1952 Bulgaria 2444.3
## 226 Europe 1952 Croatia 3119.2
## 227 Europe 1952 Czech Republic 6876.1
## 228 Europe 1952 Denmark 9692.4
## 229 Europe 1952 Finland 6424.5
## 230 Europe 1952 France 7029.8
## 231 Europe 1952 Germany 7144.1
## 232 Europe 1952 Greece 3530.7
## 233 Europe 1952 Hungary 5263.7
## 234 Europe 1952 Iceland 7267.7
## 235 Europe 1952 Ireland 5210.3
## 236 Europe 1952 Italy 4931.4
## 237 Europe 1952 Montenegro 2647.6
## 238 Europe 1952 Netherlands 8941.6
## 239 Europe 1952 Norway 10095.4
## 240 Europe 1952 Poland 4029.3
## 241 Europe 1952 Portugal 3068.3
## 242 Europe 1952 Romania 3144.6
## 243 Europe 1952 Serbia 3581.5
## 244 Europe 1952 Slovak Republic 5074.7
## 245 Europe 1952 Slovenia 4215.0
## 246 Europe 1952 Spain 3834.0
## 247 Europe 1952 Sweden 8527.8
## 248 Europe 1952 Switzerland 14734.2
## 249 Europe 1952 Turkey 1969.1
## 250 Europe 1952 United Kingdom 9979.5
## 251 Europe 2007 Albania 5937.0
## 252 Europe 2007 Austria 36126.5
## 253 Europe 2007 Belgium 33692.6
## 254 Europe 2007 Bosnia and Herzegovina 7446.3
## 255 Europe 2007 Bulgaria 10680.8
## 256 Europe 2007 Croatia 14619.2
## 257 Europe 2007 Czech Republic 22833.3
## 258 Europe 2007 Denmark 35278.4
## 259 Europe 2007 Finland 33207.1
## 260 Europe 2007 France 30470.0
## 261 Europe 2007 Germany 32170.4
## 262 Europe 2007 Greece 27538.4
## 263 Europe 2007 Hungary 18008.9
## 264 Europe 2007 Iceland 36180.8
## 265 Europe 2007 Ireland 40676.0
## 266 Europe 2007 Italy 28569.7
## 267 Europe 2007 Montenegro 9253.9
## 268 Europe 2007 Netherlands 36797.9
## 269 Europe 2007 Norway 49357.2
## 270 Europe 2007 Poland 15389.9
## 271 Europe 2007 Portugal 20509.6
## 272 Europe 2007 Romania 10808.5
## 273 Europe 2007 Serbia 9786.5
## 274 Europe 2007 Slovak Republic 18678.3
## 275 Europe 2007 Slovenia 25768.3
## 276 Europe 2007 Spain 28821.1
## 277 Europe 2007 Sweden 33859.7
## 278 Europe 2007 Switzerland 37506.4
## 279 Europe 2007 Turkey 8458.3
## 280 Europe 2007 United Kingdom 33203.3
# generate figure
pGdp <- ggplot(bDat, aes(x = gdpPercap, y = continent)) + geom_point() + facet_grid(~year) +
geom_jitter()
pGdp + ggtitle("GDP by continent in 1952 and 2007")
In the last homework assignment, we discovered that the outlier in 1952 for Asia was Kuwait, with a GDP per capita of $108,382.40.
Now, we will look at life expectancy.
cDat <- within(bDat, continent <- reorder(continent, lifeExp)) #reorder in order of life expectancy
avgLifeExp <- daply(cDat, ~year + continent, summarize, avgLifeExp = mean(lifeExp))
avgLifeExp
## continent
## year Africa Asia Americas Europe
## 1952 39.14 46.31 53.28 64.41
## 2007 54.81 70.73 73.61 77.65
# generate figure
pLifeExp <- ggplot(bDat, aes(x = factor(year), y = lifeExp)) + geom_point() +
facet_grid(~continent) + geom_smooth(aes(group = 1), method = "lm") + geom_jitter()
pLifeExp + ggtitle("Life expectancy by continent in 1952 and 2007") + xlab("year")
As expected, we can see that life expectancy has increased in all continents from 1952 to 2007.
4.) Now, we want to look at average life expectancy and average GDP using scatterplots.
I'm interested in whether the relationship between GDP and life expectancy has become stronger or weaker over time.
avgLifeExpAndGdp <- ddply(aDat, ~year + continent, summarize, avgGdp = mean(gdpPercap),
avgLifeExp = mean(lifeExp))
avgLifeExpAndGdp
## year continent avgGdp avgLifeExp
## 1 1952 Africa 1253 39.14
## 2 1952 Americas 4079 53.28
## 3 1952 Asia 5195 46.31
## 4 1952 Europe 5661 64.41
## 5 1957 Africa 1385 41.27
## 6 1957 Americas 4616 55.96
## 7 1957 Asia 5788 49.32
## 8 1957 Europe 6963 66.70
## 9 1962 Africa 1598 43.32
## 10 1962 Americas 4902 58.40
## 11 1962 Asia 5729 51.56
## 12 1962 Europe 8365 68.54
## 13 1967 Africa 2050 45.33
## 14 1967 Americas 5668 60.41
## 15 1967 Asia 5971 54.66
## 16 1967 Europe 10144 69.74
## 17 1972 Africa 2340 47.45
## 18 1972 Americas 6491 62.39
## 19 1972 Asia 8187 57.32
## 20 1972 Europe 12480 70.78
## 21 1977 Africa 2586 49.58
## 22 1977 Americas 7352 64.39
## 23 1977 Asia 7791 59.61
## 24 1977 Europe 14284 71.94
## 25 1982 Africa 2482 51.59
## 26 1982 Americas 7507 66.23
## 27 1982 Asia 7434 62.62
## 28 1982 Europe 15618 72.81
## 29 1987 Africa 2283 53.34
## 30 1987 Americas 7793 68.09
## 31 1987 Asia 7608 64.85
## 32 1987 Europe 17214 73.64
## 33 1992 Africa 2282 53.63
## 34 1992 Americas 8045 69.57
## 35 1992 Asia 8640 66.54
## 36 1992 Europe 17062 74.44
## 37 1997 Africa 2379 53.60
## 38 1997 Americas 8889 71.15
## 39 1997 Asia 9834 68.02
## 40 1997 Europe 19077 75.51
## 41 2002 Africa 2599 53.33
## 42 2002 Americas 9288 72.42
## 43 2002 Asia 10174 69.23
## 44 2002 Europe 21712 76.70
## 45 2007 Africa 3089 54.81
## 46 2007 Americas 11003 73.61
## 47 2007 Asia 12473 70.73
## 48 2007 Europe 25054 77.65
# generate figure
pLEandGdpTime <- ggplot(avgLifeExpAndGdp, aes(x = avgGdp, y = avgLifeExp, color = year)) +
geom_point() + geom_smooth(aes(group = year), method = "lm", se = FALSE)
pLEandGdpTime + ggtitle("Relationship between GDP and life expectancy over time")
We can see that the relationship between GDP and life expectancy seems to have become weaker, given the decreases in slope with time.
I'm also interested in looking at the relationship between GDP and life expectancy over time, per continent.
pLEandGdpCont <- ggplot(avgLifeExpAndGdp, aes(x = avgGdp, y = avgLifeExp, color = year,
shape = continent)) + geom_point() + geom_smooth(aes(group = continent),
method = "lm", se = FALSE)
pLEandGdpCont + ggtitle("Relationship between GDP and life expectancy over time by continent")
We can see that for all continents, life expectancy as a function of GDP appears to be increasing. We also note that life expectancy rises more sharply over time for continents with lower GDPs. Note: This was very straightforward to do using ggplot2
To sum: Comparison/contrast of ggplot2 with lattice
ggplot2, like lattice, is a very useful package for generating figures. I've noticed that ggplot2 follows a solid logic that is built on layering, which allows for greater control and flexibility in figure generation. Personal note: I feel like I'm slowly but surely starting to understand how to do this… :)