STAT545a - HW#5

Mina Park

In this exercise, we are reproducing figures from homework #4 using the R package, ggplot2. I am basing my figures on my submission from last week, where the lattice version of the figures can be found.

1.) Clean workspace, load libraries, load data, and perform superficial check that data import occurred properly

# clean workspace
rm(list = ls())

# load libraries
library(plyr)
library(ggplot2)

# load data
Dat <- read.delim("gapminderDataFiveYear.txt")
str(Dat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Things to notice about the data are the variables (country, year, pop, continent, lifeExp, gdpPercap) and that the data is in the form of a data frame.

2.) Create new datasets to work from
Because Oceania only has 2 countries, it is not very informative. So we will remove it from our data.

aDat <- subset(Dat, continent != "Oceania")
str(aDat)  #Oceania is still present
## 'data.frame':    1680 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
aDat <- droplevels(subset(Dat, continent != "Oceania"))
unique(aDat$continent)
## [1] Asia     Europe   Africa   Americas
## Levels: Africa Americas Asia Europe

We can see that Oceania has been dropped from the continent factor, using the droplevels() function.

We also want to look at only the years 1952 and 2007 for some questions, so I will generate another data frame for this purpose specifically.

bDat <- subset(aDat, year %in% c("1952", "2007"))
unique(bDat$year)
## [1] 1952 2007

3.) Explore GDP and life expectancy per capita. We will be generating stripplots.
First, we will look at GDP per capita.

avgGdp <- ddply(bDat, ~continent + year, summarize, country = country, gdpPercap = gdpPercap)
avgGdp
##     continent year                  country gdpPercap
## 1      Africa 1952                  Algeria    2449.0
## 2      Africa 1952                   Angola    3520.6
## 3      Africa 1952                    Benin    1062.8
## 4      Africa 1952                 Botswana     851.2
## 5      Africa 1952             Burkina Faso     543.3
## 6      Africa 1952                  Burundi     339.3
## 7      Africa 1952                 Cameroon    1172.7
## 8      Africa 1952 Central African Republic    1071.3
## 9      Africa 1952                     Chad    1178.7
## 10     Africa 1952                  Comoros    1103.0
## 11     Africa 1952         Congo, Dem. Rep.     780.5
## 12     Africa 1952              Congo, Rep.    2125.6
## 13     Africa 1952            Cote d'Ivoire    1388.6
## 14     Africa 1952                 Djibouti    2669.5
## 15     Africa 1952                    Egypt    1418.8
## 16     Africa 1952        Equatorial Guinea     375.6
## 17     Africa 1952                  Eritrea     328.9
## 18     Africa 1952                 Ethiopia     362.1
## 19     Africa 1952                    Gabon    4293.5
## 20     Africa 1952                   Gambia     485.2
## 21     Africa 1952                    Ghana     911.3
## 22     Africa 1952                   Guinea     510.2
## 23     Africa 1952            Guinea-Bissau     299.9
## 24     Africa 1952                    Kenya     853.5
## 25     Africa 1952                  Lesotho     298.8
## 26     Africa 1952                  Liberia     575.6
## 27     Africa 1952                    Libya    2387.5
## 28     Africa 1952               Madagascar    1443.0
## 29     Africa 1952                   Malawi     369.2
## 30     Africa 1952                     Mali     452.3
## 31     Africa 1952               Mauritania     743.1
## 32     Africa 1952                Mauritius    1968.0
## 33     Africa 1952                  Morocco    1688.2
## 34     Africa 1952               Mozambique     468.5
## 35     Africa 1952                  Namibia    2423.8
## 36     Africa 1952                    Niger     761.9
## 37     Africa 1952                  Nigeria    1077.3
## 38     Africa 1952                  Reunion    2718.9
## 39     Africa 1952                   Rwanda     493.3
## 40     Africa 1952    Sao Tome and Principe     879.6
## 41     Africa 1952                  Senegal    1450.4
## 42     Africa 1952             Sierra Leone     879.8
## 43     Africa 1952                  Somalia    1135.7
## 44     Africa 1952             South Africa    4725.3
## 45     Africa 1952                    Sudan    1616.0
## 46     Africa 1952                Swaziland    1148.4
## 47     Africa 1952                 Tanzania     716.7
## 48     Africa 1952                     Togo     859.8
## 49     Africa 1952                  Tunisia    1468.5
## 50     Africa 1952                   Uganda     734.8
## 51     Africa 1952                   Zambia    1147.4
## 52     Africa 1952                 Zimbabwe     406.9
## 53     Africa 2007                  Algeria    6223.4
## 54     Africa 2007                   Angola    4797.2
## 55     Africa 2007                    Benin    1441.3
## 56     Africa 2007                 Botswana   12569.9
## 57     Africa 2007             Burkina Faso    1217.0
## 58     Africa 2007                  Burundi     430.1
## 59     Africa 2007                 Cameroon    2042.1
## 60     Africa 2007 Central African Republic     706.0
## 61     Africa 2007                     Chad    1704.1
## 62     Africa 2007                  Comoros     986.1
## 63     Africa 2007         Congo, Dem. Rep.     277.6
## 64     Africa 2007              Congo, Rep.    3632.6
## 65     Africa 2007            Cote d'Ivoire    1544.8
## 66     Africa 2007                 Djibouti    2082.5
## 67     Africa 2007                    Egypt    5581.2
## 68     Africa 2007        Equatorial Guinea   12154.1
## 69     Africa 2007                  Eritrea     641.4
## 70     Africa 2007                 Ethiopia     690.8
## 71     Africa 2007                    Gabon   13206.5
## 72     Africa 2007                   Gambia     752.7
## 73     Africa 2007                    Ghana    1327.6
## 74     Africa 2007                   Guinea     942.7
## 75     Africa 2007            Guinea-Bissau     579.2
## 76     Africa 2007                    Kenya    1463.2
## 77     Africa 2007                  Lesotho    1569.3
## 78     Africa 2007                  Liberia     414.5
## 79     Africa 2007                    Libya   12057.5
## 80     Africa 2007               Madagascar    1044.8
## 81     Africa 2007                   Malawi     759.3
## 82     Africa 2007                     Mali    1042.6
## 83     Africa 2007               Mauritania    1803.2
## 84     Africa 2007                Mauritius   10957.0
## 85     Africa 2007                  Morocco    3820.2
## 86     Africa 2007               Mozambique     823.7
## 87     Africa 2007                  Namibia    4811.1
## 88     Africa 2007                    Niger     619.7
## 89     Africa 2007                  Nigeria    2014.0
## 90     Africa 2007                  Reunion    7670.1
## 91     Africa 2007                   Rwanda     863.1
## 92     Africa 2007    Sao Tome and Principe    1598.4
## 93     Africa 2007                  Senegal    1712.5
## 94     Africa 2007             Sierra Leone     862.5
## 95     Africa 2007                  Somalia     926.1
## 96     Africa 2007             South Africa    9269.7
## 97     Africa 2007                    Sudan    2602.4
## 98     Africa 2007                Swaziland    4513.5
## 99     Africa 2007                 Tanzania    1107.5
## 100    Africa 2007                     Togo     883.0
## 101    Africa 2007                  Tunisia    7092.9
## 102    Africa 2007                   Uganda    1056.4
## 103    Africa 2007                   Zambia    1271.2
## 104    Africa 2007                 Zimbabwe     469.7
## 105  Americas 1952                Argentina    5911.3
## 106  Americas 1952                  Bolivia    2677.3
## 107  Americas 1952                   Brazil    2108.9
## 108  Americas 1952                   Canada   11367.2
## 109  Americas 1952                    Chile    3940.0
## 110  Americas 1952                 Colombia    2144.1
## 111  Americas 1952               Costa Rica    2627.0
## 112  Americas 1952                     Cuba    5586.5
## 113  Americas 1952       Dominican Republic    1397.7
## 114  Americas 1952                  Ecuador    3522.1
## 115  Americas 1952              El Salvador    3048.3
## 116  Americas 1952                Guatemala    2428.2
## 117  Americas 1952                    Haiti    1840.4
## 118  Americas 1952                 Honduras    2194.9
## 119  Americas 1952                  Jamaica    2898.5
## 120  Americas 1952                   Mexico    3478.1
## 121  Americas 1952                Nicaragua    3112.4
## 122  Americas 1952                   Panama    2480.4
## 123  Americas 1952                 Paraguay    1952.3
## 124  Americas 1952                     Peru    3758.5
## 125  Americas 1952              Puerto Rico    3082.0
## 126  Americas 1952      Trinidad and Tobago    3023.3
## 127  Americas 1952            United States   13990.5
## 128  Americas 1952                  Uruguay    5716.8
## 129  Americas 1952                Venezuela    7689.8
## 130  Americas 2007                Argentina   12779.4
## 131  Americas 2007                  Bolivia    3822.1
## 132  Americas 2007                   Brazil    9065.8
## 133  Americas 2007                   Canada   36319.2
## 134  Americas 2007                    Chile   13171.6
## 135  Americas 2007                 Colombia    7006.6
## 136  Americas 2007               Costa Rica    9645.1
## 137  Americas 2007                     Cuba    8948.1
## 138  Americas 2007       Dominican Republic    6025.4
## 139  Americas 2007                  Ecuador    6873.3
## 140  Americas 2007              El Salvador    5728.4
## 141  Americas 2007                Guatemala    5186.1
## 142  Americas 2007                    Haiti    1201.6
## 143  Americas 2007                 Honduras    3548.3
## 144  Americas 2007                  Jamaica    7320.9
## 145  Americas 2007                   Mexico   11977.6
## 146  Americas 2007                Nicaragua    2749.3
## 147  Americas 2007                   Panama    9809.2
## 148  Americas 2007                 Paraguay    4172.8
## 149  Americas 2007                     Peru    7408.9
## 150  Americas 2007              Puerto Rico   19328.7
## 151  Americas 2007      Trinidad and Tobago   18008.5
## 152  Americas 2007            United States   42951.7
## 153  Americas 2007                  Uruguay   10611.5
## 154  Americas 2007                Venezuela   11415.8
## 155      Asia 1952              Afghanistan     779.4
## 156      Asia 1952                  Bahrain    9867.1
## 157      Asia 1952               Bangladesh     684.2
## 158      Asia 1952                 Cambodia     368.5
## 159      Asia 1952                    China     400.4
## 160      Asia 1952         Hong Kong, China    3054.4
## 161      Asia 1952                    India     546.6
## 162      Asia 1952                Indonesia     749.7
## 163      Asia 1952                     Iran    3035.3
## 164      Asia 1952                     Iraq    4129.8
## 165      Asia 1952                   Israel    4086.5
## 166      Asia 1952                    Japan    3217.0
## 167      Asia 1952                   Jordan    1546.9
## 168      Asia 1952         Korea, Dem. Rep.    1088.3
## 169      Asia 1952              Korea, Rep.    1030.6
## 170      Asia 1952                   Kuwait  108382.4
## 171      Asia 1952                  Lebanon    4834.8
## 172      Asia 1952                 Malaysia    1831.1
## 173      Asia 1952                 Mongolia     786.6
## 174      Asia 1952                  Myanmar     331.0
## 175      Asia 1952                    Nepal     545.9
## 176      Asia 1952                     Oman    1828.2
## 177      Asia 1952                 Pakistan     684.6
## 178      Asia 1952              Philippines    1272.9
## 179      Asia 1952             Saudi Arabia    6459.6
## 180      Asia 1952                Singapore    2315.1
## 181      Asia 1952                Sri Lanka    1083.5
## 182      Asia 1952                    Syria    1643.5
## 183      Asia 1952                   Taiwan    1206.9
## 184      Asia 1952                 Thailand     757.8
## 185      Asia 1952                  Vietnam     605.1
## 186      Asia 1952       West Bank and Gaza    1515.6
## 187      Asia 1952              Yemen, Rep.     781.7
## 188      Asia 2007              Afghanistan     974.6
## 189      Asia 2007                  Bahrain   29796.0
## 190      Asia 2007               Bangladesh    1391.3
## 191      Asia 2007                 Cambodia    1713.8
## 192      Asia 2007                    China    4959.1
## 193      Asia 2007         Hong Kong, China   39725.0
## 194      Asia 2007                    India    2452.2
## 195      Asia 2007                Indonesia    3540.7
## 196      Asia 2007                     Iran   11605.7
## 197      Asia 2007                     Iraq    4471.1
## 198      Asia 2007                   Israel   25523.3
## 199      Asia 2007                    Japan   31656.1
## 200      Asia 2007                   Jordan    4519.5
## 201      Asia 2007         Korea, Dem. Rep.    1593.1
## 202      Asia 2007              Korea, Rep.   23348.1
## 203      Asia 2007                   Kuwait   47307.0
## 204      Asia 2007                  Lebanon   10461.1
## 205      Asia 2007                 Malaysia   12451.7
## 206      Asia 2007                 Mongolia    3095.8
## 207      Asia 2007                  Myanmar     944.0
## 208      Asia 2007                    Nepal    1091.4
## 209      Asia 2007                     Oman   22316.2
## 210      Asia 2007                 Pakistan    2605.9
## 211      Asia 2007              Philippines    3190.5
## 212      Asia 2007             Saudi Arabia   21654.8
## 213      Asia 2007                Singapore   47143.2
## 214      Asia 2007                Sri Lanka    3970.1
## 215      Asia 2007                    Syria    4184.5
## 216      Asia 2007                   Taiwan   28718.3
## 217      Asia 2007                 Thailand    7458.4
## 218      Asia 2007                  Vietnam    2441.6
## 219      Asia 2007       West Bank and Gaza    3025.3
## 220      Asia 2007              Yemen, Rep.    2280.8
## 221    Europe 1952                  Albania    1601.1
## 222    Europe 1952                  Austria    6137.1
## 223    Europe 1952                  Belgium    8343.1
## 224    Europe 1952   Bosnia and Herzegovina     973.5
## 225    Europe 1952                 Bulgaria    2444.3
## 226    Europe 1952                  Croatia    3119.2
## 227    Europe 1952           Czech Republic    6876.1
## 228    Europe 1952                  Denmark    9692.4
## 229    Europe 1952                  Finland    6424.5
## 230    Europe 1952                   France    7029.8
## 231    Europe 1952                  Germany    7144.1
## 232    Europe 1952                   Greece    3530.7
## 233    Europe 1952                  Hungary    5263.7
## 234    Europe 1952                  Iceland    7267.7
## 235    Europe 1952                  Ireland    5210.3
## 236    Europe 1952                    Italy    4931.4
## 237    Europe 1952               Montenegro    2647.6
## 238    Europe 1952              Netherlands    8941.6
## 239    Europe 1952                   Norway   10095.4
## 240    Europe 1952                   Poland    4029.3
## 241    Europe 1952                 Portugal    3068.3
## 242    Europe 1952                  Romania    3144.6
## 243    Europe 1952                   Serbia    3581.5
## 244    Europe 1952          Slovak Republic    5074.7
## 245    Europe 1952                 Slovenia    4215.0
## 246    Europe 1952                    Spain    3834.0
## 247    Europe 1952                   Sweden    8527.8
## 248    Europe 1952              Switzerland   14734.2
## 249    Europe 1952                   Turkey    1969.1
## 250    Europe 1952           United Kingdom    9979.5
## 251    Europe 2007                  Albania    5937.0
## 252    Europe 2007                  Austria   36126.5
## 253    Europe 2007                  Belgium   33692.6
## 254    Europe 2007   Bosnia and Herzegovina    7446.3
## 255    Europe 2007                 Bulgaria   10680.8
## 256    Europe 2007                  Croatia   14619.2
## 257    Europe 2007           Czech Republic   22833.3
## 258    Europe 2007                  Denmark   35278.4
## 259    Europe 2007                  Finland   33207.1
## 260    Europe 2007                   France   30470.0
## 261    Europe 2007                  Germany   32170.4
## 262    Europe 2007                   Greece   27538.4
## 263    Europe 2007                  Hungary   18008.9
## 264    Europe 2007                  Iceland   36180.8
## 265    Europe 2007                  Ireland   40676.0
## 266    Europe 2007                    Italy   28569.7
## 267    Europe 2007               Montenegro    9253.9
## 268    Europe 2007              Netherlands   36797.9
## 269    Europe 2007                   Norway   49357.2
## 270    Europe 2007                   Poland   15389.9
## 271    Europe 2007                 Portugal   20509.6
## 272    Europe 2007                  Romania   10808.5
## 273    Europe 2007                   Serbia    9786.5
## 274    Europe 2007          Slovak Republic   18678.3
## 275    Europe 2007                 Slovenia   25768.3
## 276    Europe 2007                    Spain   28821.1
## 277    Europe 2007                   Sweden   33859.7
## 278    Europe 2007              Switzerland   37506.4
## 279    Europe 2007                   Turkey    8458.3
## 280    Europe 2007           United Kingdom   33203.3
# generate figure
pGdp <- ggplot(bDat, aes(x = gdpPercap, y = continent)) + geom_point() + facet_grid(~year) + 
    geom_jitter()
pGdp + ggtitle("GDP by continent in 1952 and 2007")

plot of chunk unnamed-chunk-4

In the last homework assignment, we discovered that the outlier in 1952 for Asia was Kuwait, with a GDP per capita of $108,382.40.

Now, we will look at life expectancy.

cDat <- within(bDat, continent <- reorder(continent, lifeExp))  #reorder in order of life expectancy
avgLifeExp <- daply(cDat, ~year + continent, summarize, avgLifeExp = mean(lifeExp))
avgLifeExp
##       continent
## year   Africa Asia  Americas Europe
##   1952 39.14  46.31 53.28    64.41 
##   2007 54.81  70.73 73.61    77.65
# generate figure
pLifeExp <- ggplot(bDat, aes(x = factor(year), y = lifeExp)) + geom_point() + 
    facet_grid(~continent) + geom_smooth(aes(group = 1), method = "lm") + geom_jitter()
pLifeExp + ggtitle("Life expectancy by continent in 1952 and 2007") + xlab("year")

plot of chunk unnamed-chunk-5

As expected, we can see that life expectancy has increased in all continents from 1952 to 2007.

4.) Now, we want to look at average life expectancy and average GDP using scatterplots.

I'm interested in whether the relationship between GDP and life expectancy has become stronger or weaker over time.

avgLifeExpAndGdp <- ddply(aDat, ~year + continent, summarize, avgGdp = mean(gdpPercap), 
    avgLifeExp = mean(lifeExp))
avgLifeExpAndGdp
##    year continent avgGdp avgLifeExp
## 1  1952    Africa   1253      39.14
## 2  1952  Americas   4079      53.28
## 3  1952      Asia   5195      46.31
## 4  1952    Europe   5661      64.41
## 5  1957    Africa   1385      41.27
## 6  1957  Americas   4616      55.96
## 7  1957      Asia   5788      49.32
## 8  1957    Europe   6963      66.70
## 9  1962    Africa   1598      43.32
## 10 1962  Americas   4902      58.40
## 11 1962      Asia   5729      51.56
## 12 1962    Europe   8365      68.54
## 13 1967    Africa   2050      45.33
## 14 1967  Americas   5668      60.41
## 15 1967      Asia   5971      54.66
## 16 1967    Europe  10144      69.74
## 17 1972    Africa   2340      47.45
## 18 1972  Americas   6491      62.39
## 19 1972      Asia   8187      57.32
## 20 1972    Europe  12480      70.78
## 21 1977    Africa   2586      49.58
## 22 1977  Americas   7352      64.39
## 23 1977      Asia   7791      59.61
## 24 1977    Europe  14284      71.94
## 25 1982    Africa   2482      51.59
## 26 1982  Americas   7507      66.23
## 27 1982      Asia   7434      62.62
## 28 1982    Europe  15618      72.81
## 29 1987    Africa   2283      53.34
## 30 1987  Americas   7793      68.09
## 31 1987      Asia   7608      64.85
## 32 1987    Europe  17214      73.64
## 33 1992    Africa   2282      53.63
## 34 1992  Americas   8045      69.57
## 35 1992      Asia   8640      66.54
## 36 1992    Europe  17062      74.44
## 37 1997    Africa   2379      53.60
## 38 1997  Americas   8889      71.15
## 39 1997      Asia   9834      68.02
## 40 1997    Europe  19077      75.51
## 41 2002    Africa   2599      53.33
## 42 2002  Americas   9288      72.42
## 43 2002      Asia  10174      69.23
## 44 2002    Europe  21712      76.70
## 45 2007    Africa   3089      54.81
## 46 2007  Americas  11003      73.61
## 47 2007      Asia  12473      70.73
## 48 2007    Europe  25054      77.65
# generate figure
pLEandGdpTime <- ggplot(avgLifeExpAndGdp, aes(x = avgGdp, y = avgLifeExp, color = year)) + 
    geom_point() + geom_smooth(aes(group = year), method = "lm", se = FALSE)
pLEandGdpTime + ggtitle("Relationship between GDP and life expectancy over time")

plot of chunk unnamed-chunk-6

We can see that the relationship between GDP and life expectancy seems to have become weaker, given the decreases in slope with time.

I'm also interested in looking at the relationship between GDP and life expectancy over time, per continent.

pLEandGdpCont <- ggplot(avgLifeExpAndGdp, aes(x = avgGdp, y = avgLifeExp, color = year, 
    shape = continent)) + geom_point() + geom_smooth(aes(group = continent), 
    method = "lm", se = FALSE)
pLEandGdpCont + ggtitle("Relationship between GDP and life expectancy over time by continent")

plot of chunk unnamed-chunk-7

We can see that for all continents, life expectancy as a function of GDP appears to be increasing. We also note that life expectancy rises more sharply over time for continents with lower GDPs. Note: This was very straightforward to do using ggplot2

To sum: Comparison/contrast of ggplot2 with lattice
ggplot2, like lattice, is a very useful package for generating figures. I've noticed that ggplot2 follows a solid logic that is built on layering, which allows for greater control and flexibility in figure generation. Personal note: I feel like I'm slowly but surely starting to understand how to do this… :)