Prepared by: Amanda Yuen
This is an R Markdown document. For homework #2, we will carry out a brief analysis of the Gapminder data located here.
Let's import the data into R:
gDat <- read.delim("gapminderDataFiveYear.txt")
Now, let's get an overview of what the data looks like:
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
This tells us that there are 1704 observations and 6 variables in the data set. What are these 6 variables?
names(gDat)
## [1] "country" "year" "pop" "continent" "lifeExp" "gdpPercap"
Ok, great. Let's get some basic descriptive statistics:
summary(gDat)
## country year pop continent
## Afghanistan: 12 Min. :1952 Min. :6.00e+04 Africa :624
## Albania : 12 1st Qu.:1966 1st Qu.:2.79e+06 Americas:300
## Algeria : 12 Median :1980 Median :7.02e+06 Asia :396
## Angola : 12 Mean :1980 Mean :2.96e+07 Europe :360
## Argentina : 12 3rd Qu.:1993 3rd Qu.:1.96e+07 Oceania : 24
## Australia : 12 Max. :2007 Max. :1.32e+09
## (Other) :1632
## lifeExp gdpPercap
## Min. :23.6 Min. : 241
## 1st Qu.:48.2 1st Qu.: 1202
## Median :60.7 Median : 3532
## Mean :59.5 Mean : 7215
## 3rd Qu.:70.8 3rd Qu.: 9325
## Max. :82.6 Max. :113523
##
Interesting. Let's look at the relationship between GDP per capita and life expectancy for the countries in the Americas. We should start by calling up the lattice library so that we could visualize the data using pretty graphs:
library(lattice)
xyplot(lifeExp ~ gdpPercap | country, gDat, subset = continent == "Americas")
Very interesting. The average life expectancies in Canada, US, and Puerto Rico appear to start from a higher age and experience a more gradual increase as GDP per capita increases compared to the other countries. Let's take a closer look at Canada:
xyplot(lifeExp ~ gdpPercap, gDat, subset = country == "Canada", type = c("p",
"r"))