Let's explore the gap minder data. You can find it here
library(lattice)
## Warning: package 'lattice' was built under R version 2.15.3
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.table(gdURL, header = TRUE, sep = '\t', quote = "\"")
What is in gDat?
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Some elementary statistics
summary(gDat)
## country year pop continent
## Afghanistan: 12 Min. :1952 Min. :6.00e+04 Africa :624
## Albania : 12 1st Qu.:1966 1st Qu.:2.79e+06 Americas:300
## Algeria : 12 Median :1980 Median :7.02e+06 Asia :396
## Angola : 12 Mean :1980 Mean :2.96e+07 Europe :360
## Argentina : 12 3rd Qu.:1993 3rd Qu.:1.96e+07 Oceania : 24
## Australia : 12 Max. :2007 Max. :1.32e+09
## (Other) :1632
## lifeExp gdpPercap
## Min. :23.6 Min. : 241
## 1st Qu.:48.2 1st Qu.: 1202
## Median :60.7 Median : 3532
## Mean :59.5 Mean : 7215
## 3rd Qu.:70.8 3rd Qu.: 9325
## Max. :82.6 Max. :113523
##
The year range
min(gDat$year)
## [1] 1952
max(gDat$year)
## [1] 2007
Does the data seem reasonable?
gDat[which(gDat$pop == max(gDat$pop)), ]
## country year pop continent lifeExp gdpPercap
## 300 China 2007 1.319e+09 Asia 72.96 4959
gDat[which(gDat$pop == max(gDat$pop)), ]$pop
## [1] 1.319e+09
gDat[which(gDat$lifeExp == max(gDat$lifeExp)), ]
## country year pop continent lifeExp gdpPercap
## 804 Japan 2007 127467972 Asia 82.6 31656
gDat[which(gDat$lifeExp == max(gDat$lifeExp)), ]$lifeExp
## [1] 82.6
Let's make some pictures.
xyplot(pop ~ year | continent, gDat)
Very interesting! Where are all those lovely asians coming from?
xyplot(pop ~ year | country, subset(within.data.frame(gDat, country <- strtrim(gDat$country,
8)), continent == "Asia"))