2013-09-13
This homework uses the dataset gapminderDataFiveYear.txt to explore basic data import, descriptive statistics and figures plotting.
Make sure your working directory is set to where this data is stored or Use the absolute path of the file as the argument.
gDat <- read.delim("gapminderDataFiveYear.txt")
Let's use the R function str() and summary() to explore more with the dataset.
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
summary(gDat)
## country year pop continent
## Afghanistan: 12 Min. :1952 Min. :6.00e+04 Africa :624
## Albania : 12 1st Qu.:1966 1st Qu.:2.79e+06 Americas:300
## Algeria : 12 Median :1980 Median :7.02e+06 Asia :396
## Angola : 12 Mean :1980 Mean :2.96e+07 Europe :360
## Argentina : 12 3rd Qu.:1993 3rd Qu.:1.96e+07 Oceania : 24
## Australia : 12 Max. :2007 Max. :1.32e+09
## (Other) :1632
## lifeExp gdpPercap
## Min. :23.6 Min. : 241
## 1st Qu.:48.2 1st Qu.: 1202
## Median :60.7 Median : 3532
## Mean :59.5 Mean : 7215
## 3rd Qu.:70.8 3rd Qu.: 9325
## Max. :82.6 Max. :113523
##
Note:
summary()will return quantiles and mean for numeric variables and levels for factors.
“A picture is worth a thousand words”. We will make a few figures from a subset of the whole dataset. Make sure you have already installed the lattice package on your computer.
Let's plot the lifeExp over gdpPercap for Colombia over 50 years. A smooth line is also fitted to show the tendency.
library(lattice)
## Warning: package 'lattice' was built under R version 3.0.1
xyplot(lifeExp ~ gdpPercap, gDat, subset = country == "Colombia", type = c("p",
"smooth"))
We can also look at the lifeExp of the five continents in different years.
stripplot(lifeExp ~ continent | as.factor(year), gDat, subset = year %in% c(1957,
1967, 1977, 1987, 1997, 2007), layout = c(2, 3), auto.key = TRUE, grid = TRUE,
type = c("p", "a"))