copied from http://www.stat.ubc.ca/~jenny/STAT545A/hw02_rmarkdownGapminder.html
Determine and report basic facts like the number of observations and which variables are there. Make at least one figure. Report some very basic descriptive statistics, such as results from summary().
These are things I'm trying to do with the Gap Minder dataset:
Look at the data:
gDat = read.table("gapminderDataFiveYear.txt", sep = "\t", quote = "\"", header = TRUE)
peek <- function(data, size = 6) {
randRows <- runif(size, min = 1, max = nrow(data))
sampleDat <- data[sort(randRows), ]
return(sampleDat)
}
peek(gDat)
## country year pop continent lifeExp gdpPercap
## 90 Bahrain 1977 297410 Asia 65.59 19340.1
## 100 Bangladesh 1967 62821884 Asia 43.45 721.2
## 1182 Panama 1977 1839782 Americas 68.68 5351.9
## 1191 Paraguay 1962 2009813 Americas 64.36 2148.0
## 1269 Reunion 1992 622191 Africa 73.61 6101.3
## 1695 Zimbabwe 1962 4277736 Africa 52.36 527.3
tail(gDat)
## country year pop continent lifeExp gdpPercap
## 1699 Zimbabwe 1982 7636524 Africa 60.36 788.9
## 1700 Zimbabwe 1987 9216418 Africa 62.35 706.2
## 1701 Zimbabwe 1992 10704340 Africa 60.38 693.4
## 1702 Zimbabwe 1997 11404948 Africa 46.81 792.4
## 1703 Zimbabwe 2002 11926563 Africa 39.99 672.0
## 1704 Zimbabwe 2007 12311143 Africa 43.49 469.7
Basic facts about the data:
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
summary(gDat)
## country year pop continent
## Afghanistan: 12 Min. :1952 Min. :6.00e+04 Africa :624
## Albania : 12 1st Qu.:1966 1st Qu.:2.79e+06 Americas:300
## Algeria : 12 Median :1980 Median :7.02e+06 Asia :396
## Angola : 12 Mean :1980 Mean :2.96e+07 Europe :360
## Argentina : 12 3rd Qu.:1993 3rd Qu.:1.96e+07 Oceania : 24
## Australia : 12 Max. :2007 Max. :1.32e+09
## (Other) :1632
## lifeExp gdpPercap
## Min. :23.6 Min. : 241
## 1st Qu.:48.2 1st Qu.: 1202
## Median :60.7 Median : 3532
## Mean :59.5 Mean : 7215
## 3rd Qu.:70.8 3rd Qu.: 9325
## Max. :82.6 Max. :113523
##
colnames(gDat)
## [1] "country" "year" "pop" "continent" "lifeExp" "gdpPercap"
dim(gDat)
## [1] 1704 6
nrow(gDat)
## [1] 1704
Difference in life expectancy across continents in year 2007:
library(lattice)
bwplot(~lifeExp | continent, data = gDat, subset = year == 2007, layout = c(1,
5))
The trend in life expectancy vs GDP per Capita, and how countries across different continents fares as in 2007:
xyplot(lifeExp ~ gdpPercap, data = gDat, subset = year == 2007, group = continent,
auto.key = T)