STAT 545A Homework 2

Yumian Hu


2013-09-13

Data import

This homework uses the dataset gapminderDataFiveYear.txt to explore basic data import, descriptive statistics and figures plotting.

Make sure your working directory is set to where this data is stored or Use the absolute path of the file as the argument.

gDat <- read.delim("gapminderDataFiveYear.txt")

Basic statistics

Let's use the R function str() and summary() to explore more with the dataset.

str(gDat)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
summary(gDat)
##         country          year           pop              continent  
##  Afghanistan:  12   Min.   :1952   Min.   :6.00e+04   Africa  :624  
##  Albania    :  12   1st Qu.:1966   1st Qu.:2.79e+06   Americas:300  
##  Algeria    :  12   Median :1980   Median :7.02e+06   Asia    :396  
##  Angola     :  12   Mean   :1980   Mean   :2.96e+07   Europe  :360  
##  Argentina  :  12   3rd Qu.:1993   3rd Qu.:1.96e+07   Oceania : 24  
##  Australia  :  12   Max.   :2007   Max.   :1.32e+09                 
##  (Other)    :1632                                                   
##     lifeExp       gdpPercap     
##  Min.   :23.6   Min.   :   241  
##  1st Qu.:48.2   1st Qu.:  1202  
##  Median :60.7   Median :  3532  
##  Mean   :59.5   Mean   :  7215  
##  3rd Qu.:70.8   3rd Qu.:  9325  
##  Max.   :82.6   Max.   :113523  
## 

Note: summary() will return quantiles and mean for numeric variables and levels for factors.

Figures

“A picture is worth a thousand words”. We will make a few figures from a subset of the whole dataset. Make sure you have already installed the lattice package on your computer.

Scatterplot

Let's plot the lifeExp over gdpPercap for Colombia over 50 years. A smooth line is also fitted to show the tendency.

library(lattice)
## Warning: package 'lattice' was built under R version 3.0.1
xyplot(lifeExp ~ gdpPercap, gDat, subset = country == "Colombia", type = c("p", 
    "smooth"))

plot of chunk unnamed-chunk-4

Stripplot

We can also look at the lifeExp of the five continents in different years.

stripplot(lifeExp ~ continent | as.factor(year), gDat, subset = year %in% c(1957, 
    1967, 1977, 1987, 1997, 2007), layout = c(2, 3), auto.key = TRUE, grid = TRUE, 
    type = c("p", "a"))

plot of chunk unnamed-chunk-5