STAT545A Homework 2

This report documents some preliminary explorations of the Gapminder data as provided in gapminderDataFiveYear.txt.

Data Structure

The Gapminder dataset being explored contains information on the life expectancy and GDP per Capita on 142 countries across 5 continents, with a record every 5 years starting 1952 - 2007. A total of 1704 entries can be found in the dataset.

The list below shows code names for the 6 types of information (i.e. variables) recorded in each data entry. A code name is accompanied by a description if it is less than self-explanatory:

Basic summary statistics

To get a first look at the distribution of each variable by using the summary() function in R:

summary(gDat)
##         country          year           pop              continent  
##  Afghanistan:  12   Min.   :1952   Min.   :6.00e+04   Africa  :624  
##  Albania    :  12   1st Qu.:1966   1st Qu.:2.79e+06   Americas:300  
##  Algeria    :  12   Median :1980   Median :7.02e+06   Asia    :396  
##  Angola     :  12   Mean   :1980   Mean   :2.96e+07   Europe  :360  
##  Argentina  :  12   3rd Qu.:1993   3rd Qu.:1.96e+07   Oceania : 24  
##  Australia  :  12   Max.   :2007   Max.   :1.32e+09                 
##  (Other)    :1632                                                   
##     lifeExp       gdpPercap     
##  Min.   :23.6   Min.   :   241  
##  1st Qu.:48.2   1st Qu.:  1202  
##  Median :60.7   Median :  3532  
##  Mean   :59.5   Mean   :  7215  
##  3rd Qu.:70.8   3rd Qu.:  9325  
##  Max.   :82.6   Max.   :113523  
## 

The following figure provides a closer look at the GDP per capita by continent over the years, along with linear regression lines to help visually assess the rate of increase.

plot of chunk unnamed-chunk-3