In this article, we will try to perform some basic analysis on Gapminder data. The Gapminder is obtained from the following: http://www.gapminder.org/. For this project, we will use the data from this repository: http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt
First, we bring the Gapminder data into R.
gDat <- read.delim(file = "~/Revolution/gapminderDataFiveYear.txt", sep = "\t")
Oftenly, we start by getting some ideas about the gDat object using str() command.
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
The dimensions, and column names are obtained with the followings:
dim(gDat)
## [1] 1704 6
names(gDat)
## [1] "country" "year" "pop" "continent" "lifeExp" "gdpPercap"
A statistical overview can be obtained with summary()
summary(gDat)
## country year pop continent
## Afghanistan: 12 Min. :1952 Min. :6.00e+04 Africa :624
## Albania : 12 1st Qu.:1966 1st Qu.:2.79e+06 Americas:300
## Algeria : 12 Median :1980 Median :7.02e+06 Asia :396
## Angola : 12 Mean :1980 Mean :2.96e+07 Europe :360
## Argentina : 12 3rd Qu.:1993 3rd Qu.:1.96e+07 Oceania : 24
## Australia : 12 Max. :2007 Max. :1.32e+09
## (Other) :1632
## lifeExp gdpPercap
## Min. :23.6 Min. : 241
## 1st Qu.:48.2 1st Qu.: 1202
## Median :60.7 Median : 3532
## Mean :59.5 Mean : 7215
## 3rd Qu.:70.8 3rd Qu.: 9325
## Max. :82.6 Max. :113523
##
Load lattice library to perform some visualzations in order to extract usuful information about the data object:
library("lattice")
We will analyze life expectancy and gdp per capita between Algeria and Egypt. First, let us have a look the life expectancy between the two countries:
xyplot(lifeExp ~ year | country, subset = (country == c("Algeria", "Egypt")),
data = gDat, type = c("p", "l"))
Clearly, the expectancy rate is substainely increased by time and if we analyze the gdp per captia for both countries within the same given interval then we observe the similar results as indicated below:
xyplot(gdpPercap ~ year | country, subset = (country == c("Algeria", "Egypt")),
data = gDat, type = c("p", "l"))
We might expect that the increase in the life expectancy rate is a result of the improvement in gdp per capita for both countries as illustrated below and it might not be true for other countries:
xyplot(lifeExp ~ gdpPercap | country, subset = (country == c("Algeria", "Egypt")),
data = gDat, type = c("p", "l"))