Homework #4: Data visualization
Setup the data and libraries
library(plyr)
library(xtable)
library(lattice)
library(directlabels)
## Loading required package: grid Loading required package: quadprog
# Import the Gapminder dataset
gDat <- read.delim("gapminderDataFiveYear.txt")
Part 1: Examining how the range of GDP values changes over time
- Firstly I create a data frame that divides the data by continent and year and then calculates the size of the range of GDP per capita values for each point. In this case, I am calculating size of the range by subtracting the max GDP per capita from the minimum.
GDP.table <- ddply(gDat, ~ continent*year, summarize, "Range.of.GDP" = ((max(gdpPercap)-min(gdpPercap))))
- I now plot this using a strip plot in lattice. The color factor is continent and lines from the same continent are connected to show the progression over time.
stripplot(Range.of.GDP ~ as.factor(year), GDP.table, groups = continent, type = c("p", "a"), auto.key = TRUE)
- As we can see, the range has decreased significantly in Asia, and is generally increasing over time in all continents, likely due to inflation.
Part 2: Examining change in GDP and life expectancy
- Here I'm looking at how GDP per capita and life expectancy have changed over time by measuring the slope of a linear regression of each variable. This will help identify cases where countries have gained GDP but lost life expectancy or vice versa.
*Firstly I create a function (based on a function provided in the lecture), that runs a linear regression on each variable and then extracts the slope.
yearMin <- min(gDat$year)
jFun <- function(x) {
LE.Fit <- lm(lifeExp ~ I(year - yearMin), x)
GDP.Fit <- lm(gdpPercap ~ I(year - yearMin), x)
LE.Coef <- coef(LE.Fit)
GDP.Coef <- coef(GDP.Fit)
names(LE.Coef) <- NULL
names(GDP.Coef) <- NULL
return(c(Life.Expectancy.slope = LE.Coef[2],
GDP.slope = GDP.Coef[2]
))
}
- I then use ddply to make a table of slopes for each country, retaining the country name and continent.
slopes <- ddply(gDat, ~country + continent, jFun)
- Next I plot this using lattice's xyplot function. Color is determined by the continent factor, and a key is produced. The panel function is used to overlay horizontal and vertical lines at zero to better divide countries that are gaining or losing each factor.
- Note: The GDP slope values are clustered around zero. I couldn't think of a way to scale the axis that retains the sign.
xyplot(Life.Expectancy.slope ~ GDP.slope, slopes,
group = continent, auto.key=TRUE,
panel = function(x, y, subscripts, groups, ...) {
panel.xyplot(x, y, subscripts = subscripts,
groups = groups)
panel.abline(v = 0, h =0)
}
)
- As we can see, most countries are gaining both GDP and life expectancy. We can see there is an extreme outlier that has an average gain in life expectancy but an extreme loss of GDP. This is Kuwait, whose economy is tied closely to oil prices.