I know it is not particulary interesting but in the interests of learning ggplot I simply recreated my plots from assignment 4. However I also visualized some aspects of the data that I couldn't do easily with lattice.
Summary:
Load libraries and data:
library(plyr)
library(ggplot2)
library(lattice)
library(scales)
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.delim(file = gdURL)
str(gDat)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Lets remove Oceania from the dataset:
gDat <- droplevels(subset(gDat, continent != "Oceania"))
summary(gDat$continent)
## Africa Americas Asia Europe
## 624 300 396 360
Since most of my plots are stripplot-like I thought I'd add this plot to show an example of a scatterplot.
gdpLifeExpSPlot <- ggplot(gDat, aes(gdpPercap, lifeExp)) + scale_x_log10() +
geom_point(aes(colour = year, size = pop), alpha = 0.75)
print(gdpLifeExpSPlot)
The correlation between GDP and life expectancy is seen. It is intereseting to see how more life expectancy are all heading towards greater life expectancy even when GDP is not always following suit.
Lets look at mean life expectancy on different continents. This is adapted from Jenny's Homework 3 example (Compute a trimmed mean of life expectancy for different years) with continent added to make it a bit more interesting.
This is my original plot (with tables sorted):
meanLifePerYearCont <- ddply(gDat, ~year + continent, summarize, meanLifeExp = mean(lifeExp))
xyplot(meanLifeExp ~ year | continent, meanLifePerYearCont, type = c("p", "r"),
as.table = TRUE)
Aside from the african continent there is a clear trend of increasing life expectancy throughout. Even Africa seems to be on track by the looks of it as seen by the last data point to having higher life expectancy.
Compared to ggplot2:
lifeExpPlot <- ggplot(meanLifePerYearCont, aes(year, meanLifeExp)) + geom_point() +
stat_smooth(method = "lm") + facet_wrap(~continent)
print(lifeExpPlot)
Code is a bit longer and I have to specify that I want a facet, but otherwise it does not seems to much more difficult than xyplot.
Now that we have that I actually wished that I could show the spread with respect to indivual countries rather than just the mean. The layering aspect of ggplot makes this easy.
lifeExpPlot <- ggplot(gDat, aes(year, lifeExp)) + geom_point(colour = "green",
alpha = 0.25) + facet_wrap(~continent)
# change variable name to make plotting possible
names(meanLifePerYearCont)[3] = "lifeExp"
lifeExpPlot <- lifeExpPlot + geom_point(data = meanLifePerYearCont) + stat_smooth(data = meanLifePerYearCont,
method = "lm")
print(lifeExpPlot)
This way we can see if any dips in life expectancy are only due to an overall trend in all the contries or just due to a few outliers. It also calls into question on whether it make sense to group countries on the same continent when you see how far appart some contries seem to be from eachother on the same continent (see Africa and Asia).
I'm using Jenny's example from homework 3 (Get the maximum and minimum of GDP per capita for all continents in a “tall” format).
Next lets apply Jenny's code with year added for so we have more to talk about:
gdpMaxMin <- ddply(gDat, ~continent, function(x) {
gdpPercap <- range(x$gdpPercap)
year <- x[x$gdpPercap == gdpPercap, ]$year
return(data.frame(year, gdpPercap, stat = c("min", "max")))
})
This is my original plot:
stripplot(gdpPercap ~ continent, gdpMaxMin, groups = year, auto.key = TRUE)
Each point is a year as well as a location. It is striking to see how rich Asia's richest country in 1952 even compared to today's (2007) riches countries standards. Quite striking also is how some of the more recent dates (2007, 2002) for Africa and America's are also the poorest since 1952.
Again here is a reproduction on the plot in ggplot2:
gdpMaxMinPlot <- ggplot(gdpMaxMin, aes(continent, gdpPercap)) + geom_point(aes(colour = year))
print(gdpMaxMinPlot)
I couldn't get it exactly the same. The years are represented on a gradient. However it is quite similar and not to difficult from the lattice plot.
To be honest I wasn't very happy with the original lattice plot. I wanted to highlight both the min an max whilst showing the overall spread of all the points in a violin plot. However I didn't know how to apply additional layers (something that should be easily done in ggplot).
gdpMaxMinPlot <- ggplot(gDat, aes(continent, gdpPercap, colour = year)) + geom_violin()
gdpMaxMinPlot <- gdpMaxMinPlot + geom_point(data = gdpMaxMin, aes(colour = year),
size = 4)
print(gdpMaxMinPlot)
Now you can get a better sense of the spread in GPD within continents. It is clear that the asia has a country with a maximum GDP far greater than an others, but this clearly shows that that country an extreme outlier. It is also interesting to note that for all of the continents (even in Europe, in the most even in terms of GDP continent) the base is alway wider, indicating that most countries are closer to the minium GDP.