Homework 5. Stat 545-Daniel Dinsdale

Here I present some plots from the Gapminder data using the ggplot2 package including a comparison between similar scatterplots using this package and the lattice package.

Preliminaries:

I used the Gapminder data as mentioned previously and the following three packages. Furthermore I created a new data frame called jDat which removed Oceania from gDat.

gDat <- read.delim("gapminderDataFiveYear.txt")
library(lattice)
library(plyr)
library(ggplot2)
jDat <- droplevels(subset(gDat, continent != "Oceania"))

Stripplot:

First of all I created two stripplots using ggplot. Here I included box plots and stripplots for each year regarding life expectancy. The two plots show how easy it is to adjust colours within the ggplot2 package. Note that here I removed Oceania from the data.

p1 <- ggplot(jDat, aes(x = factor(year), y = lifeExp, colour = year))
p1 + geom_boxplot() + geom_point()

plot of chunk unnamed-chunk-2


p2 <- ggplot(jDat, aes(x = factor(year), y = lifeExp))
p2 + geom_boxplot(aes(fill = factor(year))) + geom_point()

plot of chunk unnamed-chunk-2

Scatterplot:

Here is created two scatterplots comparing the log of GDP per capita and life expectancy for the years 1952 and 2007. Furthermore I included a colour scheme relating to continents and added a dotplot along the x-axis to show density of life expectancy for certain GDP values.

jYear <- c(1952, 2007)
yDat <- subset(gDat, year %in% jYear)
p3 <- ggplot(yDat, aes(x = gdpPercap, y = lifeExp, colour = continent))
p3 + geom_point() + scale_x_log10() + geom_dotplot() + facet_grid(~year)
## stat_bindot: binwidth defaulted to range/30. Use 'binwidth = x' to adjust
## this. stat_bindot: binwidth defaulted to range/30. Use 'binwidth = x' to
## adjust this.

plot of chunk unnamed-chunk-3

Lattice and ggplot2 comparison:

Here is a direct comparison between scatterplots of GDP per capita with life expectancy using ggplot at first and then lattice second. It is immediately obvious that with less text ggplot2 creates a more attractive graph with a less obtrusive key. The only downside is the x-axis label on the ggplot graph using powers to represent numbers.

p4g <- ggplot(jDat, aes(x = gdpPercap, y = lifeExp, colour = continent))
p4g + geom_point() + scale_x_log10()

plot of chunk unnamed-chunk-4

p4l <- xyplot(lifeExp ~ gdpPercap, jDat, grid = TRUE, scales = list(x = list(log = 10, 
    equispaced.log = FALSE)), group = continent, auto.key = TRUE)
p4l

plot of chunk unnamed-chunk-4

Some new graph types:

Finally I experimented with ggplot2. Here I present some graphs that I created with the package.

First up is a histogram of life expectancy for all continents except Oceania. Here I added a colour palate that changed as the density or height of the bars increased.

p5 <- ggplot(jDat, aes(x = lifeExp))
p5 + geom_histogram(aes(fill = ..count..)) + scale_fill_gradient("Count", low = "green", 
    high = "red")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust
## this.

plot of chunk unnamed-chunk-5

Next a similar histogram is used except this time the colours should what proportion of each bar is due to certain continents. Following this another plot is used to show the same results but in a different manner. This plot in my eyes is rather confusing however!

p6 <- ggplot(jDat, aes(x = lifeExp, fill = continent))
p6 + geom_bar()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust
## this.

plot of chunk unnamed-chunk-6

p6 + geom_bar(position = "fill")  #apparently this is a thing.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust
## this.

plot of chunk unnamed-chunk-6

Finally I produced a scatterplot for each continent except Oceania of GDP per capita and life expectancy. Here each plot includes a smoothed conditional mean with confidence bounds.

p7 <- ggplot(jDat, aes(x = gdpPercap, y = lifeExp, colour = continent))
p7 + geom_point() + scale_x_log10() + geom_smooth(colour = "black") + facet_grid(~continent)
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method. geom_smooth:
## method="auto" and size of largest group is <1000, so using loess. Use
## 'method = x' to change the smoothing method. geom_smooth: method="auto"
## and size of largest group is <1000, so using loess. Use 'method = x' to
## change the smoothing method. geom_smooth: method="auto" and size of
## largest group is <1000, so using loess. Use 'method = x' to change the
## smoothing method.

plot of chunk unnamed-chunk-7