library(plyr)
library(ggplot2)
library(lattice)
library(xtable)
html_print <- function(x, ..., digits = 0, include.rownames = FALSE,
header=names(x)){
# print html table with customized header,
# without permanently changing column names of the dataframe
origHeader <- names(x)
names(x) <- header
print(xtable(x, digits = digits, ...),
type = 'html', include.rownames = include.rownames)
names(x) <- origHeader
}
gDat <- read.table('data/gapminderDataFiveYear.txt',
sep='\t', quote='"', header=TRUE)
gDat <- droplevels(subset(gDat, continent!='Oceania'))
pExp <- ggplot(gDat, aes(x=year, y=lifeExp, color=continent))
pExp + geom_point(alpha=0.3) + geom_smooth(method='loess')
xyplot(lifeExp ~ year, gDat, group=continent,
type=c('p','smooth'),
auto.key=TRUE)
Across different continents, the life expectancy trend line over time are very distinct.
Similarities
Differences
Average of Life Expectancy over time is 59.2623. And this value is used as the cutoff for low life expectancy countries.
# split a vector by a threshold value, and count the lower and upper tier
count_splits <- function(c, threshold){
tierSizes <- c(lower=sum(c < threshold),
upper=sum(c >= threshold))
return(tierSizes)
}
lifeRatio <- ddply(gDat, ~continent + year, function(x){
count <- count_splits(x$lifeExp, benchmark)
ratio <- count/length(x$lifeExp)
tier <- c('below', 'above')
return(data.frame(tier, count, ratio))
})
pLowExpR <- ggplot(subset(lifeRatio, tier=='below'), aes(x=year, y=ratio, color=continent))
pLowExpR + geom_point() + geom_line()
xyplot(ratio ~ year, lifeRatio, group=continent,
subset=tier=='below',
auto.key=list(column=4),
type=c('o'),
xlab="Year", ylab="Ratio of Low Life Expectancy",
main="Trends in Low Life Expectancy Ratio in Different Continents",
)
The bar chart below captures roughly the same information as the one above. But the bar plot provides
The cons are costing more space and paint.
barchart(as.character(year)~count|continent, lifeRatio, group=tier,
stack=TRUE,
auto.key=list(column=2, title='Life Expectancy Tier'),
xlab="Life Expectancy Tier Count", ylab="Year",
scales=list(x = list(relation="free")),
main="Trends in Low Life Expectancy Ratio in Different Continents",
)
Same plot done with ggplot2:
pLowExpR2 <- ggplot(lifeRatio, aes(x=year, y=count, fill=tier))
pLowExpR2 + scale_fill_manual(values = c("below" = "orange","above" = "skyblue")) +
geom_bar(stat="identity") +
facet_wrap(~continent, scales='free')
Similarity
Difference
extremGdp <- ddply(gDat, ~ continent+year, function(x) {
gdpPercap <- range(x$gdpPercap)
return(data.frame(gdpPercap, stat = c("min", "max")))
})
#html_print(extremGdp)
xyplot(gdpPercap ~ year|stat, extremGdp, groups=continent,
type=c('o','g'),
scales=list(y = list(relation="free")),
layout=c(1,2),
xlab='Year', ylab='Extreme GDP per Capital',
auto.key=list(column=4))
xyplot(gdpPercap ~ year|continent, extremGdp, groups=stat,
type=c('o','g'),
layout=c(4,1),
xlab='Year', ylab='Extreme GDP per Capital',
auto.key=list(column=2))
minGdpPlt <- ggplot(extremGdp,
aes(x=year, y=gdpPercap, color=continent, linetype=stat))
minGdpPlt + geom_point() + geom_line()
I intend to show trends of GDP per capita given by the top and bottom countries of different continents. So I can see the magnitude of difference and continuously widening gap between the two collections of trend lines.
I have not found a simple solution in lattice. But ggplot2 certainly has a neat solution for it.
Difference
Here the distribution of GDP per capital by continent is plotted. GDP per capita of Africa, Europe and America all have distinct mean from each other. GDP per capita of Asian countries seem most variable.
ggplot(gDat, aes(x=gdpPercap, fill=continent)) +
geom_density(alpha=0.3) +
scale_x_log10()