Let load some r packages.
library(ggplot2)
library(reshape2)
In the begining it is important to download and source your data. * If you are writing it down in Excel, you should document how you are doing it throughtly * Downloaded links should be sourced.
setwd("~/kai_r_markdown/")
gd_url <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
#download.file(url = gd_url, destfile = "gapminder_data.txt", method="curl")
Load the data [1].
gapminder_df <- read.table("gapminder_data.txt", sep="\t", header=T)
head(gapminder_df)
## country year pop continent lifeExp gdpPercap
## 1 Afghanistan 1952 8425333 Asia 28.801 779.4453
## 2 Afghanistan 1957 9240934 Asia 30.332 820.8530
## 3 Afghanistan 1962 10267083 Asia 31.997 853.1007
## 4 Afghanistan 1967 11537966 Asia 34.020 836.1971
## 5 Afghanistan 1972 13079460 Asia 36.088 739.9811
## 6 Afghanistan 1977 14880372 Asia 38.438 786.1134
Lets find out what the worst country in the world is.
## country year pop continent lifeExp gdpPercap
## 1287 Rwanda 1992 7290203 Africa 23.599 737.0686
And the best
subset(gapminder_df, lifeExp == max(lifeExp))
## country year pop continent lifeExp gdpPercap
## 798 Japan 2007 127467972 Asia 82.603 31656.07
Clearly…
Okay, lets just plot Life Expectancy
p1 <- ggplot(gapminder_df, aes(x=year, y=lifeExp)) + stat_boxplot(aes(color=continent))
p1
p2 <- ggplot(gapminder_df, aes(x=lifeExp, y=gdpPercap)) + geom_point() + scale_y_log10() + stat_smooth(method="lm")
p2
p3 <- ggplot(gapminder_df, aes(x=year, y=gdpPercap)) + geom_point() + scale_y_log10() + stat_smooth(method="lm") + facet_wrap(~continent)
p3
Does population relate to GDP?
p4 <- ggplot(gapminder_df, aes(x=pop, y=lifeExp)) + geom_point(aes(color=continent)) + geom_text(aes(label=country))
p4
Lets take a closer look at china and india.
china_india <- subset(gapminder_df, country %in% c("China", "India"))
p5 <- ggplot(china_india, aes(x=pop, y=lifeExp)) + geom_point(aes(color=country, size=sqrt(pop))) + facet_wrap(~country) + stat_smooth(formula=y~poly(x,3), method="lm")
p5
Speculation: Why are china and india the only two countries where life expectancy correlates with population growth?
[1] D. H. Huson, A. F. Auch, J. Qi, and S. C. Schuster, “MEGAN analysis of metagenomic data.” Genome Res, vol. 17, no. 3, pp. 377–386, Mar. 2007.