GDP per capita, Life expectancy and their relationship have been studied as an example of R use.
The data has been obtained from Gapminder World
Load the needed packages:
require(openxlsx, quietly = TRUE)
require(dplyr, quietly = TRUE)
require(tidyr, quietly = TRUE)
require(ggplot2, quietly = TRUE)
Read the data from Gapminder World of GPD per capita and Life expectancy:
#download.file(url='http://spreadsheets.google.com/pub?key=phAwcNAVuyj1jiMAkmq1iMg&output=xls',
# destfile = 'GDPpc.xlsx', mode = 'wb')
#download.file(url='http://spreadsheets.google.com/pub?key=phAwcNAVuyj2tPLxKvvnNPA&output=xls',
# destfile = 'Life.xlsx', mode = 'wb')
Load the xlsx files into data frames
Life.df<-read.xlsx('Life.xlsx', sheet = 1, colNames = TRUE, check.names = FALSE)
names(Life.df)[1]<-'Country'
GDP.df<-read.xlsx('GDPpc.xlsx', sheet = 1, colNames = TRUE, check.names = FALSE)
names(GDP.df)[1]<-'Country'
Gather the ‘year’ columns into a new col year and join both data frames into one.
Life1.df<-gather(Life.df, year, Life, 2:217, convert = TRUE)
GDP1.df<-gather(GDP.df, year, GDPpc, 2:217, convert = TRUE)
All.df<-inner_join(GDP1.df, Life1.df)
## Joining by: c("Country", "year")
Six countries are selected for the analysis
Countries<-c('Spain', 'France', 'United Kingdom', 'United States', 'Argentina', 'Japan')
df<-filter(All.df, Country %in% Countries)
g<-ggplot(df, aes(x=year, y=GDPpc, color = Country))
g<-g + geom_line(size=1)
g<-g + scale_y_log10(breaks=c(seq(1000, 9000, 1000), seq(10000, 50000, 5000)))
g<-g + ylab('GDP per capita (2011 $)')
print(g)
Is significant the general increase during the 20th century that makes useful the use of a log scale. Is clear the effect of the world wars and the ‘sorpasso’ of USA over UK after WWII.
g<-ggplot(df, aes(x=year, y=Life, color = Country, shape = Country))
g<-g + geom_line(size=1)
g<-g + geom_point()
g<-g + scale_y_continuous()
g<-g + ylab('Life expectancy (years)')
print(g)
g<-g + scale_x_continuous(limits=c(1900,1950))
print(g)
Is significant the sudden drops caused by the world wars and the spanish flue in 1918.
g<-ggplot(df, aes(x=GDPpc, y=Life))
g<-g + geom_point(aes(color = Country, shape = Country))
g<-g + scale_x_log10(breaks=c(seq(1000, 9000, 1000), seq(10000, 50000, 5000)))
print(g)
There is a general improvement of life expectancy with GDP. As it possibly can depend too of the developments in the XX century the data set is divided in chunks of 50 years.
df<-mutate(df, quart=as.integer((year-1800)/50))
df$quart<-factor(df$quart, levels=c(0,1,2,3,4), labels = c('1800-1850', '1850-1900', '1900-1950', '1950-2000', '2000-2050'))
g<-ggplot(df, aes(x=GDPpc, y=Life))
g<-g + geom_point(aes(color = Country, shape = Country))
g<-g + scale_x_continuous()
g<-g + facet_wrap(~ quart, scales = 'free')
print(g)