Gapminder Data Visualization using GoogleVis and R

From Gapminder into R and GoogleVis[ualization]

David Comfort

Project #1 - Visualization
October 4, 2015
NYC Data Science Academy

The purpose of my project was to visualize data about long-term economic, social and health statistics. Specifically, I wanted to extract data sets from Gapminder using an R package, googlesheets, munge these data sets, and combine them into one dataframe, and then use the GoogleVis R package to visualize these data sets using a Google Motion chart.

What is Gapminder and who is Hans Rosling?
Data sets used for Data Visualization
Reading in the Gapminder datasets using Google Sheets R Package
Read in the Countries Data Set
Reshaping the Datasets
Combining the Datasets
What is GoogleVis?
Implementing GoogleVis
Parameters for GoogleVis
Data Visualization
What are the Key Lesson from Gapminder Data Visualization Project

1) What is Gapminder and who is Hans Rosling?

Hans Rosling is Professor of International Health at the Karolinska Institutet in Stockholm, Sweden and founded the Gapminder Foundation.
Hans Rosling gave a famous TED talk, “The Best Stats You’ve Ever Seen”. To visualise his talk, he and his team at Gapminder developed animated bubble charts, aka motion charts.
The Gapminder Foundation is a Swedish NGO which promotes sustainable global development by increased use and understanding of statistics about social, economic and environmental development.
Gapminder developed the Trendalzyer data visualization software, which was acquired by Google in 2007.

2) Data sets used for Data Visualization

Data sets from Gapminder

Child mortality (0-5 year-olds dying per 1,000 born)
Democracy score (based on Polity IV)
Income per person (GDP/capita, PPP$ inflation-adjusted)
Life expectancy at birth
Population, Total

Country Data set

Country and Dependent Territories Lists with UN Regional Codes

3) Reading in the Gapminder datasets using Google Sheets R Package

The googlesheets R package allows one to access and manage Google spreadsheets from within R.
googlesheets Basic Usage (vignette)
Reference manual
The registration functions gs_title(), gs_key(), and gs_url() return a registered sheet as a googlesheets object, which is the first argument to practically every function in this package.

First, install and load googlesheets and dplyr.

# install.packages("googlesheets")
suppressMessages(library(dplyr))
library(googlesheets)
library(dplyr)

Access a Gapminder Google Sheet by the URL and get some information about the Google Sheet:

gs_url("https://docs.google.com/spreadsheets/d/1IbDM8z5XicMIXgr93FPwjgwoTTKMuyLfzU6cQrGZzH8/pub?gid=0",lookup = FALSE)

## Sheet-identifying info appears to be a browser URL.
## googlesheets will attempt to extract sheet key from the URL.
## Putative key: 1IbDM8z5XicMIXgr93FPwjgwoTTKMuyLfzU6cQrGZzH8
## Authentication will not be used.
## Worksheets feed constructed with public visibility

##                   Spreadsheet title: indicator gapminder population
##                  Spreadsheet author: gapdata
##   Date of googlesheets registration: 2015-10-06 19:32:28 GMT
##     Date of last spreadsheet update: 2012-09-07 13:50:39 GMT
##                          visibility: public
##                         permissions: rw
##                             version: new
## 
## Contains 5 worksheets:
## (Title): (Nominal worksheet extent as rows x columns)
## Data: 261 x 233
## About: 43 x 6
## Footnotes: 20 x 6
## Settings: 20 x 6
## v: 20 x 6
## 
## Key: 1IbDM8z5XicMIXgr93FPwjgwoTTKMuyLfzU6cQrGZzH8
## Browser URL: https://docs.google.com/spreadsheets/d/1IbDM8z5XicMIXgr93FPwjgwoTTKMuyLfzU6cQrGZzH8/

Note: Setting the parameter lookup=FALSE will block authenticated API requests.

A utility function, extract_key_from_url(), helps you get and store the key from a browser URL:

extract_key_from_url("https://docs.google.com/spreadsheets/d/1IbDM8z5XicMIXgr93FPwjgwoTTKMuyLfzU6cQrGZzH8/pub?gid=0")

## [1] "1IbDM8z5XicMIXgr93FPwjgwoTTKMuyLfzU6cQrGZzH8"

You can access the Google Sheet by key:

gs_key("1IbDM8z5XicMIXgr93FPwjgwoTTKMuyLfzU6cQrGZzH8", verbose=FALSE, lookup = FALSE)

##                   Spreadsheet title: indicator gapminder population
##                  Spreadsheet author: gapdata
##   Date of googlesheets registration: 2015-10-06 19:32:29 GMT
##     Date of last spreadsheet update: 2012-09-07 13:50:39 GMT
##                          visibility: public
##                         permissions: rw
##                             version: new
## 
## Contains 5 worksheets:
## (Title): (Nominal worksheet extent as rows x columns)
## Data: 261 x 233
## About: 43 x 6
## Footnotes: 20 x 6
## Settings: 20 x 6
## v: 20 x 6
## 
## Key: 1IbDM8z5XicMIXgr93FPwjgwoTTKMuyLfzU6cQrGZzH8
## Browser URL: https://docs.google.com/spreadsheets/d/1IbDM8z5XicMIXgr93FPwjgwoTTKMuyLfzU6cQrGZzH8/

Once, one has registered a worksheet, then you can consume the data in a specific worksheet (“Data” in our case) within the Google Sheet using the gs_read() function (combining the statements with dplyr pipe and using check.names=FALSE so it deals with the integer column names correctly and doesn’t append an “x” to each column name):

gdp_per_capita <- gs_key("phAwcNAVuyj1jiMAkmq1iMg",lookup = FALSE,  verbose=FALSE) %>% gs_read(ws = "Data", check.names=FALSE)

## Accessing worksheet titled "Data"

You can also target specific cells via the range = argument. The simplest usage is to specify an Excel-like cell range, such as range = “D12:F15” or range = “R1C12:R6C15”.

df2 <- gs_key("phAwcNAVuyj1jiMAkmq1iMg",lookup = FALSE, verbose=FALSE) %>% gs_read(ws = "Data", range = "A1:D8")

## Accessing worksheet titled "Data"

But a problem arises since check.names=FALSE does not work with this statement (problem with package?). So a workaround would be to pipe the data frame through the dplyr rename function:

df2 %>% rename_(.dots=setNames(names(.), (gsub("X", "", names(.)))))

## Source: local data frame [7 x 4]
## 
##          GDP.per.capita        1800        1801        1802
##                   (chr)       (chr)       (chr)       (chr)
## 1              Abkhazia          NA          NA          NA
## 2           Afghanistan 634.4000136 634.4000136 634.4000136
## 3 Akrotiri and Dhekelia          NA          NA          NA
## 4               Albania 860.5879664 861.4817538 862.3764694
## 5               Algeria        1360 1361.635788 1363.271576
## 6        American Samoa          NA          NA          NA
## 7               Andorra        1260     1262.15      1264.3

However, for purposes, we will ingest the entire worksheet and not target by cells.

Let’s look at the data frame. We need to change the name of the first column from “GDP per capita” to “Country”.

head(gdp_per_capita)[1:5]

## Source: local data frame [6 x 5]
## 
##          GDP per capita     1800      1801      1802      1803
##                   (chr)    (dbl)     (dbl)     (dbl)     (dbl)
## 1              Abkhazia       NA        NA        NA        NA
## 2           Afghanistan  634.400  634.4000  634.4000  634.4000
## 3 Akrotiri and Dhekelia       NA        NA        NA        NA
## 4               Albania  860.588  861.4818  862.3765  863.2721
## 5               Algeria 1360.000 1361.6358 1363.2716 1364.9074
## 6        American Samoa       NA        NA        NA        NA

str(gdp_per_capita, list.len=5)

## Classes 'tbl_df', 'tbl' and 'data.frame':    260 obs. of  215 variables:
##  $ GDP per capita: chr  "Abkhazia" "Afghanistan" "Akrotiri and Dhekelia" "Albania" ...
##  $ 1800          : num  NA 634 NA 861 1360 ...
##  $ 1801          : num  NA 634 NA 861 1362 ...
##  $ 1802          : num  NA 634 NA 862 1363 ...
##  $ 1803          : num  NA 634 NA 863 1365 ...
##   [list output truncated]

We need to change the name of the first column from “GDP per capita” to “Country”.

colnames(gdp_per_capita)[1]

## [1] "GDP per capita"

colnames(gdp_per_capita)[1] <- c("Country")

Let’s download the rest of the datasets

child_mortality <- gs_key("0ArfEDsV3bBwCcGhBd2NOQVZ1eWowNVpSNjl1c3lRSWc",lookup = FALSE,  verbose=FALSE) %>% gs_read(ws = "Data", check.names=FALSE)

## Accessing worksheet titled "Data"

democracy_score <- gs_key("0ArfEDsV3bBwCdGQ2YlhDSWVIdXdpMmhLY2ZZRHdNNnc",lookup = FALSE,  verbose=FALSE) %>% gs_read(ws = "Data", check.names=FALSE)

## Accessing worksheet titled "Data"

life_expectancy  <- gs_key("tiAiXcrneZrUnnJ9dBU-PAw",lookup = FALSE,  verbose=FALSE) %>% gs_read(ws = "Data", check.names=FALSE)

## Accessing worksheet titled "Data"

population <- gs_key("phAwcNAVuyj0XOoBL_n5tAQ",lookup = FALSE,  verbose=FALSE) %>% gs_read(ws = "Data", check.names=FALSE)

## Accessing worksheet titled "Data"

colnames(child_mortality)[1]

## [1] "Under five mortality"

# I tried to figure a way to change column names on multiple data frames at once, but was unable to get the right syntax.
colnames(democracy_score)[1] <- c("Country")
colnames(gdp_per_capita)[1] <- c("Country")
colnames(child_mortality)[1] <- c("Country")
colnames(life_expectancy)[1] <- c("Country")
colnames(population)[1] <- c("Country")

4) Read in the Countries Data Set

We want to segment the countries in the data sets by region and sub-region. However, the Gapminder data sets do not include these variables. Therefore, one can download the ISO-3166-Countries-with-Regional-Codes data set from github which includes the ISO country code, country name, region, and sub-region.

Use rCurl to read in directly from Github and make sure you read in the “raw” file, rather than Github’s display version.

Note: The Gapminder data sets do not include ISO country codes, so I had to clean the countries data set with the corresponding country names used in the Gapminder data sets.

library(RCurl)

## Loading required package: bitops

countries <- getURL("https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv")
countries <- read.csv(text=countries, header = TRUE, stringsAsFactors=FALSE)
countries <- read.csv("data/countries.csv",  
                      header = TRUE, 
                      stringsAsFactors= FALSE, check.names=FALSE)
countries <- tbl_df(countries)
str(countries)[1:5]

## Classes 'tbl_df', 'tbl' and 'data.frame':    241 obs. of  9 variables:
##  $ name           : chr  "Afghanistan" "Albania" "Algeria" "American Samoa" ...
##  $ alpha-2        : chr  "AF" "AL" "DZ" "AS" ...
##  $ alpha-3        : chr  "AFG" "ALB" "DZA" "ASM" ...
##  $ country-code   : int  4 8 12 16 20 24 660 28 32 51 ...
##  $ iso_3166-2     : chr  "ISO 3166-2:AF" "ISO 3166-2:AL" "ISO 3166-2:DZ" "ISO 3166-2:AS" ...
##  $ region         : chr  "Asia" "Europe" "Africa" "Oceania" ...
##  $ sub-region     : chr  "Southern Asia" "Southern Europe" "Northern Africa" "Polynesia" ...
##  $ region-code    : int  142 150 2 9 150 2 19 19 19 142 ...
##  $ sub-region-code: int  34 39 15 61 39 17 29 29 5 145 ...

## NULL

head(countries)[1:5]

## Source: local data frame [6 x 5]
## 
##             name alpha-2 alpha-3 country-code    iso_3166-2
##            (chr)   (chr)   (chr)        (int)         (chr)
## 1    Afghanistan      AF     AFG            4 ISO 3166-2:AF
## 2        Albania      AL     ALB            8 ISO 3166-2:AL
## 3        Algeria      DZ     DZA           12 ISO 3166-2:DZ
## 4 American Samoa      AS     ASM           16 ISO 3166-2:AS
## 5        Andorra      AD     AND           20 ISO 3166-2:AD
## 6         Angola      AO     AGO           24 ISO 3166-2:AO

colnames(countries)[1] <- c("Country")
colnames(countries)[4] <- c("Code")
colnames(countries)[6] <- c("Region")
colnames(countries)[7] <- c("Sub.Region")
countries <- transmute(countries, Country, Code, Region, Sub.Region)
head(countries)

## Source: local data frame [6 x 4]
## 
##          Country  Code  Region      Sub.Region
##            (chr) (int)   (chr)           (chr)
## 1    Afghanistan     4    Asia   Southern Asia
## 2        Albania     8  Europe Southern Europe
## 3        Algeria    12  Africa Northern Africa
## 4 American Samoa    16 Oceania       Polynesia
## 5        Andorra    20  Europe Southern Europe
## 6         Angola    24  Africa   Middle Africa

str(countries)

## Classes 'tbl_df', 'tbl' and 'data.frame':    241 obs. of  4 variables:
##  $ Country   : chr  "Afghanistan" "Albania" "Algeria" "American Samoa" ...
##  $ Code      : int  4 8 12 16 20 24 660 28 32 51 ...
##  $ Region    : chr  "Asia" "Europe" "Africa" "Oceania" ...
##  $ Sub.Region: chr  "Southern Asia" "Southern Europe" "Northern Africa" "Polynesia" ...

5) Reshaping the Datasets

We need to reshape the data frames. For the purposes of reshaping our data frames, we can divide the variables into two groups: identifier, or id, variables and measured variables. In our case, id variables include the Country and Years, whereas the measured variables are the GDP per capita, life expectancy, etc..

We can further abstract and “say there are only id variables and a value, where the id variables also identify what measured variable the value represents.”

For example, we could represent a data set, which has two id variables, subject and time:

where each row represents one observation of one variable. This operation is called melting (and can be achieved by using the melt function of the Reshape package).

Compared to the former table, the latter table has a new id variable “variable”, and a new column “value”, which represents the value of that observation. See the paper, Reshaping data with the reshape package, by Hadley Wickham, for more clarification.

# load reshape2 package
library(reshape2)
# reshape gdp_per_capita data frame
gdp_per_capita <- melt(gdp_per_capita, id.vars="Country", variable.name = "Years", value.name="GDP per capita", na.rm = TRUE)
str(gdp_per_capita)

## 'data.frame':    43252 obs. of  3 variables:
##  $ Country       : chr  "Afghanistan" "Albania" "Algeria" "Andorra" ...
##  $ Years         : Factor w/ 214 levels "1800","1801",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ GDP per capita: num  634 861 1360 1260 650 ...

# change factor to numeric
gdp_per_capita$Years <- as.numeric(as.character(gdp_per_capita$Years))
str(gdp_per_capita)

## 'data.frame':    43252 obs. of  3 variables:
##  $ Country       : chr  "Afghanistan" "Albania" "Algeria" "Andorra" ...
##  $ Years         : num  1800 1800 1800 1800 1800 1800 1800 1800 1800 1800 ...
##  $ GDP per capita: num  634 861 1360 1260 650 ...

head(gdp_per_capita)

##                Country Years GDP per capita
## 2          Afghanistan  1800       634.4000
## 4              Albania  1800       860.5880
## 5              Algeria  1800      1360.0000
## 7              Andorra  1800      1260.0000
## 8               Angola  1800       650.0000
## 10 Antigua and Barbuda  1800       796.5934

We now have the data frame in a form in which there are only id variables and a value.

Let’s reshape the data frames ( child_mortality, democracy_score, life_expectancy, population):

# reshape child_mortality data frame
child_mortality <- melt(child_mortality, id.vars="Country", variable.name = "Years", value.name="Child mortality", na.rm = TRUE)
str(child_mortality)

## 'data.frame':    33490 obs. of  3 variables:
##  $ Country        : chr  "Sweden" "Sweden" "Sweden" "Sweden" ...
##  $ Years          : Factor w/ 253 levels "1761","1762",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Child mortality: num  316 377 395 316 316 ...

# change factor to numeric
child_mortality$Years <- as.numeric(as.character(child_mortality$Years))
str(child_mortality)

## 'data.frame':    33490 obs. of  3 variables:
##  $ Country        : chr  "Sweden" "Sweden" "Sweden" "Sweden" ...
##  $ Years          : num  1761 1762 1763 1764 1765 ...
##  $ Child mortality: num  316 377 395 316 316 ...

head(child_mortality)

##      Country Years Child mortality
## 221   Sweden  1761          315.61
## 482   Sweden  1762          376.98
## 743   Sweden  1763          395.31
## 1004  Sweden  1764          316.08
## 1265  Sweden  1765          316.03
## 1526  Sweden  1766          311.90

# reshape democracy_score data frame
democracy_score <- melt(democracy_score, id.vars="Country", variable.name = "Years", value.name="Democracy Score", na.rm = TRUE)
str(democracy_score)

## 'data.frame':    17798 obs. of  3 variables:
##  $ Country        : chr  "Afghanistan" "Austria" "China" "Denmark" ...
##  $ Years          : Factor w/ 212 levels "1800","1801",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Democracy Score: num  -6 -10 -6 -10 -8 -10 -10 1 -6 -6 ...

# change factor to numeric
democracy_score$Years <- as.numeric(as.character(democracy_score$Years))
str(democracy_score)

## 'data.frame':    17798 obs. of  3 variables:
##  $ Country        : chr  "Afghanistan" "Austria" "China" "Denmark" ...
##  $ Years          : num  1800 1800 1800 1800 1800 1800 1800 1800 1800 1800 ...
##  $ Democracy Score: num  -6 -10 -6 -10 -8 -10 -10 1 -6 -6 ...

head(democracy_score)

##         Country Years Democracy Score
## 2   Afghanistan  1800              -6
## 15      Austria  1800             -10
## 45        China  1800              -6
## 60      Denmark  1800             -10
## 77       France  1800              -8
## 104        Iran  1800             -10

# reshape life_expectancy data frame
life_expectancy <- melt(life_expectancy, id.vars="Country", variable.name = "Years", value.name="Life Expectancy", na.rm = TRUE)
str(life_expectancy)

## 'data.frame':    44359 obs. of  3 variables:
##  $ Country        : chr  "Denmark" "Finland" "France" "Iceland" ...
##  $ Years          : Factor w/ 254 levels "1765","1766",..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ Life Expectancy: num  33.3 35.3 27 40.1 35 ...

# change factor to numeric
life_expectancy$Years <- as.numeric(as.character(life_expectancy$Years))
str(life_expectancy)

## 'data.frame':    44359 obs. of  3 variables:
##  $ Country        : chr  "Denmark" "Finland" "France" "Iceland" ...
##  $ Years          : num  1765 1765 1765 1765 1765 ...
##  $ Life Expectancy: num  33.3 35.3 27 40.1 35 ...

head(life_expectancy)

##     Country Years Life Expectancy
## 57  Denmark  1765        33.32457
## 71  Finland  1765        35.32585
## 72   France  1765        26.95198
## 96  Iceland  1765        40.07231
## 163  Norway  1765        34.98737
## 210  Sweden  1765        35.95000

# reshape population data frame
population <- melt(population, id.vars="Country", variable.name = "Years", value.name="Population", na.rm = TRUE)
str(population)

## 'data.frame':    21591 obs. of  3 variables:
##  $ Country   : chr  "Albania" "Algeria" "American Samoa" "Australia" ...
##  $ Years     : Factor w/ 232 levels "1700","1730",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Population: chr  "300,000" "1,750,000" "7,427" "450,000" ...

# change factor to numeric
population$Years <- as.numeric(as.character(population$Years))
# population value has comma separators which need to be removed
population$Population <- as.numeric(gsub(',', '', population$Population))
str(population)

## 'data.frame':    21591 obs. of  3 variables:
##  $ Country   : chr  "Albania" "Algeria" "American Samoa" "Australia" ...
##  $ Years     : num  1700 1700 1700 1700 1700 1700 1700 1700 1700 1700 ...
##  $ Population: num  300000 1750000 7427 450000 2500000 ...

head(population)

##           Country Years Population
## 4         Albania  1700     300000
## 5         Algeria  1700    1750000
## 6  American Samoa  1700       7427
## 14      Australia  1700     450000
## 15        Austria  1700    2500000
## 19     Bangladesh  1700   15789473

6) Combining the Datasets

Whew, now we can finally all the datasets using a left_join:

# Get an idea if it will be correct join
head(left_join(gdp_per_capita, child_mortality, by=c("Country", "Years")))

##               Country Years GDP per capita Child mortality
## 1         Afghanistan  1800       634.4000          468.58
## 2             Albania  1800       860.5880          375.20
## 3             Algeria  1800      1360.0000          460.21
## 4             Andorra  1800      1260.0000              NA
## 5              Angola  1800       650.0000          485.68
## 6 Antigua and Barbuda  1800       796.5934          473.60

# perform the left join
gapdata <- left_join(gdp_per_capita, child_mortality, by=c("Country","Years"))
head(gapdata)

##               Country Years GDP per capita Child mortality
## 1         Afghanistan  1800       634.4000          468.58
## 2             Albania  1800       860.5880          375.20
## 3             Algeria  1800      1360.0000          460.21
## 4             Andorra  1800      1260.0000              NA
## 5              Angola  1800       650.0000          485.68
## 6 Antigua and Barbuda  1800       796.5934          473.60

# Join the democracy_score data frame
gapdata <- left_join(gapdata, democracy_score, by=c("Country", "Years"))
# Join the life_expectancy data frame
gapdata <- left_join(gapdata, life_expectancy, by=c("Country", "Years"))
# Join the population data frame
gapdata <- left_join(gapdata, population, by=c("Country", "Years"))
str(gapdata)

## 'data.frame':    43252 obs. of  7 variables:
##  $ Country        : chr  "Afghanistan" "Albania" "Algeria" "Andorra" ...
##  $ Years          : num  1800 1800 1800 1800 1800 1800 1800 1800 1800 1800 ...
##  $ GDP per capita : num  634 861 1360 1260 650 ...
##  $ Child mortality: num  469 375 460 NA 486 ...
##  $ Democracy Score: num  -6 NA NA NA NA NA NA NA NA NA ...
##  $ Life Expectancy: num  28.2 35.4 28.8 NA 27 ...
##  $ Population     : num  3280000 410445 2503218 2654 1567028 ...

# join the gapdata and countries data frame
# see if we are doing the right thing
head(left_join(gapdata, countries, by="Country"))

##               Country Years GDP per capita Child mortality Democracy Score
## 1         Afghanistan  1800       634.4000          468.58              -6
## 2             Albania  1800       860.5880          375.20              NA
## 3             Algeria  1800      1360.0000          460.21              NA
## 4             Andorra  1800      1260.0000              NA              NA
## 5              Angola  1800       650.0000          485.68              NA
## 6 Antigua and Barbuda  1800       796.5934          473.60              NA
##   Life Expectancy Population Code   Region      Sub.Region
## 1         28.2110    3280000    4     Asia   Southern Asia
## 2         35.4000     410445    8   Europe Southern Europe
## 3         28.8224    2503218   12   Africa Northern Africa
## 4              NA       2654   20   Europe Southern Europe
## 5         26.9800    1567028   24   Africa   Middle Africa
## 6         33.5360      37000   28 Americas       Caribbean

gapdata <- left_join(gapdata, countries, by=c("Country"))

7) What is GoogleVis

The googleVis package provides an interface between R and the Google Chart Tools.
Reference manual
Gesmann, Markus, and Diego de Castillo. Using the Google visualisation API with R. The R Journal 3.2 (2011): 40-44.
Using the Google Chart Tools with R: googleVis-0.5.10 Package Vignette
googleVis examples
Markdown example with knitr and googleVis
google-motion-charts-with-r

An overview of a GoogleVis Motion Chart

8) Implementing GoogleVis

Implementing GoogleVis is fairly easy. The design of the visualisation functions is fairly generic.

The name of the visualisation function is gvis + ChartType. So for the Motion Chart we have: - gvisMotionChart(data, idvar='Country', timevar='Years', colorvar ='Region', xvar = 'GDP per capita', yvar = 'Life Expectancy', sizevar='Population', options=list(), chartid)

9) Parameters for GoogleVis

data: a data.frame
idvar: the id variable , “Country” in our case.
timevar: the time variable for the plot, “Years” in our case.
xvar: column name of a numerical vector in data to be plotted on the x-axis.
yvar: column name of a numerical vector in data to be plotted on the y-axis.
colorvar: column name of data that identifies bubbles in the same series. We will use “Region” in our case.
sizevar - values in this column are mapped to actual pixel values using the sizeAxis option. We will use this for “Population”.

# install.packages('googleVis')
library(googleVis)

## 
## Welcome to googleVis version 0.5.10
## 
## Please read the Google API Terms of Use
## before you start using the package:
## https://developers.google.com/terms/
## 
## Note, the plot method of googleVis will by default use
## the standard browser to display its output.
## 
## See the googleVis package vignettes for more details,
## or visit http://github.com/mages/googleVis.
## 
## To suppress this message use:
## suppressPackageStartupMessages(library(googleVis))

# str(gapdata)
# head(gapdata)

chart<- gvisMotionChart(gapdata, idvar="Country", 
                        timevar="Years", 
                        colorvar ="Region", 
                        xvar = "GDP per capita",
                        yvar = "Life Expectancy", 
                        sizevar="Population",
                        chartid="GapminderData1")

The output of a googleVis function (gvisMotionChart in our case) is a list of lists (a nested list) containing information about the chart type, chart id, and the html code in a sub-list split into header, chart, caption and footer.

str(chart)

## List of 3
##  $ type   : chr "MotionChart"
##  $ chartid: chr "GapminderData1"
##  $ html   :List of 4
##   ..$ header : chr "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n  \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html xml"| __truncated__
##   ..$ chart  : Named chr [1:7] "<!-- MotionChart generated in R 3.2.2 by googleVis 0.5.10 package -->\n<!-- Tue Oct  6 15:32:38 2015 -->\n\n\n<!-- jsHeader -->"| __truncated__ "\n// jsData \nfunction gvisDataGapminderData1 () {\nvar data = new google.visualization.DataTable();\nvar datajson =\n[\n [\n \"| __truncated__ "\n// jsDrawChart\nfunction drawChartGapminderData1() {\nvar data = gvisDataGapminderData1();\nvar options = {};\noptions[\"widt"| __truncated__ "\n// jsDisplayChart\n(function() {\nvar pkgs = window.__gvisPackages = window.__gvisPackages || [];\nvar callbacks = window.__g"| __truncated__ ...
##   .. ..- attr(*, "names")= chr [1:7] "jsHeader" "jsData" "jsDrawChart" "jsDisplayChart" ...
##   ..$ caption: chr "<div><span>Data: gapdata &#8226; Chart ID: <a href=\"Chart_GapminderData1.html\">GapminderData1</a> &#8226; <a href=\"https://g"| __truncated__
##   ..$ footer : chr "\n<!-- htmlFooter -->\n<span> \n  R version 3.2.2 (2015-08-14) \n  &#8226; <a href=\"https://developers.google.com/terms/\">Goo"| __truncated__
##  - attr(*, "class")= chr [1:2] "gvis" "list"

10) Data Visualization

Let’s plot the GoogleVis Motion Chart.

I had an issue with getting the GoogleVis motion plot to get embedded in the output html, so I had to write out the plot to a separate html file.

plot(chart)

## starting httpd help server ... done

print(chart, "chart", file="Gapminder_Data_Visualization.html")

11) What are the Key Lesson from Gapminder Data Visualization Project

Hans Rosling and Gapminder have made a big impact on data visualization and how data visualization can inform the public about wide misperceptions.
The googlesheets R package allows for easy extraction of data sets which are stored in Google Sheets.
The different steps involved in reshaping and joining multiple data sets can be a little cumbersome. We could have used dplyr pipes more.
It would be a good practice for Gapminder to include the ISO country code in each of their data sets.
There is a need for a data set which lists country names, their ISO codes, as well as other categorical information such as Geographic Regions, Income groups, Landlocked, G77, OECD, etc.
It is relatively easy to implement a GoogleVis motion chart using R. However, it is difficult to change the configuration options. For instance, I was unable to make a simple change to the chart by adding a chart title.
Google Motion Charts provide a great way to visualize several variables at once and be a great teaching tool for all sorts of data.
For instance, one can visualize the Great Divergence between the Western world and China, India or Japan, whereby the West had much faster economic growth, with attendant increases in life expectancy and other health indicators.

Screenshots of the Gapminder data sets visualized using GoogleVis: