title: “Viz3” output: html_document: df_print: paged —

Introduction

Install the packages used in this assignment:

library(ggplot2)
library(dplyr)
library(ggvis)
library(WDI)
library(plotly)
library(tidyr)
library(scales)

Call package WDI to retrieve most updated figures available.

In this assignment, we need to update 8 data series from the WDI:

Tableau Name WDI Series
Birth Rate SP.DYN.CBRT.IN
Health Exp % GDP SH.XPD.TOTL.ZS
Health Exp/Capita SH.XPD.PCAP
Infant Mortality Rate SP.DYN.IMRT.IN
Internet Usage IT.NET.USER.ZS
Life Expectancy (Total) SP.DYN.LE00.IN
Mobile Phone Usage IT.CEL.SETS.P2
Population Total SP.POP.TOTL

The next code chunk will call the WDI API and fetch the years 2000 through 2016, as available. It will then remove the country regional and other aggregates.

birth <- "SP.DYN.CBRT.IN"
hxpgdp <- "SH.XPD.TOTL.ZS"
hxpcap <- "SH.XPD.PCAP"
infmort <- "SP.DYN.IMRT.IN"
net <-"IT.NET.USER.ZS"
lifeexp <- "SP.DYN.LE00.IN"
mobile <- "IT.CEL.SETS.P2"
pop <- "SP.POP.TOTL"

# create a vector of the desired indicator series
indicators <- c(birth, hxpgdp, hxpcap, infmort, net, lifeexp, mobile, pop)

newdata <- WDI(country="all", indicator = indicators, 
     start = 2000, end = 2017, extra = TRUE)

# remove country groupings
newdata$longitude[newdata$longitude==""] <- NA
countries <- filter(newdata, !is.na(longitude))  # drop aggregate groups
## rename columns for each of reference
countries <- rename(countries, birth = SP.DYN.CBRT.IN, 
       hxpgdp = SH.XPD.TOTL.ZS, hxpcap = SH.XPD.PCAP, 
       infmort = SP.DYN.IMRT.IN, net  = IT.NET.USER.ZS,
       lifeexp = SP.DYN.LE00.IN, mobile = IT.CEL.SETS.P2, 
       pop = SP.POP.TOTL)

glimpse(countries) ## data frame column names appear here
## Observations: 3,793
## Variables: 18
## $ iso2c     <chr> "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD"...
## $ country   <chr> "Andorra", "Andorra", "Andorra", "Andorra", "Andorra...
## $ year      <dbl> 2004, 2014, 2015, 2016, 2017, 2002, 2005, 2006, 2007...
## $ birth     <dbl> 10.900, NA, NA, 8.800, NA, 11.200, 10.700, 10.600, 1...
## $ hxpgdp    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ hxpcap    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ infmort   <dbl> 4.2, 3.5, 3.4, 3.3, 3.2, 4.5, 4.1, 4.1, 4.0, 3.9, 3....
## $ net       <dbl> 26.83795, 95.90000, 96.91000, 97.93064, NA, 11.26047...
## $ lifeexp   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ mobile    <dbl> 76.55160, 83.61334, 91.44000, 98.51322, 104.38121, 4...
## $ pop       <dbl> 76244, 79223, 78014, 77281, 76965, 70049, 78867, 809...
## $ iso3c     <fct> AND, AND, AND, AND, AND, AND, AND, AND, AND, AND, AN...
## $ region    <fct> Europe & Central Asia, Europe & Central Asia, Europe...
## $ capital   <fct> Andorra la Vella, Andorra la Vella, Andorra la Vella...
## $ longitude <fct> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.52...
## $ latitude  <fct> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075...
## $ income    <fct> High income, High income, High income, High income, ...
## $ lending   <fct> Not classified, Not classified, Not classified, Not ...

1. Health Indicators

Explanation of plot

This graph averaging the measures over all available years is created by ggplot package. Each line in the health indicator plot in tableau has country as y variable and 4 different indicators’ values as X variables. To get the values, I created a new dataset to calculate the values grouped by country. Dealing with the missing value, the mean values is calculated with N/A being removed. In this way, calculated value is not biased downward.

To create rectangles, I used geom_bar() to create bar plot and then did coord_flip() to prepare for putting all the bar plot together later. To make the color of rectangles change with its value, I applied scale_fill_gradient() and define the 2 colors for lowest value and highest value respectively for each plot. After creating 4 plots for 4 health indicators, I used subplot() to put 4 plots together and used layout() to add titles for 4 plots (in a naive way, which is adding blank in the same line).

#calculate mean and then modify the N/A
data1 = countries %>% group_by(country) %>% summarise(birth = mean(birth/1000,na.rm = TRUE),infmort= mean(infmort/1000,na.rm = TRUE),hxpgdp = mean(hxpgdp/100,na.rm = TRUE),lifeexp = mean(lifeexp,na.rm = TRUE))
data1$birth[is.na(data1$birth)] = 0
data1$infmort[is.na(data1$infmort)] = 0
data1$hxpgdp[is.na(data1$hxpgdp)] = 0 
data1$lifeexp[is.na(data1$lifeexp)] = 0

p1 = data1 %>% ggplot(aes(x = country, y = birth, fill = birth)) +
  geom_bar(stat = "identity", position = "fill") +
  coord_flip() +
  geom_text(aes(label = percent(birth), y = 0.8), color = "black", size = 3, vjust = 0.5) +
  labs(x = "") +
  scale_fill_gradient(low = "light grey", high = "dodgerblue4") +
  theme(axis.text.x = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.position = "none")
p1 = ggplotly(p1, height = 5000, tooltip = c("x", "fill"))

p2 = data1 %>% ggplot(aes(x = country, y = infmort, fill =infmort)) +
  geom_bar(stat = "identity", position = "fill") +
  coord_flip() +
  geom_text(aes(label = round(infmort,2), y = 0.8), color = "black", size = 3, vjust = 0.5) +
  labs(x = "") +
  scale_fill_gradient(low = "light grey", high = "orange") +
  theme(axis.text.x = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.position = "none")
p2 = ggplotly(p2, height = 5000, tooltip = c("x", "fill"))

p3 = data1 %>% ggplot(aes(x = country, y = hxpgdp, fill =hxpgdp)) +
  geom_bar(stat = "identity", position = "fill") +
  coord_flip() +
  geom_text(aes(label = percent(hxpgdp), y = 0.8), color = "black", size = 3, vjust = 0.5) +
  labs(x = "") +
  scale_fill_gradient(low = "light grey", high = "lightgoldenrod4") +
  theme(axis.text.x = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.position = "none")
p3 = ggplotly(p3, height = 5000, tooltip = c("x", "fill"))

p4 = data1 %>% ggplot(aes(x = country, y = lifeexp, fill = lifeexp)) +
  geom_bar(stat = "identity", position = "fill") +
  coord_flip() +
  geom_text(aes(label = round(lifeexp,0), y = 0.8), color = "black", size = 3, vjust = 0.5) +
  labs(x = "") +
  scale_fill_gradient(low = "light grey", high = "mediumpurple4") +
  theme(axis.text.x = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.position = "none")
p4 = ggplotly(p4, height = 5000, tooltip = c("x", "fill"))

subplot(p1,p2,p3,p4,margin = 0,shareY = TRUE)%>% layout(title='                   Country                      Birth Rate                 Infant Mortality      Health Exp % GDP       Life Expectancy',titlefont=list(size=12))

Comparison

Comparing to the original Tableau version, this graph didn’t contain “Health Indicator” as the main title. Those N/As are showed as 0 rather then nothing in the rectangle. Furthermore, it lacks the interactive part that user can’t choose to highlight a column or a certain rectangle. Also, I didn’t manage to add legend of color at the side of the plot.

Has the world changed much?

This plot only shows the average value of all years for each country. Therefore it cannot tell the trend since 2012.

2. Care Spending

Explanation of plot

Using plotly package, this plot is a combination of 2 bar plots with country being the y variable and hxpgdp and hxpcap being x variables respectively. In order to sort the data from highest to lowest, reorder() is applied on the y variable. The main function plot_ly offers many options. To add a slider for choosing the year, set frame=~year; to mark the country of a certain year that spent more than 14% of GDP on healthcare, set color = ~hxpgdp>14. In the end, using subplot to put 2 bar plots together.

## Your plotting code goes here
data2 <- countries
data2$hxpgdp[is.na(data2$hxpgdp)] = 0 
data2$hxpcap[is.na(data2$hxpcap)] = 0

c1 <- data2 %>% plot_ly(x=~hxpgdp,y=~reorder(country,hxpgdp), type="bar",frame=~year,height = 4000, width = 1000, name = 'Health Expenditure (% of GDP)', color = ~hxpgdp >14, colors = c('grey','indianred')) 


c2<- data2 %>% plot_ly(x=~hxpcap,y=~country,type="bar",frame=~year,height = 4000, width = 1000, name = 'Health Expenditure (per capita)',color = ~hxpgdp >14, colors = c('grey','indianred'))


subplot(c1,c2, shareY = TRUE, shareX = FALSE,margin = 0.05) %>%  layout(title = "Which Countries Spend More Than 14% of GDP on Healthcare?",
         xaxis = list(title = "",range=c(0,20)),
         yaxis = list(title = "",width=100))

Comparison

This plot lacks some interactive part that it cannot choose the threshold value to be mark as red color or choose to highlight a column or a certain rectangle. Also, compared to tableau, this plot lacks legends to indicate that the red color means the bar value is larger than 14%.

Has the world changed much since 2012?

For Health Expenditure (% of GDP), this plot lacks data after 2012. For ‘Health Expenditure (per capita)’, data show the value almost stay the same since 2012 which may due to the data wasn’t updated or they were collected once each 5 years.

3. Technology graph

Explanation

This plot is created by ggvis package. To add the select buttons for choosing region and country, the distinct values of these 2 variables were stored in 2 vectors to be assigned to choice parameter in input_select(). Then in ggvis(), I combined 3 layers. layer_bars is for using bars to represent “mobile usage”“, layer_lines and layer_points are for creating a line with points to show the”net usage“.

Technology <- countries
region <- as.vector(unique(Technology$region))
country <- as.vector(unique(Technology$country))

Technology  %>% ggvis(~factor(year),y=~mobile) %>%
  filter( region == eval(input_select(choices = region, label = "Region"))&country == eval(input_select(choices = country, label = "Country"))) %>% 
  layer_bars(fill := "darkcyan")%>%
  layer_lines(~factor(year), ~net, stroke := 'gold') %>%
  layer_points(~factor(year), ~net, stroke := 'gold', fill := 'gold')%>%
  add_axis("x", title = "Year") %>%
  add_axis("y", title = "Use as Percentage of Population")
## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.

Comparison

Comparing to the original Tableau version, this graph didn’t contain the main title. Furthermore, it lacks the interactive part that user can’t choose to highlight a column. Also, I didn’t manage to add legend to explain the variables of bars and line at the side of the plot.

Has the world changed much?

Since 2012, use of internet and mobile phone as percentage of population has been increasing. In Africa, Asia and The Middle East, the developing speed is faster than the rest of the world.

Conclusion

As for health indicators, the countries with higher birth rate tend to have higher infant mortality, which are located more in Africa. Highly developed countries tend to have higher life expectancy. As for care spending, countries has been increasing the expenditure of healthcare compared to GDP. As for the usage of internet and mobile phone, it is clear that the percentage of population has been increasing worldwidely. Specially, the developing speed is faster since 2012 in Africa, Asia and The Middle East.