| itle: “Viz3” |
| utput: |
| html_document: |
| df_print: paged |
You will use this code as a template for your Visualization 3 assignment. The first step is to call a set of packages that you might use in this assignment. The final choices belong to you.
Note that each code chunk is set off with special tags.
library(tidyverse)
library(ggvis)
library(WDI)
library(plotly)
library(choroplethr)
In this assignent, we need to update 8 series:
| Tableau Name | WDI Series |
|---|---|
| Birth Rate | SP.DYN.CBRT.IN |
| Health Exp % GDP | SH.XPD.TOTL.ZS |
| Health Exp/Capita | SH.XPD.PCAP |
| Infant Mortality Rate | SP.DYN.IMRT.IN |
| Internet Usage | IT.NET.USER.ZS |
| Life Expectancy (Total) | SP.DYN.LE00.IN |
| Mobile Phone Usage | IT.CEL.SETS.P2 |
| Population Total | SP.POP.TOTL |
The next code chunk will call the WDI API and fetch the years 2000 through 2016, as available. It will then remove the country regional and other aggregates.
birth <- "SP.DYN.CBRT.IN"
hxpgdp <- "SH.XPD.TOTL.ZS"
hxpcap <- "SH.XPD.PCAP"
infmort <- "SP.DYN.IMRT.IN"
net <-"IT.NET.USER.ZS"
lifeexp <- "SP.DYN.LE00.IN"
mobile <- "IT.CEL.SETS.P2"
pop <- "SP.POP.TOTL"
# create a vector of the desired indicator series
indicators <- c(birth, hxpgdp, hxpcap, infmort, net, lifeexp, mobile, pop)
newdata <- WDI(country="all", indicator = indicators,
start = 2000, end = 2016, extra = TRUE)
# remove country groupings
newdata$longitude[newdata$longitude==""] <- NA
countries <- filter(newdata, !is.na(longitude)) # drop aggregate groups
## rename columns for each of reference
countries <- rename(countries, birth = SP.DYN.CBRT.IN,
hxpgdp = SH.XPD.TOTL.ZS, hxpcap = SH.XPD.PCAP,
infmort = SP.DYN.IMRT.IN, net = IT.NET.USER.ZS,
lifeexp = SP.DYN.LE00.IN, mobile = IT.CEL.SETS.P2,
pop = SP.POP.TOTL)
countries$latitude <- as.numeric(as.character(countries$latitude))
countries$longitude <- as.numeric(as.character(countries$longitude))
glimpse(countries) ## data frame column names appear here
## Observations: 3,485
## Variables: 18
## $ iso2c <chr> "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD"...
## $ country <chr> "Andorra", "Andorra", "Andorra", "Andorra", "Andorra...
## $ year <dbl> 2013, 2006, 2015, 2004, 2005, 2007, 2003, 2008, 2009...
## $ birth <dbl> NA, 10.600, NA, 10.900, 10.700, 10.100, 10.300, 10.4...
## $ hxpgdp <dbl> 11.478046, 5.313893, NA, 5.703882, 5.223353, 6.33545...
## $ hxpcap <dbl> 4914.3912, 2256.1030, NA, 2127.7367, 2089.6678, 2997...
## $ infmort <dbl> 2.7, 3.3, 2.5, 3.5, 3.3, 3.2, 3.6, 3.1, 3.0, 2.9, 2....
## $ net <dbl> 94.00000, 48.93685, 96.91000, 26.83795, 37.60577, 70...
## $ lifeexp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ mobile <dbl> 80.70262, 84.27764, 88.12353, 73.82494, 79.48487, 78...
## $ pop <dbl> 80788, 80991, 78014, 76244, 78867, 82683, 73182, 838...
## $ iso3c <fctr> AND, AND, AND, AND, AND, AND, AND, AND, AND, AND, A...
## $ region <fctr> Europe & Central Asia (all income levels), Europe &...
## $ capital <fctr> Andorra la Vella, Andorra la Vella, Andorra la Vell...
## $ longitude <dbl> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.52...
## $ latitude <dbl> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075...
## $ income <fctr> High income: nonOECD, High income: nonOECD, High in...
## $ lending <fctr> Not classified, Not classified, Not classified, Not...
My approach for the second graph was to use plotly, and layer bubbles on top of the map that were the same size and location as found in the Tableau map. My first code for “q” is to describe the aesthetics for the map as a whole, and input that code into plot_geo for p, however that didn’t work out like I had desired.
p <-plot_geo(data = countries, lat=countries$latitude, lon=countries$longitude,
text = ~paste(countries$country,
'<br>Population:', countries$pop,
'<br>Birth Rate:', countries$birth))
p
## No scattergeo mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
This graph shows a decrease in birthrate over time…..
The heatmap function worked well with this regard, and allowed me to easily visualize the required variables. I was dissappointed, as was discussed in class, that it’s not that easy to sort or crop values as needed. Perhaps a pre-requisite for the class should be a module on data-cleaning before going into visualization.
heat <- countries[,c("birth", "infmort", "hxpgdp","lifeexp","year","country")]
heat_matrix <- data.matrix(heat)
## Warning in data.matrix(heat): NAs introduced by coercion
heat_graph <- heatmap(heat_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c(5,10))
I wanted to use plotly because it’s awesome, and allows for some great interactive graphs. I had intended to add some faceting, but it wasn’t easy to do unlike in ggplot.
countries2012 <- countries %>% filter(year==2012)
hxsort <- countries2012[order(countries2012$hxpgdp),]
p <- plot_ly(x = hxsort$hxpgdp, y = hxsort$country, type = 'bar', orientation = 'h')
p
## Warning: Ignoring 17 observations
Again, not ideal, but a good start of the graph in Tableau. This graph shows the health expenditure as a percentage of GDP for each country in 2012. As you can see, I attempted to sort the data as was done in Tableau, but plotly doesn’t allow bar charts to be sorted easily.
I originally wanted to use ggPlot to graph these figures, since it would allow me to easly layer a bar graph and geom_smooth together on the same graph. It actually worked well when I did it, but the problem was that I hadn’t averaged the values by year…
ggViz was much easier to allow me to convert values as integers and graph correctly by year, but I struggled to layer a layer_smooth of internet and phone usage.
countries$net3 <- as.integer(countries$`net`)
countries$mobile3 <- as.integer(countries$`mobile`)
countries2 <- countries %>% mutate(year = factor(year))
countries2 %>%
ggvis(~year, ~mobile3) %>%
layer_bars(x=~year, y=~mobile3, fill := "green", width = .6)
Well, the bar graphs look fine. Fairly happy with it actually, but a bit frustrated that it wasn’t easy to layer the line graph on top after turning the years into factors.
Overall, this is a tough assignment, and I anticipate I’m going to lose points on students’ reviews because I didn’t analyze changes over time, but instead commented on the graphing process. With that said, it appears that technology usage is increasing over time, that healthcare per capita is increasing.