Information Visualization Assignment 2: World Development Indicator. In Phase 1 of the assignment, we use ggplot() to create plots for 3 variables: Internet Usage, CO2 Emissions and Health Expenditure, to show the changes overtime. In Phase 2, we use leaflet() to create interactive maps for Mobile Phone Usage in 1998 and 2017 to compare the difference and learn about the developement over the globe.
library(tidyverse)
library(leaflet)
library(WDI)
WDI to retrieve most updated figures available.In this assignment, we will fetch ten data series from the WDI:
| Tableau Name | WDI Series |
|---|---|
| Birth Rate | SP.DYN.CBRT.IN |
| Infant Mortality Rate | SP.DYN.IMRT.IN |
| Internet Usage | IT.NET.USER.ZS |
| Life Expectancy (Total) | SP.DYN.LE00.IN |
| Forest Area (% of land) | AG.LND.FRST.ZS |
| Mobile Phone Usage | IT.CEL.SETS.P2 |
| Population Total | SP.POP.TOTL |
| International Tourism receipts (current US$) | ST.INT.RCPT.CD |
| Import value index (2000=100) | TM.VAL.MRCH.XD.WD |
| Export value index (2000=100) | TX.VAL.MRCH.XD.WD |
The next code chunk will call the WDI API and fetch the years 1998 through 2018, as available. You will find that only a few variables have data for 2018. The dataframe will also contain the longitude and latitude of the capital city in each country.
The World Bank uses a complex, non-intuitive scheme for naming variables. For example, the Birth Rate series is called SP.DYN.CBRT,IN. The code assigns variables names that are more intuitive than the codes assigned by the World Bank, and converts the geocodes from factors to numbers.
In our code, we will use the data frame called countries.
birth <- "SP.DYN.CBRT.IN"
infmort <- "SP.DYN.IMRT.IN"
net <-"IT.NET.USER.ZS"
lifeexp <- "SP.DYN.LE00.IN"
forest <- "AG.LND.FRST.ZS"
mobile <- "IT.CEL.SETS.P2"
pop <- "SP.POP.TOTL"
tour <- "ST.INT.RCPT.CD"
import <- "TM.VAL.MRCH.XD.WD"
export <- "TX.VAL.MRCH.XD.WD"
# create a vector of the desired indicator series
indicators <- c(birth, infmort, net, lifeexp, forest,
mobile, pop, tour, import, export)
countries <- WDI(country="all", indicator = indicators,
start = 1998, end = 2018, extra = TRUE)
## rename columns for each of reference
countries <- rename(countries, birth = SP.DYN.CBRT.IN,
infmort = SP.DYN.IMRT.IN, net = IT.NET.USER.ZS,
lifeexp = SP.DYN.LE00.IN, forest = AG.LND.FRST.ZS,
mobile = IT.CEL.SETS.P2, pop = SP.POP.TOTL,
tour = ST.INT.RCPT.CD, import = TM.VAL.MRCH.XD.WD,
export = TX.VAL.MRCH.XD.WD)
# convert geocodes from factors into numerics
countries$lng <- as.numeric(as.character(countries$longitude))
countries$lat <- as.numeric(as.character(countries$latitude))
# Remove groupings, which have no geocodes
countries <- countries %>%
filter(!is.na(lng))
glimpse(countries)
## Observations: 4,410
## Variables: 22
## $ iso2c <chr> "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD", …
## $ country <chr> "Andorra", "Andorra", "Andorra", "Andorra", "Andorra",…
## $ year <int> 2018, 2007, 2004, 2005, 2017, 1998, 1999, 2000, 2006, …
## $ birth <dbl> NA, 10.100, 10.900, 10.700, NA, 11.900, 12.600, 11.300…
## $ infmort <dbl> 2.7, 4.5, 5.1, 4.9, 2.8, 6.4, 6.2, 5.9, 4.7, 5.5, 5.3,…
## $ net <dbl> NA, 70.870000, 26.837954, 37.605766, 91.567467, 6.8862…
## $ lifeexp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ forest <dbl> NA, 34.042553, 34.042553, 34.042553, NA, 34.042553, 34…
## $ mobile <dbl> 107.28255, 76.80204, 76.55160, 81.85933, 104.33241, 22…
## $ pop <dbl> 77006, 82684, 76244, 78867, 77001, 64142, 64370, 65390…
## $ tour <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ import <dbl> 136.50668, 190.30053, 174.09246, 178.06349, 146.27331,…
## $ export <dbl> 268.35043, 332.78037, 271.81148, 314.89205, 264.92993,…
## $ iso3c <fct> AND, AND, AND, AND, AND, AND, AND, AND, AND, AND, AND,…
## $ region <fct> Europe & Central Asia, Europe & Central Asia, Europe &…
## $ capital <fct> Andorra la Vella, Andorra la Vella, Andorra la Vella, …
## $ longitude <fct> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218…
## $ latitude <fct> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, …
## $ income <fct> High income, High income, High income, High income, Hi…
## $ lending <fct> Not classified, Not classified, Not classified, Not cl…
## $ lng <dbl> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218…
## $ lat <dbl> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, …
We decided that the clearest way to generate the visualization would be to create three consecutive plots that show the behavioral trends in Internet Usage, CO2 Emissions, and Healthcare expenditure of GDP respectively for Brazil, China, India, Russia, and the United States. We deliberately facetted the countries in the same order, in hopes to encourage comparison. Additionally, we chose a simple, minimalistic, and consistent theme for our plots, free of additional noise for further clarity of analysis. We believed that the best methods for showing the trends would either be points, a line, or a smoothed line. Through an iterative process of comparing results, we found that the trends were less apparent when using points, and although the smoothing function led to “pretty” trends, nuanced information was lost. Therefore, we found a simple line plot to be the best representation of the trends. We also chose to vary the colour of the lines by region, so that countries in similar geographic locations can easily be compared.
The visualization below enables users to make comparisons between Brazil, China, India, Russia, and the United States in terms of Internet Usage, CO2 Emissions, and Healthcare expenditure of GDP. Brazil, Russia, India, and China are considered to be the “BRIC Countries”. Although these BRIC countries are technically still developing, they are all experiencing similar, newly-advanced economic development. All of the variables we have plotted are indicators of economic development. Therefore, the visualization above is showing how the BRIC countries are developing relative to the United States.
# library calls to load packages
library(data.table)
# Import data and create a subset containing required variables
world <- fread("~/Desktop/IV HW2/World Indicators.csv")
world_sub <- world[world$Country %in%
c('United States', 'Brazil', 'Russian Federation', 'India', 'China'),
c('Year', 'Country', 'Region', 'Internet Usage', 'CO2 Emissions', 'Health Exp % GDP')]
# data cleaning
world_sub$Year = as.factor(sub("12/1/", "", world_sub$Year))
world_sub$`Internet Usage` = as.numeric(sub("%", "", world_sub$`Internet Usage`))
world_sub$`Health Exp % GDP` = as.numeric(sub("%", "", world_sub$`Health Exp % GDP`))
world_sub$Country = sub("Russian Federation", "Russia", world_sub$Country)
In terms of Internet Usage, we can clearly see that the BRIC countries are experiencing exponential growth since the 21st century, while the United States, on average, is increasing at a decreasing rate. This trend is likely due to the “catch-up” effect, where developing countries can experience rapid growth in technology.
# plotting: Internet Usage map
internet_map <- ggplot(world_sub, aes(x = Year, y =`Internet Usage`, group = Country)) +
geom_line(aes(col = Region)) +
facet_wrap(~Country, ncol = 5) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, size=7, vjust= 0.5)) +
ylab("Internet Usage")
internet_map
The Healthcare Expenditure Share of GDP for most countries is rising over time. China and Russia both experienced moderate growth over this period, while the United States and Brazil, both experienced more substantial growth (approximately 50% increase). On the other hand, over this period, India experienced a slight decline.
# plotting: CO2 Emissions map
co2_map <- ggplot(world_sub, aes(x = Year, y =`CO2 Emissions`, group = Country)) +
geom_line(aes(col = Region)) +
facet_wrap(~Country, ncol = 5) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, size = 7, vjust = 0.5)) +
ylab("CO2 Emissions")
co2_map
## Warning: Removed 10 rows containing missing values (geom_path).
The plot of CO2 Emissions tells an interesting story. For Brazil, India, and Russia, their emissions are generally experiencing mild growth, while China is experiencing rapid growth due to rapid industrialization. The United States had begun this period at a high level, however since 2006, there has been a decline in their emissions likely due to increased awareness about the environmental impact of CO2 emissions. For each of the countries, we can also see a downwards pressure on emissions in the wake of the 2008 Global Financial Crisis, which tampered the BRIC countries growth rate, and intensified the United States’ decay rate.
#plotting: Health Exp % GDP map
health_map <- ggplot(world_sub, aes(x = Year, y = `Health Exp % GDP`, group = Country)) +
geom_line(aes(col = Region)) +
facet_wrap(~Country, ncol = 5) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, size = 7, vjust = 0.5)) +
ylab("Health Exp % GDP")
health_map
Overall, from these plots we can see a clear story: The United States, a developed country, begins with a high level of the economic indicator, but, in general, we can see the developing BRIC countries begin to catch up to the United States.
We decided to explore the changes in mobile phone usage between Year 1998 and 2017. We wanted to create two interactive world maps that provide country names and pop-up bubbles that contain mobile phone usage indicators.
We created a new variable, mobilecap, which contains both the name of the capital and the mobile usage indicators. By showing mobilecap in the pop-up bubbles, we can provide detailed information and avoid possible confusion of the locations, especially in a zoom-out view. We select ‘Esri’ to start with. we pinpointed the capital cities with CircleMarker and used different shades of blue to indicate the degree of mobile usage in each country. For any missing data, the marker would be in grey instead of blue. Users can compare mobile usage between or within different regions by scaling the map, and click on the circle markers to learn the detail about any selected country. Users can also make comparisons between two maps to see trends of world development over a decade.
As shown in the map below, the mobile usage in North America and Europe had the highest intensity in 1998 comparing to other regions. Africa, covering by a large number of white markers, reflects an extremely low mobile usage. The mobile usage level in developed areas in some of the South East Asian countries is almost as good as the developed countries, but in other areas in Asia the usage level was still below average.
#Creating New Variable for popup
countries$mobilecap <- paste(as.character(countries$capital), as.character(countries$mobile), sep= ': ')
# creating subset that contains
sub1998 <- countries[countries$year %in% c('1998'), ]
# creating pallet of varying shades of blue
pal1998 <- colorQuantile("Blues", sub1998$mobile)
# plotting
mobilemap_1998<- leaflet() %>%
addProviderTiles("Esri") %>%
addCircleMarkers(lng = sub1998$lng,
lat = sub1998$lat,
color = pal1998(sub1998$mobile),
popup = sub1998$mobilecap)
mobilemap_1998
Comparing 1998 map to 2017 version, we can see a significant growth, particularly in South-Eastern Asia and Oceania regions. The Southern and South-Western parts of Africa also experienced a significant increase in numbers. It seems that largely or over-populated areas over the world had inevitably shared growth in numbers. Another area to point out is South America, where the numbers are more spread in a scalable matter, particularly Central American regions.
# your code goes here
# creating subset that contains
sub2017 <- countries[countries$year %in% c('2017'), ]
# creating pallet of varying shades of blue
pal2017 <- colorQuantile("Blues", sub2017$mobile)
# plotting
mobilemap_2017<- leaflet() %>%
addProviderTiles("Esri") %>%
addCircleMarkers(lng = sub2017$lng,
lat = sub2017$lat,
color = pal2017(sub2017$mobile),
popup = sub2017$mobilecap)
mobilemap_2017