Introduction

This report analyzes the world changes in Internet Usage, CO2 Emissions, Health Expenditure Proportion of GDP, and Birthrate using WDI country data from 1998 to 2018. From this report, you can get vivid visuals of the changes as well as the explored insights of them.

Step 1: library calls to load packages

library(tidyverse)
library(leaflet)
library(WDI)
library(ggplot2)
library(plotly)
library(data.table)
library(tidyr)

Step 2: Call package WDI to retrieve most updated figures available.

In this assignment, we will fetch ten data series from the WDI:

Tableau Name WDI Series
Birth Rate SP.DYN.CBRT.IN
Infant Mortality Rate SP.DYN.IMRT.IN
Internet Usage IT.NET.USER.ZS
Life Expectancy (Total) SP.DYN.LE00.IN
Forest Area (% of land) AG.LND.FRST.ZS
Mobile Phone Usage IT.CEL.SETS.P2
Population Total SP.POP.TOTL
International Tourism receipts (current US$) ST.INT.RCPT.CD
Import value index (2000=100) TM.VAL.MRCH.XD.WD
Export value index (2000=100) TX.VAL.MRCH.XD.WD

The next code chunk will call the WDI API and fetch the years 1998 through 2018, as available. You will find that only a few variables have data for 2018. The dataframe will also contain the longitude and latitude of the capital city in each country.

Note This notebook will take approximately 2 minutes to run. The WDI call is time-consuming as is the process of knitting the file. Be patient.

The World Bank uses a complex, non-intuitive scheme for naming variables. For example, the Birth Rate series is called SP.DYN.CBRT,IN. The code assigns variables names that are more intuitive than the codes assigned by the World Bank, and converts the geocodes from factors to numbers.

In your code, you will use the data frame called countries.

birth <- "SP.DYN.CBRT.IN"
infmort <- "SP.DYN.IMRT.IN"
net <-"IT.NET.USER.ZS"
lifeexp <- "SP.DYN.LE00.IN"
forest <- "AG.LND.FRST.ZS"
mobile <- "IT.CEL.SETS.P2"
pop <- "SP.POP.TOTL"
tour <- "ST.INT.RCPT.CD"
import <- "TM.VAL.MRCH.XD.WD"
export <- "TX.VAL.MRCH.XD.WD"

# create a vector of the desired indicator series
indicators <- c(birth, infmort, net, lifeexp, forest,
                mobile, pop, tour, import, export)

countries <- WDI(country="all", indicator = indicators, 
     start = 1998, end = 2018, extra = TRUE)

## rename columns for each of reference
countries <- rename(countries, birth = SP.DYN.CBRT.IN, 
       infmort = SP.DYN.IMRT.IN, net  = IT.NET.USER.ZS,
       lifeexp = SP.DYN.LE00.IN, forest = AG.LND.FRST.ZS,
       mobile = IT.CEL.SETS.P2, pop = SP.POP.TOTL, 
       tour = ST.INT.RCPT.CD, import = TM.VAL.MRCH.XD.WD,
       export = TX.VAL.MRCH.XD.WD)

# convert geocodes from factors into numerics

countries$lng <- as.numeric(as.character(countries$longitude))
countries$lat <- as.numeric(as.character(countries$latitude))

# Remove groupings, which have no geocodes
countries <- countries %>%
   filter(!is.na(lng))

A Glimpse of the new dataframe

The overall structure of data:

glimpse(countries)
## Observations: 4,410
## Variables: 22
## $ iso2c     <chr> "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD",…
## $ country   <chr> "Andorra", "Andorra", "Andorra", "Andorra", "Andorra"…
## $ year      <int> 2018, 2007, 2004, 2005, 2017, 1998, 1999, 2000, 2006,…
## $ birth     <dbl> NA, 10.100, 10.900, 10.700, NA, 11.900, 12.600, 11.30…
## $ infmort   <dbl> 2.7, 4.5, 5.1, 4.9, 2.8, 6.4, 6.2, 5.9, 4.7, 5.5, 5.3…
## $ net       <dbl> NA, 70.870000, 26.837954, 37.605766, 91.567467, 6.886…
## $ lifeexp   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ forest    <dbl> NA, 34.042553, 34.042553, 34.042553, NA, 34.042553, 3…
## $ mobile    <dbl> 107.28255, 76.80204, 76.55160, 81.85933, 104.33241, 2…
## $ pop       <dbl> 77006, 82684, 76244, 78867, 77001, 64142, 64370, 6539…
## $ tour      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ import    <dbl> 136.50668, 190.30053, 174.09246, 178.06349, 146.27331…
## $ export    <dbl> 268.35043, 332.78037, 271.81148, 314.89205, 264.92993…
## $ iso3c     <fct> AND, AND, AND, AND, AND, AND, AND, AND, AND, AND, AND…
## $ region    <fct> Europe & Central Asia, Europe & Central Asia, Europe …
## $ capital   <fct> Andorra la Vella, Andorra la Vella, Andorra la Vella,…
## $ longitude <fct> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.521…
## $ latitude  <fct> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075,…
## $ income    <fct> High income, High income, High income, High income, H…
## $ lending   <fct> Not classified, Not classified, Not classified, Not c…
## $ lng       <dbl> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.521…
## $ lat       <dbl> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075,…

The maximum and minimum of birth rate:

max(countries$birth,na.rm=TRUE)
## [1] 54.076
min(countries$birth,na.rm=TRUE)
## [1] 6.9

#Graphing and Comments

Plot from Phase 1

Visualization Explanation

  • Functional: The graph demonstrates the trends of three variables which are Internet Usage, CO2 Emissions, and Health Expenditure Proportion of GDP of the United States, Brazil, Russian Federation, India, and China from 2000 to 2012.

  • Insightful: The plot is faceted to these five countries and three variables, making horizontal comparison easier. We choose to use line chart because it can effectively show the tendency over the years. The y-axis is marked three measurements and their units. Moreover, we apply plotly to make the graph become interactive, that is, you will get the detailed information including year, value and measurement of a certain point when your mouse hover on it.

  • Beautiful: The variables and countries we want to analyze has been properly captured through aesthetics and facets and clearly presented through line charts. We also use color to indicate three variables to further make the visualization more intuitive and beautiful.

  • Truthful: We add the annotations demonstrating the data source to ensure the truthfulness of visualization.

  • Enlightening: The graph’s topics are critical to the well-being of more people, and can raise some concerns, such as global warming. The results let us reflect on environmental issues, healthcare level and the development of the Internet. All in all, the graph is enlightening and has some inspiring points.

Insights and Conclusions

We can see the changing trend of a country in a certain issue over 13 years within the individual charts. For example, all countries showed significant upward trends in internet usage, and stable and slightly increasing trends in health expenditure proportion of GDP. The trends of CO2 emissions, however, varied between countries.

Comparing horizontally, we find that the United States had the highest Internet Usage and Health Expenditure Proportion of GDP while India had the lowest ones. Russian Federation had the largest growth rate in Internet Usage. As for CO2 Emissions, China had the highest level with rapid growth while Brazil was at a steady low level. It is worth noting that the United States showed a slight downward trend of CO2 Emissions.

Internet Usage, CO2 Emissions, and Health Expenditure Proportion of GDP show the development status of the five countries.

  • The United States was the most advanced country in the world with the highest level of Internet Usage and Health Expenditure Proportion of GDP among five countries. With the increasing attention to global warming and other environmental problems raised from the CO2 emissions, the United States shouldered the responsibility to control the CO2 emissions. Although the level of CO2 emissions of the United States was the second highest, it showed a slight downward trend, indicating that the US is trying to control it.

  • India, by contract, had the most backward development state. No wonder it had the lowest levels of Internet Usage and Health Expenditure Proportion of GDP among five countries. As the development status was comparatively low and the level of industrialization was not high as well, India’s CO2 emissions was generally low.

  • Russian Federation had a stable and relatively low level of CO2 emissions and Health Expenditure Proportion of GDP. However, interestingly, it had the largest growth rate in Internet Usage and its level of Internet Usage was even close to the United States in 2012, indicating its rapid transformation of Internet Usage in the Internet Era.

  • China was a fast-growing country in these years with a rapid growth of Internet Usage. However, CO2 emissions level of it was the highest and increased with great speed. The reasons behind it may be that the development of industrialization and the rise in the number of private cars. It should be noted that China’s Health Expenditure Proportion of GDP was comparatively low, even same as India’s, indicating that China’s healthcare level did not match the rapid economic development from 2000 to 2012.

  • Brazil is a special case. It had a medium level of Internet Usage and Health Expenditure Proportion of GDP among five countries, but had the least CO2 emissions, indicating that its development was not based on the destruction of the environment, but on environmental protection, economic development and social progress in healthcare.

data_1=fread(file="~/Downloads/World Indicators.csv",
           col.names=c('Country','Internet_Usage','CO2_Emissions','Health','Year'),
           select = c('Country','Internet Usage','CO2 Emissions','Health Exp % GDP','Year'))
data_1$Year=format(as.Date(data_1$Year,format='%m/%d/%Y'),format='%Y')
data_1=data_1[Country=='United States'|Country=='Brazil'|Country=='Russian Federation'|Country=='India'|Country=='China']
data_1$Health= as.numeric(sub("%","",data_1$Health))
data_1$Internet_Usage=as.numeric(sub("%","",data_1$Internet_Usage))
data_1$CO2_Emissions=as.numeric(data_1$CO2_Emissions)/1000
data_1$Year=as.numeric(data_1$Year)
setnames(data_1, old=c("Internet_Usage","CO2_Emissions","Health"), new=c("Internet Usage(%)", "CO2 Emissions(k)","Health Exp of GDP(%)"))
data=data_1%>%gather(Measurement,value,-Country,-Year)
static=ggplot(data,aes(x=Year,y=value,col=Measurement,group=5))+
  geom_line()+
  facet_grid(Measurement~Country,scale='free')+
  scale_x_continuous(breaks=seq(2000, 2012, 6))+
  theme(axis.text = element_text(size = 6),strip.text.x = element_text(size = 6, colour="black",face="bold"),strip.text.y=element_text(size = 7, colour = "black",face="bold"),strip.background=element_rect(colour="white",fill="white"),panel.spacing=unit(1,"lines"),legend.text=element_text(colour="black",size=7),legend.title = element_blank(),plot.caption = element_text(hjust = 1))

static=static+ labs(title = "Internet Usage, CO2 Emissions, Health Exp % for 5 countries during 13 years")+theme(plot.title = element_text(size = 9,face = "bold")) 

finalplot=ggplotly(static)%>%layout(legend = list(x = 100, y = 0.5))

finalplot=finalplot%>%layout(annotations = list(x = 1, y = -0.1, text = "Data source: World Development Indicators", 
      showarrow = F, xref='paper', yref='paper', 
      xanchor='right', yanchor='auto', xshift=0, yshift=0,
      font=list(size=6, color="gray")))

finalplot

World map showing a variable in 1998

Visualization Explanation

Using leaflet to draw the birth rate for nearly 200 countries in 1998. The range of birth rate is 0 to 60(‰).Color shade represents the size of birth rate: the darker the color, the bigger the number. The location of circle markers are the coordinate of capital.

Insights and Conclusions

On this map, we can see birth rate is unbalanced over the world. Europe and North America have low birth rate, most countries in these areas are lower than 15‰; Asia、Ociania and South America are similar, nearly 20‰-35‰; The birth date of African countries are the highest, most of them are more than 30‰, and the most one is Niger, which is 54‰. 81 countries have a birth rate lower than 20‰ while 33 countries have a birth rate more than 40‰.

library(data.table)
library(htmlwidgets)
countries=as.data.table(countries)
countries1=countries[,c('country','year','birth','lng','lat')]
countries_1998=countries1[year==1998]
countries_2017=countries1[year==2017]

# world map--1998
pal=colorNumeric(palette='Reds',domain=c(0,60))
map_1998=leaflet(options = leafletOptions(zoomSnap = 0.01, zoomDelta = 0.01))%>%
  addProviderTiles('CartoDB')%>%
  setView(lng=45,lat=7,zoom=1.3)%>%
  addCircleMarkers(data=countries_1998,label=~paste0(country,': ',birth),radius=2,color= ~pal(birth))%>%
  addLegend(title='Birthrate of 1998 (‰)',
            position='bottomright',pal=pal,values=c(0,60),opacity=0.6)
## Assuming "lng" and "lat" are longitude and latitude, respectively
map_1998

World map showing the same variable recently

Visualization Explanation

Again, using leaflet to draw the birth rate for nearly 200 countries in 2017. To make these two plots comparable, the range of birth rate is still 0 to 60(‰).Color shade represents the size of birth rate: the darker the color, the bigger the number. The location of circle markers are the coordinate of capital.

Insights and Conclusions

From 1998 to 2017, the intercontinental pattern of birth rate didn’t change much. Europe and North America still the lowest and Africa the highest. But the birth rate decreases globally. No country has a birth date more than 50‰, 114 countries have a birth rate lower than 20‰, and only 6 countries are more than 40‰.

# world map--2017
map_2017=leaflet(options = leafletOptions(zoomSnap = 0.01, zoomDelta = 0.01))%>%
  addProviderTiles('CartoDB')%>%
  setView(lng=45,lat=7,zoom=1.3)%>%
  addCircleMarkers(data=countries_2017,label=~paste0(country,': ',birth),radius=2,color= ~pal(birth))%>%
  addLegend(title='Birthrate of 2017 (‰)',
            position='bottomright',pal=pal,values=c(0,60),opacity=0.6)
## Assuming "lng" and "lat" are longitude and latitude, respectively
map_2017