This assignment is to 1) pull the data from WDI, 2) plot the relationship between tourism and three health indicators among seven regions over 1998-2018, and 3) plot the relationship between forest coverage rate and tourism on two maps to reflect the changes of them from 1998 to 2016.
library(tidyverse)
library(leaflet)
library(WDI)
library(plotly)
library(leaflet.opacity)
WDI to retrieve most updated figures available.In this assignment, fetch ten data series from the WDI:
| Tableau Name | WDI Series |
|---|---|
| Birth Rate | SP.DYN.CBRT.IN |
| Infant Mortality Rate | SP.DYN.IMRT.IN |
| Internet Usage | IT.NET.USER.ZS |
| Life Expectancy (Total) | SP.DYN.LE00.IN |
| Forest Area (% of land) | AG.LND.FRST.ZS |
| Mobile Phone Usage | IT.CEL.SETS.P2 |
| Population Total | SP.POP.TOTL |
| International Tourism receipts (current US$) | ST.INT.RCPT.CD |
| Import value index (2000=100) | TM.VAL.MRCH.XD.WD |
| Export value index (2000=100) | TX.VAL.MRCH.XD.WD |
birth <- "SP.DYN.CBRT.IN"
infmort <- "SP.DYN.IMRT.IN"
net <-"IT.NET.USER.ZS"
lifeexp <- "SP.DYN.LE00.IN"
forest <- "AG.LND.FRST.ZS"
mobile <- "IT.CEL.SETS.P2"
pop <- "SP.POP.TOTL"
tour <- "ST.INT.RCPT.CD"
import <- "TM.VAL.MRCH.XD.WD"
export <- "TX.VAL.MRCH.XD.WD"
# create a vector of the desired indicator series
indicators <- c(birth, infmort, net, lifeexp, forest,
mobile, pop, tour, import, export)
countries <- WDI(country="all", indicator = indicators,
start = 1998, end = 2018, extra = TRUE)
## rename columns for each of reference
countries <- rename(countries, birth = SP.DYN.CBRT.IN,
infmort = SP.DYN.IMRT.IN, net = IT.NET.USER.ZS,
lifeexp = SP.DYN.LE00.IN, forest = AG.LND.FRST.ZS,
mobile = IT.CEL.SETS.P2, pop = SP.POP.TOTL,
tour = ST.INT.RCPT.CD, import = TM.VAL.MRCH.XD.WD,
export = TX.VAL.MRCH.XD.WD)
# convert geocodes from factors into numerics
countries$lng <- as.numeric(as.character(countries$longitude))
countries$lat <- as.numeric(as.character(countries$latitude))
# Remove groupings, which have no geocodes
countries <- countries %>%
filter(!is.na(lng))
glimpse(countries)
## Observations: 4,410
## Variables: 22
## $ iso2c <chr> "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD"...
## $ country <chr> "Andorra", "Andorra", "Andorra", "Andorra", "Andorra...
## $ year <int> 2018, 2007, 2004, 2005, 2017, 1998, 1999, 2000, 2006...
## $ birth <dbl> NA, 10.100, 10.900, 10.700, NA, 11.900, 12.600, 11.3...
## $ infmort <dbl> NA, 4.0, 4.2, 4.1, 3.2, 5.1, 4.9, 4.7, 4.1, 4.5, 4.3...
## $ net <dbl> NA, 70.870000, 26.837954, 37.605766, NA, 6.886209, 7...
## $ lifeexp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ forest <dbl> NA, 34.042553, 34.042553, 34.042553, NA, 34.042553, ...
## $ mobile <dbl> NA, 76.80297, 76.55160, 81.85933, 104.38121, 22.0089...
## $ pop <dbl> NA, 82683, 76244, 78867, 76965, 64142, 64370, 65390,...
## $ tour <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ import <dbl> NA, 190.30053, 174.09246, 178.06349, 146.27331, 105....
## $ export <dbl> NA, 332.78037, 271.81148, 314.89205, 264.92993, 127....
## $ iso3c <fct> AND, AND, AND, AND, AND, AND, AND, AND, AND, AND, AN...
## $ region <fct> Europe & Central Asia, Europe & Central Asia, Europe...
## $ capital <fct> Andorra la Vella, Andorra la Vella, Andorra la Vella...
## $ longitude <fct> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.52...
## $ latitude <fct> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075...
## $ income <fct> High income, High income, High income, High income, ...
## $ lending <fct> Not classified, Not classified, Not classified, Not ...
## $ lng <dbl> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.52...
## $ lat <dbl> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075...
The data for Life Expectancy, Birth Rate, Infant Mortality Rate and Tourism is collected only till 2016, so here I plot the data until year 2016. Besides, since the geographic taxonomy of WDI data is different from the one of the last assignment, the plot now changes from 6 to 7 facets.
In the last assignment, we also used GDP per Capita and Healthcare Expenditure which are not include in this dataset. Hence, I take International Tourism receipts as a proxy for economy level for the analysis.
The chuck below aggregate the original countries dataframe by region and year.
Note:
the new wi3 dataframe shows the data for each region per year, including 147 observations and 6 variables.
Also, notice that in the countries data set, the birth rate and infant mortality rate is at per 1000 people scale, which is diffrent from the last assignment. Hence, for consistency, I convert them into per 100 people by dividing 10 in this assignment.
wi3############### Now summarize average rates by region
wi3 <- countries %>%
group_by(region, year) %>%
summarize(birthrate = mean(birth, na.rm = T)/10,
life_exp = mean(lifeexp, na.rm = T),
infant = mean(infmort, na.rm = T)/10,
tour = mean(tour, na.rm = T)/10^9) %>%
filter(year < 2017)
head(wi3,5)
The assignment is to look at the Birth Rate, Life Expectancy, and Infant Mortality for 7 regions over 1998-2016.
For Life Expectancy, each bin of the bar chart represents the average life expectancy for that particular year in that region. In addition, a horizontal line(the average level of all regions) is added for comparison. For Birth Rate, a line graph is created for each region which indicates the trend in Birth Rate over a certain period of time. For Infant Mortality , similar to the Birth Rate graph, we created a line graph to demonstrate the trend for a certain period of time.
p1 = wi3 %>% ggplot(aes(x = year, y = life_exp, fill=tour)) +
geom_col() +
facet_grid(.~region) +
scale_fill_gradient(low="#f2bcb6", high="#b34362") +
geom_abline(slope = 0, intercept = mean(wi3$life_exp, na.rm = T), col = 'darkgrey', size = 1.1) +
#scale_x_discrete(limits=c(2004,2005,2006,2007,2008,2009,2010)) +
theme(plot.title = element_text(size=18, face="bold"),
axis.text.x = element_text(angle=90, hjust=1),
axis.title.x = element_text(size=12, face="bold"),
axis.title.y = element_text(size=12, face="bold"),
legend.position="right" ) +
labs(x = NULL, y = 'Avg. Life Expanctancy', fill = 'Tourism B$',
title = "Health Indicators Among Seven Regions over 1998-2018 \n(by Export Index)")
p2 = wi3 %>% ggplot(aes(x = year, y = birthrate, col=tour)) +
geom_line(size = 2) +
facet_grid(.~region) +
scale_color_gradient(low="#f4a460", high="#d94633") +
theme(plot.title = element_text(size=14, face="bold"),
axis.text.x = element_text(angle=90, hjust=1),
axis.title.x = element_text(size=12, face="bold"),
axis.title.y = element_text(size=12, face="bold"),
legend.position="right") +
labs(x = NULL, y = 'Avg. Birth Rate %', col = 'Tourism B$')
#p2
p3 = wi3 %>% ggplot(aes(x = year, y = infant, col=tour)) +
geom_line(size = 2) +
facet_grid(.~region) +
scale_color_gradient(low="#e5b79e", high="#8b5058") +
# scale_x_discrete(limits=c(2004,2005,2006,2007,2008,2009,2010)) +
theme(plot.title = element_text(size=14, face="bold"),
axis.text.x = element_text(angle=90, hjust=1),
axis.title.x = element_text(size=12, face="bold"),
axis.title.y = element_text(size=12, face="bold"),
legend.position="right") +
labs(x = NULL, y = 'Avg. Infant Mortality Rate %', col = 'Tourism B$')
#p3
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
library(grid)
plots <- c(list(...), plotlist)
numPlots = length(plots)
if (is.null(layout)) {
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}
if (numPlots==1) {
print(plots[[1]])
} else {
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
for (i in 1:numPlots) {
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
layout.pos.col = matchidx$col))
}
}
}
multiplot(p1,p2,p3, cols = 1)
Here assume International Tourism receipts is a proxy for economy. Since the relationships between economy(tourism) and three indicators are interesting, I also filled bins by color where darkness represents a higher tourism receipts and vice verse.
From the above plot, I found that:
Among seven region, in North America, people live longer(nearly 80 years old), less likely to have more infant(around 1.25%) and the infant mortality rate(0.625%) is the lowest. This may due to that North America’s economy is better, which we can tell from its tourism.
While Sub-Sahara Africa and South Asia have high birth rate, high infant mortality rate and low life expectancy. This may due to the fact that the economy level for these area is relatively low, the healthcare condition is limited. People do not have proper birth control methods and they could not afford to raise more babies. But for these regions, the decrease rate(slope) for birth rate and infant mortality rate is the highest.
But overall, with the economy (tourism) developing globally, the birth rate and infant mortality rate is decreasing and life expectancy is increasing.
Note:
Since the tourism level for South Asia and Sub-Saharan Africa is significantly lower than other country, the color scale can not reflect the increasing trend in tourism.
Forest(green) and Tourism(size) per country in 1998Since I am interested in the relationship between tourism and forest coverage rate, this map is made.
The green scale reflects the forest coverage of a country: brownish circle indicates low forest coverage rate, more green circle indicates higher forest coverage.
The size of the circle reflects the tourism of a country(larger better). To avoid over-plotting, I set the circle to be transparent. Note that tourism here is transformed into hundred thousand dollars.
For the convenience of comparing and exploration, popups are added to show the details of particular countries when hovering above.
# manipulate dataset
pal <- colorNumeric(
palette = colorRampPalette(c('#87673a','#247e00'))(length(countries$forest)),
domain = countries$forest)
df_1998 = countries %>% filter(year == 1998 & !is.na(tour) & !is.na(forest)) %>% select(country, year, forest, tour, lng, lat)
leaflet(df_1998) %>%
addProviderTiles("CartoDB") %>%
addCircles(lat = ~lat, lng = ~lng, color = ~pal(forest), radius = ~tour/10^5, opacity = 1, fillOpacity = 0.7, weight=1.5) %>%
setView(lng = 20., lat = 30., zoom=2) %>%
addLegend(pal = pal, value= ~forest, group = "circles", position = "topright", title ="Forest %", opacity = 1) %>%
addMarkers(
lng = -77.032000, lat = 38.889500,
label = "United States, Forest: 33%, Tourism: $105B",
labelOptions = labelOptions(noHide = F))
From the map, we can found that most of the countries coverage is around 20-40% which is low in 1998. Interestingly, contrary to our common sense, countries with high forest coverage do not attract more tourists. And most of the traffic is concentrated on the US and European counties, where the forest coverage is not high.
Besides, we can also found that circles near have relatively same color(same forest coverage rate), which may due to the whether and atmosphere condition.
Forest(green) and Tourism(size) per country in 2016For this plot, I set the view by default with the same scale. That means, without changing the map size/scale, the area of the circle is comparable with the last map.
df_2016 = countries %>% filter(year == 2016 & !is.na(tour) & !is.na(forest)) %>% select(country, year, forest, tour, lng, lat)
leaflet(df_2016) %>%
addProviderTiles("CartoDB") %>%
addCircles(lat = ~lat, lng = ~lng, color = ~pal(forest), radius = ~tour/100000, opacity = 1, fillOpacity = 0.7, weight=1.5) %>%
setView(lng = 30.,
lat = 25.,
zoom=2) %>%
addLegend(pal = pal, value= ~forest, group = "circles", position = "topright", title ="Forest %", opacity = 1) %>%
addMarkers(
lng = -77.032000, lat = 38.889500,
label = "United States, Forest: 34%, Tourism: $246B (2x)",
labelOptions = labelOptions(noHide = F)) %>%
addMarkers(
lng = 116.286000, lat = 40.049500,
label = "China, Forest: 22%, Tourism: $44B (1.3x)",
labelOptions = labelOptions(noHide = F)) %>%
addMarkers(
lng = 100.521000, lat = 13.73080,
label = "Thailand, Forest: 32%, Tourism: $52B (1.2x)",
labelOptions = labelOptions(noHide = F)) %>%
addMarkers(
lng = 139.77, lat = 35.670,
label = "Japan, Forest: 68%, Tourism: $33B (1.2x)",
labelOptions = labelOptions(noHide = F))
Compared with 1998, in 2016, most of the countries’ tourism increase a lot, spherically in the US(increases about 100%).
Besides, countries in the east Asia region such as China, Japan and Thailand increase significantly compared to the their past.
However, we can also found that the forest coverage doesn’t change a lot across 20 years. This might due to the whether and atmosphere condition, which countries could not intervene much.