You should use this code as part of your Visualization 3 assignment. You will re-save this Notebook under a different name and modify it to complete the assignment. For example, you should delete all of the text in this section and replace it with your own introduction.
The first chunk calls the packages that you will need in this assignment.
The second code chunk automatically retrieves the latest data from the World Development Indicators database, for use in the assignment.
#install.packages(c("gglot2", "tidyverse", "WDI", "leaflet", "grid", "gridExtra", "RColorBrewer"))
library(tidyverse)
library(leaflet)
library(WDI)
library(ggplot2)
library(grid)
library(gridExtra)
library(RColorBrewer)
WDI to retrieve most updated figures available.In this assignment, we will fetch ten data series from the WDI:
| Tableau Name | WDI Series |
|---|---|
| Birth Rate | SP.DYN.CBRT.IN |
| Infant Mortality Rate | SP.DYN.IMRT.IN |
| Internet Usage | IT.NET.USER.ZS |
| Life Expectancy (Total) | SP.DYN.LE00.IN |
| Forest Area (% of land) | AG.LND.FRST.ZS |
| Mobile Phone Usage | IT.CEL.SETS.P2 |
| Population Total | SP.POP.TOTL |
| International Tourism receipts (current US$) | ST.INT.RCPT.CD |
| Import value index (2000=100) | TM.VAL.MRCH.XD.WD |
| Export value index (2000=100) | TX.VAL.MRCH.XD.WD |
The next code chunk will call the WDI API and fetch the years 1998 through 2018, as available. You will find that only a few variables have data for 2018. The dataframe will also contain the longitude and latitude of the capital city in each country.
Note This notebook will take approximately 2 minutes to run. The WDI call is time-consuming as is the process of knitting the file. Be patient.
The World Bank uses a complex, non-intuitive scheme for naming variables. For example, the Birth Rate series is called SP.DYN.CBRT,IN. The code assigns variables names that are more intuitive than the codes assigned by the World Bank, and converts the geocodes from factors to numbers.
In your code, you will use the data frame called countries.
birth <- "SP.DYN.CBRT.IN"
infmort <- "SP.DYN.IMRT.IN"
net <-"IT.NET.USER.ZS"
lifeexp <- "SP.DYN.LE00.IN"
forest <- "AG.LND.FRST.ZS"
mobile <- "IT.CEL.SETS.P2"
pop <- "SP.POP.TOTL"
tour <- "ST.INT.RCPT.CD"
import <- "TM.VAL.MRCH.XD.WD"
export <- "TX.VAL.MRCH.XD.WD"
# create a vector of the desired indicator series
indicators <- c(birth, infmort, net, lifeexp, forest,
mobile, pop, tour, import, export)
countries <- WDI(country="all", indicator = indicators,
start = 1998, end = 2018, extra = TRUE)
## rename columns for each of reference
countries <- rename(countries, birth = SP.DYN.CBRT.IN,
infmort = SP.DYN.IMRT.IN, net = IT.NET.USER.ZS,
lifeexp = SP.DYN.LE00.IN, forest = AG.LND.FRST.ZS,
mobile = IT.CEL.SETS.P2, pop = SP.POP.TOTL,
tour = ST.INT.RCPT.CD, import = TM.VAL.MRCH.XD.WD,
export = TX.VAL.MRCH.XD.WD)
# convert geocodes from factors into numerics
countries$lng <- as.numeric(as.character(countries$longitude))
countries$lat <- as.numeric(as.character(countries$latitude))
# Remove groupings, which have no geocodes
countries <- countries %>%
filter(!is.na(lng))
glimpse(countries)
## Observations: 4,389
## Variables: 22
## $ iso2c <chr> "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD"...
## $ country <chr> "Andorra", "Andorra", "Andorra", "Andorra", "Andorra...
## $ year <int> 2004, 2005, 2018, 2003, 2008, 2006, 2011, 2007, 1999...
## $ birth <dbl> 10.900, 10.700, NA, 10.300, 10.400, 10.600, NA, 10.1...
## $ infmort <dbl> 4.2, 4.1, NA, 4.3, 3.9, 4.1, 3.7, 4.0, 4.9, 4.7, 4.6...
## $ net <dbl> 26.837954, 37.605766, NA, 13.546413, 70.040000, 48.9...
## $ lifeexp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ forest <dbl> 34.042553, 34.042553, NA, 34.042553, 34.042553, 34.0...
## $ mobile <dbl> 76.55160, 81.85933, NA, 70.90951, 76.55764, 85.19959...
## $ pop <dbl> 76244, 78867, NA, 73182, 83861, 80991, 83751, 82683,...
## $ tour <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ import <dbl> 174.09246, 178.06349, NA, 149.52329, 191.70031, 176....
## $ export <dbl> 271.81148, 314.89205, NA, 198.89232, 306.86583, 364....
## $ iso3c <fct> AND, AND, AND, AND, AND, AND, AND, AND, AND, AND, AN...
## $ region <fct> Europe & Central Asia, Europe & Central Asia, Europe...
## $ capital <fct> Andorra la Vella, Andorra la Vella, Andorra la Vella...
## $ longitude <fct> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.52...
## $ latitude <fct> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075...
## $ income <fct> High income, High income, High income, High income, ...
## $ lending <fct> Not classified, Not classified, Not classified, Not ...
## $ lng <dbl> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.52...
## $ lat <dbl> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075...
Beyond this line, you will insert your original code, following the instructions in the assignment.
In this part, I made use of the countries data provided by the prof., in order to present the analysis from 1998 to 2018. For the data wrangling, I created a subset of the 3 variables that we had used in our previous analysis and then I converted those variables into numeric data type. As the year was already provided in the format required by us, so I didn’t have to convert the type. Once I was satisfied with the wrangling, I proceeded to the faceted plots similar to last assignment.
# Data Wrangling and Cleaning
countries2 <- select(countries, c("birth", "lifeexp", "infmort", "region", "year"))
# Need to strip out '%' from 2 of the columns
countries2$birth <- str_sub(countries2$`birth`, 1, str_length(countries2$`birth`)-1)
countries2$birth <- as.numeric(countries2$birth)
countries2$im <- str_sub(countries2$`infmort`, 1, str_length(countries2$`infmort`)-1)
countries2$im <- as.numeric(countries2$im)
countries2$life <- as.numeric(countries2$`lifeexp`)
# Convert Year to numeric year
#countries2$yr <- as.Date.character(countries2$year, "%m/%d/%Y")
countries2$region <- as.factor(countries2$region)
# Now summarize average rates by region
countries3 <- countries2 %>%
group_by(region, year) %>%
summarize(birth = mean(birth, na.rm = T),
lifeexp = mean(life, na.rm = T),
infant = mean(im, na.rm = T))
glimpse(countries3)
## Observations: 147
## Variables: 5
## Groups: region [7]
## $ region <fct> East Asia & Pacific, East Asia & Pacific, East Asia & ...
## $ year <int> 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, ...
## $ birth <dbl> 23.13742, 22.74375, 21.88290, 21.66226, 20.81032, 20.9...
## $ lifeexp <dbl> 69.53342, 69.74694, 70.08353, 70.56045, 70.87942, 71.2...
## $ infant <dbl> 28.51724, 29.31034, 29.21429, 28.39286, 31.07407, 27.3...
# countries3$yr1 <- as.numeric(format(as.Date(countries3$yr, format("%m%d%Y%")), "%Y"))
# countries3
# str(countries3)
# Birth Rate
p1 <- ggplot(countries3, aes(x = year, y = birth, color = region)) +
geom_line(size = 1.5) +
facet_grid(~region) +
theme(strip.text.x = element_text(size = 5),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position = "none")+
labs(y = "Birth Rate")
# Infant Mortality Rate
p2 <- ggplot(countries3, aes(x = year, y = infant, color = region)) +
geom_line(size = 1.5) +
facet_grid(~ region) +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.text.x = element_blank(),
legend.position = "none") +
labs(y = "Mortality Rate")
# Life Expectancy Rate
p3 <- ggplot(countries3,
aes(x = year, y = lifeexp, color = region)) +
geom_line(size = 1.5)+
facet_grid(~region) +
theme(strip.text.x = element_blank(),
legend.position = "none",
axis.text.x=element_text(angle=90,hjust=1),
axis.title.x = element_blank()) +
scale_x_continuous(limits = c(1997, 2018),
breaks = c(1998, 2003, 2008, 2013, 2018)) +
labs(y = "Life Expectancy")
# Plotting all ggplots in one graph
grid.arrange(p1, p2, p3, top = "Avg. Birth Rate, Avg. Infant Mortality Rate & Avg. Life Expectancy", padding = unit(0.2, "line"))
## Warning: Removed 14 rows containing missing values (geom_path).
## Warning: Removed 7 rows containing missing values (geom_path).
## Warning: Removed 14 rows containing missing values (geom_path).
this graph shows (i) Average Birth Rate, (ii) Average Mortality Rate and (iii) Average Life Expectancy of seven regions of the world across two decades (1998-2018).
In the graph, it’s pretty evident that the patterns have chanced drastically compared to the previous assignment.At first, in the previous assignment, we had only 6 region categories which has now increased to 7 so the patterns change to a larger extent. We see a significant difference in the birth rate for North America and Europe specifically.For example, the Africa region had an almost constant birth rate in the previous graph, but here we see a drop starting from 1998 ending at 2014. Similar patter can be seen for the Asia region as well. The infant mortality rate for Asia region in the previous assignment saw a steady decline over the period but here, we see that the decline is not much low.Additionally, for the America region, the infant mortality in the previous assignment was almost constant with a minor drop over the period but here we see a dip of almost 20 points i.e. 50% which is a very significantly huge dip. The average life expectency for africa region was constant around 60 in the previous assignment, whereas in this graph, the life expectancy is increased drastically from 1998-2015.FOr Asia, the average life expectancy varies as it was constant around 70 previously but here we see a variability from 72-80. For the Africa region, the value seems to be lower in this graph compared to the previous one.
Here, I filtered the countries dataset for the year 1998, removed NA and added them into a new data frame. Then I recoded the mobile variable of this new data frame into a new variable with smaller values reflecting the ranges of mobile consumptions and divided the records into 30 ranges. Post that, I used the colorfactor feature of leaflet to differenciate the colors on the map. Finally, I added tiles and circle mrkers feature to the filtered dataset with color as the new variable, an radius as the numeric value of that variable. I also added popup feature to reflect the mobile consumption range for each circle marked on the map. For extra credit,in the 2nd chunk of code, I am displaying the internet usage as the radius of the circles and popup as the numeric value of that variable. Similarly, for the legend, I am using the net variable as values and title “Internet Usage”
# your code goes here
historical <- countries %>%
filter(mobile != "NA") %>%
filter(year == "1998") %>%
arrange(mobile)
historical$mobile_new = round(as.numeric(historical$mobile), 0) # rounding off the decimals
historical$mobile_new <- cut_interval(historical$mobile, n = 30) # Making an interval of mobile usage
pal <- colorFactor(palette = "Set1", domain = historical$mobile_new)
map <- leaflet(historical) %>%
addTiles() %>%
addCircleMarkers(lng = ~lng, lat = ~lat, color = ~pal(mobile_new), radius = ~mobile, popup = ~mobile_new)
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
map %>%
addLegend(position = "bottomleft",
pal = pal,
values = ~mobile_new,
title = "Mobile Usage")
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
# For extra Credit, mobile usage with internet usage
historical <- countries %>%
filter(mobile != "NA") %>%
filter(year == "1998") %>%
arrange(mobile)
historical$mobile_new = round(as.numeric(historical$mobile), 0) # rounding off the decimals
historical$mobile_new <- cut_interval(historical$mobile, n = 7) # Making an interval of mobile usage
pal <- colorFactor(palette = "Set1", domain = historical$mobile_new)
map <- leaflet(historical) %>%
addTiles() %>%
addCircleMarkers(lng = ~lng, lat = ~lat, color = ~pal(mobile_new), radius = ~net, popup = ~net)
map %>%
addLegend(position = "bottomleft",
pal = pal,
values = ~net,
title = "Internet Usage")
## Warning in pal(v): Some values were outside the color scale and will be
## treated as NA
Here, I filtered the countries dataset for the year 2016 (2017 and 2018 had a lot of missing records), removed NA and added them into a new data frame and arranged the mobile variable values in ascending order. Then I recoded the mobile variable of this new data frame into a new variable with smaller values reflecting the ranges of mobile consumptions and divided the records into 10 ranges. Post that, I used the colorfactor feature of leaflet to differenciate the colors on the map. Finally, I added tiles and circle mrkers feature to the filtered dataset with color as the new variable, an radius as the numeric value of that variable. I also added popup feature to reflect the mobile consumption range for each circle marked on the map. For the legend, I used the bottom left position and color as defined earlier, values as new variable and title s Mobile Usage.
For extra credit, in the 2nd chunk of code, I am displaying the internet usage as the radius of the circles and popup as the numeric value of that variable. Similarly, for the legend, I am using the net variable as values and title “Internet Usage”. AS the circle size is too large for the increased internet usage compared to 1998, I took the squre root of the values and presented.
To see the difference between the 2 decades, let’s take an example of FInlnd country. In 1998, the mobile usage was 55/100 which rose to 133/100. The difference is clearly visible on the maps with the circle sizes different for Finland.Similarly, for Afghanistan, it’s very hard to see the circle mark as there was no mobile usage in 1998 but if we see the current data at 2016, the mobile usage is 62 which is very well refplected by the graph. So, the graphs do reflect a trend change in the mobile usage for different countries. COuntries like Syria and Sudan have seen a drop in the mobile usage, probably because of the civil issues prevailing in the region.On the other hand, countries like India, China, Sierre Leone, have seen a steep rise in the mobie consumption from almost 0 to around 85 within 20 years.
# your code goes here
latest <- countries %>%
filter(mobile != "NA") %>%
filter(year == "2016") %>%
arrange(mobile)
latest$mobile_updated = round(as.numeric(latest$mobile), 0) # rounding off the decimals
latest$mobile_updated <- cut_interval(latest$mobile, n = 10) # bining the moble usage into 7 catagories
pal <- colorFactor(palette = "Set1", domain = latest$mobile_updated)
map1 <- leaflet(latest) %>%
addTiles() %>%
addCircleMarkers(lng = ~lng, lat = ~lat, color = ~pal(mobile_updated), radius = ~sqrt(mobile), popup = ~mobile_updated)
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
map1 %>%
addLegend(position = "bottomleft",
pal = pal,
values = ~mobile_updated,
title = "Mobile Usage")
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
# For extra credit of mobile usage with internet usage
latest <- countries %>%
filter(mobile != "NA") %>%
filter(year == "2016") %>%
arrange(mobile)
latest$mobile_updated = round(as.numeric(latest$mobile), 0) # rounding off the decimals
latest$mobile_updated <- cut_interval(latest$mobile, n = 10) # bining the moble usage into 7 catagories
pal <- colorFactor(palette = "Set1", domain = latest$mobile_updated)
map1 <- leaflet(latest) %>%
addTiles() %>%
addCircleMarkers(lng = ~lng, lat = ~lat, color = ~pal(mobile_updated), radius = ~sqrt(net), popup = ~net)
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
map1 %>%
addLegend(position = "bottomleft",
pal = pal,
values = ~net,
title = "Internet Usage")
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning in pal(v): Some values were outside the color scale and will be
## treated as NA
```