We will be using these packages :
# Importing and wrangling data
library(readr)
library(tidyr)
library(dplyr)
# For plotting
library(ggplot2)
# For mapping
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(mapview)And we will be using my usual custom ggplot theme :
So this week’s data is about historical phone usage.
Data from https://ourworldindata.org.
2 datasets are available :
Let’s have a first glimpse at these datasets.
## Rows: 6,277
## Columns: 7
## $ entity <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ code <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ year <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 199…
## $ total_pop <dbl> 13032161, 14069854, 15472076, 17053213, 18553819, 1978988…
## $ gdp_per_cap <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1063.636,…
## $ mobile_subs <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.…
## $ continent <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "…
## Rows: 6,974
## Columns: 7
## $ entity <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist…
## $ code <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",…
## $ year <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1…
## $ total_pop <dbl> 12412000, 13299000, 14486000, 15817000, 17076000, 18111…
## $ gdp_per_cap <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1063.63…
## $ landline_subs <dbl> 0.29553158, 0.28475432, 0.20742093, 0.19211533, 0.17931…
## $ continent <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia",…
It looks like these 2 datasets have a lot in common. In fact, they only differ by 1 column : mobile_subs for one and landline_subs for the other. Consequently, we could maybe try to join them and create a single column subs, associated with a type column containing either mobile or landline.
Note : The subs variables are expressed as the number of subscriptions for 100 people. Because a same person can have several subscriptions (private, work, etc.) this number can exceed 100. However, for convenience, I will often use the “%” terminology in this document.
However, before joining, we need to find out on which keys we should join.
mobile %>% filter(year == 1990 & code == "AFG") %>% pull(total_pop) ; landline %>% filter(year == 1990 & code == "AFG") %>% pull(total_pop)## [1] 13032161
## [1] 12412000
For the same country, in the same year, the total_pop are not the same in the two datasets… In this case, we do not know which datasets contains the “truth”, so we have two options here :
Considering that one datasets contains the “truth”, and only joining the phone data from the other dataset
Keeping the two datasets apart
I opted for the first option, and joined the phone data from “landline” to “mobile” :
We will first observe the time series of both landline and mobile subscriptions :
subs_data %>%
mutate(type = recode(type,"landline_subs"="Landline","mobile_subs"="Mobile")) %>%
group_by(year,type) %>%
summarise(mean_subs = mean(subs,na.rm=TRUE)) %>%
ggplot(aes(x = year, y = mean_subs, color = type)) +
geom_smooth() +
geom_point() +
scale_x_continuous(breaks = seq(1990,2018,2)) +
scale_y_continuous(breaks = seq(0,100,10), limits = c(0,100)) +
labs(color = "Type", x = "Years", y = "Mean # of subscriptions for 100 people") +
geom_curve(aes(x = 1998, y = 30, xend = 2001, yend = 22), curvature = 0.13, arrow = arrow(length = unit(2, "mm")), color = "black") +
annotate("text", x = 1998, y = 33, label = "Inversion point in 2001", size = 5)As you can see, landline subscriptions were dominant in the 90’s with an worldwide average between 15 and 20 subscriptions for 100 people. However, from 2001, mobile subscriptions have greatly increased until reaching an average of 95% in 2017, whereas landline subscriptions have stagnated around 20%.
After visualising woldwide time series, we can now refine by continent :
subs_data %>%
mutate(type = recode(type,"landline_subs"="Landline","mobile_subs"="Mobile")) %>%
drop_na(continent) %>%
group_by(continent,year,type) %>%
summarise(mean_subs = mean(subs,na.rm=TRUE)) %>%
ggplot(aes(x = year, y = mean_subs, color = continent)) +
geom_smooth() +
facet_wrap(~type) +
geom_point() +
scale_x_continuous(breaks = seq(1990,2018,2)) +
scale_y_continuous(breaks = seq(0,100,10), limits = c(0,100)) +
labs(color = "Type", x = "Years", y = "Mean # of subscriptions for 100 people") +
theme(axis.text.x = element_text(angle = 90))As we can see, not all continents show the same overall number of subscriptions. It seems that European countries always had more landline (~ 40%) but also more mobile subscriptions than other continents.
Now last time series graph for my own country :
subs_data %>%
mutate(type = recode(type,"landline_subs"="Landline","mobile_subs"="Mobile")) %>%
filter(code == "FRA") %>%
ggplot(aes(x = year, y = subs, color = type)) +
geom_smooth() +
geom_point() +
scale_x_continuous(breaks = seq(1990,2018,2)) +
scale_y_continuous(breaks = seq(0,100,10)) +
labs(color = "Type", x = "Years", y = "Mean # of subscriptions for 100 people", subtitle = "France") As we could expect, France subscription data is quite similar to the average European one, with an inversion point in 2001 and nearly 100% mobile subscription in 2017. Landline subscriptions also show very high numbers here with about 60% subscription.
We will now produce simple maps based only on the mobile dataset.
# World map
world <- ne_countries(scale = "medium", returnclass = "sf") %>%
full_join(mobile, by = c("brk_a3"="code")) %>%
filter(year == 2017 | is.na(year)) %>%
filter(mobile_subs < 200 | is.na(mobile_subs))
na_world <- world %>% filter(is.na(mobile_subs))
ggplot() +
geom_sf(data = world, aes(fill = mobile_subs)) +
scale_fill_viridis_c() +
theme(legend.text = element_text(size = 13),legend.key.width = unit(2.5, "cm"), plot.title = element_text(hjust = 0.5)) +
labs(title = "# of mobile subscriptions for 100 people",fill = "")We can even try this with a very easy interactive version, using {mapview}, a simplified version of {leaflet} :
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.1 LTS
##
## Locale:
## LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
## LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
## LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
## LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## LC_ADDRESS=C LC_TELEPHONE=C
## LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## Package version:
## assertthat_0.2.1 backports_1.2.0 base64enc_0.1-3
## BH_1.72.0.3 brew_1.0-6 brio_1.1.0
## callr_3.5.1 class_7.3-17 classInt_0.4-3
## cli_2.1.0 clipr_0.7.1 codetools_0.2-16
## colorspace_1.4-1 compiler_4.0.2 cpp11_0.2.4
## crayon_1.3.4 crosstalk_1.1.0.1 curl_4.3
## DBI_1.1.0 desc_1.2.0 diffobj_0.3.2
## digest_0.6.27 dplyr_1.0.2 e1071_1.7-4
## ellipsis_0.3.1 evaluate_0.14 fansi_0.4.1
## farver_2.0.3 gdtools_0.2.2 generics_0.0.2
## ggplot2_3.3.2 glue_1.4.2 graphics_4.0.2
## grDevices_4.0.2 grid_4.0.2 gridExtra_2.3
## gtable_0.3.0 highr_0.8 hms_0.5.3
## htmltools_0.5.0 htmlwidgets_1.5.2 httpuv_1.5.4
## isoband_0.2.2 jsonlite_1.7.1 KernSmooth_2.23-17
## knitr_1.30 labeling_0.4.2 later_1.1.0.1
## lattice_0.20-41 lazyeval_0.2.2 leafem_0.1.3
## leaflet_2.0.3 leaflet.providers_1.9.0 leafpop_0.0.6
## lifecycle_0.2.0 magrittr_1.5 mapview_2.9.4
## markdown_1.1 MASS_7.3.53 Matrix_1.2-18
## methods_4.0.2 mgcv_1.8-33 mime_0.9
## munsell_0.5.0 nlme_3.1-149 pillar_1.4.6
## pkgbuild_1.1.0 pkgconfig_2.0.3 pkgload_1.1.0
## plyr_1.8.6 png_0.1-7 praise_1.0.0
## prettyunits_1.1.1 processx_3.4.4 promises_1.1.1
## ps_1.4.0 purrr_0.3.4 R6_2.5.0
## raster_3.3-13 RColorBrewer_1.1.2 Rcpp_1.0.5
## readr_1.4.0 rematch2_2.1.2 rgeos_0.5-5
## rlang_0.4.8 rmarkdown_2.5 rnaturalearth_0.1.0
## rnaturalearthdata_0.1.0 rprojroot_1.3.2 rstudioapi_0.11
## satellite_1.0.2 scales_1.1.1 servr_0.20
## sf_0.9-6 sp_1.4-4 splines_4.0.2
## stats_4.0.2 stats4_4.0.2 stringi_1.5.3
## stringr_1.4.0 svglite_1.2.3.2 systemfonts_0.3.2
## testthat_3.0.0 tibble_3.0.4 tidyr_1.1.2
## tidyselect_1.1.0 tinytex_0.26 tools_4.0.2
## units_0.6-7 utf8_1.1.4 utils_4.0.2
## uuid_0.1-4 vctrs_0.3.4 viridis_0.5.1
## viridisLite_0.3.0 waldo_0.2.3 webshot_0.5.2
## withr_2.3.0 xfun_0.19 yaml_2.2.1