Compiled on 2021-05-30 by Maroun Bou Sleiman. Contact me at maroun@openmaplebanon.org

1 Introduction

Under the watch of the “ruling elite”, Lebanon has been reduced from the Switzerland of the East, to a sort of refugee camp. One of the contributing factors is the huge lack of transparency at every level. The truth, or multiple truths, are served - after being conveniently manipulated - through media outlets representing different factions of the “ruling elite”. It’s therefore primordial that the Lebanese society, especially the corrupt government, is in dire need of a paradigm shift in the way it perceives, values, and uses data. In addition to data being scarce, fragmented, and not well managed, analyses based on data are rarely transparent, and sources rarely cited.

I am now a member of an excellent group of volunteers called Open Map Lebanon, or OML, which was formed right after the Beirut blast of August 4 2020. Members of the group are heavily involved in mapping the damage, coordinating and charting the humanitarian needs on the ground, and assessing the blast’s damage and reprecussions. The group’s interests extend beyond the explosion and into the structural issues that led to the explosion on one side, and to an inadequate crisis response on the other side. For example, the operational zones for humanitarian aid were not defined by the government, but by the OCHA. Damage assessment was hampered by lack of very basic data: how many buildings are actually there, how many stories is each building, and how many people live there. This is not surprising in a country where the government does not know exactly how many employees it has.

One important type of data is Open Data, that is, data that is made publically available without restrictions. There should be more than that in every country. In Lebanon, there is no central repository serving official and non-official open data, but there are some fragmented sources. One valiant attempt to at least catalogue available data is Open Data Lebanon, an unofficial citizen initiative. Open Data Lebanon does not serve any data, but points to relevant data sources, and is therefore not strictly a repository. Another important source of open data is the Humanitarian Data Exchange, or HDX, which I will be heavily using here.

This R Markdown notebook demonstrates how to download and explore data from the Humanitarian Data Exchange repository, or HDX, using the rhdx package. This demonstration will also attempt to integrate data from different sources using code. I’ll be looking into the Facebook Movement Range Data (as available from HDX), which I will overlay on the Lebanese map. I will focus a bit on Beirut before and after the explosion. And finally, I will overlay the datasets together. I will also have a look and compare with Google Mobility Data (available directly from google).

In this project, I want to:

  • show that there is some underused open data out there that may have potential in decision-making. That is the case of the Facebook and Google mobility data that are relevant in the Covid-19 outbreak and possibly in the Beirut Explosion damage and recovery assessment.
  • show one way of how to transparently analyze data. For example, this notebook contains all the code needed to reproduce the analyses. Click on the Code button on the right to see the code that generates the following output.
  • stress that if one wants to faithfully report any finding, the chain should not be broken between the raw, or input data, and the final output. We almost never see this in Lebanon.
  • stress that different datasets should be integrated (again transparently) to reach better conclusions. I will also show how some of the shortcommings of the Lebanese government affect how easily we can integrate data. The clear example is the lack of standardisation of governorate, caza, municipality, and other place names. For example, Beirut is also Beyrouth, Jubail is Jbail, Jbeil, or Byblos! This makes data science a nightmare when one wants to integrate two datasets. This is a real problem, not a detail.
  • call out to all governmental or non-governmental agencies to make the data (and metadata) available in machine-readable format, not just in static PDFs, pictures, or dashboards. The undelying data should be made available in formats that are readily useable by data scientists.
  • inspire others to do investigations in similar ways. This doesn’t have to be the same programming language or platform, but at least the general transparency involved.

1.1 The data

The specific HDX datasets we will be looking into are:

  • Facebook movement range data: link
  • The map of lebanon by OCHA: link
  • Johns Hopkins Covid-19 data: link
  • OXFORD COVID-19 Government Response Stringency index link

The non-HDX datasets are:

  • Google Mobility Data: link
  • Lebanese Government (joint effort) data portal for municipal-level covid risk assessment link
  • Ahmad Barclay’s dashboard of quarantined municipalities: link

1.2 Setup

This notebook was run on Renku. Renku is a cloud-based platform that is developed by the Swiss Data Science Center, or SDSC. Renkulab is a free public instance of Renku, which I encourage everyone to try.

Why Renku? Well Renku allows you to:

  • keep track of the versions of code AND data and how they changed with time (using Gitlab)
  • keep track and define the environment you are using, in other words, the operating system, the packages, the settings (using Docker)
  • launch a defined interactive environment to actually do the analyses and develop (using Kubernetes). Examples of environments are Jupyter Lab and RStudio, but one can define others.
  • keep track of analyses and pipelines using the knowledge graph. This is a renku-specific feature, where you can keep track of what version of code, what specific run of that code, produced what version of the results.
  • collaborate on the same projects (basically thanks to Gitlab).
  • other cool stuff…

Here is a link to the project. This specific project is using the “R basic template 4.0” with these differences. Extra debian packages installed (libgdal-dev, locales, libudunits2-dev…), extra R packages(rhdx, plotly, cowplot) Check the Dockerfile and install.R for details. What this means is that anyone can see the code of my analysis, reproduce my exact same results, and be able to use it as a starting point for his/her own analysis with a few clicks. There are really no more excuses for not sharing the data, the method, and the results in such a transparent way.

2 The Analyses

2.1 Facebook Movement Data

First load the rhdx R package. This package is not strictly required to obtain data from HDX. HDX is built on top of CKAN, an open source data portal platform with a built-in API. The rhdx package rather simplifies the process of searching and obtaining data. Let’s perform a search for Movement Range Maps Facebook, which will yield the following datasets:

suppressPackageStartupMessages(library(rhdx))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(cowplot)) # for arranging and combining plots
suppressPackageStartupMessages(library(sf))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(ggrepel))
queryDatasets <- rhdx::search_datasets(query = "Movement Range Maps Facebook", rows = 1000, configuration = NULL)
queryDatasets
## [[1]]
## <HDX Dataset> c3429f0e-651b-4788-bb2f-4adbf222c90e 
##   Title: Movement Range Maps
##   Name: movement-range-maps
##   Date range: 2020-03-01
##   Tags (up to 5): 
##   Locations (up to 5): alb, dza, asm, ago, arg
##   Resources (up to 5): How To Understand This Data.txt, movement-range-data-2021-05-28.zip
## 
## attr(,"class")
## [1] "hdx_datasets_list"

You can see above that only one (the first) dataset matches the data. Note that the dataset is not downloaded yet. Each dataset can comprise different resources, which are the individual files. Here are the available resources in the first dataset matching our query:

queryDatasets %>% nth(1) %>% get_resources() 
## [[1]]
## <HDX Resource> 435ed157-6f7a-4e8f-a63a-2aa177b9bd05 
##   Name: How To Understand This Data.txt
##   Description: 
##   Size: 961
##   Format: TXT
## 
## [[2]]
## <HDX Resource> 55a51014-0d27-49ae-bf92-c82a570c2c6c 
##   Name: movement-range-data-2021-05-28.zip
##   Description: 
##   Size: 79455036
##   Format: zip
## 
## attr(,"class")
## [1] "hdx_resources_list"

Let’s get the data now and unzip.

fbfile <- list.files(path = "../data/", pattern = "movement-range-", full.names = T) # delete any existing files first
file.remove(fbfile)
## logical(0)
file.remove("../data/facebook_movement.zip")
## [1] FALSE
queryDatasets %>% nth(1) %>% get_resource(2) %>% rhdx::download_resource(folder = "../data/", filename = "facebook_movement.zip")
unzip("../data/facebook_movement.zip", exdir = "../data/")

Now read the data (using fread, faster than the base read.table)

fbfile <- list.files(path = "../data/", pattern = "movement-range-", full.names = T)
hdr <- read.table(fbfile, nrow = 1)[1,] %>% as.character() # header
system(paste0("awk '($2 == \"LBN\")' ", fbfile, " > ", fbfile,".lbn"))
fb <- data.table::fread(file = paste0(fbfile,".lbn"))
colnames(fb) <- hdr
head(fb)

There are 11630 rows and 9 columns. Now let’s do some formatting for the dates. It is important that the dates are understood as date objects and not simple text. Also add a color column to differentiate Beirut from the rest.

fb$ds <- lubridate::as_date(fb$ds)
fb$color <- ifelse(fb$polygon_name == "Beirut", "black", "lightgrey")

The first available date is 2020-03-01, and the last date is 2021-05-28. Now let’s plot some data for “LBN” - the red line is the day of the Beirut port explosion (August 4 2020). The data is available at the caza level for (n=26).

Beirut is in black other regions are in light grey.

a <- fb %>% filter(country == "LBN") %>% ggplot(aes(x = ds, y = all_day_bing_tiles_visited_relative_change, group = polygon_name)) + 
  geom_line(aes(color = color)) +
  scale_color_identity() +
  ylab("Positive or negative change\nin movement relative to baseline") +
  geom_vline(xintercept = lubridate::as_date(c("2020-08-04")), color = "red") +
  theme_light()+
  xlab("Time")

b <- fb %>% filter(country == "LBN") %>% ggplot(aes(x = ds, y = all_day_ratio_single_tile_users, group = polygon_name)) + 
  geom_line(aes(color = color)) +
  scale_color_identity() +
  ylab("Positive proportion of users staying put\nwithin a single location") +
  geom_vline(xintercept = lubridate::as_date(c("2020-08-04")), color = "red") +
  theme_light() +
  xlab("Time")
plot_grid(a,b, ncol = 1)

As you can see from the two plots, when one parameter increases, the other decreases. We can see reduced activity during the month of march and a gradual return to normal activity afterwards. July looks normal.

Same as before but focus on the explosion

a <- fb %>% filter(country == "LBN", ds > lubridate::as_date("2020-07-25"), ds < lubridate::as_date("2020-08-10")) %>% 
  ggplot(aes(x = ds, y = all_day_bing_tiles_visited_relative_change, group = polygon_name)) + 
  geom_line(aes(color = color)) +
  scale_color_identity() +
  ylab("Positive or negative change\nin movement relative to baseline") +
  geom_vline(xintercept = lubridate::as_date(c("2020-08-04")), color = "red") +
  theme_light()

b <- fb %>% filter(country == "LBN", ds > lubridate::as_date("2020-07-25"), ds < lubridate::as_date("2020-08-10")) %>% 
  ggplot(aes(x = ds, y = all_day_ratio_single_tile_users, group = polygon_name)) + 
    geom_line(aes(color = color)) +
  scale_color_identity() +
  ylab("Positive proportion of users staying put\nwithin a single location") +
  geom_vline(xintercept = lubridate::as_date(c("2020-08-04")), color = "red") +
  theme_light()
plot_grid(a,b, ncol = 1)

Notice how the two measures show the same thing using different approaches:

  • one looks at the relative change in movement versus a certain baseline. This decreases with increased confinement.
  • the other looks at the proportion of people staying with a single location. This increases with increased confinment.

Let’s rank the different regions based on these metrics. I’m doing the following here (and there may be better ways). For each day, rank the regions, and then take the average of the rank throughout.

ranks <- fb %>% filter(country == "LBN") %>% group_by(ds) %>% 
  mutate(all_day_ratio_single_tile_users_rank = rank(all_day_ratio_single_tile_users),
         all_day_bing_tiles_visited_relative_change_rank = rank(all_day_bing_tiles_visited_relative_change)) %>% 
  group_by(polygon_name) %>% 
  summarize(all_day_ratio_single_tile_users_meanrank = mean(all_day_ratio_single_tile_users_rank),
            all_day_ratio_single_tile_users_sdrank = sd(all_day_ratio_single_tile_users_rank),
         all_day_bing_tiles_visited_relative_change_meanrank = mean(all_day_bing_tiles_visited_relative_change_rank),
          all_day_bing_tiles_visited_relative_change_sdrank = sd(all_day_bing_tiles_visited_relative_change_rank))


a <- ranks %>% ggplot(aes(x = reorder(polygon_name,all_day_bing_tiles_visited_relative_change_meanrank) , y = all_day_bing_tiles_visited_relative_change_meanrank)) + 
  geom_col() +
  geom_errorbar(aes(ymin = all_day_bing_tiles_visited_relative_change_meanrank-all_day_bing_tiles_visited_relative_change_sdrank,
                    ymax = all_day_bing_tiles_visited_relative_change_meanrank+all_day_bing_tiles_visited_relative_change_sdrank), width = 0.5) +
  coord_flip() +
  ylab("Average ranking of\nmovement change\nrelative to baseline") +
  xlab("Caza") +
  theme_light()

b <- ranks %>% ggplot(aes(x = reorder(polygon_name,-all_day_ratio_single_tile_users_meanrank) , y = all_day_ratio_single_tile_users_meanrank)) + 
  geom_col() +
  geom_errorbar(aes(ymin = all_day_ratio_single_tile_users_meanrank-all_day_ratio_single_tile_users_sdrank,
                    ymax = all_day_ratio_single_tile_users_meanrank+all_day_ratio_single_tile_users_sdrank), width = 0.5) +
  coord_flip() +
  ylab("Average ranking of\nproportion of\nusers staying put") +
  xlab("Caza") +
  theme_light()
plot_grid(a,b, ncol = 2)

We can see that there are likely significant differences between the different cazas. I did not perform a formal statistical test here because I am uncertain of the best approach at the moment. There are differences between the two metrics that maybe can be attributed to the level of density/urbanization of each caza: people in beirut would cover less ground when moving within Beirut than someone in West Bekaa.

2.2 Overlay with Lebanon map

How does this look on the Lebanese map? Need to get the Lebanon map first from HDX. Again, no government agency has released a map of Lebanon in a useable format, to my knowledge. So again I will resort to HDX by using this query string “Lebanon - Subnational Administrative Boundaries”.

lebanonMap <- rhdx::search_datasets(query = "Lebanon - Subnational Administrative Boundaries", rows = 1000, configuration = NULL)
lebanonMap %>% nth(1) %>% get_resources()
## [[1]]
## <HDX Resource> f69edee9-c8b8-4394-8b9a-02e2dd3e006f 
##   Name: LBN_AdminBoundaries_TabularData.xlsx
##   Description: Lebanon Administrative Boundaries (levels 0-3) tabular data
##   Size: 233689
##   Format: XLSX
## 
## [[2]]
## <HDX Resource> e0829b20-ab6d-4af2-923a-8680afd20533 
##   Name: lbn_adm_cdr_20200810.zip
##   Description: Lebanon administrative level 0-3 boundary shapefiles
##   Size: 7090097
##   Format: SHP
## 
## [[3]]
## <HDX Resource> 272e009d-8ab9-4014-a873-1f2192d49850 
##   Name: lbn_beirut_adm4_mapaction_pcoded.zip
##   Description: Lebanon administrative level 4 ('Neighborhood') boundaries for Beirut, P-coded to conform to administrative levels 0-3 in this dataset. SEE CAVEATS.
##   Size: 51871
##   Format: SHP
## 
## attr(,"class")
## [1] "hdx_resources_list"

We’ll need the Lebanon administrative level 0-3 boundary shapefiles (lbn_adm_cdr_20200810.zip). Specifically, admin 2 level which corresponds to Cazas. I also have to solve the issue of differing district names (different latin transliterations of the original Arabic name). For me, this due to the short-sightedness and lack of standardization by the Lebanese government. Here I do that using a pseudo-manual mapping of values using the plyr::mapvalues function. See the Google-Facebook integration section below for another automated manner.

lebanonMap %>% nth(1) %>% get_resource(2) %>% download_resource(folder = "../data/")
unzip("../data/lbn_adm_cdr_20200810.zip", exdir = "../data/lbn_adm_cdr_20200810")
shp <- sf::read_sf("../data/lbn_adm_cdr_20200810/lbn_admbnda_adm2_cdr_20200810.shp")
district_centroids <-  st_coordinates(st_centroid(shp))
shp <- cbind(shp, district_centroids)

## some modifications so that the two lists match - quick and dirty
shp$polygon_name <- gsub("El ", "", shp$admin2Name)

shp$polygon_name <- plyr::mapvalues(shp$polygon_name, from = c("Minieh-Dennie","Baalbek", "Jbeil", "Kesrwane","Meten", "Zahle", "Rachaya", "Nabatieh", "Bent Jbeil"), to = c("Minieh-Danieh", "Baalbeck", "Jubail", "Kasrouane", "El Metn", "Zahleh", "Rachiaya", "Nabatiyeh", "Bint Jbayl"))

general_map <- ggplot(shp) + geom_sf(alpha = 0.5, fill = viridis::inferno(5)[1]) + geom_label_repel(aes(x = X, y = Y, label = polygon_name), 
                                                                                                    alpha = 1, size = 3, 
                                                                                                    fill = viridis::inferno(5)[5], 
                                                                                                    color = viridis::inferno(5)[3],
                                                                                                    segment.alpha = 0) + theme_void() + xlab("") + ylab("")

leb_fb <- fb %>% filter(country == "LBN") 

# helper function to get data for one date

get_by_date <- function(date) {
  subs <- leb_fb %>% filter(ds == date)
  
  out <- cbind(shp, subs[match(shp$polygon_name, subs$polygon_name),])
  return(out)
}

rn <- range(leb_fb$all_day_ratio_single_tile_users)

p1 <- ggplot(get_by_date("2020-03-05")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("March 5 2020") + xlab("") + ylab("") +
  theme(legend.position = "bottom")
p2 <- ggplot(get_by_date("2020-03-15")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("March 15 2020") + xlab("") + ylab("")+
  theme(legend.position = "bottom")
p3 <- ggplot(get_by_date("2020-04-01")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("April 1 2020") + xlab("") + ylab("")+
  theme(legend.position = "bottom")
p4 <- ggplot(get_by_date("2020-08-01")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("August 1 2020") + xlab("") + ylab("") +
  theme(legend.position = "bottom")

p5 <- ggplot(get_by_date("2020-09-28")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("September 28 2020") + xlab("") + ylab("") +
  theme(legend.position = "bottom")

p6 <- ggplot(get_by_date("2020-10-05")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("October 5 2020") + xlab("") + ylab("") +
  theme(legend.position = "bottom")
p6 <- ggplot(get_by_date("2020-11-16")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("November 16 2020") + xlab("") + ylab("") +
  theme(legend.position = "bottom")

rmlgnd <- theme(legend.position = "none")
plot_grid(plot_grid(general_map,get_legend(p1),ncol=1,rel_heights = c(10,1)),plot_grid(p1+rmlgnd,p2+rmlgnd,p3+rmlgnd,p4+rmlgnd, p5+rmlgnd,p6+rmlgnd, nrow = 3), ncol = 2, rel_widths = c(1,1))

So what do we learn? On March 5th, the “ratio of people staying put” is low throughout, which indicates that people are moving around. By the 15th of March, which is the first official general lockdown date, it seems that Beirut, Metn, and Kasrouane were the first to start quarantine. By April 1st, we can see a more generalized restriction of movement. By the first of August, almost no one cares anymore… If we look at September 28th (this is the last monday before the selective lockdowns) versus October 5th (the first monday after the selective lockdown), we can see very little differences. Very concerning.

2.3 Focus on the Beirut Port Explosion

Explosion occured 4th of August. Note that color code was adapted to the period following August 3rd (day before explosion)

# make a new range based on the limits of the data
rn_rescale <- leb_fb %>% filter((ds > "2020-08-03")) %>% pull(all_day_ratio_single_tile_users) %>% range()

before <- ggplot(get_by_date("2020-08-03")) + geom_sf(aes(fill = all_day_ratio_single_tile_users)) + 
  # geom_text(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn_rescale, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  coord_sf(xlim = c(35.4,35.8), ylim = c(33.6, 34), expand = FALSE) +
  ggtitle("August 3 2020", subtitle = "Day before")  + xlab("") + ylab("") +
  theme(legend.position = "bottom")

dday <- ggplot(get_by_date("2020-08-04")) + geom_sf(aes(fill = all_day_ratio_single_tile_users)) + 
  # geom_text(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn_rescale, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  coord_sf(xlim = c(35.4,35.8), ylim = c(33.6, 34), expand = FALSE) +
  ggtitle("August 4 2020", subtitle = "Explosion day")  + xlab("") + ylab("") +
  theme(legend.position = "bottom")
dafter <- ggplot(get_by_date("2020-08-05")) + geom_sf(aes(fill = all_day_ratio_single_tile_users)) + 
  # geom_text(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn_rescale, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  coord_sf(xlim = c(35.4,35.8), ylim = c(33.6, 34), expand = FALSE) +
  ggtitle("August 5 2020", subtitle = "1 Day after") + xlab("") + ylab("") +
  theme(legend.position = "bottom")
wafter <- ggplot(get_by_date("2020-08-11")) + geom_sf(aes(fill = all_day_ratio_single_tile_users)) + 
  # geom_text(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn_rescale, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  coord_sf(xlim = c(35.4,35.8), ylim = c(33.6, 34), expand = FALSE) +
  ggtitle("August 11 2020", subtitle = "1 Week after") + xlab("") + ylab("") +
  theme(legend.position = "bottom")
mafter <- ggplot(get_by_date("2020-09-04")) + geom_sf(aes(fill = all_day_ratio_single_tile_users)) + 
  # geom_text(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn_rescale, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  coord_sf(xlim = c(35.4,35.8), ylim = c(33.6, 34), expand = FALSE) +
  ggtitle("September 04 2020", subtitle = "1 Month after") + xlab("") + ylab("") +
  theme(legend.position = "bottom")
lt <- max(leb_fb$ds)
ltm <- lubridate::month(lt, label=T)
ltd <- lubridate::day(lt)
lty <- lubridate::year(lt)
lastTimepoint <- ggplot(get_by_date(lt)) + geom_sf(aes(fill = all_day_ratio_single_tile_users)) + 
  # geom_text(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn_rescale, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  coord_sf(xlim = c(35.4,35.8), ylim = c(33.6, 34), expand = FALSE) +
  ggtitle(paste0(ltm, " ", ltd, " ", lty), subtitle = "Last available timepoint")  + xlab("") + ylab("") +
  theme(legend.position = "bottom")
lgn <- get_legend(before)
rmthm <- theme(legend.position = "none")
pg <- plot_grid(before + rmthm, dday + rmthm, dafter + rmthm, mafter + rmthm, wafter + rmthm, lastTimepoint + rmthm, ncol = 3)
pbl <- plot_grid(pg, lgn, nrow = 2, rel_heights = c(10,2))
pbl

Not clear if we can say something from the data, we may need finer-grain data to get more insights.

2.4 Integrating with the Covid data

Covid data is downloaded from HDX. The original source is Johns Hopkins. Again I need to stress: there is no governmental source that makes this available or directly useable (as a csv file for example). The ministry of health has a dashboard but they, as well as other government agencies, have an allergy against the download the data button.

queryDatasets_covid <- rhdx::search_datasets(query = "Novel Coronavirus (COVID-19) Cases Data", rows = 1000, configuration = NULL)
resources_covid <- queryDatasets_covid %>% nth(1) %>% get_resources()

# select the interesting ones
toget <- which(sapply(resources_covid, function(x){
  grepl("(narrow)",x$as_list()$description)
}))

covid_files <- sapply(resources_covid[toget],download_resource, folder = "../data/")
covid_files <- gsub("(.)+\\/data/", "../data/", covid_files)
leb_covid <- lapply(covid_files, function(x){
  casetype <- gsub("(.)+covid19_", "", gsub("_global(.)+", "", x))
  out <- read.csv(x) %>% filter(Country.Region == "Lebanon") %>% mutate(casetype = casetype)
  out
})

leb_covid <- do.call(rbind, leb_covid)
leb_covid$Date <- lubridate::ymd(leb_covid$Date)
leb_covid$Value <- as.numeric(leb_covid$Value)
g <- ggplot(leb_covid, aes(x = Date, y = Value)) +
  geom_col(fill = "red", alpha = 0.5) +
  facet_wrap(~casetype, scales = "free_y", ncol = 1) +
  theme_cowplot() +
  ylab("Number (cumulative)") +
  theme(legend.position = "bottom")
g

Make a non-cumulative version

leb_covid <- leb_covid %>% group_by(casetype) %>% mutate(daily = order_by(desc(Date), -diff(Value))) %>% ungroup()
g <- ggplot(leb_covid) +
  geom_col( aes(x = Date, y = daily), position = "identity", fill = "red", alpha = 0.5) +
  facet_wrap(~casetype, scales = "free_y", ncol = 1) +
  theme_cowplot() +
  ylab("Number (daily)") +
  theme(legend.position = "bottom") +
  labs(fill = "")
g

2.5 Facebook movement data overlaid with cases

How do the case data relate to Facebook movement data?

# just to add a secondary axis, calculate a transformation ratio
pltlist <- lapply(unique(leb_covid$casetype), function(ct){
  leb_covid_ct <-leb_covid %>% filter(casetype == ct) 
  g <- ggplot(leb_covid_ct) +
  geom_col( aes(x = Date, y = daily), position = "identity", fill = "red", alpha = 0.5) +
  facet_wrap(~casetype, scales = "free_y", ncol = 1) +
  theme_cowplot() +
  ylab("Number (daily)") +
  theme(legend.position = "bottom") +
  labs(fill = "")
transform_ratio <- max(leb_covid_ct$daily, na.rm = T)/max(fb %>% filter(country == "LBN") %>%  pull(all_day_ratio_single_tile_users))

g <- g + geom_line(data = fb %>% filter(country == "LBN"), aes(x = ds, y = all_day_ratio_single_tile_users * transform_ratio, group = polygon_name, color = color), alpha = 0.7) +
  scale_color_identity() +
  scale_y_continuous(
    "Number of cases", 
    sec.axis = sec_axis(~ . / transform_ratio, name = "Facebook proportion\nof people staying put")
  )
g +
  geom_vline(xintercept = lubridate::ymd("2020-08-04"), color = "red")
})

plot_grid(plotlist = pltlist, ncol = 1)

2.6 Integration with lockdown measures and the Government Stringency Index

How does the oberved movement and disease prevalence data relate to lockdown measures? Unsurprisingly, there is no dataset that I can find, will have to build that (with code). Here’s an overlay of everything, including the Beirut explosion as a verticle red line. The dashed vertical lines show the moment the selective lockdowns came into force at the village/town level (Oct 4, 12, 19, 2020).

I will also download the government stringency index prepared by The Oxford COVID-19 Government Response Tracker, or OxCGRT through HDX. I have put it in green but note that the actual unit of the index is not on the axes.

# %>% nth(1) %>% get_resource(2) %>% download_resource(folder = "../data/")
rhdx::search_datasets(query = "OXFORD COVID-19 Government Response Stringency index") %>% nth(1) %>% get_resource(1) %>% download_resource(folder = "../data/")
stringency <- read.csv("../data/data.csv", comment.char = "#")
stringency <- stringency %>% filter(CountryCode == "LBN")
stringency$Date <- lubridate::ymd(stringency$Date)
#declare a simple data frame
event_df <- data.frame(event_id = numeric(), event_type = character(), start = Date(), end = Date(), caza = NULL)

#define a useful function to add events
addEvent <- function(event_types = character(), starts = Date(), ends = Date(), caza = NULL){
  lasteventid <- ifelse(nrow(event_df) == 0,0,max(max(event_df$event_id)))
  if(is.null(caza)){
    caza <- fb %>% filter(country == "LBN") %>% pull(polygon_name) %>% unique()
  }
  for(c in caza){
      out <- data.frame(event_id = (lasteventid+1):(lasteventid + length(event_types)),
                    event_type = event_types,
                    start = lubridate::ymd(starts),
                    end = lubridate::ymd(ends),
                    caza = c)
        event_df <<- rbind(event_df, out)
  }


}

# airport closure
addEvent(event_types = "Travel Closure", start = "2020-03-18", end = "2020-07-01")

# lockdowns
addEvent(event_types = "Lockdown", start = "2020-03-15", end = "2020-05-18")

# lockdowns
addEvent(event_types = "Lockdown", start = "2020-07-27", end = "2020-08-10")

# parial lockdown
addEvent(event_types = "Partial Lockdown", start = "2020-10-04", end = "2020-10-11")
# parial lockdown
addEvent(event_types = "Partial Lockdown", start = "2020-10-12", end = "2020-10-18")
# parial lockdown
addEvent(event_types = "Partial Lockdown", start = "2020-10-19", end = "2020-10-25")

# full lockdown of November 14 - November 30
addEvent(event_types = "Lockdown", start = "2020-11-14", end = "2020-11-30")

# full lockdown of January 7 - ???
addEvent(event_types = "Lockdown", start = "2021-01-07", end = "2021-01-13")

# full lockdown of January 7 - 13
addEvent(event_types = "Lockdown", start = "2021-01-07", end = "2021-01-13")

# state of emergency of January 14 - February 28
addEvent(event_types = "State of Emergency", start = "2021-01-14", end = "2021-02-28")

# State of Emergency - easing  of March 21 - ??
addEvent(event_types = "State of Emergency - easing 1", start = "2021-03-01", end = "2021-03-21")
# State of Emergency - easing  of March 22 - ??
addEvent(event_types = "State of Emergency - easing 2", start = "2021-03-22", end = "2021-04-14")

# add a status column to the fbdata
leb_fb$status <- "None"

for(i in 1:nrow(event_df)){
  curevent <- event_df[i,]
  wc <- which( (leb_fb$ds <= curevent$end) & (leb_fb$ds >= curevent$start) & (leb_fb$polygon_name == curevent$caza))
  for(j in wc){
    if(leb_fb$status[j] == "None"){
      leb_fb$status[j] <- curevent$event_type
    }else{
      leb_fb$status[j] <- paste(sort(unique(c(strsplit(leb_fb$status[j], split = "&", fixed = T)[[1]],curevent$event_type))), collapse = "&")
    }
    
  }
  
}

statusTypes <- c("None", setdiff(unique(leb_fb$status), "None"))
leb_fb$status <- factor(leb_fb$status, levels = statusTypes, ordered = T)


pltlist <- lapply(unique(leb_covid$casetype), function(ct){
  leb_covid_ct <-leb_covid %>% filter(casetype == ct)
  ge <- ggplot(leb_covid_ct) +
  geom_col( aes(x = Date, y = daily), position = "identity", fill = "red", alpha = 0.5) +
  facet_wrap(~casetype, ncol = 1, scales = "free_y") +
  theme_cowplot() +
  xlab("Number (daily)") +
  theme(legend.position = "bottom") +
  labs(fill = "")
  transform_ratio <- max(leb_covid_ct$daily, na.rm = T)/max(fb %>% filter(country == "LBN") %>%  pull(all_day_ratio_single_tile_users))
  Stringency_mult <- max(leb_covid_ct$daily, na.rm = T)/max(stringency$StringencyIndex)
ge + geom_line(data = leb_fb, aes(x = ds, y = all_day_ratio_single_tile_users * transform_ratio, group = polygon_name, color = status), alpha = 0.7) +
  # scale_color_identity() +
  scale_y_continuous(
    "Number of cases", 
    sec.axis = sec_axis(~ . / transform_ratio, name = "Facebook proportion of\npeople staying put")
  ) +
  geom_vline(xintercept = lubridate::ymd("2020-08-04"), color = "red") +
    geom_vline(xintercept = lubridate::ymd("2020-10-04"), lty = 2, color = "black") +
  geom_vline(xintercept = lubridate::ymd("2020-10-12"), lty = 2, color = "black") +
  geom_vline(xintercept = lubridate::ymd("2020-10-19"), lty = 2, color = "black") +
  geom_line(data = stringency, aes(x= Date, y = StringencyIndex * Stringency_mult), color = "green")
  
  })

plot_grid(get_legend(pltlist[[1]] + guides(col = guide_legend(nrow = 2)) ), plot_grid(plotlist = lapply(pltlist, function(x){x+theme(legend.position = "none")}), ncol = 1), nrow = 2, rel_heights = c(1,7))

2.7 Adding Google Mobility Data

Get the data from the Google Mobility Website. Specifically the regional CSV file. Read it and make a general plot. I will remove the data relating to parks and transit stations because the data is of low quality (there are almost no parks and transit stations in Lebanon). Beirut is in black, the other districts in grey. I’ve added the stringency index in green. The scale of the stringency index is manually set (no axis for this).

download.file(url = "https://www.gstatic.com/covid19/mobility/Region_Mobility_Report_CSVs.zip", destfile = "../data/Region_Mobility_Report_CSVs.zip")
unzip("../data/Region_Mobility_Report_CSVs.zip", exdir = "../data/Region_Mobility_Report_CSVs")

leb_google <- rbind(read.csv("../data/Region_Mobility_Report_CSVs/2020_LB_Region_Mobility_Report.csv"),read.csv("../data/Region_Mobility_Report_CSVs/2021_LB_Region_Mobility_Report.csv"))
leb_google$date <- lubridate::ymd(leb_google$date)
leb_google$sub_region_2[leb_google$sub_region_1 == "Beirut Governorate"] <- "Beirut" #in order to keep beirut
leb_google <- leb_google[leb_google$sub_region_2 != "",] #keep sub-region 2 data
leb_google <- leb_google %>% select(-which(grepl("parks|transit", colnames(leb_google))))
cols_for_pivot <- which(grepl("from_baseline", colnames(leb_google))&!grepl("parks|transit", colnames(leb_google)))

leb_google_long <- leb_google %>% pivot_longer(cols = all_of(cols_for_pivot), names_to = "place_category", values_to = "percent_change_from_baseline") %>%
  mutate(place_category = gsub("_percent_change_from_baseline", "", place_category)) %>%
  mutate(place_category = gsub("_", " ", place_category))

leb_google_long$color = ifelse(leb_google_long$sub_region_2=="Beirut", "black", "lightgrey")
leb_google_long$place_category <- gsub(" and ", " and\n",leb_google_long$place_category)
g <- ggplot(leb_google_long) +
  # geom_smooth() +
  geom_line(aes(x = date, y = percent_change_from_baseline, group = sub_region_2, color = color)) +
  scale_color_identity() +
  theme_cowplot() +
  theme(legend.position = "bottom") +
  geom_vline(xintercept = lubridate::ymd("2020-08-04"), color = "red") +
  # geom_rect(data = event_df, aes(xmin = start, xmax = end, ymin = -Inf, ymax = Inf, fill = event_type), x= 0 ,y=0, alpha = 0.1, color = "grey") +
  geom_hline(yintercept = 0) +
  facet_grid(place_category~., scales = "free_y") +
  ylab("Percent change from baseline") +
    geom_vline(xintercept = lubridate::ymd("2020-10-04"), lty = 2, color = "black") +
  geom_line(data = stringency, aes(x= Date, y = StringencyIndex/2), color = "green")
g

How do these parameters correlate? Perform pairwise correlation.

forpairs <- leb_google[,cols_for_pivot]
colnames(forpairs) <- gsub("_percent_change_from_baseline", "", colnames(forpairs))
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- cor(x, y, use = "pairwise.complete.obs")
    txt <- format(c(r, 0.123456789), digits = digits)[1]
    txt <- paste0(prefix, txt)
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = 1)
}
pairs(forpairs, pch = 19, cex=0.5, upper.panel = panel.cor)

How do they relate to the Facebook data? First need to harmonize the names of the districts because they are written differently in the two datasets.

2.7.1 Harmonizing district names in Google and Facebook datasets

I’ll try to do that by minimzing the string edit distance (generalized Levenshtein distance) between the two. First let’s have a look at the problem:

google_districts <- unique(leb_google$sub_region_2)
facebook_districts <- unique(leb_fb$polygon_name)

cat("Google names: ", google_districts)
## Google names:  Aakkar Baalbek Beirut Rachaiya West Bekaa Zahlé Aalay Baabda Chouf Jbeil Kesrouane Matn Bint Jbeil Hasbaiyya Marjaayoun Nabatiyeh Batroun Koura Minieh - Danniyeh Tripoli Zgharta Jezzine Saida Tyre
cat("Facebook names: ", facebook_districts)
## Facebook names:  Akkar Baalbeck Hermel Beirut Rachiaya West Bekaa Zahleh Aley Baabda Chouf El Metn Jubail Kasrouane Bint Jbayl Hasbaya Marjaayoun Nabatiyeh Batroun Bcharre Koura Minieh-Danieh Tripoli Zgharta Jezzine Saida Sour

Now here’s a proposed solution that works well for all but Tyre (in Google) and El Metn (in Facebook). I also guess that this would be the case for Byblos = Jbeil = Jubeyl = Jbayl ….. The solution is to calculate, in a systematic fashion for each pair of names, the number of edits needed, then to pick the pairs with the lowest numbers. If the pairs are identical, that number would be 0. If one change is needed (e.g. Jbeil -> Jbail), that number would be 1. So here’s the resulting conversion table after selecting the minima.

approximate_distance <- sapply(google_districts, function(x){
  out <- sapply(facebook_districts, function(y){
    adist(ifelse(x=="Tyre", "Sour",x),
          ifelse(y=="El Metn", "Matn", y)) # make an exception for Tyre (in google) and El Metn (in facebook)
  })
  
  out[out!=min(out)] <- NA
  out
})

approximate_distance <- apply(approximate_distance, MAR = 2, function(x){
  x[x!=min(x)] <- NA
  x
})

conversion_table <- reshape2::melt(approximate_distance) %>% filter(value >= 0)
colnames(conversion_table) <- c("facebook_name", "google_name", "Distance")
conversion_table <- conversion_table %>% filter(google_name!="", facebook_name!="")

DT::datatable(conversion_table)
google_to_fc <- setNames(conversion_table$facebook_name, conversion_table$google_name)

leb_google$polygon_name <- google_to_fc[leb_google$sub_region_2]
leb_google$ds <- lubridate::ymd(leb_google$date)
facebook_google_joined <- full_join(leb_google, leb_fb, by = c("polygon_name", "ds"))
forpairs_fb_google <- facebook_google_joined[,c(colnames(leb_google)[cols_for_pivot], "all_day_bing_tiles_visited_relative_change", "all_day_ratio_single_tile_users")]

colnames(forpairs_fb_google) <- gsub("_percent_change_from_baseline", "\n(google)", colnames(forpairs_fb_google))
colnames(forpairs_fb_google) <- gsub("all_day", "facebook\nall_day", colnames(forpairs_fb_google))
pairs(forpairs_fb_google, pch = 19, cex=0.5, upper.panel = panel.cor)

Facebook and Google mobility data seem to largely agree especially when looking at the correlation between “residential” category of google and the other two facebook factors (~absolute pearson correlation ~0.9). Here’s a zoom in on a pair of parameters, but this time split by district:

facebook_google_joined %>% ggplot() +
  geom_point(aes(x=all_day_bing_tiles_visited_relative_change, y = workplaces_percent_change_from_baseline)) +
  facet_wrap(~polygon_name) +
  xlab("Facebook relative change from baseline") +
  ylab("Google relative change from baseline (percent)") +
  theme_cowplot()

Finally, let’s overlay the Google data with the lockdown measures. The dashed vertical line shows the moment the selective lockdowns came into force at the village/town level (Oct 4 2020).

# add a status column to the fbdata
leb_google_long$status <- "None"
leb_google_long$date <- lubridate::ymd(leb_google_long$date)
leb_google_long$ds <- lubridate::ymd(leb_google_long$date)
leb_google_long$polygon_name <- conversion_table$facebook_name[match(leb_google_long$sub_region_2, conversion_table$google_name)]
for(i in 1:nrow(event_df)){
  curevent <- event_df[i,]
  wc <- which( (leb_google_long$ds <= curevent$end) & (leb_google_long$ds >= curevent$start) & (leb_google_long$polygon_name == curevent$caza))
  for(j in wc){
    if(leb_google_long$status[j] == "None"){
      leb_google_long$status[j] <- curevent$event_type
    }else{
      leb_google_long$status[j] <- paste(sort(unique(c(strsplit(leb_google_long$status[j], split = "&", fixed = T)[[1]],curevent$event_type))), collapse = "&")
    }
    
  }
  
}

statusTypes <- c("None", setdiff(unique(leb_google_long$status), "None"))
leb_google_long$status <- factor(leb_google_long$status, levels = statusTypes, ordered = T)


# just to add a secondary axis, calculate a transformation ratio on a per-place_category basis
place_cats <- unique(leb_google_long$place_category)
pl <- lapply(place_cats, function(pc){
  
leb_google_long_subs <- leb_google_long %>% filter(place_category == pc)
transform_ratio <- max(leb_covid$daily, na.rm = T)/max(abs(leb_google_long_subs$percent_change_from_baseline), na.rm = T)

ge <- ggplot(leb_google_long_subs) +
  geom_col(data = leb_covid %>% filter(casetype == "confirmed") %>% mutate(date = Date), aes(x = date, y = daily), position = "identity", fill = "red", alpha = 0.5) +
  # geom_smooth() +
  geom_line(aes(x = date, y = percent_change_from_baseline*transform_ratio, group = sub_region_2, color = status)) +
  # scale_color_identity() +
  theme_cowplot() +
  theme(legend.position = "bottom") +
  geom_vline(xintercept = lubridate::ymd("2020-08-04"), color = "red") +
  # geom_rect(data = event_df, aes(xmin = start, xmax = end, ymin = -Inf, ymax = Inf, fill = event_type), x= 0 ,y=0, alpha = 0.1, color = "grey") +
  geom_hline(yintercept = 0) +
  facet_wrap(~place_category, scales = "free_y", ncol = 1) +
  scale_y_continuous(
    "New Covid-19 cases", 
    sec.axis = sec_axis(~ . / transform_ratio, name = "Google % change\nfrom baseline")
  )

ge +
    geom_vline(xintercept = lubridate::ymd("2020-10-04"), lty = 2, color = "black") +
  geom_line(data = stringency, aes(x= Date, y = StringencyIndex*10), color = "green")
})

# plot_grid(plotlist = pl, ncol = 1)


plot_grid(get_legend(pl[[1]] + guides(col = guide_legend(nrow = 2)) ), plot_grid(plotlist = lapply(pl, function(x){x+theme(legend.position = "none")}), ncol = 1), nrow = 2, rel_heights = c(1,7))

Let’s look at the breakdown by caza now, do we see the same rankings as in the Facebook data?

ranks <- leb_google_long %>% group_by(place_category,ds) %>% 
  mutate(percent_change_from_baseline_rank = rank(percent_change_from_baseline)) %>% 
  group_by(polygon_name, place_category) %>% 
  summarize(percent_change_from_baseline_meanrank = mean(percent_change_from_baseline_rank),
            percent_change_from_baseline_sdrank = sd(percent_change_from_baseline_rank))

plctypes <- unique(ranks$place_category)
pllist <- lapply(plctypes, function(pt){
  ranks %>% filter(place_category == pt) %>% ggplot(aes(x = reorder(polygon_name,percent_change_from_baseline_meanrank) , y = percent_change_from_baseline_meanrank)) + 
  geom_col() +
  geom_errorbar(aes(ymin = percent_change_from_baseline_meanrank-percent_change_from_baseline_sdrank,
                    ymax = percent_change_from_baseline_meanrank+percent_change_from_baseline_sdrank)) +
  coord_flip() +
  facet_wrap(~place_category, scales = "free")+
  ylab("Average rank of\npercent movement change\nrelative to baseline") +
  xlab("Caza") +
  theme_light()
})
plot_grid(plotlist = pllist, ncol = 2)

It’s a bit complicated to make sense of all this, especially since this average rank may not be the best way to rank the Cazas. I should look more into this data to interpret correctly.

2.8 Lebanese municipal-level risk zones

Another time, getting data from any official source is a hassle. This time I am referring to the municipal-level dashboard that the ministry of public health, Ministry of Interior and Municipalities, the CNRS, and the Covid-19 Preventive Measures Committee. There is no way to obtain the data directly, but with some reverse-engineering of their API, I could get the current data (not the timeseries). I also downloaded the list of villages that were locked down on October 04 2020. Credits go to Ahmad Barclay. See this link, who put the effort in organizing the names of the municipalities. Guess why: the Lebanses Government still releases information as photos/scans/pdfs. Bad habits…. Again, we have the same problem of town/place names: everything needs to be matched every time. I have also included the October 12 villages (and the October 13 modification).

suppressPackageStartupMessages(library(geojsonio))
suppressPackageStartupMessages(library(sf))
apicall <- "https://gisleb.cnrs.edu.lb/server/rest/services/ColorCoded/Municipalities/FeatureServer/0/query?where=no_cases+%3E%3D+0&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Meter&relationParam=&outFields=*&returnGeometry=true&maxAllowableOffset=&geometryPrecision=&outSR=&having=&gdbVersion=&historicMoment=&returnDistinctValues=false&returnIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&multipatchOption=xyFootprint&resultOffset=&resultRecordCount=&returnTrueCurves=false&returnExceededLimitFeatures=false&quantizationParameters=&returnCentroid=false&sqlFormat=none&resultType=&featureEncoding=esriDefault&f=geojson"


# to get all - but this is limited to 1000 fields
apicall_municipalities_all <- "https://gisleb.cnrs.edu.lb/server/rest/services/ColorCoded/Municipalities/FeatureServer/1/query?where=daily_deat++%3E%3D+0&objectIds=&time=&geometry=&geometryType=esriGeometryPolygon&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Meter&relationParam=&outFields=*&returnGeometry=true&maxAllowableOffset=&geometryPrecision=&outSR=&having=&gdbVersion=&historicMoment=&returnDistinctValues=false&returnIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&multipatchOption=xyFootprint&resultOffset=&resultRecordCount=&returnTrueCurves=false&returnExceededLimitFeatures=false&quantizationParameters=&returnCentroid=false&sqlFormat=none&resultType=&featureEncoding=esriDefault&f=geojson"

library(httr)

# download.file(apicall, "../data/cnrs_gis/cnrs_gis_covid_caza.geojson")
# download.file(apicall_municipalities, "data/cnrs_gis_covid_mun.geojson")
# Works

# result <- httr::GET(
#   url    = apicall,
#   config = httr::config(ssl_verifypeer = FALSE),
#   write_disk("../data/cnrs_gis/cnrs_gis_covid_caza.geojson", overwrite=TRUE)
# )
# 
# caza <- sf::st_read("../data/cnrs_gis/cnrs_gis_covid_caza.geojson", quiet = T)
# 
# mohafaza <- unique(caza$moh_new)

# conditions_batches=c("batch1" = "objectid+<+1000", "batch2" = "objectid+>+999")
# 
# mun_files <- sapply(1:length(conditions_batches), function(x){
#   apicall_municipalities_by_moh <- paste0("https://gisleb.cnrs.edu.lb/server/rest/services/ColorCoded/Municipalities/FeatureServer/1/query?where=",
#                                           conditions_batches[x],
#                                           "&objectIds=&time=&geometry=&geometryType=esriGeometryPolygon&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Meter&relationParam=&outFields=*&returnGeometry=true&maxAllowableOffset=&geometryPrecision=&outSR=&having=&gdbVersion=&historicMoment=&returnDistinctValues=false&returnIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&multipatchOption=xyFootprint&resultOffset=&resultRecordCount=&returnTrueCurves=false&returnExceededLimitFeatures=false&quantizationParameters=&returnCentroid=false&sqlFormat=none&resultType=&featureEncoding=esriDefault&f=geojson")
#   outfile <- paste0("../data/cnrs_gis/cnrs_gis_covid_mun_",names(conditions_batches)[x],"_", lubridate::today(),".geojson")
#   result <- httr::GET(
#   url    = apicall_municipalities_by_moh,
#   config = httr::config(ssl_verifypeer = FALSE),
#   write_disk(outfile, overwrite=TRUE)
# )
#   outfile
# })
mun_files<- c("../data/cnrs_gis/cnrs_gis_covid_mun_batch1_2021-01-04.geojson","../data/cnrs_gis/cnrs_gis_covid_mun_batch2_2021-01-04.geojson")
mun <- lapply(mun_files, sf::st_read, quiet = T)

mun <- do.call(rbind, mun)

write.table(mun %>% as_tibble() %>% select(-geometry), sep = "\t", file = "../data/cnrs_gis/municipalities_table.tsv", row.names = F, col.names = T, quote = FALSE)


# add the closed municipalities list - October 4
mun_lockdown <- read.table("../data/lockdown_mun_20201004.tsv", sep = "\t", header = T)
# first convert the districts
mun_lockdown$DISTRICT_conversion <- sapply(mun_lockdown$DISTRICT, function(tv){
  mun$kada_na[which.min(adist(tv, mun$kada_na))]
})
# then match within districts
mun_lockdown <- lapply(split(mun_lockdown, mun_lockdown$DISTRICT_conversion), function(ds){
  out <- ds
  out$TOWN.VILLAGE_conversion <- sapply(out$TOWN.VILLAGE, function(tv){
    mun_sub <- mun %>% filter(kada_na == ds$DISTRICT_conversion[1])
    mun_sub$adm3_name[which.min(adist(tv, mun_sub$adm3_name))[1]]
})
  out
})
mun_lockdown <- do.call(rbind, mun_lockdown)

mun$lockdown_october4 <- paste0(mun$adm3_name,mun$kada_na) %in% paste0(mun_lockdown$TOWN.VILLAGE_conversion, mun_lockdown$DISTRICT_conversion)

# now for the second round of lockdowns (october 12-13): I manually prepared a file based on the ministry of interior's official scanned document
mun_lockdown_october13 <- read.table("../data/lockdown_mun_20201013.tsv", sep = "\t", header = T)
mun_lockdown_october13 <- mun_lockdown_october13 %>% filter(!is.na(lockdown_13_oct_index)) %>% filter(lockdown_13_oct_index != "")# the index column is the index as in the official document.
mun$lockdown_october13 <- paste0(mun$adm3_name,mun$kada_na) %in% paste0(mun_lockdown_october13$adm3_name, mun_lockdown_october13$kada_na)

mun_lockdown_october19 <- read.table("../data/lockdown_mun_20201019.tsv", sep = "\t", header = T)
mun_lockdown_october19 <- mun_lockdown_october19 %>% filter(!is.na(lockdown_19_oct_index)) %>% filter(lockdown_19_oct_index != "")# the index column is the index as in the official document.

mun$lockdown_october19 <- paste0(mun$adm3_name,mun$kada_na) %in% paste0(mun_lockdown_october19$adm3_name, mun_lockdown_october19$kada_na)



# ggplot(mun) + geom_sf(aes(color = lockdown_october4, fill = rate) ) +
#   theme_void() +
#   scale_fill_gradient2(low = 'green', mid = 'yellow', high = 'red',midpoint = 8, limits = c(0, 100))
rl <- mun %>% as_tibble %>% select(risk_lev, risk_level) %>% unique() %>% arrange(risk_lev)

rl <- setNames(c(0,10,30,70,100),  c("Low/No data", "Very low", "Low", "Moderate", "High"))
mun$risk_level_clean <- names(rl)[match(mun$risk_lev, rl)]
mun$risk_level_clean <- factor(mun$risk_level_clean, levels = names(rl), ordered = T)
ggplot(mun) + geom_sf(aes(color = lockdown_october4, fill = risk_level_clean, size = lockdown_october4) ) +
  theme_void() +
  scale_color_manual(values = c("FALSE" = "#AAAAAA", "TRUE" = "#FF0000"), guide = FALSE)+
  scale_size_manual(values = c("FALSE" = 0.25, "TRUE" = 0.5), guide = FALSE) +
  scale_fill_viridis_d(name = "Risk Level", option = "cividis") +
  ggtitle(paste0("Risk levels in all Lebanese Municipalities on ", lubridate::today(),"\nMunicipalities in quarantine from Oct 4-12 are highlighted in red")) +
  labs(caption = paste("Risk data retrieved on ",lubridate::today(), " from gisleb.cnrs.edu.lb")) +
  theme(plot.title = element_text(size = 15))

It is not clear when the data is last updated since there are no timestamps in the file, but it seems that at least in the data I could get, the lockdown measures do not reflect the severity of the situation.

Now let’s examine the second round of lockdowns (12 October) affecting 169 villages, of which 68 were already quarantined.

plt <- ggplot(mun) + geom_sf(aes(color = lockdown_october13, fill = risk_level_clean, size = lockdown_october13, adm3_name = adm3_name ) ) +
  theme_void() +
  scale_color_manual(values = c("FALSE" = "#AAAAAA", "TRUE" = "#FF0000"), guide = FALSE)+
  scale_size_manual(values = c("FALSE" = 0.25, "TRUE" = 0.5), guide = FALSE) +
  scale_fill_viridis_d(name = "Risk Level", option = "cividis") +
  ggtitle(paste0("Risk levels in all Lebanese Municipalities on ", lubridate::today(),"\nMunicipalities in quarantine from Oct 12-19 are highlighted in red (with Oct 13 adjustment)"), ) +
  labs(caption = paste("Risk data retrieved on ",lubridate::today(), " from gisleb.cnrs.edu.lb")) +
  theme(plot.title = element_text(size = 15))
plt

Now let’s examine the third round of lockdowns (19 October) affecting 79 villages, of which 50 were already quarantined.

plt <- ggplot(mun) + geom_sf(aes(color = lockdown_october19, fill = risk_level_clean, size = lockdown_october19, adm3_name = adm3_name ) ) +
  theme_void() +
  scale_color_manual(values = c("FALSE" = "#AAAAAA", "TRUE" = "#FF0000"), guide = FALSE)+
  scale_size_manual(values = c("FALSE" = 0.25, "TRUE" = 0.5), guide = FALSE) +
  scale_fill_viridis_d(name = "Risk Level", option = "cividis") +
  ggtitle(paste0("Risk levels in all Lebanese Municipalities on ", lubridate::today(),"\nMunicipalities in quarantine from Oct 19-26 are highlighted in red"), ) +
  labs(caption = paste("Risk data retrieved on ",lubridate::today(), " from gisleb.cnrs.edu.lb")) +
  theme(plot.title = element_text(size = 15))
plt

2.8.1 Analysis of the October 4 lockdown

I’m going to try to see if there is an effect of the lockdown in these villages on the overall mobility as measured in the Google and Facebook data. For that, I will estimate the proportion of citizens affected by the measure by Caza, and see if there is a proportionate change in activity after the lockdown.

nmresi_by_kada <- mun %>% as_tibble %>% group_by(kada_na) %>% summarise(nm_resi = sum(nm_resi, na.rm = T)) %>% arrange(-nm_resi)
mun_stats_kada <- mun %>% as_tibble %>% group_by(kada_na, lockdown_october4) %>% summarise(nm_resi = sum(nm_resi, na.rm = T), prop_resi = sum(nm_resi, na.rm = T)/nmresi_by_kada$nm_resi[nmresi_by_kada$kada_na == kada_na[1]])

mun_stats_kada$kada_na <- factor(mun_stats_kada$kada_na, levels = as.character(nmresi_by_kada$kada_na), ordered = T)
mun_stats_kada_summaryl2 <- mun_stats_kada %>% filter(lockdown_october4)
mun_stats_kada %>% ggplot() + 
  geom_col(position = 'stack', aes(x = kada_na, y = nm_resi, fill = lockdown_october4)) +
 coord_flip() +
  theme_cowplot() +
  ylab("Number of residents") +
  xlab("Caza")

Based on these statistics, which smell terrible to be honest (Hermel has 540 citizens, really?), 13.9308854 of the population is effectively under lockdown. Now let’s see if there is a decrease in activity proportional to the lockdown rate. I will compare the monday to wednesday before the 4th of October with those after. Again, another time I need to convert names!

before <- lubridate::ymd("2020-09-28"):lubridate::ymd("2020-10-02")
after <- lubridate::ymd("2020-10-05"):lubridate::ymd("2020-10-09")

mun_stats_kada_summaryl2$polygon_name <- sapply(mun_stats_kada_summaryl2$kada_na, function(x){
  conversion_table$facebook_name[which.min(adist(x,conversion_table$facebook_name))]
})

leb_fb_comparison <- leb_fb %>% filter(ds %in% c(before,after)) %>% left_join(mun_stats_kada_summaryl2) %>% mutate(before = ds %in% before) %>% 
  group_by(polygon_name, prop_resi, kada_na) %>%
  summarize(all_day_ratio_single_tile_users_diff = mean(all_day_ratio_single_tile_users[!before])-mean(all_day_ratio_single_tile_users[before]),
            all_day_bing_tiles_visited_relative_change_diff = mean(all_day_bing_tiles_visited_relative_change[!before])-mean(all_day_bing_tiles_visited_relative_change[before]))
leb_fb_comparison$prop_resi[is.na(leb_fb_comparison$prop_resi)] <- 0

nmresi_by_kada$polygon_name <- sapply(nmresi_by_kada$kada_na, function(x){
  conversion_table$facebook_name[which.min(adist(x,conversion_table$facebook_name))]
})


nmresi_by_kada$polygon_name <- sapply(nmresi_by_kada$kada_na, function(x){
  conversion_table$facebook_name[which.min(adist(x,conversion_table$facebook_name))]
})

leb_fb_comparison$nm_resi <- nmresi_by_kada$nm_resi[match(leb_fb_comparison$polygon_name, nmresi_by_kada$polygon_name)]

ggplot(leb_fb_comparison,aes(x = prop_resi, y = all_day_ratio_single_tile_users_diff)) +
  geom_hline(yintercept = 0, color = "red") +
  geom_smooth(method = "lm") +
  geom_label(aes(label = polygon_name)) +
  theme_cowplot() +
  xlab("Proportion of Caza residents quarantined") +
  ylab("Difference in proportion\nof people staying put (Facebook)")

Here, the red line indicates no change. It seems there is a positive relationship between the two, suggesting that the lockdown measures proportionately affect mobility. That’s a good sign. Same as earlier figure, but with the population mapped to the size of the points.

ggplot(leb_fb_comparison,aes(x = prop_resi, y = all_day_ratio_single_tile_users_diff)) +
  geom_hline(yintercept = 0, color = "red") +
  geom_smooth(method = "lm") +
  geom_point(aes(size = nm_resi), pch = 21) +
  theme_cowplot() +
  xlab("Proportion of Caza residents quarantined") +
  ylab("Difference in proportion\nof people staying put (Facebook)") +
  scale_size_continuous(range = c(1,10), name = "Number of residents") +
  theme(legend.position = "bottom")

Let’s validate with the other Facebook parameter now, which is measured in relation to baseline activity in February. As expected, the relationship is opposite and negative.

ggplot(leb_fb_comparison,aes(x = prop_resi, y = all_day_bing_tiles_visited_relative_change_diff)) +
  geom_hline(yintercept = 0, color = "red") +
  geom_smooth(method = "lm") +
  geom_point() +
  geom_label(aes(label = polygon_name)) +
  theme_cowplot() +
  xlab("Proportion of Caza residents quarantined") +
  ylab("Difference in relative\nchange wrt baseline (Facebook)")

It seems that the Facebook data, which is at the Caza level, is sensitive enough to pick up differences in mobility due to the differential quarantine (municipality-level) policy.

Let’s do a simple linear regression to see if the relationship is significant

model <- lm(data = leb_fb_comparison, all_day_ratio_single_tile_users_diff ~  prop_resi)
print(summary(model))
## 
## Call:
## lm(formula = all_day_ratio_single_tile_users_diff ~ prop_resi, 
##     data = leb_fb_comparison)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0146170 -0.0049857 -0.0000085  0.0047612  0.0137337 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.008701   0.001852   4.698 8.96e-05 ***
## prop_resi   0.024445   0.009315   2.624   0.0149 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007047 on 24 degrees of freedom
## Multiple R-squared:  0.223,  Adjusted R-squared:  0.1906 
## F-statistic: 6.886 on 1 and 24 DF,  p-value: 0.01487

It seems that the p-value of the association between the proportion of quarantined residents and the proportion of people staying put is 0.0148669 and each one unit change in the proportion of quarantine people corresponds to a 0.0244447 unit change in the ratio of people staying put. If the p-value is low (let’s say less than 0.05), we can consider this relationship statistically significant, but is the effect large enough from the epidemiological perspective?

To put things into context, let’s compare the magnitude of the changes due to partial lockdown to those of the March full lockdown. I chose to compare first week of March and the third week of March (only Weekdays).

before_march <- lubridate::ymd("2020-03-02"):lubridate::ymd("2020-03-06") # first week of March (weekdays)
after_march <- lubridate::ymd("2020-03-26"):lubridate::ymd("2020-03-26") # last week of March

leb_fb_comparison_march <- leb_fb %>% filter(ds %in% c(before_march,after_march)) %>% left_join(mun_stats_kada_summaryl2) %>% mutate(before = ds %in% before_march) %>% 
  group_by(polygon_name, prop_resi, kada_na) %>%
  summarize(all_day_ratio_single_tile_users_diff_march = mean(all_day_ratio_single_tile_users[!before])-mean(all_day_ratio_single_tile_users[before]),
            all_day_bing_tiles_visited_relative_change_diff_march = mean(all_day_bing_tiles_visited_relative_change[!before])-mean(all_day_bing_tiles_visited_relative_change[before]))


leb_fb_comparison_march <- left_join(leb_fb_comparison_march, leb_fb_comparison)


leb_fb_comparison_march %>%
  ggplot(aes(x = all_day_ratio_single_tile_users_diff_march, y=all_day_ratio_single_tile_users_diff, label = polygon_name)) +
  geom_segment(aes(xend = all_day_ratio_single_tile_users_diff_march, yend = all_day_ratio_single_tile_users_diff_march), color = "lightgrey", lty = 2) +
  geom_text_repel(angle = 90, aes(y= (all_day_ratio_single_tile_users_diff_march-all_day_ratio_single_tile_users_diff)/2,
                                  label = paste0("", round(all_day_ratio_single_tile_users_diff_march-all_day_ratio_single_tile_users_diff, 3))),
                  color = "grey") +
  geom_point(color = "red") +
  geom_text_repel() +
  coord_fixed(ratio = 1) +
  xlim(c(0,max(leb_fb_comparison_march$all_day_ratio_single_tile_users_diff_march, leb_fb_comparison_march$all_day_ratio_single_tile_users_diff, na.rm = T))) +
  ylim(c(0,max(leb_fb_comparison_march$all_day_ratio_single_tile_users_diff_march, leb_fb_comparison_march$all_day_ratio_single_tile_users_diff, na.rm = T))) +
  geom_abline(slope = 1, intercept = 0) +
  theme_cowplot() +
  xlab("Response to March general lockdown") +
  ylab("Response to October selective lockdown") +
  labs(caption = "Values are the differences in Facebook 'ratio of users staying put'\nbetween two weeks of March and two weeks of October 2020.")

So it is clear from the mobility data that the October quarantine measures have a much smaller footprint on mobility. For example, in El Metn, 25.224 percent more people quarantined (reduced their movement) between the first week and the third week compared to 2.0406 percent during October. In both cases, I am comparing several weekdays before with the same number of weekdays after the measure was taken (complete or selective lockdown).

2.8.2 Analysis of the October 12 lockdown

First with a plot per caza:

p0 <-  ggplot(get_by_date("2020-04-06")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("April 6 2020 - Generalized lockdown") + xlab("") + ylab("") +
  theme(legend.position = "bottom")
p1 <- ggplot(get_by_date("2020-09-28")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("September 28 2020 - No lockdown") + xlab("") + ylab("") +
  theme(legend.position = "bottom")

p2 <- ggplot(get_by_date("2020-10-05")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("October 5 2020 - First localized lockdown") + xlab("") + ylab("") +
  theme(legend.position = "bottom")

p3 <- ggplot(get_by_date("2020-10-12")) + geom_sf(aes(fill = all_day_ratio_single_tile_users), color = "white", size = 0.2) + 
  # geom_text_repel(aes(x = X, y = Y, label = polygon_name), size = 2.5) +
  scale_fill_viridis_c(limits = rn, name = "Ratio of people staying put\nwithin a single location", option = "inferno") +
  theme_void() +
  ggtitle("October 12 2020 - Second localized lockdown") + xlab("") + ylab("") +
  theme(legend.position = "bottom")

plot_grid(p0,p1,p2,p3, nrow = 2)

Let’s compare the first week of October (actually starting September 28) to the week of October 12, where a second lockdown was announced. Let’s look at the relationship between proportion of locked down population and movement change.

before <- lubridate::ymd("2020-09-28"):lubridate::ymd("2020-10-02")
after <- lubridate::ymd("2020-10-12"):lubridate::ymd("2020-10-16")

mun_stats_kada_summaryl2$polygon_name <- sapply(mun_stats_kada_summaryl2$kada_na, function(x){
  conversion_table$facebook_name[which.min(adist(x,conversion_table$facebook_name))]
})

leb_fb_comparison <- leb_fb %>% filter(ds %in% c(before,after)) %>% left_join(mun_stats_kada_summaryl2) %>% mutate(before = ds %in% before) %>% 
  group_by(polygon_name, prop_resi, kada_na) %>%
  summarize(all_day_ratio_single_tile_users_diff = mean(all_day_ratio_single_tile_users[!before])-mean(all_day_ratio_single_tile_users[before]),
            all_day_bing_tiles_visited_relative_change_diff = mean(all_day_bing_tiles_visited_relative_change[!before])-mean(all_day_bing_tiles_visited_relative_change[before]))
leb_fb_comparison$prop_resi[is.na(leb_fb_comparison$prop_resi)] <- 0

nmresi_by_kada$polygon_name <- sapply(nmresi_by_kada$kada_na, function(x){
  conversion_table$facebook_name[which.min(adist(x,conversion_table$facebook_name))]
})

nmresi_by_kada$polygon_name <- sapply(nmresi_by_kada$kada_na, function(x){
  conversion_table$facebook_name[which.min(adist(x,conversion_table$facebook_name))]
})

leb_fb_comparison$nm_resi <- nmresi_by_kada$nm_resi[match(leb_fb_comparison$polygon_name, nmresi_by_kada$polygon_name)]

ggplot(leb_fb_comparison,aes(x = prop_resi, y = all_day_ratio_single_tile_users_diff)) +
  geom_hline(yintercept = 0, color = "red") +
  geom_smooth(method = "lm") +
  geom_label(aes(label = polygon_name)) +
  theme_cowplot() +
  xlab("Proportion of Caza residents quarantined") +
  ylab("Difference in proportion\nof people staying put (Facebook)")

3 Conclusions, future work, data wishlist

The first naive observations I can make are the following (pending more formal analyses):

  • The earliest lockdown response was the highest as measured by Facebook movement data. This started wearing-off gradually.
  • By the time the airport was open, movement was almost back to baseline. The combination of the two is likely to have led to the recent increase.
  • At the time of the Beirut Explosion, cases were already increasing (at least visually)
  • Facebook and Google mobility data agree to a large extent. It is not clear whether one is more useful than the other or whether a combination of the two is more informative than using either source.
  • There are clear differences in how different cazas or regions react to the Covid situation.
  • It seems that reduction in mobility is proportional to the proportion of people quarantined within a Caza. It is unclear if this reduction is significant enough to affect Covid-19 spread yet.
  • Reduction in mobility after selective lockdown measures in October is ~10 times lower than that of the general lockdown in March. This raises concerns about the effectiveness of the measures AND the seriousness of the people in respecting them.
  • There are three versions of district/Caza names in three different sources (Google, Facebook, Lebanese OCHA map). There are likely infinitely more combinations of those circulating (at all administrative levels). If not already available, we should look into ways of standardizing this and/or a database and utilities that facilitate conversions.

Future work:

  • Application of established epidemiological models.
  • Some more formal statistics based on very defined questions.

Data wishlist:

  • Finer grain Facebook/Google data for more in-depth assessment of the explosion
  • I couldn’t (easily) find subnational Covid data for Lebanon - would be nice to have and add.
  • The government (either intentionally or through its incompetence) needs to make data more FAIR: Findable, Accessible, Interoperable, and Reusable. READ THIS
  • Need to define events with better granularity

4 Export data for the interactive app

The data prepared in this notebook will be used in the shiny app. Click here to go the interactive app.

# lebanon map
saveRDS(rmapshaper::ms_simplify(shp, keep_shapes = TRUE, keep = 0.1), "../leb_covid_movement/data/admin2_map.RDS")

# municipality map
saveRDS(rmapshaper::ms_simplify(mun,keep_shapes = TRUE, keep = 0.01), file = "../leb_covid_movement/data/mun_map.RDS")

# events table
saveRDS(event_df, file = "../leb_covid_movement/data/event_df_caza.RDS")

#facebook data:
saveRDS(leb_fb, file = "../leb_covid_movement/data/leb_fb.RDS")

# google data:
saveRDS(leb_google_long, file = "../leb_covid_movement/data/leb_google_long.RDS")

# stringency data:
saveRDS(stringency, file = "../leb_covid_movement/data/stringency.RDS")

# covid data
saveRDS(leb_covid, file = "../leb_covid_movement/data/leb_covid.RDS")

# conversion table
saveRDS(conversion_table, file = "../leb_covid_movement/data/conversion_table.RDS")

5 R Session Information

sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] httr_1.4.2      geojsonio_0.9.4 ggrepel_0.9.1   lubridate_1.7.8
##  [5] sf_0.9-8        cowplot_1.1.1   forcats_0.5.0   stringr_1.4.0  
##  [9] dplyr_0.8.5     purrr_0.3.4     readr_1.4.0     tidyr_1.0.3    
## [13] tibble_3.1.1    ggplot2_3.3.3   tidyverse_1.3.0 rhdx_0.1.0.9000
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-147       fs_1.5.0           tools_4.0.0        backports_1.1.7   
##  [5] utf8_1.2.1         R6_2.5.0           DT_0.13            KernSmooth_2.23-16
##  [9] mgcv_1.8-31        lazyeval_0.2.2     rgeos_0.5-5        DBI_1.1.1         
## [13] colorspace_2.0-1   sp_1.4-5           withr_2.4.2        tidyselect_1.1.0  
## [17] gridExtra_2.3      curl_4.3.1         compiler_4.0.0     cli_2.5.0         
## [21] rvest_0.3.5        geojsonsf_2.0.1    xml2_1.3.2         labeling_0.4.2    
## [25] bookdown_0.19      triebeard_0.3.0    rmapshaper_0.4.4   scales_1.1.1      
## [29] classInt_0.4-3     proxy_0.4-25       rappdirs_0.3.3     digest_0.6.27     
## [33] foreign_0.8-78     rmarkdown_2.1      base64enc_0.1-3    pkgconfig_2.0.3   
## [37] htmltools_0.5.1.1  dbplyr_1.4.3       fastmap_1.1.0      jsonvalidate_1.1.0
## [41] htmlwidgets_1.5.3  rlang_0.4.11       readxl_1.3.1       rstudioapi_0.13   
## [45] httpcode_0.3.0     generics_0.1.0     farver_2.1.0       jsonlite_1.7.2    
## [49] crosstalk_1.1.1    magrittr_2.0.1     Matrix_1.2-18      Rcpp_1.0.6        
## [53] munsell_0.5.0      fansi_0.4.2        abind_1.4-5        viridis_0.6.1     
## [57] lifecycle_1.0.0    stringi_1.6.1      yaml_2.2.1         jqr_1.2.1         
## [61] plyr_1.8.6         maptools_1.1-1     grid_4.0.0         parallel_4.0.0    
## [65] crayon_1.4.1       lattice_0.20-41    splines_4.0.0      stars_0.5-2       
## [69] haven_2.2.0        geojson_0.3.4      hms_1.0.0          knitr_1.28        
## [73] pillar_1.6.0       geojsonlint_0.4.0  reshape2_1.4.4     crul_1.1.0        
## [77] reprex_0.3.0       glue_1.4.2         evaluate_0.14      V8_3.4.2          
## [81] hoardr_0.5.2       data.table_1.12.8  modelr_0.1.7       vctrs_0.3.8       
## [85] urltools_1.7.3     cellranger_1.1.0   gtable_0.3.0       assertthat_0.2.1  
## [89] cachem_1.0.4       xfun_0.22          lwgeom_0.2-6       broom_0.5.6       
## [93] e1071_1.7-6        class_7.3-16       viridisLite_0.4.0  memoise_2.0.0     
## [97] units_0.7-1        ellipsis_0.3.2