In the absence of vaccines and antiviral medication, non-pharmaceutical interventions (NPIs) implemented in response to (emerging) epidemic respiratory viruses are the only option available to delay and moderate the spread of the virus in a population.
Most governments have implemented bundles of highly restrictive NPIs. Decisions had to be taken under rapidly changing epidemiological situations, despite (at least at the beginning of the epidemic) a lack of data evidence on the individual and combined effectiveness of these measures, degree of compliance of the population, and societal impact. These NPIs may cause substantial economic and social costs.
Among the NPIs with the largest impacts on the spread of Covid-19 as concluded in a comprehensive study on NPIs before the wide availability of vaccines are1:
With the availability of vaccines, a wide roll-out of vaccinating the public and controlling mobility should be the priority strategies of reducing the spread of the virus which can be easily seen from the trend in new cases (trying to avoid the R0 or Rt jargon for simplicity).
The easy and timely availability of local mobility data for a country (like Malaysia) with detailed granularity (by state, district in the case of Malaysia) is indispensable in any study of the effectiveness of any movement control order (MCO). Malaysia has a myriad and often confusing set of MCOs but without granular data, it is difficult to access its impact.
Unfortunately, the various applications in Malaysia that track mobility like MySejahtera and Selangkah do not share data for the public even in summary form. Fortunately, Apple2, Google3, and Facebook4 do provide such data daily.
This post is intended to share some examples of how to analyze these mobility data with Covid-19 data. Each of the 3 sources has its strengths and limitations. FB is the best in terms of granularity to the state and district data. Google has good categories of the general purpose of movement.
This Covid-19 related R package facilitates the direct download of various Covid-19 related data (including data on governmental measures) directly from authoritative sources. It also provides a flexible function to visualize the spreading of the virus.5
This package has a function download_merged_data() that downloads all data sources and creates a merged country-day panel sample. It includes the Apple and Google mobility data. We will use this to show some examples of mobility. We will look into the FB mobility data in the later sections.
We download and save the data.
The data comes with two meta data sets that describe the data. The data.frame tidycovid19_data_sources provides short descriptions and links for each data source used by the tidycovid19 package. The data.frame tidycovid19_variable_defintions provides variable definitions for each variable included in the merged country-day data.frame provided by download_merged_data():
| var_name | var_def |
|---|---|
| iso3c | ISO3c country code as defined by ISO 3166-1 alpha-3 |
| country | Country name |
| date | Calendar date |
| confirmed | Confirmed Covid-19 cases as reported by JHU CSSE (accumulated) |
| deaths | Covid-19-related deaths as reported by JHU CSSE (accumulated) |
| recovered | Covid-19 recoveries as reported by JHU CSSE (accumulated) |
| ecdc_cases | Covid-19 cases as reported by ECDC (accumulated, weekly post 2020-12-14) |
| ecdc_deaths | Covid-19-related deaths as reported by ECDC (accumulated, weekly post 2020-12-14) |
| total_tests | Accumulated test counts as reported by Our World in Data |
| tests_units | Definition of what constitutes a ‘test’ |
| positive_rate | The share of COVID-19 tests that are positive, given as a rolling 7-day average |
| hosp_patients | Number of COVID-19 patients in hospital on a given day |
| icu_patients | Number of COVID-19 patients in intensive care units (ICUs) on a given day |
| total_vaccinations | Total number of COVID-19 vaccination doses administered |
| soc_dist | Number of social distancing measures reported up to date by ACAPS, net of lifted restrictions |
| mov_rest | Number of movement restrictions reported up to date by ACAPS, net of lifted restrictions |
| pub_health | Number of public health measures reported up to date by ACAPS, net of lifted restrictions |
| gov_soc_econ | Number of social and economic measures reported up to date by ACAPS, net of lifted restrictions |
| lockdown | Number of lockdown measures reported up to date by ACAPS, net of lifted restrictions |
| apple_mtr_driving | Apple Maps usage for driving directions, as percentage*100 relative to the baseline of Jan 13, 2020 |
| apple_mtr_walking | Apple Maps usage for walking directions, as percentage*100 relative to the baseline of Jan 13, 2020 |
| apple_mtr_transit | Apple Maps usage for public transit directions, as percentage*100 relative to the baseline of Jan 13, 2020 |
| gcmr_retail_recreation | Google Community Mobility Reports data for the frequency that people visit retail and recreation places expressed as a percentage*100 change relative to the baseline period Jan 3 - Feb 6, 2020 |
| gcmr_grocery_pharmacy | Google Community Mobility Reports data for the frequency that people visit grocery stores and pharmacies expressed as a percentage*100 change relative to the baseline period Jan 3 - Feb 6, 2020 |
| gcmr_parks | Google Community Mobility Reports data for the frequency that people visit parks expressed as a percentage*100 change relative to the baseline period Jan 3 - Feb 6, 2020 |
| gcmr_transit_stations | Google Community Mobility Reports data for the frequency that people visit transit stations expressed as a percentage*100 change relative to the baseline period Jan 3 - Feb 6, 2020 |
| gcmr_workplaces | Google Community Mobility Reports data for the frequency that people visit workplaces expressed as a percentage*100 change relative to the baseline period Jan 3 - Feb 6, 2020 |
| gcmr_residential | Google Community Mobility Reports data for the frequency that people visit residential places expressed as a percentage*100 change relative to the baseline period Jan 3 - Feb 6, 2020 |
| gtrends_score | Google search volume for the term ‘coronavirus’, relative across time with the country maximum scaled to 100 |
| gtrends_country_score | Country-level Google search volume for the term ‘coronavirus’ over a period starting Jan 1, 2020, relative across countries with the country having the highest search volume scaled to 100 (time-stable) |
| region | Country region as classified by the World Bank (time-stable) |
| income | Country income group as classified by the World Bank (time-stable) |
| population | Country population as reported by the World Bank (original identifier ‘SP.POP.TOTL’, time-stable) |
| land_area_skm | Country land mass in square kilometers as reported by the World Bank (original identifier ‘AG.LND.TOTL.K2’, time-stable) |
| pop_density | Country population density as reported by the World Bank (original identifier ‘EN.POP.DNST’, time-stable) |
| pop_largest_city | Population in the largest metropolian area of the country as reported by the World Bank (original identifier ‘EN.URB.LCTY’, time-stable) |
| life_expectancy | Average life expectancy at birth of country citizens in years as reported by the World Bank (original identifier ‘SP.DYN.LE00.IN’, time-stable) |
| gdp_capita | Country gross domestic product per capita, measured in 2010 US-$ as reported by the World Bank (original identifier ‘NY.GDP.PCAP.KD’, time-stable) |
| timestamp | Date and time where data has been collected from authoritative sources |
Apart from Covid related data, the merged data.frame also has some data related to government NPIs (non-pharmaceutical interventions) and mobility data from Apple and Google.
The Google mobility dataset is intended to help remediate the impact of COVID-19. It shows how visits to places, such as grocery stores and parks, are changing in each geographic region. The Place categories are:
These datasets show how visits and length of stay at different places change compared to a baseline. Changes for each day are compared to a baseline value for that day of the week. The baseline is the median value, for the corresponding day of the week, during the 5 weeks Jan 3–Feb 6, 2020.
The Apple mobility datasets show a relative volume of directions requests per country/region, sub-region, or city compared to a baseline volume on January 13th, 2020. They define day as midnight-to-midnight, Pacific time. Cities are defined as the greater metropolitan area and their geographic boundaries remain constant across the data set. In many countries/regions, sub-regions, and cities, the relative volume has increased since January 13th, consistent with normal, seasonal usage of Apple Maps. Day of week effects is important to normalize as you use this data. Data that is sent from users’ devices to the Maps service is associated with random, rotating identifiers so Apple doesn’t have a profile of individual movements and searches.
With the merged data.frame, plotting becomes easy. We will show some simple examples.
merged %>% filter(iso3c == "MYS") %>%
mutate(
new_cases = confirmed - lag(confirmed),
ave_new_cases = rollmean(new_cases, 7, na.pad=TRUE, align="right")) %>%
filter(!is.na(new_cases), !is.na(ave_new_cases)) %>%
ggplot(aes(x = date)) +
geom_bar(aes(y = new_cases), stat = "identity", fill = "lightblue") +
geom_line(aes(y = ave_new_cases), color ="darkred") +
theme_minimal() +
labs(title = "New Confirmed Covid-19 Cases in Malaysia",
caption = "Blue bars show the cases, red line shows the rolling average")
my_countries = c("MYS", "IDN", "MMR", "PHL", "SGP", "THA", "VNM", "SAU")
plot_covid19_spread(
merged, type = "confirmed", min_cases = 10, edate_cutoff = 450,
cumulative = FALSE, change_ave = 7,
highlight = my_countries)
Another option to visualize the spread of Covid-19, in particular if we want to compare many countries, is to produce a stripes-based visualization.
plot_covid19_stripes(merged,
countries = my_countries,
type = "confirmed",
min_cases = 100,
cumulative = FALSE,
sort_countries = "countries")
With the per_capita data, we can compare how countries are coping with the pandemic based on their GDP. The richer countries should have more flexibility in their NPIs. They may also have an advantage in the early procurement of vaccines. This post will not include such an analysis.
merged %>% plot_covid19_stripes(
countries = my_countries,
type = "active",
min_cases = 100,
cumulative = FALSE,
per_capita = TRUE,
population_cutoff = TRUE,
sort_countries = "magnitude"
)
Is there an association between testing during the first 30 days of the spread and the amount of confirmed cases that a country observes?
Is there an association between testing during the first 30 days of the spread and the amount of deaths that a country observes?
We look at the driving and walking patterns from the Apple data for Malaysia.
merged %>% filter(iso3c == "MYS") -> covidMYS
covidMYS %>% select(date, apple_mtr_driving, apple_mtr_walking) %>%
gather(key = "variable", value = "value", -date) -> df
ggplot(df, aes(x = date, y = value)) +
geom_line(aes(color = variable, linetype = variable)) +
geom_hline(yintercept=100, color="darkred") +
geom_vline(xintercept=as.Date("2020-05-24"), color="darkblue") +
geom_vline(xintercept=as.Date("2021-05-13"), color="darkblue") +
geom_vline(xintercept=as.Date("2021-06-01"), color="black") +
labs(title = "Apple Driving and Walking Trends for Malaysia",
subtitle = "Relative to Feb 2020 with 100 as the baseline",
caption = "Vertical lines indicate the 2 Hari Rayas and start of MCO3")
It is clear that people are “driving less” since Hari Raya on 13 May.
Now we combine the above plot with data for new cases.
covidMYS %>%
mutate(newcases_per100 = (confirmed - lag(confirmed))/100) %>%
select(date, apple_mtr_driving, apple_mtr_walking, newcases_per100) %>%
gather(key = "variable", value = "value", -date) -> df
ggplot(df, aes(x = date, y = value)) +
geom_line(aes(color = variable, linetype = variable)) +
geom_text(aes(x=as.Date("2020-09-26"), label="PRN Sabah", y=30),
colour="blue", angle=90, vjust = 0, text=element_text(size=2)) +
geom_hline(yintercept=100, color="darkred") +
geom_vline(xintercept=as.Date("2020-05-24"), color="darkblue") +
geom_vline(xintercept=as.Date("2021-05-13"), color="darkblue") +
geom_vline(xintercept=as.Date("2021-06-01"), color="black") +
labs(title = "Apple Driving and Walking Trends for Malaysia",
subtitle = "Relative to Feb 2020 with 100 as the baseline",
caption = "Vertical lines indicate the 2 Hari Rayas and start of MCO3")
Obviously, there is a lag between mobility and the spread of Covid-19. Not much can be concluded from the above plot but it gives an initial visual that although people are complying in terms of mobility the number of cases is increasing. The number of cases clearly shot up after PRN Sabah. Other scenarios need to be analyzed like the spread within clusters or states. Again the granular mobility and case data by states and districts become crucial.
If people are moving, where are they going? Google Mobility data gives some indication.
Google data shows that there is increased movement to residential locations, which should raise an alarm if it is to residential locations other than their own. Grocery and pharmacy destinations do not show much change. As expected, retail_recreation, parks, transit stations, workplaces show a reduction.
These are the national trends. What are the trends at the state and/or district levels? Unfortunately, the Apple and Google mobility data made available to the public do not have that level of granularity. This is where the Facebook mobility data shines. However, it is not part of the merged dataset. In the following sections, we will show how to use the Facebook mobility data and combine it with the merged dataset for some analysis.
Facebook Data for Good has several tools and initiatives that can help organizations respond to the COVID-19 pandemic.6 The mobility data is covered under Movement Range Maps,7 which define a movement range index.
The data is based on GADM level 2 (district) movements i.e. into (one direction movement) of the polygons (tiles), and within polygons (stay home). It is indexed against a benchmark line.
These data sets are intended to inform how populations are responding to physical distancing measures. In particular, there are two metrics, Change in Movement and Stay Put, that provide a slightly different perspective on movement trends. Change in Movement looks at how much people are moving around and compares it with a baseline period that predates most social distancing measures, while Stay Put looks at the fraction of the population that appears to stay within a small area during an entire day.
Data is provided in one global tab-delimited text file with the following columns.
The data from https://data.humdata.org/dataset/movement-range-maps is in a zip file. The read_tsv function easily reads the .txt file.
We do a point plot of the “visits” by people within each district.
The data shows people moved (visited) less during the 2 Hari Rayas. Also the visits seem to be going down after MCO3.
We do a point plot of the “stay” by people within each district.
People “stayed” home during the 2 Hari Rayas but below the level after the first lockdown. Yes, people are staying home after MCO3.
Next we repeat the two plots but using lines.
We scraped the data from public sources since the Malaysian granular data is not easily available in .csv or .txt formats.
Start date of Sel_case tibble is 2021-04-26. Filter SEL_mov and then merge by two columns
SEL_mov %>% filter(as.Date(date) >= as.Date("2021-04-26")) -> Sel_mov
Sel_merge <- left_join(Sel_case, Sel_mov, by=c("District", "date"))
library(plotly)
color1 = "red";symbol1 = "x"
color2 = "orange"; symbol2 = "triangle-up"
color3 = "plum2"; symbol3 = "triangle-down"
color4 = "violet"; symbol4 = "triangle-se"
Sel_merge %>% filter(District == "Kuala Selangor") %>%
plot_ly(x = ~date, y = ~New, name = "New Cases",
type = "scatter", marker = list(color=color1, symbol = symbol1)) %>%
add_trace(x = ~date, y = ~stay*100,name = "Stay Home Index",
marker = list(color=color2,symbol = symbol2)) %>%
add_trace(x = ~date, y = ~visit*100, name = "Visit Mobility Index",
marker = list(color= color3, symbol3)) %>%
layout(title = "Kuala Selangor Mobility and New Cases",
xaxis = list(title = "Date"),
yaxis = list(title = "Index/Count"))
We have a small data sample. For the selected district of Kuala Selangor people are moving less and staying put more, but the new cases are increasing. There is a lag factor involved. A bigger sample size will help together with other data parameters.
We repeat for Petaling district.
It seems that people in Petaling district did not change their mobility behavior much as compared to Kuala Selangor. The new cases show an upward trend.
This post is to show how the public mobility data from Apple, Google, and Facebook can help in studying Covid-19 new case trends with mobility for Malaysia. The data is certainly easy to understand and use with the R platform.
We have found that the Facebook data is more useful because of its granularity, allowing us to analyze at the state and district levels. It also has parameters like polygon_source and polygon_id (alphanumeric string for GADM regions) that can be used for geospatial analysis. We plan to explore this option for our next post.
Ranking the effectiveness of worldwide COVID-19 government interventions, https://doi.org/10.1038/s41562-020-01009-0↩
https://www.google.com/covid19/mobility/data_documentation.html?hl=en↩
Meet tidycovid19: Yet another Covid-19 related R Package, https://www.r-bloggers.com/2020/03/meet-tidycovid19-yet-another-covid-19-related-r-package/↩
Facebook Data for Good, https://dataforgood.fb.com/docs/covid19/↩