Like all infectious diseases, Covid-19 transmission happens through close contact. It has to pass from person to person, and these people will have to travel to spread the virus there. Thus networks have often been used to help model the spread of diseases, whether it’s by using a social network to model the spread of the disease between people1, or geographic networks to help us understand the spread based on what’s going on in neighboring places.2 The network approach has also been used for cases in China3 and India4.
We will focus on geographic networks, where the nodes are districts, and the edges link two places if they are direct neighbors (first-degree) to each other.
This post extends from an earlier post that was limited to only one state in Malaysia, Selangor.5 Here we look at all the districts in Malaysia or which we could get the data with permission.6 The main challenge is to fit the districts defined in the map shape file we are using with the Covid cases dataset. We will explain these adjustments and compromises as we go along.
These are all the packages needed.
library(dplyr)
library(readr) # Loading the data
library(tidyr)
library(sf) # For the maps
library(sp) # Transform coordinates
library(ggplot2)
library(viridis)
library(igraph) # build network
library(spdep) # builds network
library(tidygraph)
library(ggraph) # for plotting networks
library(cowplot)
## # A tibble: 7,512 x 7
## daerah Date New `14 Days` Active Total negeri
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Alor Gajah 2021-04-26 22 77 NA NA Melaka
## 2 Alor Gajah 2021-04-27 5 85 NA NA Melaka
## 3 Alor Gajah 2021-05-03 12 NA NA NA Melaka
## 4 Alor Gajah 2021-05-04 16 184 NA NA Melaka
## 5 Alor Gajah 2021-05-05 19 166 NA NA Melaka
## 6 Alor Gajah 2021-05-06 30 188 NA NA Melaka
## 7 Alor Gajah 2021-05-07 20 203 NA NA Melaka
## 8 Alor Gajah 2021-05-08 39 235 NA NA Melaka
## 9 Alor Gajah 2021-05-09 30 243 NA NA Melaka
## 10 Alor Gajah 2021-05-10 39 274 NA NA Melaka
## # ... with 7,502 more rows
The columns are:
Figure 1.1: Point plot of Malaysia case data
Selangor is consistently the state with the highest number of new cases. The simple point or line plot will not give us the same insight as if we plot it on a map. Before that we will take some summary statistics and focus our analysis on these new measures.
# Convert daerah to upper case since we want to do a join later
mys_cases$daerah <- toupper(mys_cases$daerah)
mys_cases$negeri <- toupper(mys_cases$negeri)
mys_cases %>%
group_by(daerah, negeri) %>%
summarize(days_of_data = n(),
avg_new = mean(New),
med_new = median(New),
lo_new = min(New),
hi_new = max(New)) -> mys_cases
We do a bubble plot of the summary data.
Figure 1.2: New case summary by disrict and state
Only 8 out of 154 districts are outside the main cluster. Petaling is almost an outlier. The simple bubble plot says that the new cases seem worrying only for a small number of districts and one state. Why did the government impose a nationwide full lockdown starting 1 Jun 2021?7 The visual evidence is more apparent when we plot the data using the map of districts in Malaysia.
To plot a map, we have to download the shape file of districts in Malaysia.8 Save and unzip the whole folder. Then we can load it.
district_sf <- sf::st_read("F:/RProjects/Covid19/zd362bc5680.shp")
## Reading layer `zd362bc5680' from data source `F:\RProjects\Covid19\zd362bc5680.shp' using driver `ESRI Shapefile'
## Simple feature collection with 236 features and 9 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 99.64061 ymin: 0.8537959 xmax: 119.267 ymax: 7.362766
## Geodetic CRS: WGS 84
# Rename columns to follow mys_cases
district_sf %>% rename(daerah = laa, negeri = nam) -> district_sf
Figure 1.3: Map of all districts in Malaysia
Figure 1.3 will be too large to visualize for all cases. We will focus on Semenanjung Malaysia or West Malaysia first. We then repeat for East Malaysia (SABAH,SARAWAK, and WILAYAH PERSEKUTUAN LABUAN). This will make the plotting easier. we can combine plots later if needed.
The summary mys_cases dataset has granular data for KUALA LUMPUR but the district_sf dataset does not have this breakdown. So we merge the data for all the daerah in KUALA LUMPUR.
mys_cases %>% filter(negeri == "KUALA LUMPUR") %>%
group_by(negeri) %>%
summarise(days_of_data = mean(days_of_data),
avg_new = mean(avg_new),
med_new = median(med_new),
lo_new = min(lo_new),
hi_new = max(hi_new)) %>%
mutate(negeri = "WILAYAH PERSEKUTUAN",
daerah = "KUALA LUMPUR") %>%
select(daerah,negeri,days_of_data,avg_new,med_new,lo_new,hi_new) -> tmpkl
mys_cases %>% filter(negeri != "KUALA LUMPUR") %>% rbind(tmpkl) -> mys_cases
We redo the bubble plot in Figure 1.2.
Figure 1.4: New case summary by disrict and state
The combined districts (now designated as Kuala Lumpur) did not make the list (for districts with average new cases >= 100).
Figure 1.5: Map of all districts in Semenanjung
To plot the case data we group by daerah and created a tibble with the summary statistics of the data between 03/05/2021 and 14/06/2021 as in mys_cases. We filter out some of the negeri.
mys_cases %>% filter(!negeri %in% c("SABAH","SARAWAK",
"WILAYAH PERSEKUTUAN LABUAN",
"PUTRAJAYA")) %>%
distinct() -> west_cases
west_cases
## # A tibble: 87 x 7
## # Groups: daerah [87]
## daerah negeri days_of_data avg_new med_new lo_new hi_new
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALOR GAJAH MELAKA 49 51.9 41 5 197
## 2 BACHOK KELANTAN 49 31.4 24 6 101
## 3 BALING KEDAH 49 9.86 10 0 23
## 4 BANDAR BAHARU KEDAH 49 7.24 5 0 33
## 5 BARAT DAYA PULAU PINANG 49 30.1 28 1 62
## 6 BATANG PADANG PERAK 49 11.4 6 0 58
## 7 BATU PAHAT JOHOR 49 21.4 18 7 61
## 8 BENTONG PAHANG 49 14.8 13 1 43
## 9 BERA PAHANG 49 9.43 4 0 74
## 10 BESUT TERENGGANU 49 39.1 40 0 104
## # ... with 77 more rows
We do a join on the case data. We can now fill the districts based on the new case statistics. But we limit to only similar daerah in both data sets.
# Convert daerah to upper case since we want to do a join
west_cases$daerah <- toupper(west_cases$daerah)
west_cases$negeri <- toupper(west_cases$negeri)
district_sf$daerah[district_sf$daerah == "ULU LANGAT"] <- "HULU LANGAT"
# limit to only similar daerah in both data sets
intersect(west_cases$daerah, district_sf$daerah) -> same_daerah
west_cases %>% filter (daerah %in% same_daerah) -> west_cases
Next we join west_cases with district_sf and just plot the highest and lowest incidence of new cases. We can easily repeat for the other statistics.
Note that %in% returns a logical vector of TRUE and FALSE. To negate it, you can use ! in front of the logical statement
district_sf %>%
filter(!negeri %in% c("SABAH","SARAWAK","WILAYAH PERSEKUTUAN LABUAN")) %>%
left_join(west_cases, by = "daerah") -> west_sf
west_sf %>%
ggplot() +
geom_sf(aes(geometry = geometry, fill = hi_new)) +
# Everything past this point is only formatting:
scale_fill_gradient2(low = "green",
mid = "yellow",
high = "red",
midpoint = 0,
space = "Lab",
na.value = "grey80",
guide = "colourbar",
aesthetics = "fill") +
theme_void() +
labs(title = "Highest New Cases Semenanjung") -> p1
west_sf %>%
ggplot() +
geom_sf(aes(geometry = geometry, fill = lo_new)) +
# Everything past this point is only formatting:
scale_fill_gradient2(low = "green",
mid = "yellow",
high = "red",
midpoint = 0,
space = "Lab",
na.value = "grey80",
guide = "colourbar",
aesthetics = "fill") +
theme_void() +
labs(title = "Lowest New Cases Semenanjung") -> p2
cowplot::plot_grid(
p1,
p2,
labels = c("A", "B"))
Figure 1.6: Map of new cases for districts in Malaysia
This is where we get the first indication that networks will be useful. Figure 1.6 shows that neighboring districts have cases similar to neighboring areas. This makes sense. If we have an outbreak in one place we expect it to spread to neighboring places.
Figure 1.6 also confirms the bubble plots in Figures 1.2 and 1.4. Only a few districts can be deemed as dangerous districts. So why the nationwide lockdown?
High case numbers spread from one area to neighboring areas. If we can define a network graph of the districts in Semenanjung we can further analyze the data based on the properties or features of a network graph.9
To define a network, it is enough to know what makes up the nodes, and how they join together (i.e. the edges). In our case, the nodes are just the districts. We know we want the edges to be such that two nodes are joined if the two districts they represent are immediate (first-degree) neighbors on our map. The poly2nb() function (read “polygon to neighbors”) from the {spdep} package helps us do just this. It can take a shape file and tell us which areas are neighbors.
west_sf %>% distinct(daerah, .keep_all = TRUE) -> west_sf
# Use the poly2nb to get the neighborhood of each district
west_district_neighbours <- spdep::poly2nb(west_sf)
We use the nb2mat() function (“neighbors to matrix”) to turn this into an adjacency matrix. An adjacency matrix is such that cell i, j is equal to 1 if there is an edge connecting node i to node j, and a 0 otherwise. Thus an adjacency matrix defines a network. We can use the graph_from_adjacency_matrix function from the {igraph} package to create a network graph. We have to declare the parameter zero.policy = TRUE to cater for empty neighbor sets.10
# Use nb2mat to turn this into an adjacency matrix
adj_mat <- spdep::nb2mat(west_district_neighbours,
style = "B",
zero.policy = TRUE)
rownames(adj_mat) <- west_sf$daerah
colnames(adj_mat) <- west_sf$daerah
west_g <- igraph::graph_from_adjacency_matrix(adj_mat, mode = "undirected")
The {tidygraph} package handles network graphs using standard {dplyr}. We convert west_g into a tidy graph object using as_tbl_graph().
west_g <- tidygraph::as_tbl_graph(west_g)
west_g
## # A tbl_graph: 90 nodes and 196 edges
## #
## # An undirected simple graph with 8 components
## #
## # Node Data: 90 x 1 (active)
## name
## <chr>
## 1 GUA MUSANG
## 2 KOTA TINGGI
## 3 SEGAMAT
## 4 BERA
## 5 ROMPIN
## 6 LIPIS
## # ... with 84 more rows
## #
## # Edge Data: 196 x 2
## from to
## <int> <int>
## 1 1 6
## 2 1 7
## 3 1 8
## # ... with 193 more rows
Our network has 90 nodes and 196 edges. We also see that it is defined by two tibbles, one for the node data, and one for the edge data.
The node data just contains the names of the districts, and the edge data tells us which nodes are connected based on their position in the node data. For example, the first row of the edge data tells us “GUA MUSANG” and “LIPIS” are neighbors.
We can visualize the network using {ggraph}, a network-specific extension to {ggplot2}.
Figure 1.7: Semenanjung district network
We must overlay this network on top of our map from Figure 1.5. This makes it easier to translate between the network and reality.
We first need to get central coordinates for each district. The shape file we downloaded does not come with a “clean” pair of latitude and longitude coordinates for each district. Unfortunately, the polygons themselves are under a different spatial projection.
We get some information about our specific shape file from west_sf.
str(west_sf)
## Classes 'sf' and 'data.frame': 90 obs. of 16 variables:
## $ f_code : chr "FA001" "FA001" "FA001" "FA001" ...
## $ coc : chr "MYS" "MYS" "MYS" "MYS" ...
## $ negeri.x : chr "KELANTAN" "JOHOR" "JOHOR" "PAHANG" ...
## $ daerah : chr "GUA MUSANG" "KOTA TINGGI" "SEGAMAT" "BERA" ...
## $ pop : int -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 ...
## $ ypc : int 0 0 0 0 0 0 0 0 0 0 ...
## $ adm_code : chr "UNK" "UNK" "UNK" "UNK" ...
## $ salb : chr "UNK" "UNK" "UNK" "UNK" ...
## $ soc : chr "MYS" "MYS" "MYS" "MYS" ...
## $ negeri.y : chr "KELANTAN" "JOHOR" "JOHOR" "PAHANG" ...
## $ days_of_data: num 49 49 49 49 49 49 49 49 49 49 ...
## $ avg_new : num 6.04 24.92 21.27 9.43 1.98 ...
## $ med_new : num 2 21 9 4 1 3 7 5 6 6 ...
## $ lo_new : num 0 2 0 0 0 0 0 0 0 0 ...
## $ hi_new : num 36 84 291 74 18 55 64 92 72 80 ...
## $ geometry :sfc_POLYGON of length 90; first list element: List of 1
## ..$ : num [1:7319, 1:2] 102 102 102 102 102 ...
## ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
## ..- attr(*, "names")= chr [1:15] "f_code" "coc" "negeri.x" "daerah" ...
st_coordinates(st_centroid(west_sf$geometry)) %>% head(10)
## X Y
## 1 101.9594 4.947106
## 2 103.9484 1.763144
## 3 102.9634 2.484834
## 4 102.5586 3.168661
## 5 103.1134 2.872403
## 6 101.9135 4.348188
## 7 101.0947 4.833658
## 8 101.2978 5.435118
## 9 102.5519 4.258134
## 10 103.2244 4.278661
We follow the same steps as per the earlier work to extract the latitude and longitude and add it to west_sf using mutate.
west_sf <- west_sf %>%
dplyr::mutate(lon = st_coordinates(st_centroid(west_sf$geometry))[,1],
lat = st_coordinates(st_centroid(west_sf$geometry))[,2])
We extract the coordinates and transform the projection.
coords <- west_sf %>%
as.data.frame() %>%
select(lon, lat, daerah)
coords %>% head(10)
## lon lat daerah
## 1 101.9594 4.947106 GUA MUSANG
## 2 103.9484 1.763144 KOTA TINGGI
## 3 102.9634 2.484834 SEGAMAT
## 4 102.5586 3.168661 BERA
## 5 103.1134 2.872403 ROMPIN
## 6 101.9135 4.348188 LIPIS
## 7 101.0947 4.833658 KUALA KANGSAR
## 8 101.2978 5.435118 HULU PERAK
## 9 102.5519 4.258134 JERANTUT
## 10 103.2244 4.278661 KEMAMAN
This makes it easy to join the coordinates onto our network object. We essentially want to left_join onto our nodes tibble.
# Join on coordinates and sf geometry
west_g <- west_g %>%
activate("nodes") %>%
left_join(coords, by = c("name" = "daerah"))
We can use as_tibble() to extract the activated tibble (i.e. either nodes or edges). We see the coordinates have been successfully joined.
west_g %>%
as_tibble() %>%
head(10)
## # A tibble: 10 x 3
## name lon lat
## <chr> <dbl> <dbl>
## 1 GUA MUSANG 102. 4.95
## 2 KOTA TINGGI 104. 1.76
## 3 SEGAMAT 103. 2.48
## 4 BERA 103. 3.17
## 5 ROMPIN 103. 2.87
## 6 LIPIS 102. 4.35
## 7 KUALA KANGSAR 101. 4.83
## 8 HULU PERAK 101. 5.44
## 9 JERANTUT 103. 4.26
## 10 KEMAMAN 103. 4.28
Now we just create a ggraph plot, where we use the district shape file for a geom_sf layer, and then overlay our network.
ggraph(west_g, layout = "manual", x = lon, y = lat) +
geom_sf(data = west_sf, fill = "white") +
theme_void() +
geom_edge_link(colour = 'black', width = 1) +
geom_node_point(size = 2, colour = 'darkred') +
geom_node_text(aes(label = name), size = 1.5, repel = TRUE) +
theme_void() +
labs(title = "Network plot of districts in Semenanjung",
subtitle = "Each node is a district, and an edge joins two
nodes if they are neighbors")
Figure 1.8: Semenanjung district network on map
The network is exactly what we want it to be; every node is a district, and an edge connects two nodes if the districts are neighbors.
Next, we will see how the network graph allows us to create powerful parameters quickly, and then use these new parameters to have better insights into our Covid data.
Kindly refer to the earlier work for the definitions of the network parameters and the code used to generate the parameters and the plots.
We start with our network object west_g. We activate the nodes, so it knows the following mutate is to be applied to the underlying node data. We use mutate to define our new parameters to be the corresponding {tidygraph} function applied to west_g. These functions calculate some measures that the define the characteristics of nodes in a network graph.
sel_features <- west_g %>%
activate(nodes) %>%
mutate(
degree = centrality_degree(),
betweenness = centrality_betweenness(),
closeness = centrality_closeness(normalized = T),
pg_rank = centrality_pagerank(),
eigen = centrality_eigen(),
coreness = node_coreness(),
center = node_is_center(),
dist_to_center = node_distance_to(node_is_center())
)
node_measures <- sel_features %>% as_tibble()
We ask some questions about which is the “most important” node but what does “most important” mean? It of course depends on the definition and this is where network centrality measures come into play. We will look at four of these.
node_measures %>% arrange(desc(degree)) %>% head(10)
## # A tibble: 10 x 11
## name lon lat degree betweenness closeness pg_rank eigen coreness center
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 GUA ~ 102. 4.95 9 1168. 0.0893 0.0193 0.982 3 FALSE
## 2 SEGA~ 103. 2.48 9 588. 0.0828 0.0205 0.543 3 FALSE
## 3 JERA~ 103. 4.26 9 1235. 0.0901 0.0197 1 3 FALSE
## 4 HULU~ 103. 5.05 8 477. 0.0874 0.0190 0.665 3 FALSE
## 5 BATA~ 101. 4.04 8 262. 0.0854 0.0172 0.823 3 FALSE
## 6 KULIM 101. 5.40 8 355. 0.0815 0.0191 0.307 3 FALSE
## 7 BERA 103. 3.17 7 630. 0.0866 0.0151 0.757 3 FALSE
## 8 HULU~ 101. 5.44 7 985. 0.0860 0.0157 0.555 3 FALSE
## 9 BENT~ 102. 3.38 7 466. 0.0865 0.0148 0.866 3 FALSE
## 10 TAMP~ 102. 2.55 7 202. 0.0788 0.0162 0.408 3 FALSE
## # ... with 1 more variable: dist_to_center <dbl>
Please refer to the earlier work for the codes to produce the following plots and a brief explanation of the centrality measures. We will only show the example for the degree centrality measure. Also
sel_features %>%
activate("nodes") %>%
ggraph(layout = "manual", x = lon, y = lat) +
geom_sf(data = west_sf, fill = "white", color = "grey") +
theme_void() +
geom_edge_link(colour = 'black', width = 1) +
geom_node_point(aes(color = degree), size = 2) +
geom_node_text(aes(label = name), size = 1.5, repel = TRUE) +
scale_color_viridis_c() +
ggtitle("District network for Semenanjung") +
guides(color=guide_legend(title="Degree", reverse = TRUE)) -> p3
p3
Figure 1.9: Degree centrality of Semenanjung district network
Figure 1.10: Betweenness centrality of Semenanjung district network
Figure 1.11: Closeness centrality of Semenanjung district network
Figure 1.12: Eigenvector centrality of Semenanjung district network
Eigenvector Centrality can denote Connections. Is a district connected to other “well-connected” districts? Petaling is “well-connected”. What other districts are “well-connected” to Petaling.
As we have seen, there is more than one definition of “most important”. It will depend on the context (and the available information) which one to choose.
When we compare Figures 1.9, 1.10, 1.11, and 1.12 with Figures 1.4 and 1.6 on the Covid cases per district, the districts with the higher centrality measures have lower new cases and are quite far from Petaling, the district with the highest cases. This strengthens our doubt for the need of a nation wide lockdown. In a later section we will show how many network hops does it take to reach out from Petaling to the other districts.
One network parameter we calculated is center = node_is_center(). Which districts can take the role of a “CENTER”?
node_measures %>% filter(center == "TRUE") %>%
arrange(-eigen) %>%
select(-lon, -lat)
## # A tibble: 6 x 9
## name degree betweenness closeness pg_rank eigen coreness center
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 LANG~ 0 0 0.0111 0.00177 0 0 TRUE
## 2 PULA~ 0 0 0.0111 0.00177 0 0 TRUE
## 3 PULA~ 0 0 0.0111 0.00177 0 0 TRUE
## 4 PULA~ 0 0 0.0111 0.00177 0 0 TRUE
## 5 ALOR~ 0 0 0.0111 0.00177 0 0 TRUE
## 6 PULA~ 0 0 0.0111 0.00177 0 0 TRUE
## # ... with 1 more variable: dist_to_center <dbl>
These are the central districts for the whole of Semananjung Malaysia and these are also the districts with the highest cases as per Figure 1.4. So to contain the spread of Covid in Semenanjung, the network graph analysis has identified the districts that need to be controlled using more stringent controls compared to the remaining districts.
We should learn a lot more when we populate our sel_features network with Covid data.
We now want to generate new parameters based on what is happening in neighboring districts, for example, by combining the number of new cases with a particular network parameter like degree.
We left_join from the cases data to our network graph.
west_cases_network <- sel_features %>%
activate("nodes") %>%
left_join(west_cases,
by = c("name" = "daerah"))
What we want here is quite simple, and the most efficient way of calculating it is to use an adjacency matrix. We have presented the details earlier.
adj_matrix <- igraph::get.adjacency(west_cases_network)
west_cases_network <- west_cases_network %>%
mutate(cases_in_neighbourhood =
as.vector(adj_matrix %*% .N()$hi_new),
cases_average_degree =
cases_in_neighbourhood/degree)
We apply mutate to our tidygraph object. We get our vector of cases by using .N() which accesses the node data underlying the network and then using $hi_new to take the vector of new cases. We multiply our adjacency matrix to this using matrix multiplication, which in R is denoted by %*%. Finally, we wrap the result in as.vector() so mutate knows to treat this as a vector. We calculate the average number of cases by dividing by the degree of the node.
Figure 1.13: Cases of district network of neighbors
Figure 1.14: Cases of district network of neighbors
Starting from any node (district or daerah in our case), what is its distance to the other nodes in the network? The answer is 1 for the immediate neighbors. igraph has a shortest.path function to calculate the shortest distance (number of hops) between any 2 nodes in a network graph. We will combine west_g as shown in Figure 1.7 and west_sf as shown in Figure 1.8.
sp_mat <- shortest.paths(west_g)
sp_mat2 <- as.data.frame(sp_mat)
names(sp_mat2) %>% head(10)
## [1] "GUA MUSANG" "KOTA TINGGI" "SEGAMAT" "BERA"
## [5] "ROMPIN" "LIPIS" "KUALA KANGSAR" "HULU PERAK"
## [9] "JERANTUT" "KEMAMAN"
cbind(west_sf, sp_mat2) -> west_sf
The sample names from the shortest path matrix sp_mat correctly reflect the daerah. We use cbind to combine the 2 datasets.
ggplot2 now directly supports spatial data through geom_sf format. We have customized it a bit by removing elements which are not required and added a background.11 The final map is shown below in Figure 1.15. We show the case for Petaling, the hottest district.
ggplot(west_sf) +
geom_sf(aes(fill = PETALING)) +
geom_sf(fill="blue", alpha =ifelse(west_sf$daerah == "PETALING", 1, 0)) +
scale_fill_gradient2(low="darkred", mid="darksalmon",
high="cornsilk", midpoint = 5) +
labs(title="Shortest path network",
fill="Distance from \nPetaling") +
theme(
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
plot.background = element_rect(fill = "#f5f5f2", color = NA),
panel.background = element_rect(fill = "#f5f5f2", color = NA),
legend.background = element_rect(fill = "#f5f5f2", color = NA),
panel.border = element_blank()) -> p6
p6
Figure 1.15: Distance from selected district
Langkawi and Penang islands in gray are physically isolated from Petaling. They also have relatively low cases. The national lockdown will hurt them for no good reason. Only districts a “short distance” away from Petaling (perhaps only two hops) need more stringent physical access restrictions.
Figure 1.16: Combined plots
Figure 1.16 combines some of the earlier figures. It shows that when we combine our sample Semenanjung Covid case data with the graph network parameters, we get a richer analysis. Firstly, we just have the highest recorded cases within the period that we collected the data for each district. We focus on the simplest network centrality parameter, the degree. By just incorporating the concept of node neighbors and degree we can get more insights.
Obviously, Petaling is the most serious district for Covid cases in Semenanjung. It is a CENTER district in the network. So are GOMBAK, KUALA SELANGOR, and KLANG. These three districts also have high Covid cases. The plot clearly tells us that the priority strategy is to “isolate and treat” these districts. Thus, “limiting access to and from” these districts should be an effective NPI (non-pharmaceutical intervention). Vaccination must be prioritized for them too.
Sabak Bernam is a district close to the CENTER (just 2 hops away from Petaling) but offers a different scenario. It is low on all measures and also not “influenced” by its neighbors. It is a good candidate for a Randomized Control Trial (RCT) where vaccination is accelerated, access is controlled but the restrictions on social distancing within the district are being relaxed. This can allow a gradual opening of businesses, secondary schools, and mosques. There are many other districts like Sabak Bernam as can be seen from Figure 1.16.
Kuala Selangor and Kuala Langat highlight the value of networks. They appear “safe” on their own but not so when the neighbors are considered. Kuala Selangor has a relatively high betweenness, closeness, and eigenvalue measure.
There are many other network graph parameters12 like Eigenvectors which measures influence and Closeness which measures the ease of reaching other nodes. Incorporating these in the analysis will certainly give other insights.
The analysis for Semenanjung Malaysia can be repeated for Sabah, Sarawak, and Labuan. But no district from these 3 states has anomalous high cases as shown in Figure 1.2 and Figure 1.4. Furthermore, they are physically not connected to any of the “hot” districts in Semanjung.
Networks are real in our world. Physical networks like the Internet, electricity grids, mobile networks, social networks, and virus networks are all real examples that have been extensively studied using network graphs.13 Any analysis of Covid data that does not incorporate network graphs will certainly miss out on important and rich insights.
This posting has illustrated how to combine Covid case data with a physical geographical network. The various plots exhibited show the difference and added value when the network parameters are incorporated in the analysis.
Figure 1.16 really challenges the wisdom of the third total lockdown for Malaysia. Was the decision guided by data? The “hottest” districts are only 8 out of 90 in Semenanjung. They can be “isolated and treated” with more stringent controls without involving the other districts as shown by the shortest path network.
https://towardsdatascience.com/predicting-the-spread-of-covid-19-using-networks-in-r-2fee0013db91↩︎
https://www.thestar.com.my/news/nation/2021/05/28/covid-19-nsc-decides-nationwide-lockdown-from-june-1-to-14↩︎
https://earthworks.stanford.edu/catalog/stanford-zd362bc5680↩︎
https://community.rstudio.com/t/error-in-nb2listw-w-nb-style-w-empty-neighbour-sets-found/96922↩︎
Visualising Neighbouring Polygons in R, https://mikeyharper.uk/calculating-neighbouring-polygons-in-r/↩︎
http://web.stanford.edu/~jacksonm/Jackson-IntroConcepts.pdf↩︎