1

1.1 Introduction

Like all infectious diseases, Covid-19 transmission happens through close contact. It has to pass from person to person, and these people will have to travel to spread the virus there. Thus networks have often been used to help model the spread of diseases, whether it’s by using a social network to model the spread of the disease between people¹, or geographic networks to help us understand the spread based on what’s going on in neighboring places.² The network approach has also been used for cases in China³ and India⁴.

We will focus on geographic networks, where the nodes are districts, and the edges link two places if they are direct neighbors (first-degree) to each other.

This post extends from an earlier post that was limited to only one state in Malaysia, Selangor.⁵ Here we look at all the districts in Malaysia or which we could get the data with permission.⁶ The main challenge is to fit the districts defined in the map shape file we are using with the Covid cases dataset. We will explain these adjustments and compromises as we go along.

These are all the packages needed.

1.2 Set up and load packages

library(dplyr)
library(readr) # Loading the data
library(tidyr)
library(sf) # For the maps 
library(sp) # Transform coordinates 
library(ggplot2)
library(viridis)
library(igraph) # build network
library(spdep) # builds network
library(tidygraph)
library(ggraph) # for plotting networks
library(cowplot)

1.3 Loading the data

## # A tibble: 7,512 x 7
##    daerah     Date         New `14 Days` Active Total negeri
##    <chr>      <date>     <dbl>     <dbl>  <dbl> <dbl> <chr> 
##  1 Alor Gajah 2021-04-26    22        77     NA    NA Melaka
##  2 Alor Gajah 2021-04-27     5        85     NA    NA Melaka
##  3 Alor Gajah 2021-05-03    12        NA     NA    NA Melaka
##  4 Alor Gajah 2021-05-04    16       184     NA    NA Melaka
##  5 Alor Gajah 2021-05-05    19       166     NA    NA Melaka
##  6 Alor Gajah 2021-05-06    30       188     NA    NA Melaka
##  7 Alor Gajah 2021-05-07    20       203     NA    NA Melaka
##  8 Alor Gajah 2021-05-08    39       235     NA    NA Melaka
##  9 Alor Gajah 2021-05-09    30       243     NA    NA Melaka
## 10 Alor Gajah 2021-05-10    39       274     NA    NA Melaka
## # ... with 7,502 more rows

The columns are:

daerah : the Local Authority District (LAD)
N : number of new cases
14 Days : number of new cases for the last 14 days
Active : active cases
Total : total cases
negeri : state

1.4 Initial point plot

The input data for the number of new cases per day is shown in Figure 1.1.

Figure 1.1: Point plot of Malaysia case data

Selangor is consistently the state with the highest number of new cases. The simple point or line plot will not give us the same insight as if we plot it on a map. Before that we will take some summary statistics and focus our analysis on these new measures.

# Convert daerah to upper case since we want to do a join later
mys_cases$daerah <- toupper(mys_cases$daerah)
mys_cases$negeri <- toupper(mys_cases$negeri)
mys_cases %>% 
  group_by(daerah, negeri) %>% 
  summarize(days_of_data = n(),
            avg_new = mean(New),
            med_new = median(New),
            lo_new = min(New),
            hi_new = max(New)) -> mys_cases

We do a bubble plot of the summary data.

Figure 1.2: New case summary by disrict and state

Only 8 out of 154 districts are outside the main cluster. Petaling is almost an outlier. The simple bubble plot says that the new cases seem worrying only for a small number of districts and one state. Why did the government impose a nationwide full lockdown starting 1 Jun 2021?⁷ The visual evidence is more apparent when we plot the data using the map of districts in Malaysia.

1.5 Create and plot maps of districts

To plot a map, we have to download the shape file of districts in Malaysia.⁸ Save and unzip the whole folder. Then we can load it.

district_sf <- sf::st_read("F:/RProjects/Covid19/zd362bc5680.shp")

## Reading layer `zd362bc5680' from data source `F:\RProjects\Covid19\zd362bc5680.shp' using driver `ESRI Shapefile'
## Simple feature collection with 236 features and 9 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 99.64061 ymin: 0.8537959 xmax: 119.267 ymax: 7.362766
## Geodetic CRS:  WGS 84

# Rename columns to follow mys_cases
district_sf %>% rename(daerah = laa, negeri = nam) -> district_sf

Figure 1.3: Map of all districts in Malaysia

Figure 1.3 will be too large to visualize for all cases. We will focus on Semenanjung Malaysia or West Malaysia first. We then repeat for East Malaysia (SABAH,SARAWAK, and WILAYAH PERSEKUTUAN LABUAN). This will make the plotting easier. we can combine plots later if needed.

1.6 Semenanjung districts

The summary mys_cases dataset has granular data for KUALA LUMPUR but the district_sf dataset does not have this breakdown. So we merge the data for all the daerah in KUALA LUMPUR.

mys_cases %>% filter(negeri == "KUALA LUMPUR") %>% 
  group_by(negeri) %>%
  summarise(days_of_data = mean(days_of_data),
            avg_new = mean(avg_new),
            med_new = median(med_new),
            lo_new = min(lo_new),
            hi_new = max(hi_new)) %>%
  mutate(negeri = "WILAYAH PERSEKUTUAN",
         daerah = "KUALA LUMPUR") %>%
  select(daerah,negeri,days_of_data,avg_new,med_new,lo_new,hi_new) -> tmpkl

mys_cases %>% filter(negeri != "KUALA LUMPUR") %>% rbind(tmpkl) -> mys_cases

We redo the bubble plot in Figure 1.2.

Figure 1.4: New case summary by disrict and state

The combined districts (now designated as Kuala Lumpur) did not make the list (for districts with average new cases >= 100).

Figure 1.5: Map of all districts in Semenanjung

1.7 Combine with Covid case data

To plot the case data we group by daerah and created a tibble with the summary statistics of the data between 03/05/2021 and 14/06/2021 as in mys_cases. We filter out some of the negeri.

mys_cases %>% filter(!negeri %in% c("SABAH","SARAWAK",
                                    "WILAYAH PERSEKUTUAN LABUAN",
                                    "PUTRAJAYA")) %>%
  distinct() -> west_cases
west_cases

## # A tibble: 87 x 7
## # Groups:   daerah [87]
##    daerah        negeri       days_of_data avg_new med_new lo_new hi_new
##    <chr>         <chr>               <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
##  1 ALOR GAJAH    MELAKA                 49   51.9       41      5    197
##  2 BACHOK        KELANTAN               49   31.4       24      6    101
##  3 BALING        KEDAH                  49    9.86      10      0     23
##  4 BANDAR BAHARU KEDAH                  49    7.24       5      0     33
##  5 BARAT DAYA    PULAU PINANG           49   30.1       28      1     62
##  6 BATANG PADANG PERAK                  49   11.4        6      0     58
##  7 BATU PAHAT    JOHOR                  49   21.4       18      7     61
##  8 BENTONG       PAHANG                 49   14.8       13      1     43
##  9 BERA          PAHANG                 49    9.43       4      0     74
## 10 BESUT         TERENGGANU             49   39.1       40      0    104
## # ... with 77 more rows

We do a join on the case data. We can now fill the districts based on the new case statistics. But we limit to only similar daerah in both data sets.

# Convert daerah to upper case since we want to do a join
west_cases$daerah <- toupper(west_cases$daerah)
west_cases$negeri <- toupper(west_cases$negeri)
district_sf$daerah[district_sf$daerah == "ULU LANGAT"] <- "HULU LANGAT"
# limit to only similar daerah in both data sets
intersect(west_cases$daerah, district_sf$daerah) -> same_daerah
west_cases %>% filter (daerah %in% same_daerah) -> west_cases

Next we join west_cases with district_sf and just plot the highest and lowest incidence of new cases. We can easily repeat for the other statistics. Note that %in% returns a logical vector of TRUE and FALSE. To negate it, you can use ! in front of the logical statement

district_sf %>%
    filter(!negeri %in% c("SABAH","SARAWAK","WILAYAH PERSEKUTUAN LABUAN")) %>%
    left_join(west_cases, by = "daerah") -> west_sf

west_sf %>% 
    ggplot() +
    geom_sf(aes(geometry = geometry, fill = hi_new)) +
# Everything past this point is only formatting:
    scale_fill_gradient2(low = "green",
                         mid = "yellow",
                         high = "red",
                         midpoint = 0,
                         space = "Lab",
                         na.value = "grey80",
                         guide = "colourbar",
                         aesthetics = "fill") +
    theme_void() +
    labs(title = "Highest New Cases Semenanjung") -> p1

west_sf %>% 
    ggplot() +
    geom_sf(aes(geometry = geometry, fill = lo_new)) +
# Everything past this point is only formatting:
    scale_fill_gradient2(low = "green",
                         mid = "yellow",
                         high = "red",
                         midpoint = 0,
                         space = "Lab",
                         na.value = "grey80",
                         guide = "colourbar",
                         aesthetics = "fill") +
    theme_void() +
    labs(title = "Lowest New Cases Semenanjung") -> p2

cowplot::plot_grid(
     p1,
     p2,
     labels = c("A", "B"))

Figure 1.6: Map of new cases for districts in Malaysia

This is where we get the first indication that networks will be useful. Figure 1.6 shows that neighboring districts have cases similar to neighboring areas. This makes sense. If we have an outbreak in one place we expect it to spread to neighboring places.

Figure 1.6 also confirms the bubble plots in Figures 1.2 and 1.4. Only a few districts can be deemed as dangerous districts. So why the nationwide lockdown?

High case numbers spread from one area to neighboring areas. If we can define a network graph of the districts in Semenanjung we can further analyze the data based on the properties or features of a network graph.⁹

1.8 The Network

To define a network, it is enough to know what makes up the nodes, and how they join together (i.e. the edges). In our case, the nodes are just the districts. We know we want the edges to be such that two nodes are joined if the two districts they represent are immediate (first-degree) neighbors on our map. The poly2nb() function (read “polygon to neighbors”) from the {spdep} package helps us do just this. It can take a shape file and tell us which areas are neighbors.

west_sf %>% distinct(daerah, .keep_all = TRUE) -> west_sf
# Use the poly2nb to get the neighborhood of each district
west_district_neighbours <- spdep::poly2nb(west_sf)

We use the nb2mat() function (“neighbors to matrix”) to turn this into an adjacency matrix. An adjacency matrix is such that cell i, j is equal to 1 if there is an edge connecting node i to node j, and a 0 otherwise. Thus an adjacency matrix defines a network. We can use the graph_from_adjacency_matrix function from the {igraph} package to create a network graph. We have to declare the parameter zero.policy = TRUE to cater for empty neighbor sets.¹⁰

# Use nb2mat to turn this into an adjacency matrix
adj_mat <- spdep::nb2mat(west_district_neighbours,
                         style = "B",
                         zero.policy = TRUE)
rownames(adj_mat) <- west_sf$daerah
colnames(adj_mat) <- west_sf$daerah
west_g <- igraph::graph_from_adjacency_matrix(adj_mat, mode = "undirected")

The {tidygraph} package handles network graphs using standard {dplyr}. We convert west_g into a tidy graph object using as_tbl_graph().

west_g <- tidygraph::as_tbl_graph(west_g)
west_g

## # A tbl_graph: 90 nodes and 196 edges
## #
## # An undirected simple graph with 8 components
## #
## # Node Data: 90 x 1 (active)
##   name       
##   <chr>      
## 1 GUA MUSANG 
## 2 KOTA TINGGI
## 3 SEGAMAT    
## 4 BERA       
## 5 ROMPIN     
## 6 LIPIS      
## # ... with 84 more rows
## #
## # Edge Data: 196 x 2
##    from    to
##   <int> <int>
## 1     1     6
## 2     1     7
## 3     1     8
## # ... with 193 more rows

Our network has 90 nodes and 196 edges. We also see that it is defined by two tibbles, one for the node data, and one for the edge data.

The node data just contains the names of the districts, and the edge data tells us which nodes are connected based on their position in the node data. For example, the first row of the edge data tells us “GUA MUSANG” and “LIPIS” are neighbors.

We can visualize the network using {ggraph}, a network-specific extension to {ggplot2}.

Figure 1.7: Semenanjung district network

We must overlay this network on top of our map from Figure 1.5. This makes it easier to translate between the network and reality.

We first need to get central coordinates for each district. The shape file we downloaded does not come with a “clean” pair of latitude and longitude coordinates for each district. Unfortunately, the polygons themselves are under a different spatial projection.

We get some information about our specific shape file from west_sf.

str(west_sf)

## Classes 'sf' and 'data.frame':   90 obs. of  16 variables:
##  $ f_code      : chr  "FA001" "FA001" "FA001" "FA001" ...
##  $ coc         : chr  "MYS" "MYS" "MYS" "MYS" ...
##  $ negeri.x    : chr  "KELANTAN" "JOHOR" "JOHOR" "PAHANG" ...
##  $ daerah      : chr  "GUA MUSANG" "KOTA TINGGI" "SEGAMAT" "BERA" ...
##  $ pop         : int  -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 -99999999 ...
##  $ ypc         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ adm_code    : chr  "UNK" "UNK" "UNK" "UNK" ...
##  $ salb        : chr  "UNK" "UNK" "UNK" "UNK" ...
##  $ soc         : chr  "MYS" "MYS" "MYS" "MYS" ...
##  $ negeri.y    : chr  "KELANTAN" "JOHOR" "JOHOR" "PAHANG" ...
##  $ days_of_data: num  49 49 49 49 49 49 49 49 49 49 ...
##  $ avg_new     : num  6.04 24.92 21.27 9.43 1.98 ...
##  $ med_new     : num  2 21 9 4 1 3 7 5 6 6 ...
##  $ lo_new      : num  0 2 0 0 0 0 0 0 0 0 ...
##  $ hi_new      : num  36 84 291 74 18 55 64 92 72 80 ...
##  $ geometry    :sfc_POLYGON of length 90; first list element: List of 1
##   ..$ : num [1:7319, 1:2] 102 102 102 102 102 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:15] "f_code" "coc" "negeri.x" "daerah" ...

st_coordinates(st_centroid(west_sf$geometry)) %>% head(10)

##           X        Y
## 1  101.9594 4.947106
## 2  103.9484 1.763144
## 3  102.9634 2.484834
## 4  102.5586 3.168661
## 5  103.1134 2.872403
## 6  101.9135 4.348188
## 7  101.0947 4.833658
## 8  101.2978 5.435118
## 9  102.5519 4.258134
## 10 103.2244 4.278661

We follow the same steps as per the earlier work to extract the latitude and longitude and add it to west_sf using mutate.

west_sf <- west_sf %>%
    dplyr::mutate(lon = st_coordinates(st_centroid(west_sf$geometry))[,1],
                  lat = st_coordinates(st_centroid(west_sf$geometry))[,2])

We extract the coordinates and transform the projection.

coords <- west_sf %>% 
  as.data.frame() %>% 
  select(lon, lat, daerah)
coords %>% head(10)

##         lon      lat        daerah
## 1  101.9594 4.947106    GUA MUSANG
## 2  103.9484 1.763144   KOTA TINGGI
## 3  102.9634 2.484834       SEGAMAT
## 4  102.5586 3.168661          BERA
## 5  103.1134 2.872403        ROMPIN
## 6  101.9135 4.348188         LIPIS
## 7  101.0947 4.833658 KUALA KANGSAR
## 8  101.2978 5.435118    HULU PERAK
## 9  102.5519 4.258134      JERANTUT
## 10 103.2244 4.278661       KEMAMAN

This makes it easy to join the coordinates onto our network object. We essentially want to left_join onto our nodes tibble.

# Join on coordinates and sf geometry
west_g <- west_g %>% 
  activate("nodes") %>% 
  left_join(coords, by = c("name" = "daerah"))

We can use as_tibble() to extract the activated tibble (i.e. either nodes or edges). We see the coordinates have been successfully joined.

west_g %>% 
  as_tibble() %>% 
  head(10)

## # A tibble: 10 x 3
##    name            lon   lat
##    <chr>         <dbl> <dbl>
##  1 GUA MUSANG     102.  4.95
##  2 KOTA TINGGI    104.  1.76
##  3 SEGAMAT        103.  2.48
##  4 BERA           103.  3.17
##  5 ROMPIN         103.  2.87
##  6 LIPIS          102.  4.35
##  7 KUALA KANGSAR  101.  4.83
##  8 HULU PERAK     101.  5.44
##  9 JERANTUT       103.  4.26
## 10 KEMAMAN        103.  4.28

Now we just create a ggraph plot, where we use the district shape file for a geom_sf layer, and then overlay our network.

ggraph(west_g, layout = "manual", x = lon, y = lat)  +
  geom_sf(data = west_sf, fill = "white") +
  theme_void() +
  geom_edge_link(colour = 'black', width = 1) + 
  geom_node_point(size = 2, colour = 'darkred') +
  geom_node_text(aes(label = name), size = 1.5, repel = TRUE) +
  theme_void() +
  labs(title = "Network plot of districts in Semenanjung",
       subtitle = "Each node is a district, and an edge joins two 
       nodes if they are neighbors")

Figure 1.8: Semenanjung district network on map

The network is exactly what we want it to be; every node is a district, and an edge connects two nodes if the districts are neighbors.

Next, we will see how the network graph allows us to create powerful parameters quickly, and then use these new parameters to have better insights into our Covid data.

1.9 Network parameters

Kindly refer to the earlier work for the definitions of the network parameters and the code used to generate the parameters and the plots.

We start with our network object west_g. We activate the nodes, so it knows the following mutate is to be applied to the underlying node data. We use mutate to define our new parameters to be the corresponding {tidygraph} function applied to west_g. These functions calculate some measures that the define the characteristics of nodes in a network graph.

sel_features <- west_g %>%
    activate(nodes) %>%
    mutate(
        degree = centrality_degree(),
        betweenness = centrality_betweenness(),
        closeness = centrality_closeness(normalized = T),
        pg_rank = centrality_pagerank(),
        eigen = centrality_eigen(),
        coreness = node_coreness(),
        center = node_is_center(),
        dist_to_center = node_distance_to(node_is_center())
    )

node_measures <- sel_features %>% as_tibble()

1.9.1 Which district is the most important influencer?

We ask some questions about which is the “most important” node but what does “most important” mean? It of course depends on the definition and this is where network centrality measures come into play. We will look at four of these.

node_measures %>% arrange(desc(degree)) %>% head(10)

## # A tibble: 10 x 11
##    name    lon   lat degree betweenness closeness pg_rank eigen coreness center
##    <chr> <dbl> <dbl>  <dbl>       <dbl>     <dbl>   <dbl> <dbl>    <dbl> <lgl> 
##  1 GUA ~  102.  4.95      9       1168.    0.0893  0.0193 0.982        3 FALSE 
##  2 SEGA~  103.  2.48      9        588.    0.0828  0.0205 0.543        3 FALSE 
##  3 JERA~  103.  4.26      9       1235.    0.0901  0.0197 1            3 FALSE 
##  4 HULU~  103.  5.05      8        477.    0.0874  0.0190 0.665        3 FALSE 
##  5 BATA~  101.  4.04      8        262.    0.0854  0.0172 0.823        3 FALSE 
##  6 KULIM  101.  5.40      8        355.    0.0815  0.0191 0.307        3 FALSE 
##  7 BERA   103.  3.17      7        630.    0.0866  0.0151 0.757        3 FALSE 
##  8 HULU~  101.  5.44      7        985.    0.0860  0.0157 0.555        3 FALSE 
##  9 BENT~  102.  3.38      7        466.    0.0865  0.0148 0.866        3 FALSE 
## 10 TAMP~  102.  2.55      7        202.    0.0788  0.0162 0.408        3 FALSE 
## # ... with 1 more variable: dist_to_center <dbl>

Please refer to the earlier work for the codes to produce the following plots and a brief explanation of the centrality measures. We will only show the example for the degree centrality measure. Also

1.9.2 Degree network

sel_features %>% 
    activate("nodes") %>% 
    ggraph(layout = "manual", x = lon, y = lat)  +
    geom_sf(data = west_sf, fill = "white", color = "grey") +
    theme_void() +
    geom_edge_link(colour = 'black', width = 1) + 
    geom_node_point(aes(color = degree), size = 2) +
    geom_node_text(aes(label = name), size = 1.5, repel = TRUE) +
    scale_color_viridis_c() +
    ggtitle("District network for Semenanjung") +
    guides(color=guide_legend(title="Degree", reverse = TRUE)) -> p3
p3

Figure 1.9: Degree centrality of Semenanjung district network

1.9.3 Betweenness network

Figure 1.10: Betweenness centrality of Semenanjung district network

1.9.4 Closeness network

Figure 1.11: Closeness centrality of Semenanjung district network

1.9.5 Eigenvector network

Figure 1.12: Eigenvector centrality of Semenanjung district network

Eigenvector Centrality can denote Connections. Is a district connected to other “well-connected” districts? Petaling is “well-connected”. What other districts are “well-connected” to Petaling.

As we have seen, there is more than one definition of “most important”. It will depend on the context (and the available information) which one to choose.

When we compare Figures 1.9, 1.10, 1.11, and 1.12 with Figures 1.4 and 1.6 on the Covid cases per district, the districts with the higher centrality measures have lower new cases and are quite far from Petaling, the district with the highest cases. This strengthens our doubt for the need of a nation wide lockdown. In a later section we will show how many network hops does it take to reach out from Petaling to the other districts.

One network parameter we calculated is center = node_is_center(). Which districts can take the role of a “CENTER”?

node_measures %>% filter(center == "TRUE") %>% 
  arrange(-eigen) %>%
  select(-lon, -lat)

## # A tibble: 6 x 9
##   name  degree betweenness closeness pg_rank eigen coreness center
##   <chr>  <dbl>       <dbl>     <dbl>   <dbl> <dbl>    <dbl> <lgl> 
## 1 LANG~      0           0    0.0111 0.00177     0        0 TRUE  
## 2 PULA~      0           0    0.0111 0.00177     0        0 TRUE  
## 3 PULA~      0           0    0.0111 0.00177     0        0 TRUE  
## 4 PULA~      0           0    0.0111 0.00177     0        0 TRUE  
## 5 ALOR~      0           0    0.0111 0.00177     0        0 TRUE  
## 6 PULA~      0           0    0.0111 0.00177     0        0 TRUE  
## # ... with 1 more variable: dist_to_center <dbl>

These are the central districts for the whole of Semananjung Malaysia and these are also the districts with the highest cases as per Figure 1.4. So to contain the spread of Covid in Semenanjung, the network graph analysis has identified the districts that need to be controlled using more stringent controls compared to the remaining districts.

We should learn a lot more when we populate our sel_features network with Covid data.

1.10 Features based on neighbors

We now want to generate new parameters based on what is happening in neighboring districts, for example, by combining the number of new cases with a particular network parameter like degree.

We left_join from the cases data to our network graph.

west_cases_network <- sel_features %>%  
  activate("nodes") %>% 
  left_join(west_cases, 
            by = c("name" = "daerah"))

What we want here is quite simple, and the most efficient way of calculating it is to use an adjacency matrix. We have presented the details earlier.

adj_matrix <- igraph::get.adjacency(west_cases_network)

west_cases_network <- west_cases_network %>% 
  mutate(cases_in_neighbourhood = 
             as.vector(adj_matrix %*% .N()$hi_new),
         cases_average_degree =     
             cases_in_neighbourhood/degree)

We apply mutate to our tidygraph object. We get our vector of cases by using .N() which accesses the node data underlying the network and then using $hi_new to take the vector of new cases. We multiply our adjacency matrix to this using matrix multiplication, which in R is denoted by %*%. Finally, we wrap the result in as.vector() so mutate knows to treat this as a vector. We calculate the average number of cases by dividing by the degree of the node.

Figure 1.13: Cases of district network of neighbors

Figure 1.14: Cases of district network of neighbors

1.11 Network distance from selected district

Starting from any node (district or daerah in our case), what is its distance to the other nodes in the network? The answer is 1 for the immediate neighbors. igraph has a shortest.path function to calculate the shortest distance (number of hops) between any 2 nodes in a network graph. We will combine west_g as shown in Figure 1.7 and west_sf as shown in Figure 1.8.

sp_mat <- shortest.paths(west_g)
sp_mat2 <- as.data.frame(sp_mat)
names(sp_mat2) %>% head(10)

##  [1] "GUA MUSANG"    "KOTA TINGGI"   "SEGAMAT"       "BERA"         
##  [5] "ROMPIN"        "LIPIS"         "KUALA KANGSAR" "HULU PERAK"   
##  [9] "JERANTUT"      "KEMAMAN"

cbind(west_sf, sp_mat2) -> west_sf

The sample names from the shortest path matrix sp_mat correctly reflect the daerah. We use cbind to combine the 2 datasets.

ggplot2 now directly supports spatial data through geom_sf format. We have customized it a bit by removing elements which are not required and added a background.¹¹ The final map is shown below in Figure 1.15. We show the case for Petaling, the hottest district.

ggplot(west_sf) +
     geom_sf(aes(fill = PETALING)) +
     geom_sf(fill="blue", alpha =ifelse(west_sf$daerah == "PETALING", 1, 0)) +
     scale_fill_gradient2(low="darkred", mid="darksalmon", 
                          high="cornsilk", midpoint = 5) +
     labs(title="Shortest path network",
          fill="Distance from \nPetaling") +
     theme(
         axis.line = element_blank(),
         axis.text.x = element_blank(),
         axis.text.y = element_blank(),
         axis.ticks = element_blank(),
         axis.title.x = element_blank(),
         axis.title.y = element_blank(),
         panel.grid.minor = element_blank(),
         panel.grid.major = element_blank(),
         plot.background = element_rect(fill = "#f5f5f2", color = NA), 
         panel.background = element_rect(fill = "#f5f5f2", color = NA), 
         legend.background = element_rect(fill = "#f5f5f2", color = NA),
         panel.border = element_blank()) -> p6
p6

Figure 1.15: Distance from selected district

Langkawi and Penang islands in gray are physically isolated from Petaling. They also have relatively low cases. The national lockdown will hurt them for no good reason. Only districts a “short distance” away from Petaling (perhaps only two hops) need more stringent physical access restrictions.

1.12 Discussion

Figure 1.16: Combined plots

Figure 1.16 combines some of the earlier figures. It shows that when we combine our sample Semenanjung Covid case data with the graph network parameters, we get a richer analysis. Firstly, we just have the highest recorded cases within the period that we collected the data for each district. We focus on the simplest network centrality parameter, the degree. By just incorporating the concept of node neighbors and degree we can get more insights.

Obviously, Petaling is the most serious district for Covid cases in Semenanjung. It is a CENTER district in the network. So are GOMBAK, KUALA SELANGOR, and KLANG. These three districts also have high Covid cases. The plot clearly tells us that the priority strategy is to “isolate and treat” these districts. Thus, “limiting access to and from” these districts should be an effective NPI (non-pharmaceutical intervention). Vaccination must be prioritized for them too.

Sabak Bernam is a district close to the CENTER (just 2 hops away from Petaling) but offers a different scenario. It is low on all measures and also not “influenced” by its neighbors. It is a good candidate for a Randomized Control Trial (RCT) where vaccination is accelerated, access is controlled but the restrictions on social distancing within the district are being relaxed. This can allow a gradual opening of businesses, secondary schools, and mosques. There are many other districts like Sabak Bernam as can be seen from Figure 1.16.

Kuala Selangor and Kuala Langat highlight the value of networks. They appear “safe” on their own but not so when the neighbors are considered. Kuala Selangor has a relatively high betweenness, closeness, and eigenvalue measure.

There are many other network graph parameters¹² like Eigenvectors which measures influence and Closeness which measures the ease of reaching other nodes. Incorporating these in the analysis will certainly give other insights.

The analysis for Semenanjung Malaysia can be repeated for Sabah, Sarawak, and Labuan. But no district from these 3 states has anomalous high cases as shown in Figure 1.2 and Figure 1.4. Furthermore, they are physically not connected to any of the “hot” districts in Semanjung.

1.13 Conclusion

Networks are real in our world. Physical networks like the Internet, electricity grids, mobile networks, social networks, and virus networks are all real examples that have been extensively studied using network graphs.¹³ Any analysis of Covid data that does not incorporate network graphs will certainly miss out on important and rich insights.

This posting has illustrated how to combine Covid case data with a physical geographical network. The various plots exhibited show the difference and added value when the network parameters are incorporated in the analysis.

Figure 1.16 really challenges the wisdom of the third total lockdown for Malaysia. Was the decision guided by data? The “hottest” districts are only 8 out of 90 in Semenanjung. They can be “isolated and treated” with more stringent controls without involving the other districts as shown by the shortest path network.

Visualizing Covid Data With Network Parameters For Malaysia

Argument against a national lock-down

Azman Hussin

2021-06-28

1