1 Introduction

In our previous two posts using Covid data from the districts in Selangor1 and then Semenanjung Malaysia2 we stressed the importance of the vaccination program. The vaccination data has since been published.3

This blog post shows examples of data visualization of the vaccination data including the use of vector maps.4 In the later sections we will also combine the Covid case data[1,2] into the visualization. The work involves some data wrangling to format the vaccination and Covid case data with the vector map dataframe.

Some of the visuals raise questions on the vaccination strategy adopted by the government. In addition to the slow vaccination rate, we do not see a clear approach in using the vaccination program as part of a larger exit strategy from the current national lockdown.

2 Supporting libraries

library(ggplot2)
library(dplyr)
library(tidyverse)
library(tidyr)
library(readr)
library(sf)
library(maps)
library(patchwork)
library(leaflet)

3 Data Loading and Preparation

We use the data from here.5. We do some data cleaning to make the data usable for graphing. For those familiar with the tidyverse in R, most functions prefer to have data in long format.

3.1 Malaysia vaccine data

mys_vax <- read.csv("https://raw.githubusercontent.com/CITF-Malaysia/citf-public/main/vaccination/vax_state.csv")
head(mys_vax)
##         date           state dose1_daily dose2_daily total_daily dose1_cumul
## 1 2021-02-24           Johor           0           0           0           0
## 2 2021-02-24           Kedah           0           0           0           0
## 3 2021-02-24        Kelantan           0           0           0           0
## 4 2021-02-24          Melaka           0           0           0           0
## 5 2021-02-24 Negeri Sembilan           0           0           0           0
## 6 2021-02-24          Pahang           0           0           0           0
##   dose2_cumul total_cumul
## 1           0           0
## 2           0           0
## 3           0           0
## 4           0           0
## 5           0           0
## 6           0           0

3.2 Prepare data

mys_vax %>%
     dplyr::select(date,state,total_daily,
                   dose1_cumul,dose2_cumul,total_cumul) %>%
     gather(DoseType, Measure, 3:5) -> df1
head(df1)
##         date           state total_cumul    DoseType Measure
## 1 2021-02-24           Johor           0 total_daily       0
## 2 2021-02-24           Kedah           0 total_daily       0
## 3 2021-02-24        Kelantan           0 total_daily       0
## 4 2021-02-24          Melaka           0 total_daily       0
## 5 2021-02-24 Negeri Sembilan           0 total_daily       0
## 6 2021-02-24          Pahang           0 total_daily       0

3.3 Initial point plot

The input data for the number of new cases per day is shown in Figure 3.1.
Point plot of Malaysia vaccination data

Figure 3.1: Point plot of Malaysia vaccination data

Point plot of vaccination data faceted by state

Figure 3.2: Point plot of vaccination data faceted by state

Figure 3.1 and Figure 3.2 show that Selangor and Sarawak have ramped up the daily number of vaccinations. The simple point or line plot will not give us the same insight as if we plot it on a map. Before that we will take some summary statistics and focus our analysis on these new measures.

# Convert state to upper case since we want to do a join later
mys_vax$state <- toupper(mys_vax$state)

mys_vax %>% 
  group_by(state) %>% 
  summarize(days_of_data = n(),
            avg_daily = mean(total_daily),
            med_daily = median(total_daily),
            min_daily = min(total_daily),
            max_daily = max(total_daily),
            dose1 = max(dose1_cumul),
            dose2 = max(dose2_cumul),
            total = max(total_cumul)) -> vaxsum
We do a bubble plot of the summary data.
Vaccination summary by state

Figure 3.3: Vaccination summary by state

Figure 3.3 shows that Sarawak, Selangor and Kuala Lumpur lead in both the average and highest numbers of vaccinations per day. They also lead in the total vaccinations.

3.4 Create and plot maps of districts

To plot a map, we have to download the shape file of districts in Malaysia.6 Save and unzip the whole folder. Then we can load it.

state_sf <- sf::st_read("F:/RProjects/Covid19/MYS_adm1.shp")
## Reading layer `MYS_adm1' from data source `F:\RProjects\Covid19\MYS_adm1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 13 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 99.64072 ymin: 0.855001 xmax: 119.2697 ymax: 7.380556
## Geodetic CRS:  WGS 84
# Rename columns to follow vaxsum
state_sf %>% rename(state = NAME_1) -> state_sf
Map of all states in Malaysia

Figure 3.4: Map of all states in Malaysia

Comparing Figure 3.3 with Figure 3.4 shows that some of the states are not defined separately in the map. We have to make these adjustments and also correct some of the state names.

vaxsum %>% 
  filter(state %in% c("SELANGOR", "W.P. KUALA LUMPUR", "W.P. PUTRAJAYA")) %>% 
  summarize(days_of_data = mean(days_of_data),
              avg_daily = mean(avg_daily),
              med_daily = median(med_daily),
              min_daily = min(min_daily),
              max_daily = max(max_daily),
              dose1 = sum(dose1),
              dose2 = sum(dose2),
              total = sum(total)) %>% 
  mutate(state = "SELANGOR") %>% 
  select(state, everything()) -> tmpkl

vaxsum %>% 
  filter(!state %in% c("SELANGOR", "W.P. KUALA LUMPUR", "W.P. PUTRAJAYA")) %>% 
  rbind(tmpkl) -> vaxsum

We do a join on the case data. We can now fill the states based on the new case statistics. But we limit to only similar state in both data sets. We also add population data for states.7

# Convert daerah to upper case since we want to do a join
vaxsum$state <- toupper(vaxsum$state)
state_sf$state <- toupper(state_sf$state)
state_sf$state[state_sf$state == "TRENGGANU"] <- "TERENGGANU"
# limit to only similar state in both data sets
intersect(vaxsum$state, state_sf$state) -> same_state
vaxsum %>% filter (state %in% same_state) -> vaxsum
# remove some columns 
state_sf %>% select(-c(1:4, 6:9)) -> state_sf
# join
state_sf %>%
    left_join(vaxsum, by = "state") -> state_sf

popn = c(3776.6, 2193.9, 1904.9, 936.9, 1135.9, 1682.2, 2518.6, 
         255, 1783.6, 3907.5, 2828.7, 8452.3, 1259)
state_sf %>% mutate(population = popn) -> state_sf

Next we join vaxsum with state_sf and just plot the average daily vaccination and total vaccinations to date. We can easily repeat for the other statistics. Note that %in% returns a logical vector of TRUE and FALSE. To negate it, you can use ! in front of the logical statement

state_sf %>% 
    ggplot() +
    geom_sf(aes(geometry = geometry, fill = avg_daily)) +
# Everything past this point is only formatting:
    scale_fill_gradient2(low = "green",
                         mid = "yellow",
                         high = "red",
                         midpoint = 0,
                         space = "Lab",
                         na.value = "grey80",
                         guide = "colourbar",
                         aesthetics = "fill") +
    theme_void() +
    labs(title = "Average daily vaccinations") -> p1

state_sf %>% 
    ggplot() +
    geom_sf(aes(geometry = geometry, fill = total)) +
# Everything past this point is only formatting:
    scale_fill_gradient2(low = "green",
                         mid = "yellow",
                         high = "red",
                         midpoint = 0,
                         space = "Lab",
                         na.value = "grey80",
                         guide = "colourbar",
                         aesthetics = "fill") +
    theme_void() +
    labs(title = "Total vaccinations to date") -> p2

cowplot::plot_grid(
     p1,
     p2,
     ncol = 1)
Vaccination of states in Malaysia

Figure 3.5: Vaccination of states in Malaysia

Figure 3.5 reconfirms the two states where vaccination is most advanced in terms of the numbers. Figure 3.8 however shows that this does not match with the Covid cases per state. It raises some questions on the planning of the Malaysian vaccination program.

Next we plot the population per state and the vaccination per population
Map of population and percent vaccinated per state

Figure 3.6: Map of population and percent vaccinated per state

Sarawak has more than 30% of its population having received the first dose. This is still very far from the 70% required for herd immunity. Perlis has about 15% of its population having received the second dose. It also has low case incidents as shown in Figure 3.8. Should it be the first state to exit the national lockdown? (Vaccination as per 2021-07-06)

3.5 Integrate with Covid case data

We now integrate with the Covid 19 case data.8

3.5.1 Loading the data

## # A tibble: 7,512 x 7
##    daerah     Date         New `14 Days` Active Total negeri
##    <chr>      <date>     <dbl>     <dbl>  <dbl> <dbl> <chr> 
##  1 Alor Gajah 2021-04-26    22        77     NA    NA Melaka
##  2 Alor Gajah 2021-04-27     5        85     NA    NA Melaka
##  3 Alor Gajah 2021-05-03    12        NA     NA    NA Melaka
##  4 Alor Gajah 2021-05-04    16       184     NA    NA Melaka
##  5 Alor Gajah 2021-05-05    19       166     NA    NA Melaka
##  6 Alor Gajah 2021-05-06    30       188     NA    NA Melaka
##  7 Alor Gajah 2021-05-07    20       203     NA    NA Melaka
##  8 Alor Gajah 2021-05-08    39       235     NA    NA Melaka
##  9 Alor Gajah 2021-05-09    30       243     NA    NA Melaka
## 10 Alor Gajah 2021-05-10    39       274     NA    NA Melaka
## # ... with 7,502 more rows

The columns are:

  • daerah : the Local Authority District (LAD)
  • N : number of new cases
  • 14 Days : number of new cases for the last 14 days
  • Active : active cases
  • Total : total cases
  • negeri : state

We will take some summary statistics and focus our analysis on these new measures.

# Convert daerah to upper case since we want to do a join later
mys_cases$daerah <- toupper(mys_cases$daerah)
mys_cases$negeri <- toupper(mys_cases$negeri)
mys_cases %>% 
  group_by(negeri) %>% 
  summarize(avg_new = mean(New),
            med_new = median(New),
            lo_new = min(New),
            hi_new = max(New)) -> state_cases
We do a bubble plot of the summary data.
New daily case summary by state

Figure 3.7: New daily case summary by state

By far, Selangor has the highest number of new cases per day. It is interesting to see where Sarawak stands on the Covid cases.

Next we combine some of the case data to match the states defined in our map, Figure 3.4.

state_cases %>% 
  filter(negeri %in% c("SELANGOR", "KUALA LUMPUR", "PUTRAJAYA")) %>% 
  summarize(avg_new = mean(avg_new),
            med_new = median(med_new),
            lo_new = min(lo_new),
            hi_new = max(hi_new)) %>% 
  mutate(negeri = "SELANGOR") %>% 
  select(negeri, everything()) -> tmpkl

state_cases %>% 
  filter(!negeri %in% c("SELANGOR", "KUALA LUMPUR", "PUTRAJAYA", "LABUAN")) %>% 
  rbind(tmpkl) -> state_cases

We join state_cases to state_sf.

state_sf %>%
    left_join(state_cases, by=c("state" = "negeri")) -> state_sf
Covid cases and vaccination per state

Figure 3.8: Covid cases and vaccination per state

Figure 3.8 raises some important issues and questions.

  1. Malacca is the state with the highest number of daily cases (on average) per population. Selangor looks normal on this ratio. This ratio is rarely highlighted.
  2. Although the vaccination rate is high in Selangor, it is not enough compared to the population and the “hotness” of the Covid situation in the state. Much more has to be done on the vaccination for Selangor. As the richest state, it should be allowed to roll out its own vaccination program that is not constrained by the federal program.
  3. For the low cases in Sabah, the vaccination rate is high. Perhaps more granular data at the district and mukim levels can give a better picture. The same applies to Sarawak.
  4. Perlis is a good candidate to start a phased exit from the current national lockdown. Perhaps an RCT (Randomized Control Trial) can be piloted there.

But the data does not show there is a clear strategy for the vaccination program except maybe for the categories of people like the frontliners, senior citizens, etc. Again we need more granular data to analyze.

4 Conclusion

This posting has illustrated how to combine the Malaysian Covid case and vaccination data and visualize some combinations of the data through vector maps. The various plots exhibited show the difference and added value when the ratios are incorporated in the analysis.

Figure 3.6 and Figure 3.8 really question the Malaysian vaccination program. What are its objectives? Is it to achieve a mass herd immunity for the nation? At less than 15% we are not doing well as of today. Is it to exit from the lockdown by state? Is it to control the rise in infections in the most problematic state, Selangor? The initial data shows a confusing picture.

Vaccination is a key intervention that is being considered in moving a country toward normalcy.9 Except for the supply of vaccines (which is improving), everything else related to the vaccination is in our own hands. But we must administer the vaccines with a clear purpose. We must do it to fit some exit strategy out of Covid. Right now, the current data does not show this. We need more granular data to answer some of the issues and questions raised.

5 Reference


  1. https://rpubs.com/azmanH/782998↩︎

  2. https://rpubs.com/azmanH/786112↩︎

  3. https://github.com/CITF-Malaysia/citf-public↩︎

  4. Geocomputation with R, https://geocompr.robinlovelace.net/index.html↩︎

  5. https://raw.githubusercontent.com/CITF-Malaysia/citf-public/9739fc10f85f2fbab06d98e9ce4ab43456c63fa3/vaccination/vax_state.csv↩︎

  6. https://www.diva-gis.org/gdata↩︎

  7. https://www.statista.com/statistics/1040670/malaysia-population-distribution-by-state/↩︎

  8. https://www.pascacovid.my/data/sources_of_data.pdf↩︎

  9. https://www.nature.com/articles/d41586-021-00728-2↩︎