In our previous two posts using Covid data from the districts in Selangor1 and then Semenanjung Malaysia2 we stressed the importance of the vaccination program. The vaccination data has since been published.3
This blog post shows examples of data visualization of the vaccination data including the use of vector maps.4 In the later sections we will also combine the Covid case data[1,2] into the visualization. The work involves some data wrangling to format the vaccination and Covid case data with the vector map dataframe.
Some of the visuals raise questions on the vaccination strategy adopted by the government. In addition to the slow vaccination rate, we do not see a clear approach in using the vaccination program as part of a larger exit strategy from the current national lockdown.
library(ggplot2)
library(dplyr)
library(tidyverse)
library(tidyr)
library(readr)
library(sf)
library(maps)
library(patchwork)
library(leaflet)
We use the data from here.5. We do some data cleaning to make the data usable for graphing. For those familiar with the tidyverse in R, most functions prefer to have data in long format.
mys_vax <- read.csv("https://raw.githubusercontent.com/CITF-Malaysia/citf-public/main/vaccination/vax_state.csv")
head(mys_vax)
## date state dose1_daily dose2_daily total_daily dose1_cumul
## 1 2021-02-24 Johor 0 0 0 0
## 2 2021-02-24 Kedah 0 0 0 0
## 3 2021-02-24 Kelantan 0 0 0 0
## 4 2021-02-24 Melaka 0 0 0 0
## 5 2021-02-24 Negeri Sembilan 0 0 0 0
## 6 2021-02-24 Pahang 0 0 0 0
## dose2_cumul total_cumul
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
mys_vax %>%
dplyr::select(date,state,total_daily,
dose1_cumul,dose2_cumul,total_cumul) %>%
gather(DoseType, Measure, 3:5) -> df1
head(df1)
## date state total_cumul DoseType Measure
## 1 2021-02-24 Johor 0 total_daily 0
## 2 2021-02-24 Kedah 0 total_daily 0
## 3 2021-02-24 Kelantan 0 total_daily 0
## 4 2021-02-24 Melaka 0 total_daily 0
## 5 2021-02-24 Negeri Sembilan 0 total_daily 0
## 6 2021-02-24 Pahang 0 total_daily 0
Figure 3.1: Point plot of Malaysia vaccination data
Figure 3.2: Point plot of vaccination data faceted by state
Figure 3.1 and Figure 3.2 show that Selangor and Sarawak have ramped up the daily number of vaccinations. The simple point or line plot will not give us the same insight as if we plot it on a map. Before that we will take some summary statistics and focus our analysis on these new measures.
# Convert state to upper case since we want to do a join later
mys_vax$state <- toupper(mys_vax$state)
mys_vax %>%
group_by(state) %>%
summarize(days_of_data = n(),
avg_daily = mean(total_daily),
med_daily = median(total_daily),
min_daily = min(total_daily),
max_daily = max(total_daily),
dose1 = max(dose1_cumul),
dose2 = max(dose2_cumul),
total = max(total_cumul)) -> vaxsum
We do a bubble plot of the summary data.
Figure 3.3: Vaccination summary by state
Figure 3.3 shows that Sarawak, Selangor and Kuala Lumpur lead in both the average and highest numbers of vaccinations per day. They also lead in the total vaccinations.
To plot a map, we have to download the shape file of districts in Malaysia.6 Save and unzip the whole folder. Then we can load it.
state_sf <- sf::st_read("F:/RProjects/Covid19/MYS_adm1.shp")
## Reading layer `MYS_adm1' from data source `F:\RProjects\Covid19\MYS_adm1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 13 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 99.64072 ymin: 0.855001 xmax: 119.2697 ymax: 7.380556
## Geodetic CRS: WGS 84
# Rename columns to follow vaxsum
state_sf %>% rename(state = NAME_1) -> state_sf
Figure 3.4: Map of all states in Malaysia
Comparing Figure 3.3 with Figure 3.4 shows that some of the states are not defined separately in the map. We have to make these adjustments and also correct some of the state names.
vaxsum %>%
filter(state %in% c("SELANGOR", "W.P. KUALA LUMPUR", "W.P. PUTRAJAYA")) %>%
summarize(days_of_data = mean(days_of_data),
avg_daily = mean(avg_daily),
med_daily = median(med_daily),
min_daily = min(min_daily),
max_daily = max(max_daily),
dose1 = sum(dose1),
dose2 = sum(dose2),
total = sum(total)) %>%
mutate(state = "SELANGOR") %>%
select(state, everything()) -> tmpkl
vaxsum %>%
filter(!state %in% c("SELANGOR", "W.P. KUALA LUMPUR", "W.P. PUTRAJAYA")) %>%
rbind(tmpkl) -> vaxsum
We do a join on the case data. We can now fill the states based on the new case statistics. But we limit to only similar state in both data sets. We also add population data for states.7
# Convert daerah to upper case since we want to do a join
vaxsum$state <- toupper(vaxsum$state)
state_sf$state <- toupper(state_sf$state)
state_sf$state[state_sf$state == "TRENGGANU"] <- "TERENGGANU"
# limit to only similar state in both data sets
intersect(vaxsum$state, state_sf$state) -> same_state
vaxsum %>% filter (state %in% same_state) -> vaxsum
# remove some columns
state_sf %>% select(-c(1:4, 6:9)) -> state_sf
# join
state_sf %>%
left_join(vaxsum, by = "state") -> state_sf
popn = c(3776.6, 2193.9, 1904.9, 936.9, 1135.9, 1682.2, 2518.6,
255, 1783.6, 3907.5, 2828.7, 8452.3, 1259)
state_sf %>% mutate(population = popn) -> state_sf
Next we join vaxsum with state_sf and just plot the average daily vaccination and total vaccinations to date. We can easily repeat for the other statistics.
Note that %in% returns a logical vector of TRUE and FALSE. To negate it, you can use ! in front of the logical statement
state_sf %>%
ggplot() +
geom_sf(aes(geometry = geometry, fill = avg_daily)) +
# Everything past this point is only formatting:
scale_fill_gradient2(low = "green",
mid = "yellow",
high = "red",
midpoint = 0,
space = "Lab",
na.value = "grey80",
guide = "colourbar",
aesthetics = "fill") +
theme_void() +
labs(title = "Average daily vaccinations") -> p1
state_sf %>%
ggplot() +
geom_sf(aes(geometry = geometry, fill = total)) +
# Everything past this point is only formatting:
scale_fill_gradient2(low = "green",
mid = "yellow",
high = "red",
midpoint = 0,
space = "Lab",
na.value = "grey80",
guide = "colourbar",
aesthetics = "fill") +
theme_void() +
labs(title = "Total vaccinations to date") -> p2
cowplot::plot_grid(
p1,
p2,
ncol = 1)
Figure 3.5: Vaccination of states in Malaysia
Figure 3.5 reconfirms the two states where vaccination is most advanced in terms of the numbers. Figure 3.8 however shows that this does not match with the Covid cases per state. It raises some questions on the planning of the Malaysian vaccination program.
Next we plot the population per state and the vaccination per populationFigure 3.6: Map of population and percent vaccinated per state
Sarawak has more than 30% of its population having received the first dose. This is still very far from the 70% required for herd immunity. Perlis has about 15% of its population having received the second dose. It also has low case incidents as shown in Figure 3.8. Should it be the first state to exit the national lockdown? (Vaccination as per 2021-07-06)
We now integrate with the Covid 19 case data.8
## # A tibble: 7,512 x 7
## daerah Date New `14 Days` Active Total negeri
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Alor Gajah 2021-04-26 22 77 NA NA Melaka
## 2 Alor Gajah 2021-04-27 5 85 NA NA Melaka
## 3 Alor Gajah 2021-05-03 12 NA NA NA Melaka
## 4 Alor Gajah 2021-05-04 16 184 NA NA Melaka
## 5 Alor Gajah 2021-05-05 19 166 NA NA Melaka
## 6 Alor Gajah 2021-05-06 30 188 NA NA Melaka
## 7 Alor Gajah 2021-05-07 20 203 NA NA Melaka
## 8 Alor Gajah 2021-05-08 39 235 NA NA Melaka
## 9 Alor Gajah 2021-05-09 30 243 NA NA Melaka
## 10 Alor Gajah 2021-05-10 39 274 NA NA Melaka
## # ... with 7,502 more rows
The columns are:
We will take some summary statistics and focus our analysis on these new measures.
# Convert daerah to upper case since we want to do a join later
mys_cases$daerah <- toupper(mys_cases$daerah)
mys_cases$negeri <- toupper(mys_cases$negeri)
mys_cases %>%
group_by(negeri) %>%
summarize(avg_new = mean(New),
med_new = median(New),
lo_new = min(New),
hi_new = max(New)) -> state_cases
We do a bubble plot of the summary data.
Figure 3.7: New daily case summary by state
By far, Selangor has the highest number of new cases per day. It is interesting to see where Sarawak stands on the Covid cases.
Next we combine some of the case data to match the states defined in our map, Figure 3.4.
state_cases %>%
filter(negeri %in% c("SELANGOR", "KUALA LUMPUR", "PUTRAJAYA")) %>%
summarize(avg_new = mean(avg_new),
med_new = median(med_new),
lo_new = min(lo_new),
hi_new = max(hi_new)) %>%
mutate(negeri = "SELANGOR") %>%
select(negeri, everything()) -> tmpkl
state_cases %>%
filter(!negeri %in% c("SELANGOR", "KUALA LUMPUR", "PUTRAJAYA", "LABUAN")) %>%
rbind(tmpkl) -> state_cases
We join state_cases to state_sf.
state_sf %>%
left_join(state_cases, by=c("state" = "negeri")) -> state_sf
Figure 3.8: Covid cases and vaccination per state
Figure 3.8 raises some important issues and questions.
But the data does not show there is a clear strategy for the vaccination program except maybe for the categories of people like the frontliners, senior citizens, etc. Again we need more granular data to analyze.
This posting has illustrated how to combine the Malaysian Covid case and vaccination data and visualize some combinations of the data through vector maps. The various plots exhibited show the difference and added value when the ratios are incorporated in the analysis.
Figure 3.6 and Figure 3.8 really question the Malaysian vaccination program. What are its objectives? Is it to achieve a mass herd immunity for the nation? At less than 15% we are not doing well as of today. Is it to exit from the lockdown by state? Is it to control the rise in infections in the most problematic state, Selangor? The initial data shows a confusing picture.
Vaccination is a key intervention that is being considered in moving a country toward normalcy.9 Except for the supply of vaccines (which is improving), everything else related to the vaccination is in our own hands. But we must administer the vaccines with a clear purpose. We must do it to fit some exit strategy out of Covid. Right now, the current data does not show this. We need more granular data to answer some of the issues and questions raised.