Overview and Objective

Dengue Cases in Singapore has reached over 20000 cases this year.

A dengue cluster, according to Singapore’s National Environment Agency (NEA), indicates a locality with active transmission where intervention is needed.

It is formed when two or more cases have onset within 14 days and are located within 150m of each other (based on residential and workplace addresses). The clusters are colored based on 3 alert status:

  • High-risk area with 10 or more cases: RED

  • Medium-risk area with 10 cases or less: YELLOW

  • No new cases, under surveillance for the next 21 days GREEN

While the NEA Dengue Cluster Map tells us the locality and boundary of the dengue clusters, the visualization could be improved by including the color code of the cluster, and the number of cases for the cluster in the map. This will achieve its objective of relaying the information of the severity of dengue cases for residents staying in the vicinity.

1. Data and Design Challenges

1.1 Data Challenge - Data Cleaning HTML Tags

  • The first data challenge is getting reliable information regarding the clusters. Data is obtained from Data.gov.sg website at https://data.gov.sg/dataset/dengue-clusters and it was last updated on 29th July 2020 with data as of 23rd July 2020.

  • Using the st_read function from the sf package to read the KML file, we obtain a file with the coordinates of the to form the cluster polygons. However, the Description is mixed with HTML tags and other spurious information. We have to clean the data and extract useful information.

  • I used the replace_html function from the textclean library and applied the function across the column with HTML tags. This removed all the HTML tags.

  • I noticed the information are separated by capitalized words. By using a custom function built with REGEX filtering, I was able to split the strings and extract LOCALITY and CASE_SIZE values from the data.

strsplit2 <- function(x){
  return(strsplit(x, split="[A-Z]{4}"))
} 
  • Next I trimmed white spaces and pasted it as a list using a for loop and finally merged them into a data frame.

1.2 Data Challenge - Green Zones

NEA defined “Green Clusters” as those under surveillance for the next 21 days. This means that technically, these areas are currently not active and has dengue transmission under control. However, there arise a few problems:

  • there is no data regarding the coordinates of these green zone polygons

  • some of these areas that are currently in the green zone, are within the polygon of an active dengue cluster

*I decided to focus on active clusters in dengue-cluster.csv as these are the places where residents should be more vigilant.

1.3 Design Challenge - Zone Coloring

In light of the new understanding and data challenges, I decided to re categorize the color codes for dengue. I recommend to keep:

  • the RED for more than 10 cases

  • split the YELLOW for 5-10 cases, and GREEN for <5 cases.

  • This year, there is an emergence of many ‘super-clusters’ with more than 100 cases and to show them effectively, I color coded it as BLACK

  • This will help the user see clearly the cluster locations by their color code (danger level). We can provide an explanation of what the code means at the caption of the map.

1.4 Design Challenge - Additional layer to represent Case Sizes

While the color code offers a refreshing view of the dengue cluster locations, it does not provide the information of how many cases are there. Within a color code, there may be wide variations and to do this I initially thought of using a bubble plot with bubble size to represent case sizes in a map as an additional layer to the chloropleth map.

The problem with this approach is that the bubble sizes may overlap and it may be obscured if there are many large cases within a small area.

I thought of using text to directly represent case numbers. However, too many numbers may be too complicated if we just want a little bit more detail than the color codes of the clusters.

In the end, I decided on a combination approach of chloropleth colored zones, bubble color fill opacity and text to offer users the level of detail they want to see on the map using the interactive selectors at the left side of the map. There will be legends to indicate what the color code means, and also the bubble color fill opacity legend to show the number of cases.

1.5 Design Add-on: Mosquito Breeding Sites

  • While parsing the data, I realized I can add value to the overall visualization by showing the top 10 places/areas where mosquito breeding sites are found. This will help residents and other workers at construction sites be more vigilant of the potential breeding sites, and especially if their area of residence/work is within an active cluster.

2. Step-by-step description on how the data visualization was prepared

2.1 Import the necessary libraries:

We need 4 libraries here:

  • sf

  • tmap

  • tidyverse

  • textclean

  • plotly

packages = c('sf', 'tmap', 'tidyverse', 'textclean')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

2.2 Import the KML file

  • Import the KML file using st_read function from sf package, and call a summary to see the resulting dataframe characteristics.
dengue <- st_read("data/geo/dengue-clusters-kml.kml")
## Reading layer `DENGUE_CLUSTER' from data source `C:\Users\User\Desktop\tmp\Assignment5\Assignment5\data\geo\dengue-clusters-kml.kml' using driver `KML'
## Simple feature collection with 437 features and 2 fields
## geometry type:  POLYGON
## dimension:      XYZ
## bbox:           xmin: 103.6283 ymin: 1.265024 xmax: 103.9685 ymax: 1.454956
## z_range:        zmin: 0 zmax: 0
## geographic CRS: WGS 84
summary(dengue)
##      Name           Description                 geometry  
##  Length:437         Length:437         POLYGON Z    :437  
##  Class :character   Class :character   epsg:4326    :  0  
##  Mode  :character   Mode  :character   +proj=long...:  0

2.3 Data Cleaning

  • Transform sf dataframe dengue into normal dataframe dengue1
dengue1 <- data.frame(dengue)
  • Apply the replace_html function on the Description column of dengue1, clean HTML tags into new dataframe dengue2, rename column name as Description
dengue2<- data.frame(apply(dengue1["Description"], 1, FUN=replace_html))
names(dengue2)[1] <- "Description"
  • Create new function strsplit2 to split string by CAPITALIZED words
strsplit2 <- function(x){
  return(strsplit(x, split="[A-Z]{4}"))
} 
  • Apply the strsplit2 function to the Description column of dengue2
dengue3 <- apply(dengue2["Description"], 1, FUN="strsplit2")
  • initialize LOCALITY, CASE_SIZE and NAME as empty lists
LOCALITY <- list()
CASE_SIZE <- list()
NAME <- list()
  • extract LOCALITY, CASE_SIZE info from dengue3

  • unlist and trim whitespaces

  • send information to the empty lists initiated earlier

for (i in 1:437) {
    NAME[i]<- paste0("kml_",i) }
for (i in 1:437) {
    LOCALITY[i]<- paste0(trimws(unlist(dengue3[i])[3])) }
for (i in 1:437) {
    CASE_SIZE[i]<- paste0(trimws(unlist(dengue3[i])[5])) }
  • rbind filled lists into new dataframe dengue4 and rename dataframe columns appropriately

  • change CASE_SIZE data type for the whole column to numeric for color coding later

dengue4 <- do.call(rbind, Map(data.frame, A=NAME, B=LOCALITY, C=CASE_SIZE))
names(dengue4) <- c("NAME", "LOCALITY","CASE_SIZE")
dengue4$CASE_SIZE <- as.numeric(dengue4$CASE_SIZE)

2.4 Data Transformation

  • Create color codes of BLACK (>100), RED (11-100), YELLOW (6-10), GREEN (1-5) by mutating over CASE_SIZE

  • Create 4 new levels of BLACK (>100), RED (11-100), YELLOW (6-10), GREEN (1-5), with the numbers to represent number of cases.

dengue_clusters <- dengue4 %>%
  select(NAME, LOCALITY, CASE_SIZE) %>%
  mutate(COLOR_CODE_CASES = case_when(
    CASE_SIZE > 100 ~ "BLACK (>100)",
    CASE_SIZE > 10 ~ "RED (11-100)",
    CASE_SIZE > 5 ~ "YELLOW (6-10)",
    TRUE ~ "GREEN (1-5)")) %>%
  mutate(COLOR_CODE_CASES = factor(COLOR_CODE_CASES,  
                                    levels = c("BLACK (>100)", "RED (11-100)", 
                                               "YELLOW (6-10)", "GREEN (1-5)")))
  • extract Name and geometry from original dengue sf dataframe as dengue_sel

  • full_join dengue_clusters dataframe and dengue_sel

  • reset the dataframe as sf type by passing st_st argument

dengue_sel <- dengue %>% select("Name","geometry")

dengue_clusters_chloropleth<- full_join(dengue_clusters, dengue_sel,
                                by=c("NAME"="Name"))

dengue_clusters_chloropleth <- st_sf(dengue_clusters_chloropleth)

2.5 Data Layers

  • Create 2 more layers for interactivity and user defined data granularity
dengue_case_size <- dengue_clusters_chloropleth
dengue_case_size_bubbles <- dengue_clusters_chloropleth

2.6 Create Interactive Map

tmap_mode("view")
## tmap mode set to interactive viewing
  • Include chloropleth map with colored coded zones
tm <- tm_shape(dengue_clusters_chloropleth)+
  tm_fill("COLOR_CODE_CASES",
          id = 'LOCALITY',
          palette = c("black","red2","yellow2","darkgreen"),
          alpha= 0.75,
          borders=1,
          borders.col='blue') +
  tm_borders(alpha = 0.5) +
tm_basemap("Esri.WorldTopoMap")
  • Add bubble color fill opacity to represent case numbers
tm <- tm + tm_shape(dengue_case_size_bubbles) +
  tm_bubbles(col = 'CASE_SIZE',
          id = 'LOCALITY',
          palette= NULL,
          size = 0.05,
          border.col = "black",
          border.lwd = 1,
          alpha=0.8)
  • Include text as the final layer of granularity
tm <- tm + tm_shape(dengue_case_size) +
  tm_text(text='CASE_SIZE', 
          fontface = "bold") 

2.7 Add-on data (Top 10 Breeding Sites at Homes/Construction Sites)

  • reading the data from dengue3, I reuse the steps I used to parse for case_size and locality on the home breeding reasons.

  • I used lapply to apply the function str_to_upper and trimws on the list, I then sort by Freq in descending order

HOMES = list()

for (i in 1:437) {
    HOMES[i]<- paste0(trimws(unlist(dengue3[i])[9])) }

HOMES <- trimws(gsub("^S","",HOMES))
HOMES <- strsplit(HOMES, split=',')
HOMES <- lapply(HOMES, str_to_upper)
HOMES <- lapply(HOMES, trimws)
HOMES <- table(unlist(HOMES))
HOMES <- cbind.data.frame(HOMES)
HOMES <- HOMES[order(-HOMES$Freq),]
HOMES10 <- HOMES[1:10,]
  • for Construction Sites, I repeated the steps used for the homes
SITES = list()
for (i in 1:437) {
    SITES[i]<- paste0(trimws(unlist(dengue3[i])[11])) }

SITES <- trimws(gsub("^ES","",SITES))
SITES <- strsplit(SITES, split=',')
SITES <- lapply(SITES, str_to_upper)
SITES <- lapply(SITES, trimws)
SITES <- table(unlist(SITES))
SITES <- cbind.data.frame(SITES)
SITES <- SITES[order(-SITES$Freq),]
SITES10 <- SITES[1:10,]

2.7 Add-on data Lollipop Plot (Top 10 Breeding Sites at Homes/Construction Sites)

  • I used combination of geom_segment and geom_point of ggplot2 package, with coord_flip(), to create a lollipop plot which is visually more pleasing than a simple horizontal bar plot.
lp1 <- HOMES10 %>%
ggplot( aes(x=reorder(Var1,Freq), y=Freq)) +
geom_segment( aes(xend=Var1, yend=0)) +
geom_point( size=4, color="blue") +
scale_y_continuous(breaks=c(4,8,12,16,20,24,28,32)) +
coord_flip() +
theme_bw() +
xlab("") +
ggtitle("Top 10 Home Mosquito Breeding Areas")

lp2<- SITES10 %>%
ggplot( aes(x=reorder(Var1,Freq), y=Freq)) +
geom_segment( aes(xend=Var1, yend=0)) +
geom_point( size=4, color="orange") +
scale_y_continuous(breaks=c(2,4,6,8,10,12)) +
coord_flip() +
theme_bw() +
xlab("") +
ggtitle("Top 10 Construction Site Mosquito Breeding Areas")

3 Final Visualization and Insights

3.1 New Dengue Cluster Map

  1. Map shows a huge concentration of Cluster Zones (regardless of color codes) at the Eastern region of Singapore in this period.

  2. While the West area is sparsely affected, there are two distinct heavily hit clusters at Bukit Panjang (senja Rd vicinity) at 275 cases and Hillview at 134 cases.

  3. The largest cluster is at Aljunied Area, with many consecutive red/black cluster zones. Government should perform a Dengue Blitz to eliminate dengue at that area.

3.2 Top 10 Mosquitoes Breeding Areas at Homes and Construction Sites

  1. Pail, Vase and Flower Pots are the top 3 most commonly reported mosquito breeding areas at homes. Residents should take note and be wary of mosquito breeding in these areas.

  2. Closed Perimeter Drain, Tree Hole and Scupper Drain are the top 3 most commonly reported mosquito breeding areas at construction sites. Workers/Supervisors should take note and be wary of mosquito breeding in these areas.