Analysis of Role of Dams on Recorded Dry Season Malaria Incidences in Kasungu

This geospatial statistical model uses routinely collected malaria case data, population data and remotely sensed data, such as open and vegetated water bodies, to estimate population living around open water bodies, expected malaria cases, and standardised morbidity ratio (SMR) of malaria. And ultimately, quantify the association between proximity to larval habitat and malaria risk in health facility catchment areas in Kasungu. The SMR compares the risk of morbidity in a population of interest with that of a standard population. In this case, our interest is to find out whether the number of dry season malaria cases in each catchment area are greater than we would expect given the malaria rate for the entire Kasungu district.

We do this by comparing what we observe (O) with what we would expect (E) if the risk of malaria was equal throughout Kasungu. The SMR statistical notation of catchment i can be written as follows: \[SMR_i = \frac{O_i}{E_i}\]

Buffers around waterbodies are created and then combined with population data in raster format to estimate the proprtion of catcment population living within 1km, 2km and 3km of water bodies. Subsequently, the observed malaria cases are modeled using Poisson regression to find out if living within various distances from water bodies is causing variability in malaria risk in Kasungu district. We hypothesize that the risk of being a case in a catchment is dependent on proximity to water bodies. The data used spans from 2017 to 2020 and was derived from digitized DHIS2 malaria records, accessibility mapping, aggregated population geospatial layer and TropWet tool in Google Earth Engine.

Load packages

Loading the R packages that will be used to read in, view, transform and model the malaria cases and spatial datasets.

library(SpatialEpi)
library(spdep)
library(spaMM)
library(popEpi)
library(Epi)
library(tidyverse)
library(ggpubr)
library(plotly)
library(lubridate)
library(knitr)
library(raster)
library(rgdal)
library(rgeos)
library(sf)
library(sp)
library(tmap)
library(spdep)
library(maptools)
library(gridExtra)
library(ggsci)
library(grid)
library(exactextractr)
library(DataExplorer)
library(mapview)
`%>%` <- magrittr::`%>%`

Tell R where the data is

here::here()

## [1] "C:/Users/cnkolokosa/Documents/R/upscaled_2021_updated_May/upscaled_2021"

Load datasets

The total dry season malaria cases recorded at health-care facilities in Kasungu from 2017 to 2019 are contained in the KasunguData.csv sourced from https://dhis2.health.gov.mw/. The kasungu_facility_catchments_2004.shp shapefile also contains the population and health information within each health-facility catchment area in Kasungu district.

The aggregated population raster layers for Malawi e.g.,ku_pop_2017_1km_aggregated.tif were downloaded from the Open Spatial and Demographic and Data Research website: https://www.worldpop.org/geodata/country?iso3=MWI. These layers estimate total number of people per grid-cell. The units are number of people per pixel with country totals adjusted to match the corresponding official United Nations population estimates. The datasets were downloaded in Geotiff at a resolution of 1km and are projected in Geographic Coordinate System, WGS84.

The kasungu_water.shpand water_bodies layers contain open and vegetated waterbodies polygons, detected using the Tropical Wetland Unmixing Tool (TropWet). TropWet is a Google Earth Engine hosted toolbox that uses the Landsat archive to map tropical wetlands and can be accessed through: https://www.aber.ac.uk/en/dges/research/earth-observation-laboratory/research/tropwet/

# Kasungu dry season malaria data
dry_season_malaria_2017_2020 <- read.csv(here::here("data/dry_season_malaria_2017_2020.csv"))

# Kasungu district boundary shapefile 
kasungu_district <- sf::st_read(here::here("data", "kasungu_district.shp"))

## Reading layer `kasungu_district' from data source 
##   `C:\Users\cnkolokosa\Documents\R\upscaled_2021_updated_May\upscaled_2021\data\kasungu_district.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 5 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 491272.7 ymin: 8494349 xmax: 609044.2 ymax: 8632164
## Projected CRS: WGS 84 / UTM zone 36S

# Kasungu health facility catchments generated from accessibility mapping
malire_new <- sf::st_read(here::here("data", "new_catchments.shp")) %>% 
              sf::st_transform(32736) # reproject to WGS UTM Zone 36 South

## Reading layer `new_catchments' from data source 
##   `C:\Users\cnkolokosa\Documents\R\upscaled_2021_updated_May\upscaled_2021\data\new_catchments.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 27 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 32.925 ymin: -13.61667 xmax: 34.00833 ymax: -12.375
## Geodetic CRS:  WGS 84

# Kasungu population raster layer
kasungu_population_2017 <- raster(here::here("data", "ku_pop_2017_1km_aggregated.tif"))

kasungu_population_2018 <- raster(here::here("data", "ku_pop_2018_1km_aggregated.tif"))

kasungu_population_2019 <- raster(here::here("data", "ku_pop_2019_1km_aggregated.tif"))

kasungu_population_2020 <- raster(here::here("data", "ku_pop_2020_1km_aggregated.tif"))

# Read in waterbodies polygons 
dryseason_waterbodies_2017 <- sf::st_read(here::here("data", "water_bodies_2017.shp"))

## Reading layer `water_bodies_2017' from data source 
##   `C:\Users\cnkolokosa\Documents\R\upscaled_2021_updated_May\upscaled_2021\data\water_bodies_2017.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 168 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 514497 ymin: 8495941 xmax: 603149.8 ymax: 8620169
## Projected CRS: WGS 84 / UTM zone 36S

dryseason_waterbodies_2018 <- sf::st_read(here::here("data", "kasungu_2018_water.shp"))

## Reading layer `kasungu_2018_water' from data source 
##   `C:\Users\cnkolokosa\Documents\R\upscaled_2021_updated_May\upscaled_2021\data\kasungu_2018_water.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1105 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 496807.6 ymin: 8494693 xmax: 607913.8 ymax: 8607747
## Projected CRS: WGS 84 / UTM zone 36S

dryseason_waterbodies_2019 <- sf::st_read(here::here("data", "kasungu_2019_water.shp"))

## Reading layer `kasungu_2019_water' from data source 
##   `C:\Users\cnkolokosa\Documents\R\upscaled_2021_updated_May\upscaled_2021\data\kasungu_2019_water.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1941 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 494197.2 ymin: 8494693 xmax: 607913.8 ymax: 8617573
## Projected CRS: WGS 84 / UTM zone 36S

dryseason_waterbodies_2020 <- sf::st_read(here::here("data", "water_bodies_2020.shp"))

## Reading layer `water_bodies_2020' from data source 
##   `C:\Users\cnkolokosa\Documents\R\upscaled_2021_updated_May\upscaled_2021\data\water_bodies_2020.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 266 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 508985.6 ymin: 8495793 xmax: 585761.1 ymax: 8620169
## Projected CRS: WGS 84 / UTM zone 36S

# Add a field ID to water bodies polygons 
dryseason_waterbodies_2017$ID <- 1:nrow(dryseason_waterbodies_2017)

dryseason_waterbodies_2018$ID <- 1:nrow(dryseason_waterbodies_2018)

dryseason_waterbodies_2019$ID <- 1:nrow(dryseason_waterbodies_2019)

dryseason_waterbodies_2020$ID <- 1:nrow(dryseason_waterbodies_2020)

View the dry season malaria case data

We observe that Kasungu district has 30 health facilities classified as dispensary, health centre, district hospital and rural hospital, and the highest malaria cases were recorded at Kasungu District Hospital.

dry_season_malaria_2017_2020 %>% 
  summary()

##        X             rowID          Names              dr_2017       
##  Min.   : 1.00   Min.   : 1.00   Length:36          Min.   :    0.0  
##  1st Qu.: 9.75   1st Qu.: 9.75   Class :character   1st Qu.:  918.8  
##  Median :18.50   Median :18.50   Mode  :character   Median : 1505.0  
##  Mean   :18.50   Mean   :18.50                      Mean   : 1818.1  
##  3rd Qu.:27.25   3rd Qu.:27.25                      3rd Qu.: 2150.0  
##  Max.   :36.00   Max.   :36.00                      Max.   :11976.0  
##                                                                      
##     dr_2018        dr_2019         dr_2020         LONGITU     
##  Min.   :   0   Min.   :    0   Min.   :    0   Min.   :33.18  
##  1st Qu.:1180   1st Qu.: 1152   1st Qu.: 1650   1st Qu.:33.38  
##  Median :1706   Median : 1569   Median : 2698   Median :33.50  
##  Mean   :2093   Mean   : 2039   Mean   : 3404   Mean   :33.52  
##  3rd Qu.:2754   3rd Qu.: 2512   3rd Qu.: 4722   3rd Qu.:33.68  
##  Max.   :9820   Max.   :11399   Max.   :16435   Max.   :33.87  
##                                                 NA's   :6      
##     LATITUD      
##  Min.   :-13.57  
##  1st Qu.:-13.25  
##  Median :-12.98  
##  Mean   :-12.99  
##  3rd Qu.:-12.79  
##  Max.   :-12.42  
##  NA's   :6

# Plotly bar chart -------------------------------------------------------------
bar_chart <- dry_season_malaria_2017_2020 %>%  
  dplyr::filter(Names != "K2 Taso Clinic",          # Have missing malaria records
                Names != "Kalikeni Private Clinic",
                Names != "Kakwale Health Centre",
                Names != "St Andrews Community Hospital",
                Names != "St. Faith Health Centre",
                Names != "Chambwe Health Centre") %>% 
  plotly::plot_ly(y = ~Names,
                  x = ~dr_2017,
                  type = "bar",
                  orientation = 'h',
                  name = "2017") %>%
  plotly::add_trace(x = ~ dr_2018,
                    name = "2018") %>%
  plotly::add_trace(x = ~ dr_2019,
                    name = "2019") %>% 
  plotly::add_trace(x = ~ dr_2020,
                    name = "2020") %>% 
  plotly::layout(xaxis = list(title = "Total malaria cases"),
                 yaxis = list(title = " "),
                 hovermode = "compare",
                 margin = list(b = 10,
                               t = 10,
                               pad = 2))
bar_chart

Fig.1 The total malaria cases recorded at each health-care facility in Kasungu district

# Pivot longer -----------------------------------------------------------------
# dry_season_malaria_longer <- dry_season_malaria_2017_2020 %>% 
#   dplyr::filter(Names != "K2 Taso Clinic",                       
#                 Names != "Kalikeni Private Clinic",
#                 Names != "Kakwale Health Centre",
#                 Names != "St Andrews Community Hospital",
#                 Names != "St. Faith Health Centre",
#                 Names != "Chambwe Health Centre") %>% 
#   dplyr::rename(`2017` = dr_2017,
#                 `2018` = dr_2018,
#                 `2019` = dr_2019,
#                 `2020` = dr_2020) %>% 
#   tidyr::pivot_longer(cols = `2017`:`2020`,
#                       names_to = 'year',
#                       values_to = 'malaria_cases')
# 
# ggplot2::ggplot(dry_season_malaria_longer, 
#                 aes(x = malaria_cases, 
#                     y = Names,
#                     fill = year))+
#   ggplot2::geom_bar(stat ='identity',
#                     position = "dodge")+
#   ggplot2::labs(x = "Dry season malaria cases",
#                 y = " ")+
#   ggplot2::theme_classic()+
#   ggsci::scale_fill_jama()

Kasungu health-care facilities and their catchment areas

Heath facility catchment area is the area from which a health facility attracts patients. The new health facility catchments polygon was generated from generic accessibility mapping script adapted from https://malariaatlas.org/wp-content/uploads/accessibility/R_generic_accessibilty_mapping_script.r The script requires two user supplied datasets: the 2015 friction surface, which is available here: http://www.map.ox.ac.uk/accessibility_to_cities/, and a user-supplied .csv of points dry_season_malaria_2017_2020. The accumulated cost algorithm accCost and r.Cost algorithm in QGIS were run to make the final output map of new health facility catchment boundaries.

# Using the complete.cases() function to select health centres with complete 
# longitude and latitude coordinates.
zipatala_aggregated <- dry_season_malaria_2017_2020[complete.cases(dry_season_malaria_2017_2020),] 

# Aggregate health facilities close to each other: 
#  a) Kasalika Health Centre and Kasungu District Hospital, 
#  b) Bua and Mziza Health Centres, and 
#  c) Kaluluma and Nkhamenya Rural Hospitals in order to 
# generate catchment areas that are geographically correct

zipatala_aggregated$dr_2017[which(
  zipatala_aggregated$Names == "Kasungu District Hospital")] <- zipatala_aggregated$dr_2017[which(
    zipatala_aggregated$Names == "Kasungu District Hospital")] + zipatala_aggregated$dr_2017[which(
      zipatala_aggregated$Names == "Kasalika Health Centre")]

zipatala_aggregated$dr_2018[which(
  zipatala_aggregated$Names == "Kasungu District Hospital")] <- zipatala_aggregated$dr_2018[which(
    zipatala_aggregated$Names == "Kasungu District Hospital")] + zipatala_aggregated$dr_2018[which(
      zipatala_aggregated$Names == "Kasalika Health Centre")]

zipatala_aggregated$dr_2019[which(
  zipatala_aggregated$Names == "Kasungu District Hospital")] <- zipatala_aggregated$dr_2019[which(
    zipatala_aggregated$Names == "Kasungu District Hospital")] + zipatala_aggregated$dr_2019[which(
      zipatala_aggregated$Names == "Kasalika Health Centre")]

zipatala_aggregated$dr_2020[which(
  zipatala_aggregated$Names == "Kasungu District Hospital")] <- zipatala_aggregated$dr_2020[which(
    zipatala_aggregated$Names == "Kasungu District Hospital")] + zipatala_aggregated$dr_2020[which(
      zipatala_aggregated$Names == "Kasalika Health Centre")]
  
zipatala_aggregated$dr_2017[which(
  zipatala_aggregated$Names == "Nkhamenya Rural Hospital")] <- zipatala_aggregated$dr_2017[which(
    zipatala_aggregated$Names == "Nkhamenya Rural Hospital")] + zipatala_aggregated$dr_2017[which(
      zipatala_aggregated$Names == "Kaluluma Rural Hospital")]

zipatala_aggregated$dr_2018[which(
  zipatala_aggregated$Names == "Nkhamenya Rural Hospital")] <- zipatala_aggregated$dr_2018[which(
    zipatala_aggregated$Names == "Nkhamenya Rural Hospital")] +zipatala_aggregated$dr_2018[which(
      zipatala_aggregated$Names == "Kaluluma Rural Hospital")]

zipatala_aggregated$dr_2019[which(
  zipatala_aggregated$Names == "Nkhamenya Rural Hospital")] <- zipatala_aggregated$dr_2019[which(
    zipatala_aggregated$Names == "Nkhamenya Rural Hospital")] + zipatala_aggregated$dr_2019[which(
      zipatala_aggregated$Names == "Kaluluma Rural Hospital")]

zipatala_aggregated$dr_2020[which(
  zipatala_aggregated$Names == "Nkhamenya Rural Hospital")] <- zipatala_aggregated$dr_2020[which(
    zipatala_aggregated$Names == "Nkhamenya Rural Hospital")] + zipatala_aggregated$dr_2020[which(
      zipatala_aggregated$Names == "Kaluluma Rural Hospital")]

zipatala_aggregated$dr_2017[which(
  zipatala_aggregated$Names == "Mziza Health Centre")] <- zipatala_aggregated$dr_2017[which(
    zipatala_aggregated$Names == "Mziza Health Centre")] + zipatala_aggregated$dr_2017[which(
      zipatala_aggregated$Names == "Bua Health Centre")]

zipatala_aggregated$dr_2018[which(
  zipatala_aggregated$Names == "Mziza Health Centre")] <- zipatala_aggregated$dr_2017[which(
    zipatala_aggregated$Names == "Mziza Health Centre")] + zipatala_aggregated$dr_2017[which(
      zipatala_aggregated$Names == "Bua Health Centre")]

zipatala_aggregated$dr_2019[which(
  zipatala_aggregated$Names == "Mziza Health Centre")] <- zipatala_aggregated$dr_2017[which(
    zipatala_aggregated$Names == "Mziza Health Centre")] + zipatala_aggregated$dr_2017[which(
      zipatala_aggregated$Names == "Bua Health Centre")] 
  

zipatala_aggregated$dr_2020[which(
  zipatala_aggregated$Names == "Mziza Health Centre")] <- zipatala_aggregated$dr_2017[which(
    zipatala_aggregated$Names == "Mziza Health Centre")] + zipatala_aggregated$dr_2017[which(
      zipatala_aggregated$Names == "Bua Health Centre")]

# Drop out the other health facilities
zipatala_aggregated <- zipatala_aggregated %>%
  dplyr::filter(Names != "Kasalika Health Centre",
                Names != "Bua Health Centre",
                Names != "Kaluluma Rural Hospital")

# write.csv(zipatala_aggregated, "data/health_facilities_aggregated.csv")

# Convert to csv spatial to spatial object
health_facility_aggr_sf <- sf::st_as_sf(zipatala_aggregated,
                                        coords = c("LONGITU", "LATITUD"),
                                        crs = 4326, agr = "constant")

# st_write(health_facility_aggr_sf, "data/health_facilities_aggregated.shp")

View location of the health facilities in the new catchment areas

# Plot map
tm_shape(malire_new)+
  tm_polygons()+
  tm_shape(health_facility_aggr_sf)+
  tm_dots(size = .3, 
          col = "blue", 
          alpha = 0.5)+
  tm_text("Names", 
          size = .3, 
          just = "top", 
          col = "black", 
          remove.overlap = TRUE)+
  tm_layout(frame = FALSE,
            title = "New Kasungu health facility \n catchment boundaries",
            title.size = .8, 
            title.position = c("left", "top"))+
  tm_compass(position=c("right", "top"))+
  tm_scale_bar(breaks = c(0, 10, 20), 
               text.size = .5)

Fig 2. Kasungu health-care facilities and catchment areas

Kasungu district estimated population per grid-cell

# Take a glimpse at the WorldPop raster layers
kasungu_population_2017

## class      : RasterLayer 
## dimensions : 150, 128, 19200  (nrow, ncol, ncell)
## resolution : 920.0898, 918.7667  (x, y)
## extent     : 491272.7, 609044.2, 8494349, 8632164  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : C:/Users/cnkolokosa/Documents/R/upscaled_2021_updated_May/upscaled_2021/data/ku_pop_2017_1km_aggregated.tif 
## names      : ku_pop_2017_1km_aggregated

kasungu_population_2018

## class      : RasterLayer 
## dimensions : 150, 128, 19200  (nrow, ncol, ncell)
## resolution : 920.0898, 918.7667  (x, y)
## extent     : 491272.7, 609044.2, 8494349, 8632164  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : C:/Users/cnkolokosa/Documents/R/upscaled_2021_updated_May/upscaled_2021/data/ku_pop_2018_1km_aggregated.tif 
## names      : ku_pop_2018_1km_aggregated 
## values     : 0, 6253.557  (min, max)

kasungu_population_2019

## class      : RasterLayer 
## dimensions : 150, 128, 19200  (nrow, ncol, ncell)
## resolution : 920.0898, 918.7667  (x, y)
## extent     : 491272.7, 609044.2, 8494349, 8632164  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : C:/Users/cnkolokosa/Documents/R/upscaled_2021_updated_May/upscaled_2021/data/ku_pop_2019_1km_aggregated.tif 
## names      : ku_pop_2019_1km_aggregated 
## values     : 0, 6483.727  (min, max)

kasungu_population_2020

## class      : RasterLayer 
## dimensions : 150, 128, 19200  (nrow, ncol, ncell)
## resolution : 920.0898, 918.7667  (x, y)
## extent     : 491272.7, 609044.2, 8494349, 8632164  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : C:/Users/cnkolokosa/Documents/R/upscaled_2021_updated_May/upscaled_2021/data/ku_pop_2020_1km_aggregated.tif 
## names      : ku_pop_2020_1km_aggregated 
## values     : 0, 7949.033  (min, max)

# Helper function to create a raster population map
create.population.map <- function(population.raster, title){
  # raster population map
  # arguments:
  #   population.raster:  aggregated population raster layer from WorldPop
  #   legend.title: legend title
  # returns:
  #   a tmap-element (plots a map)
  tm_shape(population.raster)+
    tm_raster(palette = "-viridis", 
              title = title,
              breaks = c(0,100,200,400,600,800,1000,2000,4000,6000,8000))+
    tm_layout(legend.position = c("right", "bottom"),
              frame = FALSE)+
    tm_scale_bar(position = c("left", "bottom"))
}
# Set to static map
tmap_mode("plot")

estimated_pop_2017 <- create.population.map(kasungu_population_2017, title = "2017 Population")

estimated_pop_2018 <- create.population.map(kasungu_population_2018, title = "2018 Population")

estimated_pop_2019 <- create.population.map(kasungu_population_2019, title = "2019 Population")

estimated_pop_2020 <- create.population.map(kasungu_population_2020, title = "2020 Population")

# Layout the maps
tmap_arrange(estimated_pop_2017, estimated_pop_2018, estimated_pop_2019, estimated_pop_2020, nrow = 2)

Fig.3 Estimated total number of people per 1km grid-cell

Assign dry season malaria cases and population density to new health facility catchments

The WorldPop aggregated population e.g. kasungu_population_2017.tif, and DHIS2 malaria dry_season_malaria_2017_2020 datasets are assigned to the new health facility catchments.

# Helper function that assigns malaria data from health facilities to their catchments areas ----------------
assign.malaria.data <- function(catchment_boundary, malaria_data){
  # arguments:
  #   catchment_boundary: sf polygon object of new catchment boundaries
  #   malaria_data: sf point object with a data frame containing the dry season malaria cases
  # returns:
  #   catchments_malaria_sf: sf polygon object with a data frame containing dry season malaria cases


  # Convert sf objects to spatial
  catchment_shp <- as(catchment_boundary, "Spatial")
  
  malaria_shp <- as(malaria_data, "Spatial")

  # Match CRS
  malaria_shp <- spTransform(malaria_shp, crs(catchment_shp))

  # Overlay aggregated health facility points and extract 2017 - 2020 malaria cases
  # Using 'point.in.poly' to return a point spatial object, in this case location of health facilities
  # and estimated population instead of sp::over function, which simply returns 
  # a data frame, with the same no. rows.
  # Argument 'sp = TRUE' returns an sp class object, else returns sf class object
  # Joining the malaria and population dataset using only 'merge' function can't work due to 
  # non-unique columns and differences in row numbers
  
  hospitals_in_catchment <- spatialEco::point.in.poly(malaria_shp, catchment_shp, sp = TRUE) 

  # Add the extracted ID, health facility names and dry season malaria cases to 
  # the health facility catchments (hfc)
  hfc_malaria_shp <- merge(catchment_shp, hospitals_in_catchment, by.x = "DN", by.y = "rowID")

  # Convert the shapefile containing malaria data to sf-object
  hfc_malaria_sf <- sf::st_as_sf(hfc_malaria_shp)

  # Tidy the data by dropping columns not needed
  catchment_malaria <- hfc_malaria_sf %>% 
    dplyr::select(-c(coords.x1, coords.x2))

  return(out = catchment_malaria)
}


# Invoking the function ----------------------------------------------------------------------------------
malaria_by_catchment <- assign.malaria.data(malire_new, health_facility_aggr_sf)

Assign population data to the health catchment areas

# Helper unction to extract population from WorldPop raster file and assign ---------------------------
# the values to the new catchments.

extract.pop.values <- function(kasungu_pop_raster, catchments){
  # function to extract population from raster file and assign the population to catchments
  # arguments:
  #   kasungu_pop_raster: population raster file clipped to Kasungu district
  #   catchments: shapefile containing the polygons that we wish to use as boundaries
  # returns:
  #   catchments_malaria_pop_sf: sf polygon object containing malaria and population data
  
  # convert from sf to sp
  catchments_sp <- as(catchments, "Spatial")
  
  # Match extent i.e projection
  catchments_sp <- spTransform(catchments_sp, proj4string(kasungu_pop_raster))
  
  # Crop and mask the population raster to exclude Kasungu National Park
  pop_raster_clip <- raster::mask(raster::crop(kasungu_pop_raster, extent(catchments_sp)), catchments_sp)
  
  # Extracting zonal statistics from a population raster layer. 
  # The population raster is a continuous gridded surface layer that has an 
  # estimated population density value to every square in their grid. 
  # The population values are then summed and apportioned to the catchment polygons
  # catchments_malaria_pop <- catchments %>% 
  #   dplyr::mutate(pop = round(raster::extract(pop_raster_clip, catchments, fun = sum, na.rm = TRUE)))
  
  pop_by_catchment <- round(raster::extract(pop_raster_clip, catchments, fun = sum, na.rm = TRUE))
  
  pop_by_catchment_df <-  pop_by_catchment %>%  
  # apply unlist to the lists to have vectors as the list elements
  lapply(unlist) %>% 
  # convert vectors to data.frames
  lapply(as_tibble) %>% 
  # combine the list of data.frames
  bind_rows(., .id = "rowID") %>% 
  # rename the value variable
  dplyr::rename(pop = value)
  
  # Add row ID to column to catchment layer
  catchments$rowID <- 1:nrow(catchments)
  
  # Merge catchment areas and population data 
  pop_by_catchments <- merge(catchments, pop_by_catchment_df, by = "rowID")
  
  # Cleaning 'Inf' values
  pop_by_catchments %>% 
    dplyr::mutate_if(is.numeric, list(~na_if(., Inf))) %>% 
    dplyr::mutate_if(is.numeric, list(~na_if(., -Inf)))

  return(out = pop_by_catchments)
  
}

# Invoking the function ---------------------------------------------------------------------------------------
malaria_pop_by_catchment_2017 <- extract.pop.values(kasungu_population_2017, malaria_by_catchment)

malaria_pop_by_catchment_2018 <- extract.pop.values(kasungu_population_2018, malaria_by_catchment)

malaria_pop_by_catchment_2019 <- extract.pop.values(kasungu_population_2019, malaria_by_catchment)

malaria_pop_by_catchment_2020 <- extract.pop.values(kasungu_population_2020, malaria_by_catchment)

View Kasungu population by catchment maps

Estimated total number of people within health facility catchment areas.

# Helper function to create maps of estimated population by catchment areas --------------------------------
create.population.map <- function(catchment.area, 
                                  variable = "pop", 
                                  title, 
                                  legend.title = "Estimated \n population"){
  # estimated population map
  # catchment.area: estimated population layer from nachulu function
  # variable: variable name (as character, in qoutes)
  # title: map title in quotes
  # legend.title: legend title in qoutes
  # returns:
  #   a tmap-element (plots a map)
  tm_shape(catchment.area)+
    tm_fill(col = variable, 
            breaks = c(0, 13000, 19000, 27000, 35000, 70000, 140000, 200000),
            palette = "YlOrBr",
            title = legend.title)+
    tm_borders(col = "grey",
               lwd = 0.4)+
    tm_layout(legend.position = c(0.75, "bottom"),
              legend.text.size = 0.6,
              legend.title.size = 0.8,
              frame = FALSE)+
    tm_credits(title, 
               position = c(0.3, 0.8), 
               size = 1)
}

# Invoking the function --------------------------------------------------------------------------------
pop_by_catchment_2017 <- create.population.map(malaria_pop_by_catchment_2017, title = "2017")

pop_by_catchment_2018 <- create.population.map(malaria_pop_by_catchment_2018, title = "2018")

pop_by_catchment_2019 <- create.population.map(malaria_pop_by_catchment_2019, title = "2019")

pop_by_catchment_2020 <- create.population.map(malaria_pop_by_catchment_2020, title = "2020")

tmap::tmap_arrange(pop_by_catchment_2017, pop_by_catchment_2018,
                   pop_by_catchment_2019, pop_by_catchment_2020, ncol = 2)

Fig. 4: Estimated population by health facility catchment areas

Population density by catchment

# Helper function to calculate population density by catchment -----------------
calculate.population.density <- function(pop.data){
  
  # Convert to spatial object
  pop.sp <- as(pop.data, "Spatial")

  # Calculate area of catchment polygon in square kilometres
  pop.sp$area_sqkm <- round(rgeos::gArea(pop.sp, byid = TRUE) / (1000 * 1000))

  # Calculate population density
  pop.sp$pop_density <- round(pop.sp$pop / pop.sp$area_sqkm)
  
  # Convert back to sf object
  pop.sf <- sf::st_as_sf(pop.sp)
  
  return(pop.sf)
}

# Invoking function ------------------------------------------------------------
pop_density_2017 <- calculate.population.density(malaria_pop_by_catchment_2017)

pop_density_2018 <- calculate.population.density(malaria_pop_by_catchment_2018)

pop_density_2019 <- calculate.population.density(malaria_pop_by_catchment_2019)

pop_density_2020 <- calculate.population.density(malaria_pop_by_catchment_2020)

# Helper function to create population density maps ----------------------------
create.pop.density.map <- function(pop.density.data,
                                   variable = "pop_density", 
                                   title = NA, 
                                   legend.title = "Population \ndensity/km^2"){
  tm_shape(pop.density.data)+
    tm_fill(col = variable, 
            breaks = c(0, 50, 100, 150, 200, 250, 300, 350),
            palette = "-magma",
            title = legend.title)+
    tm_borders()+
    tm_layout(legend.position = c(0.75, "bottom"),
              legend.text.size = 0.6,
              legend.title.size = 0.8,
              frame = FALSE)+
    tm_credits(title, 
               position = c(0.3, 0.8), 
               size = 1)
}

# Invoking function ------------------------------------------------------------
pop_density_2017_map <- create.pop.density.map(pop_density_2017, title = "2017")

pop_density_2018_map <- create.pop.density.map(pop_density_2018, title = "2018")

pop_density_2019_map <- create.pop.density.map(pop_density_2019, title = "2019")

pop_density_2020_map <- create.pop.density.map(pop_density_2020, title = "2020")

# Layout maps
tmap::tmap_arrange(pop_density_2017_map, pop_density_2018_map,
                   pop_density_2019_map, pop_density_2020_map, ncol = 2)

Fig. 5: Estimated population density by health facility catchment areas

Calculate the expected number of cases for each catchment area

The expected number of dry season malaria cases in catchment i are calculated as the observed risk (r) of malaria i.e. the total number of malaria cases in Kasungu district divided by the total population of the district, multiplied by the number of people in the catchment area: \[E_i = \frac{\sum_i O_i}{\sum_i N_i}\times N_i\]

The expected number of dry season malaria cases are calculated under the assumption that there is no spatial variation in risk, i.e., no difference in infection rates between the catchment areas.

# Calculate expected malaria cases --------------------------------------------------------------
expected_malaria_2017 <- malaria_pop_by_catchment_2017 %>% 
  dplyr::rename(
    observed_2017 = dr_2017,
     pop_2017 = pop) %>% 
  dplyr::mutate(
    expected_2017 = round(sum(observed_2017)/sum(pop_2017, na.rm = TRUE)*pop_2017))

expected_malaria_2018 <- malaria_pop_by_catchment_2018 %>% 
  dplyr::rename(
    observed_2018 = dr_2018,
    pop_2018 = pop) %>% 
  dplyr::mutate(
    expected_2018 = round(sum(observed_2018)/sum(pop_2018, na.rm = TRUE)*pop_2018))

expected_malaria_2019 <- malaria_pop_by_catchment_2019 %>% 
  dplyr::rename(
    observed_2019 = dr_2019,
    pop_2019 = pop) %>%
  dplyr::mutate(
    expected_2019 = round(sum(observed_2019)/sum(pop_2019, na.rm = TRUE)*pop_2019)) 

expected_malaria_2020 <- malaria_pop_by_catchment_2020 %>% 
  dplyr::rename(
    observed_2020 = dr_2020,
    pop_2020 = pop) %>% 
  dplyr::mutate(
    expected_2020 = round(sum(observed_2020)/sum(pop_2020, na.rm = TRUE)*pop_2020))

Calculate the Standardised Morbidity Ratio of malaria incidences for each catchment area

The SMR compares the risk of morbidity in a population of interest with that of a standard population. In this case, our interest is to find out whether the number of dry season malaria cases in each catchment area are greater than we would expect given the malaria rate for the entire Kasungu district.

We do this by comparing what we observe (O) with what we would expect (E) if the risk of malaria was equal throughout Kasungu. The SMR of catchment i can be calculated as follows: \[SMR_i = \frac{O_i}{E_i}\]

# Calculate Standardised Morbidity Ratio (SMR) -------------------------------------
SMR_2017 <- expected_malaria_2017 %>% 
  dplyr::mutate(SMR = round(observed_2017/expected_2017, 1)) %>% 
  dplyr::select(rowID,Names, pop_2017, observed_2017, expected_2017, SMR) 

SMR_2018 <- expected_malaria_2018 %>% 
  dplyr::mutate(SMR = round(observed_2018/expected_2018, 1)) %>% 
  dplyr::select(rowID, Names, pop_2018, observed_2018, expected_2018, SMR) 

SMR_2019 <- expected_malaria_2019 %>% 
  dplyr::mutate(SMR = round(observed_2019/expected_2019, 1)) %>% 
  dplyr::select(rowID, Names, pop_2019, observed_2019, expected_2019, SMR) 

SMR_2020 <- expected_malaria_2020 %>% 
  dplyr::mutate(SMR = round(observed_2020/expected_2020, 1)) %>% 
  dplyr::select(rowID, Names, pop_2020, observed_2020, expected_2020, SMR)


# Create SMR tables ------------------------------------------------------------
SMR_table_2017 <- SMR_2017 %>% 
  dplyr::as_tibble() %>% 
  dplyr::select(-rowID, -geometry) %>% 
  kable %>%
  kableExtra::kable_styling(full_width = FALSE)

SMR_table_2018 <- SMR_2018 %>% 
  dplyr::as_tibble() %>% 
  dplyr::select(-rowID, -geometry) %>% 
  kable %>% 
  kableExtra::kable_styling(full_width = FALSE)

SMR_table_2019 <- SMR_2019 %>% 
  dplyr::as_tibble() %>% 
  dplyr::select(-rowID, -geometry) %>% 
  kable %>% 
  kableExtra::kable_styling(full_width = FALSE)

SMR_table_2020 <- SMR_2020 %>% 
  dplyr::as_tibble() %>% 
  dplyr::select(-rowID, -geometry) %>% 
  kable %>% 
  kableExtra::kable_styling(full_width = FALSE)

SMR_table_2017

Names	pop_2017	observed_2017	expected_2017	SMR
Lodjwa Health Centre	9923	564	826	0.7
Nkhamenya Rural Hospital	40154	2720	3344	0.8
Newa Mpasazi Health Centre	13879	216	1156	0.2
Mpepa /Chisinga Health Centre	27459	1523	2287	0.7
Mnyanja Health Centre	39950	1480	3327	0.4
Simlemba Health Centre	26999	1159	2249	0.5
Ofesi Health Centre	28098	1930	2340	0.8
Chulu Health Centre	27906	3482	2324	1.5
Kapelula Health Centre	35727	2970	2976	1.0
Livwezi Health Centre	22009	594	1833	0.3
Gogode Dispensary	13061	1553	1088	1.4
Dwangwa Dispensary	32704	1153	2724	0.4
Chamama Health Facility	20026	1005	1668	0.6
Wimbe Health Centre	11864	2558	988	2.6
Chinyama	12768	1140	1063	1.1
Mdunga Health Centre	18177	1382	1514	0.9
Mtunthama Health Centre	18744	1982	1561	1.3
Kasungu District Hospital	143490	14663	11951	1.2
Chamwabvi Health Centre	35353	2031	2945	0.7
Linyangwa Health Centre	17772	1987	1480	1.3
Kawamba Health Centre	22865	3845	1904	2.0
Mziza Health Centre	44189	4098	3681	1.1
Kamboni Health Centre	21226	2588	1768	1.5
Khola Health Centre	16956	1012	1412	0.7
Santhe Health Centre	6096	4000	508	7.9
Anchor Farm	48861	1668	4070	0.4
Mkhota Health Centre	21621	1487	1801	0.8

SMR_table_2018

Names	pop_2018	observed_2018	expected_2018	SMR
Lodjwa Health Centre	10281	1151	978	1.2
Nkhamenya Rural Hospital	41642	3343	3962	0.8
Newa Mpasazi Health Centre	14248	434	1356	0.3
Mpepa /Chisinga Health Centre	28488	2616	2710	1.0
Mnyanja Health Centre	41856	1715	3982	0.4
Simlemba Health Centre	27455	1506	2612	0.6
Ofesi Health Centre	29002	1773	2759	0.6
Chulu Health Centre	28832	3330	2743	1.2
Kapelula Health Centre	37630	3480	3580	1.0
Livwezi Health Centre	22544	1128	2145	0.5
Gogode Dispensary	13368	2550	1272	2.0
Dwangwa Dispensary	33534	1216	3191	0.4
Chamama Health Facility	20372	1226	1938	0.6
Wimbe Health Centre	11814	3167	1124	2.8
Chinyama	13138	1673	1250	1.3
Mdunga Health Centre	18928	1894	1801	1.1
Mtunthama Health Centre	19074	3358	1815	1.9
Kasungu District Hospital	147175	12019	14003	0.9
Chamwabvi Health Centre	36167	2079	3441	0.6
Linyangwa Health Centre	18032	1500	1716	0.9
Kawamba Health Centre	22902	3881	2179	1.8
Mziza Health Centre	46208	5689	4396	1.3
Kamboni Health Centre	21430	3250	2039	1.6
Khola Health Centre	17315	1697	1647	1.0
Santhe Health Centre	6244	4158	594	7.0
Anchor Farm	49871	2037	4745	0.4
Mkhota Health Centre	22167	4218	2109	2.0

SMR_table_2019

Names	pop_2019	observed_2019	expected_2019	SMR
Lodjwa Health Centre	10608	1168	942	1.2
Nkhamenya Rural Hospital	43293	3932	3843	1.0
Newa Mpasazi Health Centre	14780	626	1312	0.5
Mpepa /Chisinga Health Centre	29456	4169	2615	1.6
Mnyanja Health Centre	43783	2504	3887	0.6
Simlemba Health Centre	28076	1788	2492	0.7
Ofesi Health Centre	30065	2124	2669	0.8
Chulu Health Centre	29731	3537	2639	1.3
Kapelula Health Centre	39747	3357	3528	1.0
Livwezi Health Centre	22945	435	2037	0.2
Gogode Dispensary	13641	1469	1211	1.2
Dwangwa Dispensary	34415	1370	3055	0.4
Chamama Health Facility	20701	1127	1838	0.6
Wimbe Health Centre	11855	2162	1052	2.1
Chinyama	13475	1260	1196	1.1
Mdunga Health Centre	19960	1485	1772	0.8
Mtunthama Health Centre	19385	1718	1721	1.0
Kasungu District Hospital	151079	13052	13411	1.0
Chamwabvi Health Centre	36899	1180	3275	0.4
Linyangwa Health Centre	18279	2692	1623	1.7
Kawamba Health Centre	23041	3469	2045	1.7
Mziza Health Centre	48340	5689	4291	1.3
Kamboni Health Centre	21509	2537	1909	1.3
Khola Health Centre	17761	2139	1577	1.4
Santhe Health Centre	6435	4424	571	7.7
Anchor Farm	50995	1369	4527	0.3
Mkhota Health Centre	22677	2268	2013	1.1

SMR_table_2020

Names	pop_2020	observed_2020	expected_2020	SMR
Lodjwa Health Centre	13081	1788	1537	1.2
Nkhamenya Rural Hospital	53692	8539	6308	1.4
Newa Mpasazi Health Centre	18311	2182	2151	1.0
Mpepa /Chisinga Health Centre	36317	5186	4266	1.2
Mnyanja Health Centre	54649	6117	6420	1.0
Simlemba Health Centre	34240	5310	4022	1.3
Ofesi Health Centre	37240	2323	4375	0.5
Chulu Health Centre	36638	7160	4304	1.7
Kapelula Health Centre	50214	7297	5899	1.2
Livwezi Health Centre	27786	1028	3264	0.3
Gogode Dispensary	16681	2767	1960	1.4
Dwangwa Dispensary	42282	2869	4967	0.6
Chamama Health Facility	25248	635	2966	0.2
Wimbe Health Centre	14367	2233	1688	1.3
Chinyama	16463	1605	1934	0.8
Mdunga Health Centre	25108	3169	2950	1.1
Mtunthama Health Centre	23501	1882	2761	0.7
Kasungu District Hospital	185282	19393	21767	0.9
Chamwabvi Health Centre	45106	1128	5299	0.2
Linyangwa Health Centre	22144	4380	2601	1.7
Kawamba Health Centre	27961	7073	3285	2.2
Mziza Health Centre	60510	5689	7109	0.8
Kamboni Health Centre	25750	4665	3025	1.5
Khola Health Centre	21929	3426	2576	1.3
Santhe Health Centre	7917	4891	930	5.3
Anchor Farm	62633	1665	7358	0.2
Mkhota Health Centre	27830	4592	3269	1.4

View observed and expected dry season malaria cases

# Helper function to create maps of observed and expected dry season malaria cases
create.malaria.map <- function(malaria.data, 
                               variable = NA, 
                               title = NA, 
                               legend.title = NA){
  # observed and expected malaria incidence map
  # malaria.data: data frame containing observed and expected malaria cases
  # variable: variable name (as character, in quotes e.g. "observed")
  # title: map title in quotes
  # legend.title: legend title in quotes
  # returns:
  #   a tmap-element (plots a map)
  tm_shape(malaria.data)+
    tm_fill(col = variable, 
            breaks = c(0, 500, 1000, 2500, 5000, 10000, 15000, 20000, 25000),
            palette = "YlOrRd",
            title = legend.title)+
    tm_borders(lw = 0.3)+
    tm_layout(legend.position = c(0.75,"bottom"),
              legend.text.size = 0.5,
              legend.title.size = 0.7,
              frame = FALSE)+
    tm_credits(title, 
               position = c(0.2, 0.8), 
               size = 1)
}

# Invoking the function
# 2017 observed and expected malaria cases -------------------------------------
observed_malaria_2017_map <- create.malaria.map(malaria_pop_by_catchment_2017, 
                                                variable = "dr_2017",
                                                title = "2017",
                                                legend.title = "Observed malaria")

expected_malaria_2017_map <- create.malaria.map(expected_malaria_2017,
                                                variable = "expected_2017",
                                                title = "2017",
                                                legend.title = "Expected malaria")

# 2018 observed and expected malaria cases -------------------------------------
observed_malaria_2018_map <- create.malaria.map(malaria_pop_by_catchment_2018,
                                                variable = "dr_2018",
                                                title = "2018",
                                                legend.title = "Observed malaria")

expected_malaria_2018_map <- create.malaria.map(expected_malaria_2018,
                                                variable = "expected_2018",
                                                title = "2018",
                                                legend.title = "Expected malaria")

# 2019 observed and expected malaria cases -------------------------------------
observed_malaria_2019_map <- create.malaria.map(malaria_pop_by_catchment_2019,
                                                variable = "dr_2019",
                                                title = "2019",
                                                legend.title = "Observed malaria")

expected_malaria_2019_map <- create.malaria.map(expected_malaria_2019,
                                                variable = "expected_2019",
                                                title = "2019",
                                                legend.title = "Expected malaria")

# 2020 observed and expected malaria cases -------------------------------------
observed_malaria_2020_map <- create.malaria.map(malaria_pop_by_catchment_2020,
                                                variable = "dr_2020",
                                                title = "2020",
                                                legend.title = "Observed malaria")

expected_malaria_2020_map <- create.malaria.map(expected_malaria_2020,
                                                variable = "expected_2020",
                                                title = "2020",
                                                legend.title = "Expected malaria")

# Layout maps ------------------------------------------------------------------
tmap::tmap_arrange(observed_malaria_2017_map, expected_malaria_2017_map,
                   observed_malaria_2018_map, expected_malaria_2018_map, 
                   observed_malaria_2019_map, expected_malaria_2019_map,
                   observed_malaria_2020_map, expected_malaria_2020_map, ncol = 2)

Fig 6: Observed and expected malaria incidence by health facility catchment area, Kasungu

View SMR by catchment

A ratio greater than 1.0 indicates that more malaria cases have occurred than would have been expected, while a ratio less than 1.0 indicates that less cases have occurred.

# max(SMR_2017$SMR)
# [1] 7.9
# > max(SMR_2018$SMR)
# [1] 7
# > max(SMR_2019$SMR)
# [1] 7.7
# > max(SMR_2020$SMR)
# [1] 5.3

# Helper function to create maps of SMR by catchment ----------------------------------
create.smr.map <- function(smr.data, 
                           variable = "SMR", 
                           title = NA, 
                           legend.title = "SMR"){
  # SMR by catchment map
  # smr.data: sf polygon object containing SMR by catchment data
  # variable: variable name (as character, in qoutes)
  # title: map title in quotes
  # legend.title: legend title in qoutes
  # returns:
  #   a tmap-element (plots a map)
 
  tm_shape(smr.data)+
    tm_fill(col = variable, 
            breaks = c(0, 0.5, 1, 1.5, 2, 2.5, 5, 8),
            palette = "-magma",
            title = legend.title)+
    tm_borders(lw = 0.3)+
    tm_layout(legend.position = c(0.75,"bottom"),
              legend.text.size = 0.5,
              legend.title.size = 0.7,
              frame = FALSE)+
    tm_credits(title, 
               position = c(0.2, 0.8), 
               size = 1)
}

# Invoking function -------------------------------------------------------------------
SMR_2017_map <- create.smr.map(SMR_2017, title = "2017")

SMR_2018_map <- create.smr.map(SMR_2018, title = "2018")

SMR_2019_map <- create.smr.map(SMR_2019, title = "2019")

SMR_2020_map <- create.smr.map(SMR_2020, title = "2020")

# Layout maps -------------------------------------------------------------------------
tmap::tmap_arrange(SMR_2017_map, SMR_2018_map, SMR_2019_map, SMR_2020_map, ncol = 2)

Fig. 7: Standardised morbidity ratio of malaria by health facility catchment

Calculate the proportion of the catchment population living within 1km, 2km, 3km of water bodies

First, using st_buffer, we compute 1km, 2km and 3km buffers around dry season water bodies obtained from LandSat satellite imagery using TropWet tool in Google Earth Engine. Then geometry of the buffer features are then combined resulting in resolved internal boundaries to enable extracting population values from WorldPop raster. Finally, we calculate the proportion of people in each catchment area living within water bodies.

# Combine and transform TropWet derived waterbody polygons -------------------------------
surface_waterbodies_2017 <- sf::st_as_sf(
  st_cast(
    st_union(
      st_buffer(dryseason_waterbodies_2017, dist = 30)), "POLYGON"))

surface_waterbodies_2018 <- sf::st_as_sf(
  st_cast(
    st_union(
      st_buffer(dryseason_waterbodies_2018, dist = 30)), "POLYGON"))

surface_waterbodies_2019 <- sf::st_as_sf(
  st_cast(
    st_union(
      st_buffer(dryseason_waterbodies_2019, dist = 30)), "POLYGON"))

surface_waterbodies_2020 <- sf::st_as_sf(
  st_cast(
    st_union(
      st_buffer(dryseason_waterbodies_2020, dist = 30)), "POLYGON"))

# Helper function to compute 1km, 2km and 3km buffers around the water bodies ---------------------

create.waterbody.buffer <- function(waterbody, distance, catchment){
  # function for creating buffers around waterbodies
  # arguments:
  #   waterbody:  waterbody shapefile
  #   distance: buffer distance in meters
  #   catchment: catchment area shapefile
  # returns:
  #   buffered waterbodies 
  
  # Create buffers around water bodies
  buffer_radius <- sf::st_buffer(waterbody, distance)
  
  # Dissolve the buffers
  buffer_union <- sf::st_as_sf(st_cast(st_union(buffer_radius),"MULTIPOLYGON"))
  
  # Assign attributes of the 'catchment' to each of the water bodies. 
   buffer_intersect <- sf::st_intersection(buffer_union, catchment)
  
   buffer_intersect_sf <- sf::st_as_sf(buffer_intersect)
   
  # Convert the MULTIPOLYGON object into several POLYGON objects
   buffer_intersect_polygons <- sf::st_cast(
     sf::st_buffer(buffer_intersect_sf,0.0), "MULTIPOLYGON") %>% 
     sf::st_cast("POLYGON")
  
  # Polygons being seen to be in multiple catchments
   sf::st_intersects(buffer_intersect_polygons, catchment)
  
  # Make the assumption that the attribute is constant throughout the geometry
   sf::st_agr(buffer_intersect_polygons) = "constant"
   
   sf::st_agr(catchment) = "constant"
  
  return(out = buffer_intersect_polygons)
}


# Invoking function
# For 2017 TropWet surface water polygons --------------------------------------------------------
buffer_1km_2017 <- create.waterbody.buffer(waterbody = surface_waterbodies_2017, 
                                           distance = 1000, 
                                           catchment = malire_new)

buffer_2km_2017 <- create.waterbody.buffer(waterbody = surface_waterbodies_2017, 
                                           distance = 2000, 
                                           catchment = malire_new)

buffer_3km_2017 <- create.waterbody.buffer(waterbody = surface_waterbodies_2017, 
                                           distance = 3000,
                                           catchment = malire_new)

# For 2018 TropWet surface water polygons --------------------------------------------------------
buffer_1km_2018 <- create.waterbody.buffer(waterbody = surface_waterbodies_2018, 
                                           distance = 1000, 
                                           catchment = malire_new)

buffer_2km_2018 <- create.waterbody.buffer(waterbody = surface_waterbodies_2018, 
                                           distance = 2000, 
                                           catchment = malire_new)

buffer_3km_2018 <- create.waterbody.buffer(waterbody = surface_waterbodies_2018, 
                                           distance = 3000, 
                                           catchment = malire_new)
 
# For 2019 TropWet surface water polygons ------------------------------------------------------
buffer_1km_2019 <- create.waterbody.buffer(waterbody = surface_waterbodies_2019, 
                                           distance = 1000, 
                                           catchment = malire_new)

buffer_2km_2019 <- create.waterbody.buffer(waterbody = surface_waterbodies_2019, 
                                           distance = 2000, 
                                           catchment = malire_new)

buffer_3km_2019 <- create.waterbody.buffer(waterbody = surface_waterbodies_2019, 
                                           distance = 3000, 
                                           catchment = malire_new)

# For 2020 TropWet surface water polygons ------------------------------------------------------
buffer_1km_2020 <- create.waterbody.buffer(waterbody = surface_waterbodies_2020, 
                                           distance = 1000, 
                                           catchment = malire_new)

buffer_2km_2020 <- create.waterbody.buffer(waterbody = surface_waterbodies_2020, 
                                           distance = 2000, 
                                           catchment = malire_new)

buffer_3km_2020 <- create.waterbody.buffer(waterbody = surface_waterbodies_2020, 
                                           distance = 3000, 
                                           catchment = malire_new)

View the created waterbody buffers

# Map the buffers
create.buffer.map <- function(buffers, boundary = malire_new, title = NA){
  # function for creating buffer map in ggplot
  # arguments:
  #   buffer:  waterbodies buffer polygon layer
  #   boundary: health facility catchment polygons
  #   title: main title
  # returns:
  #   a map-element (plots a map)
  ggplot(data = buffers)+
     geom_sf()+
     geom_sf(data = boundary, 
             fill = NA)+
     theme_void()+
     labs(title = title)
}

# Invoking the function
# For 2017 -------------------------------------------------------------------------------
buffer_1km_2017_map <- create.buffer.map(buffer_1km_2017, title = "2017: 1km Buffers")

buffer_2km_2017_map <- create.buffer.map(buffer_2km_2017, title = "2017: 2km Buffers")

buffer_3km_2017_map <- create.buffer.map(buffer_3km_2017, title = "2017: 3km Buffers")

# For 2018 --------------------------------------------------------------------------------
buffer_1km_2018_map <- create.buffer.map(buffer_1km_2018, title = "2018: 1km Buffers")

buffer_2km_2018_map <- create.buffer.map(buffer_2km_2018, title = "2018: 2km Buffers")

buffer_3km_2018_map <- create.buffer.map(buffer_3km_2018, title = "2018: 3km Buffers")

# For 2019 ---------------------------------------------------------------------------------
buffer_1km_2019_map <- create.buffer.map(buffer_1km_2019, title = "2019: 1km Buffers")

buffer_2km_2019_map <- create.buffer.map(buffer_2km_2019, title = "2019: 2km Buffers")

buffer_3km_2019_map <- create.buffer.map(buffer_3km_2019, title = "2019: 3km Buffers")

# For 2020 --------------------------------------------------------------------------------
buffer_1km_2020_map <- create.buffer.map(buffer_1km_2020, title = "2020: 1km Buffers")

buffer_2km_2020_map <- create.buffer.map(buffer_2km_2020, title = "2020: 2km Buffers")

buffer_3km_2020_map <- create.buffer.map(buffer_3km_2020, title = "2020: 3km Buffers")
 
grid.arrange(buffer_1km_2017_map, buffer_1km_2018_map, buffer_1km_2019_map, buffer_1km_2020_map,
             buffer_2km_2017_map, buffer_2km_2018_map, buffer_2km_2019_map, buffer_2km_2020_map, 
             buffer_3km_2017_map, buffer_3km_2018_map, buffer_3km_2019_map, buffer_3km_2020_map, ncol = 4)

Fig 8. Buffers around dry season waterbodies in Kasungu

Extract the population living within waterbody buffers by catchment area

# Helper function to calculate estimated number of people living within waterbody buffers
# in each catchment area
estimate.buffer.pop <- function(catchment.population, buffers, catchment.area){
  
  # Extract population estimates from WorldPop raster
  buffers$buffer_pop <- raster::extract(catchment.population,
                                        buffers, 
                                        fun = sum, 
                                        na.rm = TRUE)
                                               
                                              
  # Find which catchment each polygon belongs to using its centroid - a point dataset 
  # representing the geographic center-points of the polygons 
  buffer_by_catchment <- st_intersection(st_centroid(buffers), catchment.area)
  
  # Notice that the buffer_catchment is comprised of separate POLYGONS (buffer_by_catchment$x). 
  # The first step is to “dissolve” away these POLYGONS into one MULTIPOLYGON. 
  # There is no sf equivalent to the QGIS or ArcMap “dissolve” operation. 
  # Instead we use a combination of group_by and summarize from the dplyr package. 
  # Stats::aggregate from sf package, and dplyr::summarize both do essentially the same.
   buffer_pop_aggregated <- buffer_by_catchment %>% 
     dplyr::group_by(DN) %>%
     dplyr::summarize(
       buffer_pop_aggregated = round(sum(buffer_pop, na.rm = TRUE)))
   
  buffer_pop <- merge(
    catchment.area, st_drop_geometry(
      buffer_pop_aggregated), by = 'DN', all.x = TRUE)
  
  return(out = buffer_pop)
  
}

# Invoking the function and calculating proportion of 
# catchment population living within buffers
# 2017 buffer population -------------------------------------------------------
buffer_pop_1km_2017 <- estimate.buffer.pop(
  kasungu_population_2017, 
  buffer_1km_2017, 
   malaria_pop_by_catchment_2017) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100)) 

buffer_pop_2km_2017 <- estimate.buffer.pop(
  kasungu_population_2017,
  buffer_2km_2017,
  malaria_pop_by_catchment_2017) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100)) 

buffer_pop_3km_2017 <- estimate.buffer.pop(
  kasungu_population_2017,
  buffer_3km_2017,
  malaria_pop_by_catchment_2017) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

# 2018 buffer population -------------------------------------------------------
buffer_pop_1km_2018 <- estimate.buffer.pop(
  kasungu_population_2018,
  buffer_1km_2018,
  malaria_pop_by_catchment_2018) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

buffer_pop_2km_2018 <- estimate.buffer.pop(
  kasungu_population_2018,
  buffer_2km_2018,
  malaria_pop_by_catchment_2018) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

buffer_pop_3km_2018 <- estimate.buffer.pop(
  kasungu_population_2018,
  buffer_3km_2018,
  malaria_pop_by_catchment_2018) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

# 2019 buffer population -------------------------------------------------------
buffer_pop_1km_2019 <- estimate.buffer.pop(
  kasungu_population_2019,
  buffer_1km_2019,
  malaria_pop_by_catchment_2019) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

buffer_pop_2km_2019 <- estimate.buffer.pop(
  kasungu_population_2019,
  buffer_2km_2019,
  malaria_pop_by_catchment_2019) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

buffer_pop_3km_2019 <- estimate.buffer.pop(
  kasungu_population_2019,
  buffer_3km_2019,
  malaria_pop_by_catchment_2019) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

# 2020 buffer population -------------------------------------------------------
buffer_pop_1km_2020 <- estimate.buffer.pop(
  kasungu_population_2020,
  buffer_1km_2020,
  malaria_pop_by_catchment_2020) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

buffer_pop_2km_2020 <- estimate.buffer.pop(
  kasungu_population_2020,
  buffer_2km_2020,
  malaria_pop_by_catchment_2020) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

buffer_pop_3km_2020 <- estimate.buffer.pop(
  kasungu_population_2020,
  buffer_3km_2020,
  malaria_pop_by_catchment_2020) %>% 
  dplyr::rename(catchment_pop = pop,
                buffer_pop = buffer_pop_aggregated) %>% 
  dplyr::mutate(
    prop_buffer_catchment_pop = round((buffer_pop/catchment_pop)*100))

Mapping proportion of catchment population living within waterbodies

# Helper function to create maps of proportion of people living in proximity ----------
# to water bodies in each catchment area
create.pop.proportion.map <- function(pop.data, 
                                      variable = "prop_buffer_catchment_pop", 
                                      title = NA, 
                                      legend.title = NA){
 
  # pop.data: sf polygon object containing proportion of catchment population 
  #           living within water bodies
  # variable: variable name (as character, in qoutes)
  # title: map title in quotes
  # legend.title: legend title in qoutes
  # returns:
  #   a tmap-element (plots a map)
 
  tm_shape(pop.data)+
    tm_fill(col = variable, 
            breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
            palette = "YlOrBr",
            title = legend.title)+
    tm_borders(lw = 0.3)+
    tm_layout(legend.position = c(0.8,"bottom"),
              legend.text.size = 0.5,
              legend.title.size = 0.7,
              frame = FALSE)+
    tm_credits(title, 
               position = c(0.25, 0.75), 
               size = 1)
}

# Invoking function 
# 2017 population proportion ---------------------------------------------------
pop_proportion_1km_2017_map <- create.pop.proportion.map(
  buffer_pop_1km_2017, 
  title = "2017",
  legend.title = "Population within \n1km buffers (%)")

pop_proportion_2km_2017_map <- create.pop.proportion.map(
  buffer_pop_2km_2017, 
  title = "2017",
  legend.title = "Population within \n2km buffers (%)")

pop_proportion_3km_2017_map <- create.pop.proportion.map(
  buffer_pop_3km_2017,
  title = "2017",
  legend.title = "Population within \n3km buffers (%)")

# 2018 population proportion ---------------------------------------------------
pop_proportion_1km_2018_map <- create.pop.proportion.map(
  buffer_pop_1km_2018,
  title = "2018",
  legend.title = "Population within \n1km buffers (%)")

pop_proportion_2km_2018_map <- create.pop.proportion.map(
  buffer_pop_2km_2018,
  title = "2018",
  legend.title = "Population within \n2km buffers (%)")

pop_proportion_3km_2018_map <- create.pop.proportion.map(
  buffer_pop_3km_2018,
  title = "2018",
  legend.title = "Population within \n3km buffers (%)")

# 2019 population proportion ---------------------------------------------------
pop_proportion_1km_2019_map <- create.pop.proportion.map(
  buffer_pop_1km_2019,
  title = "2019",
  legend.title = "Population within \n1km buffers (%)")

pop_proportion_2km_2019_map <- create.pop.proportion.map(
  buffer_pop_2km_2019,
  title = "2019",
  legend.title = "Population within \n2km buffers (%)")

pop_proportion_3km_2019_map <- create.pop.proportion.map(
  buffer_pop_3km_2019,
  title = "2019",
  legend.title = "Population within \n3km buffers (%)")

# 2020 population proportion ---------------------------------------------------
pop_proportion_1km_2020_map <- create.pop.proportion.map(
  buffer_pop_1km_2020,
  title = "2020",
  legend.title = "Population within \n1km buffers (%)")

pop_proportion_2km_2020_map <- create.pop.proportion.map(
  buffer_pop_2km_2020,
  title = "2020",
  legend.title = "Population within \n2km buffers (%)")

pop_proportion_3km_2020_map <- create.pop.proportion.map(
  buffer_pop_3km_2020,
  title = "2020",
  legend.title = "Population within \n3km buffers (%)")

# Layout maps ------------------------------------------------------------------
tmap::tmap_arrange(pop_proportion_1km_2017_map, pop_proportion_2km_2017_map, 
                   pop_proportion_3km_2017_map, pop_proportion_1km_2018_map,
                   pop_proportion_2km_2018_map, pop_proportion_3km_2018_map,
                   pop_proportion_1km_2019_map, pop_proportion_2km_2019_map,
                   pop_proportion_3km_2019_map, pop_proportion_1km_2020_map,
                   pop_proportion_2km_2020_map, pop_proportion_3km_2020_map, ncol = 3)

Fig 8. Proportion of catchment population living around water bodies

Scatter plots of SMR against the proportion of the catchment population living waterbody buffers

A correlation coeeficient of more than zero (cor.coeff r > 0.1) indicates some positive association between the SMR and the buffer population variables. That is, SMR of dry season malaria increases with increase in number of people surrounding water bodies.

# Helper function to tidy and bind the SMR and proportion of -------------------
# buffer-catchment population data frames
tidy.data <- function(smr.df, 
                      proportion.pop.1km, 
                      proprotion.pop.2km,
                      proportion.pop.3km){

# Convert the sf objects to data frames-------------------------------------------
smr_df <- as.data.frame(smr.df) %>% 
  dplyr::select(rowID, Names, SMR)

proportion_pop_1km_df <- as.data.frame(proportion.pop.1km) %>% 
  dplyr::select(rowID, prop_pop_1km = `prop_buffer_catchment_pop`)

proportion_pop_2km_df <- as.data.frame(proprotion.pop.2km)%>% 
  dplyr::select(rowID, prop_pop_2km = `prop_buffer_catchment_pop`)

proportion_pop_3km_df <- as.data.frame(proportion.pop.3km)%>% 
  dplyr::select(rowID, prop_pop_3km = `prop_buffer_catchment_pop`)

# Merge SMR and population data frames -----------------------------------------
combined_1 <- merge(smr_df, proportion_pop_1km_df, by = "rowID", all = TRUE)

combined_2 <- merge(proportion_pop_2km_df, proportion_pop_3km_df)

combined_fully <- merge(combined_1, combined_2, by = "rowID", all = TRUE)

}

# Invoking the function --------------------------------------------------------
smr_pop_2017 <- tidy.data(SMR_2017, buffer_pop_1km_2017, buffer_pop_2km_2017, buffer_pop_3km_2017)

smr_pop_2018 <- tidy.data(SMR_2018, buffer_pop_1km_2018, buffer_pop_2km_2018, buffer_pop_3km_2018)

smr_pop_2019 <- tidy.data(SMR_2019, buffer_pop_1km_2019, buffer_pop_2km_2019, buffer_pop_3km_2019)

smr_pop_2020 <- tidy.data(SMR_2020, buffer_pop_1km_2020, buffer_pop_2km_2020, buffer_pop_3km_2020)

# Helper function to create scatter plots --------------------------------------
create.scatter.plot <- function(smr.pop.df, 
                                independent.var = NA,
                                dependent.var = "SMR",
                                x.label = NA,
                                plot.title = NA){
  
  scatter.plot <- ggpubr::ggscatter(smr.pop.df,          # data frame
                                    x = independent.var, # x-axis variable
                                    y = dependent.var,   # y-axis variable
                                    add = "reg.line",    # Add regression line
                                    conf.int = TRUE,     # Add confidence interval
                                    add.params = list(color = "red",
                                                      fill = "lightgray"),
                                    palette = "jco",     # journal color palette. see ?ggpar
                                    xlab = x.label,      # x-axis label
                                    ylab = "SMR",        # y-axis label
                                    title = plot.title)+    
                  ggpubr::stat_cor(label.y = 4)+         # Add correlation coefficient
                  ggpubr::font("title", size = 10, face = "bold")+
                  ggpubr::font("xlab", size = 10)+
                  ggpubr::font("ylab", size = 10)
 
  return(scatter.plot)
  
}

# Invoking function 
# 2017 scatter plots ------------------------------------------------------------

scatter_1km_2017 <- create.scatter.plot(smr_pop_2017, independent.var = "prop_pop_1km",
                                       x.label = "Percentage of catchment population \nliving in 1km buffer",
                                       plot.title = "2017")

scatter_2km_2017 <- create.scatter.plot(smr_pop_2017, independent.var = "prop_pop_2km",
                                        x.label = "Percentage of catchment population \nliving in 2km buffer",
                                        plot.title = "2017")

scatter_3km_2017 <- create.scatter.plot(smr_pop_2017, independent.var = "prop_pop_3km",
                                        x.label = "Percentage of catchment population \nliving in 3km buffer",
                                        plot.title = "2017")

# 2018 scatter plots -----------------------------------------------------------
scatter_1km_2018 <- create.scatter.plot(smr_pop_2018, independent.var = "prop_pop_1km",
                                        x.label = "Percentage of catchment population \nliving in 1km buffer",
                                        plot.title = "2018")

scatter_2km_2018 <- create.scatter.plot(smr_pop_2018, independent.var = "prop_pop_2km",
                                        x.label = "Percentage of catchment population \nliving in 2km buffer",
                                        plot.title = "2018")

scatter_3km_2018 <- create.scatter.plot(smr_pop_2018, independent.var = "prop_pop_3km",
                                        x.label = "Percentage of catchment population \nliving in 3km buffer",
                                        plot.title = "2018")

# 2019 scatter plots -----------------------------------------------------------
scatter_1km_2019 <- create.scatter.plot(smr_pop_2019, independent.var = "prop_pop_1km",
                                        x.label = "Percentage of catchment population \nliving in 1km buffer",
                                        plot.title = "2019")

scatter_2km_2019 <- create.scatter.plot(smr_pop_2019, independent.var = "prop_pop_2km",
                                        x.label = "Percentage of catchment population \nliving in 2km buffer",
                                        plot.title = "2019")

scatter_3km_2019 <- create.scatter.plot(smr_pop_2019, independent.var = "prop_pop_3km",
                                        x.label = "Percentage of catchment population \nliving in 3km buffer",
                                        plot.title = "2019")

# 2020 scatter plots -----------------------------------------------------------
scatter_1km_2020 <- create.scatter.plot(smr_pop_2020, independent.var = "prop_pop_1km",
                                        x.label = "Percentage of catchment population \nliving in 1km buffer",
                                        plot.title = "2020")

scatter_2km_2020 <- create.scatter.plot(smr_pop_2020, independent.var = "prop_pop_2km",
                                        x.label = "Percentage of catchment population \nliving in 2km buffer",
                                        plot.title = "2020")

scatter_3km_2020 <- create.scatter.plot(smr_pop_2020, independent.var = "prop_pop_3km",
                                        x.label = "Percentage of catchment population \nliving in 3km buffer",
                                        plot.title = "2020")

# Arrange the plots ------------------------------------------------------------
ggpubr::ggarrange(scatter_1km_2017, scatter_2km_2017, scatter_3km_2017,
                  scatter_1km_2018, scatter_2km_2018, scatter_3km_2019,
                  scatter_1km_2019, scatter_2km_2019, scatter_3km_2019, 
                  scatter_1km_2020, scatter_2km_2020, scatter_3km_2020,
                  ncol = 3, nrow = 4)

Fig 10. Relationship between standardised morbidity ratio and living near waterbodies

Model fitting

# Combine data for model fitting -----------------------------------------------
model_data_2017 <- merge(expected_malaria_2017, smr_pop_2017, by = "rowID", all = TRUE) %>% 
  dplyr::select(-Names.y) %>% 
  dplyr::rename(Names = Names.x)

model_data_2018 <-  merge(expected_malaria_2018, smr_pop_2018, by = "rowID", all = TRUE) %>% 
  dplyr::select(-Names.y) %>% 
  dplyr::rename(Names = Names.x)

model_data_2019 <-  merge(expected_malaria_2019, smr_pop_2019, by = "rowID", all = TRUE) %>% 
  dplyr::select(-Names.y) %>% 
  dplyr::rename(Names = Names.x)

model_data_2020 <-  merge(expected_malaria_2020, smr_pop_2020, by = "rowID", all = TRUE) %>% 
  dplyr::select(-Names.y) %>% 
  dplyr::rename(Names = Names.x)

# Fit generalised linear model -------------------------------------------------
# Defining model parameters:
# response variable: observed_2017, observed_2018, observed_2019, observed_2020 are
#                    recorded dry season malaria cases in that year
# risk factor: prop_pop_1km,  prop_pop_2km,  prop_pop_3km are the percentage of people living
#             within 1km, 2km and 3km buffers of water bodies, respectively.
# offset: expected_* is the number of malaria cases we would expect if the malaria rate
#          was equal in all the catchment areas

# 2017 -------------------------------------------------------------------------
model_1km_2017 <- glm(observed_2017~1+prop_pop_1km+offset(log(expected_2017)),
                      data = model_data_2017, family = 'poisson')

summary(model_1km_2017)

## 
## Call:
## glm(formula = observed_2017 ~ 1 + prop_pop_1km + offset(log(expected_2017)), 
##     family = "poisson", data = model_data_2017)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -41.710  -16.212   -4.071   15.861  102.821  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.2263102  0.0076125  -29.73   <2e-16 ***
## prop_pop_1km  0.0259495  0.0006364   40.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 22888  on 23  degrees of freedom
## Residual deviance: 21257  on 22  degrees of freedom
##   (3 observations deleted due to missingness)
## AIC: 21485
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_1km_2017, digits = 3, digits.re = 3)

	observed_2017
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.797	0.786 – 0.809	<0.001
prop_pop_1km	1.026	1.025 – 1.028	<0.001
Observations	24
R² Nagelkerke	1.000

model_2km_2017 <- glm(observed_2017~1+prop_pop_2km+offset(log(expected_2017)),
                      data = model_data_2017, family = 'poisson')

summary(model_2km_2017)

## 
## Call:
## glm(formula = observed_2017 ~ 1 + prop_pop_2km + offset(log(expected_2017)), 
##     family = "poisson", data = model_data_2017)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -37.790  -16.287   -3.521   14.516  102.891  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.3376108  0.0081235  -41.56   <2e-16 ***
## prop_pop_2km  0.0143939  0.0002717   52.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 22981  on 24  degrees of freedom
## Residual deviance: 20288  on 23  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 20525
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_2km_2017, digits = 3, digits.re = 3)

	observed_2017
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.713	0.702 – 0.725	<0.001
prop_pop_2km	1.014	1.014 – 1.015	<0.001
Observations	25
R² Nagelkerke	1.000

model_3km_2017 <- glm(observed_2017~1+prop_pop_3km+offset(log(expected_2017)),
                      data = model_data_2017, family = 'poisson')

summary(model_3km_2017)

## 
## Call:
## glm(formula = observed_2017 ~ 1 + prop_pop_3km + offset(log(expected_2017)), 
##     family = "poisson", data = model_data_2017)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -41.588  -15.524   -1.491   12.971  105.199  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.421804   0.009805  -43.02   <2e-16 ***
## prop_pop_3km  0.010746   0.000210   51.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 22981  on 24  degrees of freedom
## Residual deviance: 20355  on 23  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 20592
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_3km_2017, digits = 3, digits.re = 3)

	observed_2017
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.656	0.643 – 0.669	<0.001
prop_pop_3km	1.011	1.010 – 1.011	<0.001
Observations	25
R² Nagelkerke	1.000

# 2018 -------------------------------------------------------------------------
model_1km_2018 <- glm(observed_2018~1+prop_pop_1km+offset(log(expected_2018)),
                      data = model_data_2018, family = 'poisson')

summary(model_1km_2018)

## 
## Call:
## glm(formula = observed_2018 ~ 1 + prop_pop_1km + offset(log(expected_2018)), 
##     family = "poisson", data = model_data_2018)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -42.095  -16.161    8.221   11.932  101.646  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.4013420  0.0077538  -51.76   <2e-16 ***
## prop_pop_1km  0.0320765  0.0005163   62.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 26459  on 25  degrees of freedom
## Residual deviance: 22588  on 24  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 22842
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_1km_2018, digits = 3, digits.re = 3)

	observed_2018
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.669	0.659 – 0.680	<0.001
prop_pop_1km	1.033	1.032 – 1.034	<0.001
Observations	26
R² Nagelkerke	1.000

model_2km_2018 <- glm(observed_2018~1+prop_pop_2km+offset(log(expected_2018)),
                      data = model_data_2018, family = 'poisson')

summary(model_2km_2018)

## 
## Call:
## glm(formula = observed_2018 ~ 1 + prop_pop_2km + offset(log(expected_2018)), 
##     family = "poisson", data = model_data_2018)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -47.897  -14.455    5.506   13.274   84.708  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.5331448  0.0085344  -62.47   <2e-16 ***
## prop_pop_2km  0.0157525  0.0002141   73.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 26459  on 25  degrees of freedom
## Residual deviance: 20965  on 24  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 21219
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_2km_2018, digits = 3, digits.re = 3)

	observed_2018
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.587	0.577 – 0.597	<0.001
prop_pop_2km	1.016	1.015 – 1.016	<0.001
Observations	26
R² Nagelkerke	1.000

model_3km_2018 <- glm(observed_2018~1+prop_pop_3km+offset(log(expected_2018)),
                      data = model_data_2018, family = 'poisson')

summary(model_3km_2018)

## 
## Call:
## glm(formula = observed_2018 ~ 1 + prop_pop_3km + offset(log(expected_2018)), 
##     family = "poisson", data = model_data_2018)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -46.83  -10.85    7.01   15.01   78.89  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.8134482  0.0112691  -72.18   <2e-16 ***
## prop_pop_3km  0.0155327  0.0001932   80.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 26459  on 25  degrees of freedom
## Residual deviance: 19572  on 24  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 19826
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_3km_2018, digits = 3, digits.re = 3)

	observed_2018
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.443	0.434 – 0.453	<0.001
prop_pop_3km	1.016	1.015 – 1.016	<0.001
Observations	26
R² Nagelkerke	1.000

# 2019 -------------------------------------------------------------------------
model_1km_2019 <- glm(observed_2019~1+prop_pop_1km+offset(log(expected_2019)),
                      data = model_data_2019, family = 'poisson')

summary(model_1km_2019)

## 
## Call:
## glm(formula = observed_2019 ~ 1 + prop_pop_1km + offset(log(expected_2019)), 
##     family = "poisson", data = model_data_2019)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -60.843  -11.742    3.299   11.319   98.797  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.2351021  0.0085176  -27.60   <2e-16 ***
## prop_pop_1km  0.0127932  0.0004057   31.54   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 24276  on 26  degrees of freedom
## Residual deviance: 23273  on 25  degrees of freedom
## AIC: 23534
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_1km_2019, digits = 3, digits.re = 3)

	observed_2019
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.790	0.777 – 0.804	<0.001
prop_pop_1km	1.013	1.012 – 1.014	<0.001
Observations	27
R² Nagelkerke	1.000

model_2km_2019 <- glm(observed_2019~1+prop_pop_2km+offset(log(expected_2019)),
                      data = model_data_2019, family = 'poisson')

summary(model_2km_2019)

## 
## Call:
## glm(formula = observed_2019 ~ 1 + prop_pop_2km + offset(log(expected_2019)), 
##     family = "poisson", data = model_data_2019)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -57.871  -12.608    3.258   13.949   99.011  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.1813931  0.0096831  -18.73   <2e-16 ***
## prop_pop_2km  0.0040086  0.0001946   20.60   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 24276  on 26  degrees of freedom
## Residual deviance: 23847  on 25  degrees of freedom
## AIC: 24107
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_2km_2019, digits = 3, digits.re = 3)

	observed_2019
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.834	0.818 – 0.850	<0.001
prop_pop_2km	1.004	1.004 – 1.004	<0.001
Observations	27
R² Nagelkerke	1.000

model_3km_2019 <- glm(observed_2019~1+prop_pop_3km+offset(log(expected_2019)),
                      data = model_data_2019, family = 'poisson')

summary(model_3km_2019)

## 
## Call:
## glm(formula = observed_2019 ~ 1 + prop_pop_3km + offset(log(expected_2019)), 
##     family = "poisson", data = model_data_2019)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -56.890  -12.331    2.847   14.390   98.273  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.2456583  0.0117497  -20.91   <2e-16 ***
## prop_pop_3km  0.0035830  0.0001604   22.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 24276  on 26  degrees of freedom
## Residual deviance: 23765  on 25  degrees of freedom
## AIC: 24026
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_3km_2019, digits = 3, digits.re = 3)

	observed_2019
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.782	0.764 – 0.800	<0.001
prop_pop_3km	1.004	1.003 – 1.004	<0.001
Observations	27
R² Nagelkerke	1.000

# 2020 -------------------------------------------------------------------------
model_1km_2020 <- glm(observed_2020~1+prop_pop_1km+offset(log(expected_2020)),
                      data = model_data_2020, family = 'poisson')

summary(model_1km_2020)

## 
## Call:
## glm(formula = observed_2020 ~ 1 + prop_pop_1km + offset(log(expected_2020)), 
##     family = "poisson", data = model_data_2020)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -80.514  -14.770    4.389   23.530   91.020  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   0.0508589  0.0060781   8.368   <2e-16 ***
## prop_pop_1km -0.0052146  0.0004923 -10.592   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 35338  on 22  degrees of freedom
## Residual deviance: 35226  on 21  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 35458
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_1km_2020, digits = 3, digits.re = 3)

	observed_2020
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	1.052	1.040 – 1.065	<0.001
prop_pop_1km	0.995	0.994 – 0.996	<0.001
Observations	23
R² Nagelkerke	0.992

model_2km_2020 <- glm(observed_2020~1+prop_pop_2km+offset(log(expected_2020)),
                      data = model_data_2020, family = 'poisson')

summary(model_2km_2020)

## 
## Call:
## glm(formula = observed_2020 ~ 1 + prop_pop_2km + offset(log(expected_2020)), 
##     family = "poisson", data = model_data_2020)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -76.405  -25.721    7.898   27.184   90.141  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.1647670  0.0066634  -24.73   <2e-16 ***
## prop_pop_2km  0.0052467  0.0002156   24.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 36442  on 23  degrees of freedom
## Residual deviance: 35851  on 22  degrees of freedom
##   (3 observations deleted due to missingness)
## AIC: 36093
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_2km_2020, digits = 3, digits.re = 3)

	observed_2020
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.848	0.837 – 0.859	<0.001
prop_pop_2km	1.005	1.005 – 1.006	<0.001
Observations	24
R² Nagelkerke	1.000

model_3km_2020 <- glm(observed_2020~1+prop_pop_3km+offset(log(expected_2020)),
                      data = model_data_2020, family = 'poisson')

summary(model_3km_2020)

## 
## Call:
## glm(formula = observed_2020 ~ 1 + prop_pop_3km + offset(log(expected_2020)), 
##     family = "poisson", data = model_data_2020)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -74.75  -25.76   10.26   28.33   88.63  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.2402844  0.0077064  -31.18   <2e-16 ***
## prop_pop_3km  0.0048994  0.0001564   31.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 36442  on 23  degrees of freedom
## Residual deviance: 35442  on 22  degrees of freedom
##   (3 observations deleted due to missingness)
## AIC: 35683
## 
## Number of Fisher Scoring iterations: 5

sjPlot::tab_model(model_3km_2020, digits = 3, digits.re = 3)

	observed_2020
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.786	0.775 – 0.798	<0.001
prop_pop_3km	1.005	1.005 – 1.005	<0.001
Observations	24
R² Nagelkerke	1.000

Yearly variation in malaria cases as a risk factor

# Gather year ------------------------------------------------------------------
model_data_2017_longer <- as.data.frame(model_data_2017) %>%
  dplyr::select(rowID, Names, observed_2017, dr_2018, dr_2019, dr_2020, 
                expected_2017, prop_pop_1km, prop_pop_2km, prop_pop_3km) %>% 
  dplyr::mutate(observed = observed_2017) %>% 
  dplyr::rename(`2017` = observed_2017,
                `2018` = dr_2018,
                `2019` = dr_2019,
                `2020` = dr_2020) %>%
  as.data.frame() %>% 
  tidyr::pivot_longer(cols = `2017`:`2020`,
                      names_to = 'year',
                      values_to = 'malaria_cases')

# Model fitting ----------------------------------------------------------------
multivariate_1km_2017 <- glm(observed~1+prop_pop_1km+year+offset(log(expected_2017)),
                             data = model_data_2017_longer, family = 'poisson')

summary(multivariate_1km_2017)
sjPlot::tab_model(multivariate_1km_2017)


multivariate_2km_2017 <- glm(observed~1+prop_pop_2km+year+offset(log(expected_2017)),
                             data = model_data_2017_longer, family = 'poisson')

summary(multivariate_2km_2017)
sjPlot::tab_model(multivariate_2km_2017)

multivariate_3km_2017 <- glm(observed~1+prop_pop_3km+year+offset(log(expected_2017)),
                             data = model_data_2017_longer, family = 'poisson')

summary(multivariate_3km_2017)
sjPlot::tab_model(multivariate_3km_2017)

Check how well the fitted values line up with the observations

The fitted values appear to line up particularly well with the observed data, suggesting that prop_pop_* (i.e., proportion of catchment population living near water bodies) can help us understand malaria risk in the catchment areas.

# Helper function to create scatter plots to see how well 
# fitted values line up with observed malaria cases
plot.fitted.values <- function(fitted.values.df, model.df, title){
  
  # Remove missing values from model data since 
  # model fitting deletes missing observations
  model.df.complete <- model.df %>% 
    tidyr::drop_na() %>% 
    dplyr::rename_at(vars(starts_with("observed_")), ~ str_c("observed"))

  # Plot fitted versus observed values
  scatter.plot <- ggplot2::ggplot()+ 
                  ggplot2::geom_point(aes(fitted.values.df$fitted.values, 
                                      model.df.complete$observed))+
                  ggplot2::theme_classic()+
                  ggplot2::labs(y = "Observed values", 
                                x = "Fitted values",
                                title = title)
  return(scatter.plot)
}

# Invoking function 
# 2017 -------------------------------------------------------------------------
fitted_1km_2017 <- plot.fitted.values(model_1km_2017, model_data_2017, "2017: 1km model")

fitted_2km_2017 <- plot.fitted.values(model_2km_2017, model_data_2017, "2017: 2km model")

fitted_3km_2017 <- plot.fitted.values(model_3km_2017, model_data_2017, "2017: 3km model")

# 2018 -------------------------------------------------------------------------
fitted_1km_2018 <- plot.fitted.values(model_1km_2018, model_data_2018, "2018: 1km model")

fitted_2km_2018 <- plot.fitted.values(model_2km_2018, model_data_2018, "2018: 2km model")

fitted_3km_2018 <- plot.fitted.values(model_3km_2018, model_data_2018, "2018: 3km model")

# 2019 -------------------------------------------------------------------------
fitted_1km_2019 <- plot.fitted.values(model_1km_2019, model_data_2019, "2019: 1km model")

fitted_2km_2019 <- plot.fitted.values(model_2km_2019, model_data_2019, "2019: 2km model")

fitted_3km_2019 <- plot.fitted.values(model_3km_2019, model_data_2020, "2019: 3km model")

# 2020 -------------------------------------------------------------------------
fitted_1km_2020 <- plot.fitted.values(model_1km_2020, model_data_2020, "2020: 1km model")

fitted_2km_2020 <- plot.fitted.values(model_2km_2020, model_data_2020, "2020: 2km model")

fitted_3km_2020 <- plot.fitted.values(model_3km_2020, model_data_2020, "2020: 3km model")

# Layout scatter plots ---------------------------------------------------------
cowplot::plot_grid(fitted_1km_2017, fitted_1km_2018, fitted_2km_2018, fitted_3km_2018,
                   fitted_1km_2019, fitted_2km_2019, fitted_1km_2020, ncol = 3, nrow = 3)

Fig 12. How well the percentage of catchment population living around water bodies explain observed malaria incidence

Test for residual spatial autocorrelation using adjacency as criterion

# Find adjacent polygons,
# Contiguity neighbors - all that share a boundary point
catchment_neighbours <- spdep::poly2nb(model_data_2017)  # Queen contiguity

summary(catchment_neighbours)

## Neighbour list object:
## Number of regions: 27 
## Number of nonzero links: 106 
## Percentage nonzero weights: 14.54047 
## Average number of links: 3.925926 
## Link number distribution:
## 
## 1 2 3 4 5 6 7 8 
## 1 4 7 6 5 2 1 1 
## 1 least connected region:
## 1 with 1 link
## 1 most connected region:
## 18 with 8 links

# Get coordinates from catchment polygons
catchment_points <- as(model_data_2017, "Spatial")

coords <- coordinates(catchment_points)

# Get catchment boundaries and convert to spatial object
catchment_polygons <- as(model_data_2017, "Spatial")

# View the neighbors
{plot(catchment_polygons, asp = 1)+
plot(catchment_neighbours, coords, col = "blue", add = TRUE)}

Fig 13. Neighbourhood matrix

## integer(0)

# Run a Conditional Autoregressive (CAR) model, which allows us to incorporate 
# the spatial autocorrelation between neighbours within our GLM

# First, generate a weights matrix from a neighbours list with spatial weights
adj_matrix <- spdep::nb2mat(catchment_neighbours, style = "B") # see ?nb2mat

# Match row and column names with those of geographic location index 
rownames(adj_matrix) <- colnames(adj_matrix) <- model_data_2017$rowID
# row.names(adj_matrix) <- NULL # alternatively

# Now we can fit the model. The spatial effect is called using the adjacency function which 
# requires the grouping factor (i.e. the rowID of each catchment area)

CAR_model_1km_2017 <- spaMM::fitme(observed_2017~prop_pop_1km+offset(log(expected_2017)),
                                   adjMatrix = adj_matrix, 
                                   data = model_data_2017, 
                                   family = 'poisson')

# Generate 95% CI
coefs <- as.data.frame(summary(CAR_model_1km_2017)$beta_table)

## formula: observed_2017 ~ prop_pop_1km + offset(log(expected_2017))
## Estimation of fixed effects by ML.
## family: poisson( link = log ) 
##  ------------ Fixed effects (beta) ------------
##              Estimate  Cond. SE t-value
## (Intercept)  -0.22631 0.0076125  -29.73
## prop_pop_1km  0.02595 0.0006364   40.78
##  ------------- Likelihood values  -------------
##                         logLik
## p(h)   (Likelihood): -10740.45

# Moran's I contiguity test
MI_2017 <- spdep::moran(model_data_2017$observed_2017, nb2listw(catchment_neighbours),
                        length(model_data_2017$observed_2017),Szero(nb2listw(catchment_neighbours)))