Handling Missing Values in DHS Data

Author

Jesse McDevitt-Irwin

Overview

In DHS datasets, missing values are often coded with special numbers (like 997, 998, 999). Sometimes special cases like “on premises” are assigned a different value (e.g., 996).

In this handout, we will learn how to:

  • Recode missing values properly
  • Recode special cases like “on premises”
  • Visualize changes using histograms

We will use the example of time to water source (v115) from the Mozambique 2022 DHS Individual Recode file (IR file).


Step 1: Load the Data

library(tidyverse)
library(haven)

# Define variables to load
vars_ir <- c("v115") # Only one variable for this illustration

# Load only the necessary columns
ir_raw <- read_dta("Raw/MZIR81FL.DTA",
                   col_select = all_of(vars_ir))

Step 2: Recode v115

attr(ir_raw$v115, "labels")
          on premises not a dejure resident            don't know 
                  996                   997                   998 
  • Values 997 and 998 are types of missing data
  • Value 996 means “On premises” and we want to set it to 0 minutes
ir_raw <- ir_raw %>%
  mutate(time_to_water = case_when(
    v115 %in% c(997, 998, 999) ~ NA,  # Set special codes to NA
    v115 == 996 ~ 0,                        # "On premises" = 0 minutes
    TRUE ~ as.numeric(v115)                 # Otherwise keep the original value
  ))

Step 3: Visualize Before and After

# Histogram of original v115
hist(ir_raw$v115,
     main = "Original v115 (Time to Water)",
     xlab = "Minutes",
     col = "lightblue")

We see in this figure that there are a large number of observations clustered at 1000. This is not reflecting time to water, this is reflecting the values the DHS used to represent special cases and missing values.

# Histogram after recoding
hist(ir_raw$time_to_water,
     main = "Recoded Time to Water",
     xlab = "Minutes",
     col = "lightgreen")


Summary

  • Always check for special missing value codes in DHS
  • Use case_when() to recode cleanly
  • Visualizing before and after can help confirm your transformations are correct!