Creating a Data Dictionary for BDHS Data Using R

Working with BDHS (Bangladesh Demographic and Health Survey) data? Organizing variables with a data dictionary saves time and effort! Here’s how to quickly summarize variable names, labels, unique values, and more.

Step-by-Step Guide

1️⃣ Load the Required Library

Use the expss package to manage variable and value labels in R.

if(!require(expss)) install.packages("expss")
library(expss)

2️⃣ Load Your Dataset

Import the BDHS dataset you’re working with, like PR.

PR <- read_sav("C:/R/Data/BD_2022_DHS_11112024_545_222526/BDPR81SV/BDPR81FL.SAV")

3️⃣ Create the Data Dictionary Table

This code creates a summary table of variables, labels, values, and missing data.

# Create a summary table for the PR dataset
dd.PR <- data.frame(
  Variable = names(PR),                                                  # 1️⃣ Column: Variable names
  Label = sapply(PR, var_lab),                                           # 2️⃣ Column: Variable labels
  Values = sapply(PR, function(x) paste(unique(x), collapse = ", ")),    # 3️⃣ Column: Unique values
  Value_Labels = sapply(PR, function(x) {                                # 4️⃣ Column: Value labels
    val_labels <- val_lab(x)                                               # Get value labels for each variable
    if (!is.null(val_labels)) {                                            # Check if labels exist
      paste(names(val_labels), "=", val_labels, collapse = ", ")           # Format as "name = value"
    } else {
      NA                                                                   # If no labels, assign NA
    }
  }),
  Missing_Values = sapply(PR, function(x) sum(is.na(x))),                # 5️⃣ Column: Count of missing values
  Total_Rows = nrow(PR)                                                  # 6️⃣ Column: Total rows in the dataset
)

4️⃣ Export as CSV

Save your data dictionary to a CSV file for easy sharing.

write.csv(dd.PR, "PR_data_dictionary.csv", row.names = FALSE)

Why Use a Data Dictionary?

  • Quick Analysis: Understand variables and values at a glance.
  • Team Collaboration: Share dataset structure with others easily.
  • Data Quality Check: View missing values and unique entries to assess data quality.

Customize for Any BDHS Dataset

To create a data dictionary for a different dataset, simply replace PR with the dataset name you’re using (e.g., KR, IR, VA).

LS0tDQp0aXRsZTogIlIgRGF0YSBEaWN0aW9uYXJ5Ig0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KIyBDcmVhdGluZyBhIERhdGEgRGljdGlvbmFyeSBmb3IgQkRIUyBEYXRhIFVzaW5nIFIgDQoNCldvcmtpbmcgd2l0aCBCREhTIChCYW5nbGFkZXNoIERlbW9ncmFwaGljIGFuZCBIZWFsdGggU3VydmV5KSBkYXRhPyBPcmdhbml6aW5nIHZhcmlhYmxlcyB3aXRoIGEgZGF0YSBkaWN0aW9uYXJ5IHNhdmVzIHRpbWUgYW5kIGVmZm9ydCEgSGVyZeKAmXMgaG93IHRvIHF1aWNrbHkgc3VtbWFyaXplIHZhcmlhYmxlIG5hbWVzLCBsYWJlbHMsIHVuaXF1ZSB2YWx1ZXMsIGFuZCBtb3JlLg0KDQojIyMgU3RlcC1ieS1TdGVwIEd1aWRlDQoNCiMjIyMgMe+4j+KDoyBMb2FkIHRoZSBSZXF1aXJlZCBMaWJyYXJ5DQpVc2UgdGhlIGV4cHNzIHBhY2thZ2UgdG8gbWFuYWdlIHZhcmlhYmxlIGFuZCB2YWx1ZSBsYWJlbHMgaW4gUi4NCg0KDQpgYGB7cn0NCmlmKCFyZXF1aXJlKGV4cHNzKSkgaW5zdGFsbC5wYWNrYWdlcygiZXhwc3MiKQ0KbGlicmFyeShleHBzcykNCmBgYA0KDQoNCiMjIyMgMu+4j+KDoyBMb2FkIFlvdXIgRGF0YXNldA0KSW1wb3J0IHRoZSBCREhTIGRhdGFzZXQgeW914oCZcmUgd29ya2luZyB3aXRoLCBsaWtlIFBSLg0KDQpgYGB7cn0NClBSIDwtIHJlYWRfc2F2KCJDOi9SL0RhdGEvQkRfMjAyMl9ESFNfMTExMTIwMjRfNTQ1XzIyMjUyNi9CRFBSODFTVi9CRFBSODFGTC5TQVYiKQ0KYGBgDQoNCg0KIyMjIyAz77iP4oOjIENyZWF0ZSB0aGUgRGF0YSBEaWN0aW9uYXJ5IFRhYmxlDQpUaGlzIGNvZGUgY3JlYXRlcyBhIHN1bW1hcnkgdGFibGUgb2YgdmFyaWFibGVzLCBsYWJlbHMsIHZhbHVlcywgYW5kIG1pc3NpbmcgZGF0YS4NCg0KYGBge3J9DQojIENyZWF0ZSBhIHN1bW1hcnkgdGFibGUgZm9yIHRoZSBQUiBkYXRhc2V0DQpkZC5QUiA8LSBkYXRhLmZyYW1lKA0KICBWYXJpYWJsZSA9IG5hbWVzKFBSKSwgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMgMe+4j+KDoyBDb2x1bW46IFZhcmlhYmxlIG5hbWVzDQogIExhYmVsID0gc2FwcGx5KFBSLCB2YXJfbGFiKSwgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIyAy77iP4oOjIENvbHVtbjogVmFyaWFibGUgbGFiZWxzDQogIFZhbHVlcyA9IHNhcHBseShQUiwgZnVuY3Rpb24oeCkgcGFzdGUodW5pcXVlKHgpLCBjb2xsYXBzZSA9ICIsICIpKSwgICAgIyAz77iP4oOjIENvbHVtbjogVW5pcXVlIHZhbHVlcw0KICBWYWx1ZV9MYWJlbHMgPSBzYXBwbHkoUFIsIGZ1bmN0aW9uKHgpIHsgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMgNO+4j+KDoyBDb2x1bW46IFZhbHVlIGxhYmVscw0KICAgIHZhbF9sYWJlbHMgPC0gdmFsX2xhYih4KSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIyBHZXQgdmFsdWUgbGFiZWxzIGZvciBlYWNoIHZhcmlhYmxlDQogICAgaWYgKCFpcy5udWxsKHZhbF9sYWJlbHMpKSB7ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIENoZWNrIGlmIGxhYmVscyBleGlzdA0KICAgICAgcGFzdGUobmFtZXModmFsX2xhYmVscyksICI9IiwgdmFsX2xhYmVscywgY29sbGFwc2UgPSAiLCAiKSAgICAgICAgICAgIyBGb3JtYXQgYXMgIm5hbWUgPSB2YWx1ZSINCiAgICB9IGVsc2Ugew0KICAgICAgTkEgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIyBJZiBubyBsYWJlbHMsIGFzc2lnbiBOQQ0KICAgIH0NCiAgfSksDQogIE1pc3NpbmdfVmFsdWVzID0gc2FwcGx5KFBSLCBmdW5jdGlvbih4KSBzdW0oaXMubmEoeCkpKSwgICAgICAgICAgICAgICAgIyA177iP4oOjIENvbHVtbjogQ291bnQgb2YgbWlzc2luZyB2YWx1ZXMNCiAgVG90YWxfUm93cyA9IG5yb3coUFIpICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIDbvuI/ig6MgQ29sdW1uOiBUb3RhbCByb3dzIGluIHRoZSBkYXRhc2V0DQopDQpgYGANCg0KDQojIyMjIDTvuI/ig6MgRXhwb3J0IGFzIENTVg0KU2F2ZSB5b3VyIGRhdGEgZGljdGlvbmFyeSB0byBhIENTViBmaWxlIGZvciBlYXN5IHNoYXJpbmcuDQoNCmBgYHtyfQ0Kd3JpdGUuY3N2KGRkLlBSLCAiUFJfZGF0YV9kaWN0aW9uYXJ5LmNzdiIsIHJvdy5uYW1lcyA9IEZBTFNFKQ0KYGBgDQoNCg0KIyMjIFdoeSBVc2UgYSBEYXRhIERpY3Rpb25hcnk/DQotIGBRdWljayBBbmFseXNpc2A6IFVuZGVyc3RhbmQgdmFyaWFibGVzIGFuZCB2YWx1ZXMgYXQgYSBnbGFuY2UuDQotIGBUZWFtIENvbGxhYm9yYXRpb25gOiBTaGFyZSBkYXRhc2V0IHN0cnVjdHVyZSB3aXRoIG90aGVycyBlYXNpbHkuDQotIGBEYXRhIFF1YWxpdHkgQ2hlY2tgOiBWaWV3IG1pc3NpbmcgdmFsdWVzIGFuZCB1bmlxdWUgZW50cmllcyB0byBhc3Nlc3MgZGF0YSBxdWFsaXR5Lg0KDQoNCiMjIyBDdXN0b21pemUgZm9yIEFueSBCREhTIERhdGFzZXQNClRvIGNyZWF0ZSBhIGRhdGEgZGljdGlvbmFyeSBmb3IgYSBkaWZmZXJlbnQgZGF0YXNldCwgc2ltcGx5IHJlcGxhY2UgUFIgd2l0aCB0aGUgZGF0YXNldCBuYW1lIHlvdeKAmXJlIHVzaW5nIChlLmcuLCBLUiwgSVIsIFZBKS4NCg==