Using R to Extract, Organize, & Analyze Complex SDOH Data

Prasad Bhoite, MS, MPH, MBA

Ingrid Gonzalez, MPH

Outline

  • Background
    • NeighborhoodHELP (NHELP)
    • NHELP Partnerships
  • Management of Data
    • End-to-End Process
  • Social Stability Impact Metric
    • Data Preparation Challenges
    • Data Analysis

Background

NeighborhoodHELP

  • Neighborhood Health Education Learning Program
  • Community outreach program
    • Outreach
    • Clinical
    • Curricular

NeighborhoodHELP

  • Provide household-centered care
    • Definition: the identification and management of the social needs within a given household
      • Employment
      • Housing
      • Healthcare Access
      • Transportation
      • Education

NeighborhoodHELP Partnership

  • FIU NHELP and Baptist Health South Florida Partnership
    • Implementation of NHELP in South Miami and West Kendall neighborhood
    • Received Funding
  • Collaboration with FIU-Department of Biostatistics for evaluation
    • Quarterly Reporting
    • Includes Impact Metrics

Impact Metrics

  • Social Stability Metric
  • Emergency Department (ED) Utilization
  • Health Stability
  • Connection to Primary Care Provider (PCP)

Management of Data

Data Collection

Data Collection

Portal

Health Risk Profile (HRP)

Database

Data Extraction

Data Extraction

# Import the required libraries
library(odbc)        # For Connecting to the database

# Create a Database Connection
# Note you need to set up DNS configuration before you run the following code
con <- DBI::dbConnect(odbc::odbc(), "**********")


# Read the data tables from MS SQL Server
# HRP Tables

HRP_Household <- DBI::dbGetQuery(con, "select * from Household")
glimpse(HRP_Household)

HRP_Member <- DBI::dbGetQuery(con, "select * from Member")
glimpse(HRP_Member)

HRP_Encounter <- DBI::dbGetQuery(con, "select * from Encounter")
glimpse(HRP_Encounter)

HRP_Category <- DBI::dbGetQuery(con, "select * from Service")
glimpse(HRP_Service)

HRP_Need <- DBI::dbGetQuery(con, "select * from Need")
glimpse(HRP_Need)

HRP_outcome <- DBI::dbGetQuery(con, "select * from Outcome")
glimpse(HRP_Outcome)

Data Storage

Data Analysis

Social Stability Metric

Social Stability Impact Metric


Data Preparation Challenges

  • Nested data


Data Preparation Challenges

  • Multiple data sources & data tables
#### Merging Multiple datasets ########################################

## Creating HRP Dataframe 
HRP_Merged_Data <- HRP_CarePlanEntry %>%

# Merging the Category table
  left_join(
    select(HRP_Category, CategoryId, CategoryName),
    by = c("CategoryId" = "CategoryId")) %>%

# Merging the Need table
  left_join(
    select(HRP_Need, NeedId, NeedName),
    by = c("UrgentNeedId" = "NeedId")) %>%

# Merging the Service table
  left_join(
    select(HRP_Service, ServiceId, ServiceName),
    by = c("ServiceId" = "ServiceId")) %>%

# Merging Another HRP table
  left_join(
    select(TableName, Column1, Column2, Column3),
    by = c("Primary_Key_X" = "Primary_Key_Y")) %>%

# Merging Another HRP table
  left_join(
    select(TableName, Column1, Column2, Column3),
    by = c("Primary_Key_X" = "Primary_Key_Y"))


## Creating Portal Dataset
Portal_Participating_HH <- Portal_HH_Status_History %>%
  group_by(HH_ID) %>%
  mutate(end_date = dplyr::lead(as.Date(Status_Date)-1, n = 1, default = NA)) %>% 
  mutate(end_date = ifelse(is.na(end_date), "2099-12-31", as.character(end_date))) %>%
  filter(Status %in% c('Active') & 
           (end_date >= Start_Date_Reporting_Period) & 
           (Status_Date < A_Day_After_End_Date_Reporting_Period)) %>%
  distinct(HH_ID, .keep_all = TRUE) %>% 
  mutate(Participating_HH = "Participating_HH")



## Merging HRP and Portal
HRP_Portal_Merged_Dataset <- HRP_Merged_Data %>%
  left_join(select(Portal_Participating_HH, HH_ID, Participating_HH),
            by = c("HH_ID" = "HH_ID")) %>%
  filter(Participating_HH %in% c("Participating_HH")) 

Data Preparation Challenges

  • Identifying households that are currently active
    • Using the Window Function - lead()

# Introducing the Window Function - Lead
Portal_Participating_HH <- Portal_HH_Status_History %>%
  group_by(HH_ID) %>%
  mutate(end_date = dplyr::lead(as.Date(Status_Date)-1, n = 1, default = NA)) %>% 
  mutate(end_date = ifelse(is.na(end_date), "2099-12-31", as.character(end_date))) %>%
  filter(Status %in% c('Active') & 
           (end_date >= Start_Date_Reporting_Period) & 
           (status_date < A_Day_After_End_Date_Reporting_Period)) %>%
  distinct(HH_ID, .keep_all = TRUE) %>% 
  mutate(Participating_HH = "Participating_HH")

Data Analysis

  • The % of HHMs reporting a social need were offered a service
    • # of household members reporting a social need
    • # of household members that were offered a service
  • The % of social needs that were successfully met.
    • Total # of social needs reported
    • Total # of social needs being successfully met

Results

#### Social Needs #############################################################

## Create a new file for future matching -- Denominator //
## Subset of all the unique patients within given time frame
SocialNeed_Denominator <- NHELP_URFO_Q2 %>% 
  filter(! NeedName %in% c("No needs identified", "NULL", NA)) %>% 
  distinct(MemberID, CategoryId, UrgentNeedId, .keep_all = TRUE) %>%
  mutate(DummyVariable_Baseline_Need = as.numeric(paste0(MemberID, CategoryId, UrgentNeedId)))

## Total number of unique patients that reported a social need 
## within the given time frame
Number_Unique_Members_w_SocialNeed <-SocialNeed_Denominator %>% 
  summarise(Number_Unique_Members_w_SocialNeed = n_distinct(MemberID))

## Total number of unique needs within given time frame
Number_Unique_SocialNeeds <- SocialNeed_Denominator %>% 
  summarise(Number_SocialNeeds = n())


#### Services Provided #######################################################

## Create a new file for future matching -- Denominator //
## Subset of all the services offered for the social needs reported
Services <- NHELP_URFO_Q2 %>%
  filter(!ServiceName %in% c("NULL")) %>%
  mutate(DummyVariable_Baseline_Services = as.numeric(paste0(MemberID, CategoryId, UrgentNeedId))) 

## Total number of unique members that received services for social needs 
## reported within the given time frame
Number_Unique_Members_that_Received_Services_for_SocialNeed <- Services %>% 
  filter(!NeedName %in% c("No needs identified", "NULL", NA)) %>% 
  summarise(Number_Unique_Members_that_Received_Services_for_SocialNeed = n_distinct(MemberID)) 

## Total number of services provided for the social needs reported within the
## given time frame
Number_Services_Provided_for_SocialNeed <- SocialNeed_Denominator %>% 
  left_join(select(Services, DummyVariable_Baseline_Services, ServiceName), 
            by = c("DummyVariable_Baseline_Need" = "DummyVariable_Baseline_Services")) %>%
  filter(!ServiceName.y %in% c(NA, "NULL")) %>% 
  distinct(DummyVariable_Baseline_Need, .keep_all = TRUE) %>%
  summarise(Number_Services_Provided_for_SocialNeed = n())


#### Need Met/ Outcome  #######################################################

## Let's check what are the different unique options in the data set for need met
NHELP_URFO_Q2 %>% 
  distinct(OutcomeName)

## Create a new file for future matching -- Denominator
## Subset of all the needs that were met within the given time frame
Outcome <- NHELP_URFO_Q2 %>%
  filter((OutcomeDate >= "2022-04-01") & (OutcomeDate <= "2022-06-30"), 
         OutcomeName %in% c("Service received",
                            "Seen a doctor/Need met",
                            "Application approved/Need MET",
                            "Need resolved due to other factors:")) %>% 
  mutate(DummyVariable_Baseline_Outcome = as.numeric(paste0(MemberID, CategoryId, UrgentNeedId)))

## Total number of unique members that Met the need for social needs 
## within the given time frame
Number_Unique_Members_that_met_at_least_1_SocialNeed <- Outcome %>% 
  filter(!NeedName %in% c("No needs identified", "NULL", NA)) %>% 
  summarise(Number_Unique_Members_that_met_at_least_1_SocialNeed = n_distinct(MemberID))

## Number of social needs met among the social need reported within the given
## time frame
Number_of_SocialNeeds_Met_among_socialneed_reported <- SocialNeed_Denominator %>% 
  left_join(select(Outcome, DummyVariable_Baseline_Outcome, OutcomeName), 
            by = c("DummyVariable_Baseline_Need" = "DummyVariable_Baseline_Outcome")) %>% 
  filter(!OutcomeName.y %in% c(NA, "NULL")) %>% 
  distinct(DummyVariable_Baseline_Need, .keep_all = TRUE) %>%
  summarise(Number_of_SocialNeeds_Met_among_socialneed_reported = n())

Results


% of HHMs reporting a social need were offered a service 97.4%
# of household members reporting a social need 39
# of household members that were offered a service 38


% of social needs that were successfully met 46.3%
Total # of social needs reported 54
Total # of social needs being successfully met 25

Closing Remarks

  • Took lots of efforts to create analysis data frame due to nested and complex data from multiple tables
  • R can connect to the databases using ODBC connectors, so use it!
  • Some SDOH needs are complex and can take months/years to resolve (met).
    • For instance, need for higher education, immigration issues, housing.

Acknowledgements

  • Dr. David Brown, MD
  • Dr. Rachel Clarke, PhD
  • Dr. Zoran Bursac, PhD, MPH
  • Ms. Emmet Kiliddjian
  • Baptist Health South Florida

Thank you!