MDG Spring Health R&D Assignment

Load Packages

library(tidyverse)
library(ggplot2)
library(ggpubr)

Data import

data <- read.csv("Data/spring_health_take_home_df.csv")

Q1 How many individuals used our platform?

users_sum<-data%>%
  summarise(sum_users=n_distinct(member_id_hashed))

users_sum

##   sum_users
## 1      1166

The number of unique individuals using the the platform is 1166.

Q2 What is the average number of times that a member interacts with the platform?

users_av_time<-data%>%
  count(member_id_hashed)%>%
  summarize(mean_time=mean(n))

users_av_time

##   mean_time
## 1  2.803602

The average number of time a user interacts with the platform is 2.8 times.

Q3 What is the distribution of baseline PHQ9 total scores for members on the platform.

# Q3a Please include a visualization
# Q3b Please calculate summary statistics

scores<-data%>%
  filter(questionnaire_kind=="PHQ9")%>%
  mutate_at(c("PHQ9_score"), as.numeric)%>% 
  arrange(member_id_hashed,assessment_created_at)%>%
  group_by(member_id_hashed)%>%
  summarise(number_assestments=n(),
            first_score = dplyr::first(PHQ9_score))

gghistogram(scores, x = "first_score", add = "mean",main='Baseline PHQ9 Score',xlab='PHQ9 Baseline Score')

summary(scores$first_score)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   4.000   7.000   8.018  11.000  26.000

Q4 What is the average change in total PHQ9 score for members using the platform

I assumed the change is the difference between the last and the first PHQ9 _score questionnaire. I ignored members who only completed 1 questionnaire.

scores_change<-data%>%
  filter(questionnaire_kind=="PHQ9")%>%
  mutate_at(c("PHQ9_score"), as.numeric)%>% 
  arrange(member_id_hashed,assessment_created_at)%>%
  group_by(member_id_hashed)%>%
  summarise(number_assestments=n(),
            first = dplyr::first(PHQ9_score),
            last = dplyr::last(PHQ9_score), 
            change=last-first)%>%
  filter(number_assestments>1)%>%
  summarize(mean_score_change=mean(change))

scores_change

## # A tibble: 1 × 1
##   mean_score_change
##               <dbl>
## 1            -0.902

Members interacting with the platform are seeing their PHQ9 scores decrease by an average of 0.902, which indicates their depression levels are decreasing.

Q5 What is the average change in total PHQ9 score for depressed individuals using the platform?

I assumed members who indicate TRUE for PHQ9_positive are depressed.

depressed<-data%>%
  filter(PHQ9_positive==TRUE)%>%
  filter(questionnaire_kind=="PHQ9")%>%
  mutate_at(c("PHQ9_score"), as.numeric)%>% 
  arrange(member_id_hashed,assessment_created_at)%>%
  group_by(member_id_hashed)%>%
  summarise(number_assestments=n(),
            first = dplyr::first(PHQ9_score),
            last = dplyr::last(PHQ9_score), 
            change=last-first)%>%
  filter(number_assestments>1)%>%
  summarize(mean_score_change=mean(change))

depressed

## # A tibble: 1 × 1
##   mean_score_change
##               <dbl>
## 1            -0.353

Members interacting with the platform who are depressed are seeing their PHQ9 scores decrease by an average of 0.353, which indicates their depression levels are decreasing.

Q6 What is the average change in total workplace productivity (SDS_days_unproductive ) for members interacting with the platform?

I’m assuming the change in workplace productivity should be the difference between the first from the last assessment, which tells us the number of productive hours that were previously unproductive hours.

prod<-data%>%
  filter(questionnaire_kind=="SDS")%>%
  filter(SDS_days_unproductive !="N/A")%>%
  mutate_at(c("SDS_days_unproductive"), as.numeric)%>%
  arrange(member_id_hashed,assessment_created_at)%>%
  group_by(member_id_hashed)%>%
  summarise(number_assestments=n(),
            first = dplyr::first(SDS_days_unproductive),
            last = dplyr::last(SDS_days_unproductive), 
            change=last-first)%>%
  filter(number_assestments>1)%>%
  summarize(mean_score_change=mean(change))

prod

## # A tibble: 1 × 1
##   mean_score_change
##               <dbl>
## 1           -0.0947

Users are increasing their productive work hours by an average of 0.0947 hours.

Q7 A core goal of treating depression is to improve function (e.g. SDS_days_unproductive) as well as symptoms (e.g. PHQ9_score).

# Q7a Please explore and explain the relationship between symptomatic improvement and functional improvement amongst members who interacted with the Spring platform.
# Q7b Do you think that members benefit from interacting with the Spring platform? Why?

#Data format
prod_2<-data%>%
  filter(questionnaire_kind=="SDS")%>%
  filter(SDS_days_unproductive !="N/A")%>%
  mutate_at(c("SDS_days_unproductive"), as.numeric)%>%
  arrange(member_id_hashed,assessment_created_at)%>%
  group_by(member_id_hashed)%>%
  summarise(number_assestments=n(),
            first_SDS = dplyr::first(SDS_days_unproductive),
            last_SDS = dplyr::last(SDS_days_unproductive), 
            change_SDS=last_SDS-first_SDS)%>%
  filter(number_assestments>1)%>%
  dplyr::select(-c(number_assestments))

scores_change_2<-data%>%
  filter(questionnaire_kind=="PHQ9")%>%
  mutate_at(c("PHQ9_score"), as.numeric)%>% 
  arrange(member_id_hashed,assessment_created_at)%>%
  group_by(member_id_hashed)%>%
  summarise(number_assestments=n(),
            first_PHQ9 = dplyr::first(PHQ9_score),
            last_PHQ9 = dplyr::last(PHQ9_score), 
            change_PHQ9=last_PHQ9-first_PHQ9)%>%
  filter(number_assestments>1)%>%
  dplyr::select(-c(number_assestments))

#Data join
new_df<-scores_change_2%>%left_join(prod_2, by="member_id_hashed")%>%drop_na()

#Linear Model
lm<-lm(change_SDS~change_PHQ9, data=new_df)
summary(lm)

## 
## Call:
## lm(formula = change_SDS ~ change_PHQ9, data = new_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.3935 -0.8575  0.1471  0.9848  4.9940 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.06598    0.12118  -0.544    0.587    
## change_PHQ9  0.15314    0.02349   6.519 5.22e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.725 on 209 degrees of freedom
## Multiple R-squared:  0.169,  Adjusted R-squared:  0.165 
## F-statistic:  42.5 on 1 and 209 DF,  p-value: 5.221e-10

#Plotting
new_df%>%ggplot(aes(x=change_PHQ9,y=change_SDS))+
  geom_point()+
  geom_smooth(method="lm")+
  annotate("text", x = 11, y = 6, label = "R^2 == 0.165", parse = TRUE) +
  annotate("text", x = 11, y = 5, label = "p-value == 5.221e-10", parse = TRUE) +
  xlab("Change in PHQ9 scores")+ylab("Change in Number of Days Unproductive")+
  theme_bw()+ theme(panel.grid.major = element_blank(),
                    panel.grid.minor = element_blank(),
                    panel.border = element_rect(colour = "black"))+theme(legend.position = "none")

## `geom_smooth()` using formula 'y ~ x'

There is a positive correlation between the number of unproductive days and the change in PHQ9 scores, which indicates the platform is having a positive effect. By interacting with the platform users are increasing the number of productive days and their PHQ9 scores have decreased, which I’m assuming means they have decreased depression. Overall,the Spring Health platform appears to have positive effects on productivity and reduction in a persons depression levels.

There maybe some external factors at play that are not currently accounted for with this data and with the model assumptions. Individual users may be having more productive days and feeling less depressed due to external factors in their life or outside the effects this assessment. Future modeling efforts should explore the roles of these factors to better determine Spring Health’s effects on member productivity and depression.