library(tidyverse)
library(ggplot2)
library(ggpubr)
data <- read.csv("Data/spring_health_take_home_df.csv")
users_sum<-data%>%
summarise(sum_users=n_distinct(member_id_hashed))
users_sum
## sum_users
## 1 1166
The number of unique individuals using the the platform is 1166.
users_av_time<-data%>%
count(member_id_hashed)%>%
summarize(mean_time=mean(n))
users_av_time
## mean_time
## 1 2.803602
The average number of time a user interacts with the platform is 2.8 times.
# Q3a Please include a visualization
# Q3b Please calculate summary statistics
scores<-data%>%
filter(questionnaire_kind=="PHQ9")%>%
mutate_at(c("PHQ9_score"), as.numeric)%>%
arrange(member_id_hashed,assessment_created_at)%>%
group_by(member_id_hashed)%>%
summarise(number_assestments=n(),
first_score = dplyr::first(PHQ9_score))
gghistogram(scores, x = "first_score", add = "mean",main='Baseline PHQ9 Score',xlab='PHQ9 Baseline Score')
summary(scores$first_score)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 4.000 7.000 8.018 11.000 26.000
I assumed the change is the difference between the last and the first PHQ9 _score questionnaire. I ignored members who only completed 1 questionnaire.
scores_change<-data%>%
filter(questionnaire_kind=="PHQ9")%>%
mutate_at(c("PHQ9_score"), as.numeric)%>%
arrange(member_id_hashed,assessment_created_at)%>%
group_by(member_id_hashed)%>%
summarise(number_assestments=n(),
first = dplyr::first(PHQ9_score),
last = dplyr::last(PHQ9_score),
change=last-first)%>%
filter(number_assestments>1)%>%
summarize(mean_score_change=mean(change))
scores_change
## # A tibble: 1 × 1
## mean_score_change
## <dbl>
## 1 -0.902
Members interacting with the platform are seeing their PHQ9 scores decrease by an average of 0.902, which indicates their depression levels are decreasing.
I assumed members who indicate TRUE for PHQ9_positive are depressed.
depressed<-data%>%
filter(PHQ9_positive==TRUE)%>%
filter(questionnaire_kind=="PHQ9")%>%
mutate_at(c("PHQ9_score"), as.numeric)%>%
arrange(member_id_hashed,assessment_created_at)%>%
group_by(member_id_hashed)%>%
summarise(number_assestments=n(),
first = dplyr::first(PHQ9_score),
last = dplyr::last(PHQ9_score),
change=last-first)%>%
filter(number_assestments>1)%>%
summarize(mean_score_change=mean(change))
depressed
## # A tibble: 1 × 1
## mean_score_change
## <dbl>
## 1 -0.353
Members interacting with the platform who are depressed are seeing their PHQ9 scores decrease by an average of 0.353, which indicates their depression levels are decreasing.
I’m assuming the change in workplace productivity should be the difference between the first from the last assessment, which tells us the number of productive hours that were previously unproductive hours.
prod<-data%>%
filter(questionnaire_kind=="SDS")%>%
filter(SDS_days_unproductive !="N/A")%>%
mutate_at(c("SDS_days_unproductive"), as.numeric)%>%
arrange(member_id_hashed,assessment_created_at)%>%
group_by(member_id_hashed)%>%
summarise(number_assestments=n(),
first = dplyr::first(SDS_days_unproductive),
last = dplyr::last(SDS_days_unproductive),
change=last-first)%>%
filter(number_assestments>1)%>%
summarize(mean_score_change=mean(change))
prod
## # A tibble: 1 × 1
## mean_score_change
## <dbl>
## 1 -0.0947
Users are increasing their productive work hours by an average of 0.0947 hours.
# Q7a Please explore and explain the relationship between symptomatic improvement and functional improvement amongst members who interacted with the Spring platform.
# Q7b Do you think that members benefit from interacting with the Spring platform? Why?
#Data format
prod_2<-data%>%
filter(questionnaire_kind=="SDS")%>%
filter(SDS_days_unproductive !="N/A")%>%
mutate_at(c("SDS_days_unproductive"), as.numeric)%>%
arrange(member_id_hashed,assessment_created_at)%>%
group_by(member_id_hashed)%>%
summarise(number_assestments=n(),
first_SDS = dplyr::first(SDS_days_unproductive),
last_SDS = dplyr::last(SDS_days_unproductive),
change_SDS=last_SDS-first_SDS)%>%
filter(number_assestments>1)%>%
dplyr::select(-c(number_assestments))
scores_change_2<-data%>%
filter(questionnaire_kind=="PHQ9")%>%
mutate_at(c("PHQ9_score"), as.numeric)%>%
arrange(member_id_hashed,assessment_created_at)%>%
group_by(member_id_hashed)%>%
summarise(number_assestments=n(),
first_PHQ9 = dplyr::first(PHQ9_score),
last_PHQ9 = dplyr::last(PHQ9_score),
change_PHQ9=last_PHQ9-first_PHQ9)%>%
filter(number_assestments>1)%>%
dplyr::select(-c(number_assestments))
#Data join
new_df<-scores_change_2%>%left_join(prod_2, by="member_id_hashed")%>%drop_na()
#Linear Model
lm<-lm(change_SDS~change_PHQ9, data=new_df)
summary(lm)
##
## Call:
## lm(formula = change_SDS ~ change_PHQ9, data = new_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.3935 -0.8575 0.1471 0.9848 4.9940
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.06598 0.12118 -0.544 0.587
## change_PHQ9 0.15314 0.02349 6.519 5.22e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.725 on 209 degrees of freedom
## Multiple R-squared: 0.169, Adjusted R-squared: 0.165
## F-statistic: 42.5 on 1 and 209 DF, p-value: 5.221e-10
#Plotting
new_df%>%ggplot(aes(x=change_PHQ9,y=change_SDS))+
geom_point()+
geom_smooth(method="lm")+
annotate("text", x = 11, y = 6, label = "R^2 == 0.165", parse = TRUE) +
annotate("text", x = 11, y = 5, label = "p-value == 5.221e-10", parse = TRUE) +
xlab("Change in PHQ9 scores")+ylab("Change in Number of Days Unproductive")+
theme_bw()+ theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black"))+theme(legend.position = "none")
## `geom_smooth()` using formula 'y ~ x'
There is a positive correlation between the number of unproductive days and the change in PHQ9 scores, which indicates the platform is having a positive effect. By interacting with the platform users are increasing the number of productive days and their PHQ9 scores have decreased, which I’m assuming means they have decreased depression. Overall,the Spring Health platform appears to have positive effects on productivity and reduction in a persons depression levels.
There maybe some external factors at play that are not currently accounted for with this data and with the model assumptions. Individual users may be having more productive days and feeling less depressed due to external factors in their life or outside the effects this assessment. Future modeling efforts should explore the roles of these factors to better determine Spring Health’s effects on member productivity and depression.