This document summarizes the methods and results of the wellness study at the Duke neurosurgery department.
We grouped the data for the following analyses into weeks. The time periord is from April 30, 2018 to Oct 8, 2018. The data appear different from April 30 to June 4th, possibly the pilot period. Thus for all analyses the study periord is from the weeks June 11th through Oct 8th.
There were four questions which appear one at a time when opening the wellness mobile application. The user can respond to the question or choose to skip. The questions were:
Through my work, I feel that I have a positive influence on people.
I accomplish many worthwhile things in this job.
I have become more insensitive to people since I’ve been working.
Working with people all day long requires a great deal of effort.
The possible answers range from zero to seven. Zero, for all questions, means skip question. For questions one and two, a score of seven is the most positive, and a score of one is the most negative. For questions three and four, a score of one is most positive and a score of seven is most negative.
Please note, for some of the analyses the scale for questions three and four was inverted (i.e. a score of one was switch to seven, a score of two was switch to six, etc etc) to facilitate the pooling of data. This inversion aligned the direction of positive and negative for all four questions.
One thing to think about as you look at the results. Do we want to invert the scale for questions three and four (i.e. 1 = 7, 2 = 6)? The benefit is the scale for three and four now are in the same direction. Thus we can group all questions, or average all questions, to get an overall average score. The downside is that this is different than how the scale was administered. Also think about what will be easier to explain at the conference, “remember low scores for questions three and four are good” OR “we flipped the scale for three and four so higher scores are better for all questions.”
data_w_na <- data5
data_w_na$answer <- car::recode(data5$answer, "'0' = 'NA'")
There were 31 unique end users of the application over the study period, 19 were residents and 12 were attendings (Table 1). Overall, residents answered more questions than attendings. Question one was the most answered questions (82 responses), followed by questions two (76 responses), question three (66 responses) and question four (62 responses). The most commonly skipped questions follows a reverse pattern. Questions four was skipped 48% of the time, followed by question three (47%), question two (42%), question one (41%). The average score was calculated for each question. Please note the scale for questions three and four are inverted, higher scores for all four questions indicated better wellness. Attendings scored higher than residents on all four questions.
| Variable | All | Residents | Attendings |
|---|---|---|---|
| unique_users | 31 | 19 | 12 |
| Question 1 | |||
| Q1 Complete n(%) | 82 ( 0.59) | 64 ( 0.55) | 18 ( 0.75) |
| Q1 Skipped n(%) | 58 ( 0.41) | 52 ( 0.45) | 6 ( 0.25) |
| Average Score Q1 u(sd) | 5.63 ( 3.07) | 5.44 ( 3.04) | 6.33 ( 2.89) |
| Question 2 | |||
| Q2 Complete n(%) | 76 ( 0.58) | 63 ( 0.57) | 13 ( 0.68) |
| Q2 Skipped n(%) | 54 ( 0.42) | 48 ( 0.43) | 6 ( 0.32) |
| Average Score Q2 u(sd) | 6.24 ( 3.2 ) | 6.29 ( 3.2 ) | 6 ( 3.21) |
| Question 3 | |||
| Q3 Complete n(%) | 66 ( 0.53) | 56 ( 0.51) | 10 ( 0.67) |
| Q3 Skipped n(%) | 59 ( 0.47) | 54 ( 0.49) | 5 ( 0.33) |
| Average Score Q3 u(sd) | 4.14 ( 2.48) | 4 ( 2.4 ) | 4.9 ( 2.91) |
| Question 4 | |||
| Q4 Complete n(%) | 62 ( 0.52) | 53 ( 0.51) | 9 ( 0.6 ) |
| Q4 Skipped n(%) | 57 ( 0.48) | 51 ( 0.49) | 6 ( 0.4 ) |
| Average Score Q4 u(sd) | 4.05 ( 2.48) | 3.85 ( 2.38) | 5.22 ( 2.97) |
The following figures trend the average score for each question, and for the overall question average, for the entire study period. We obtained the values to plot by finding the weekly averages. Please note the scale for questions three and four are inverted, higher scores for all four questions indicated better wellness.
The average score for all questions combined (left most plot) shows a consistent horizontal trend. Question one and question three have positive slopes. Initially, question four had a negative slope which became positive starting in August.
The residents over the course of the study had a positive slope. This trend can be appreciated when viewing plots for question one, three and four. Question two scores were largely constant throughout the study period.
The small number of data points, particularly when investigating individual questions, limited the analyses for the attendings. When focusing on the average score for all questions combines, we see a slight negative slope. FInally, we see the curves are mostly higher for attendings than residents. This finding is further illustrated in the next plot.
The following is a density plot comparing the responses provided between residents and attendings. The density, or frequency of responses, aggregate at a higher score for attendings (gray) than for residents (blue).
The final figures comparing attendings and residents are the boxplots below. These boxplots provide a side by side comparison between residents and attendings by question type. The pattern of responses are nearly identical for questions one and two. Attendings tend to score higher on questions three and four. Please note the scale for questions three and four are inverted, higher scores for all four questions indicated better wellness.
This boxplot is the exact same as the one above except for the addition of color and dots illustrating individual participant scores.
To determine if residents and attendings had significantly different result, we performed two seperate two tailed t-test. The first ttest tested if there was a difference between the two groups with responses from question one and two grouped together (row 1). There was no significant difference The second ttest investigated if there was a difference between residents and attendings with questions three and four grouped together (row 2). There was a significant difference between residents and attendings when looking at question three and four together. We grouped questions three and four together for two reasons: 1) to increase sample size and 2) questions three and four behave differently than one and two. This difference, lower scores for three and four, can be appreciated from the boxplot. Finally, we compared responses from residents and attendings with question one through four grouped together. We again see attending’s scoring significantly higher than residents.
| Question | Attendings mean | Residents mean | Conf Interv for difference | p -value |
|---|---|---|---|---|
| Q1 and Q2 together | 6.19 | 5.86 | (-0.21, 0.88) | 0.22 |
| Q3 and Q4 together | 5.05 | 3.93 | (0.15, 2.10) | 0.025 |
| Q1 - Q4 together | 5.76 | 4.97 | (0.27, 1.32) | 0.034 |
Next, we investigated the proportion of questions skipped by particpants over the course of the study. We see a clear increase in non-responses by participants.
The following figure provides a more detailed view of the non-response trend. The leftmost figure is the same as the figure above, non-responses for the entire study population. The middle and right figure show the non-response trends for attendings and residents respectively. We see the non-response rate increasing rapidly for attendings while remaining constant for residents.
We were interested if those who more commonly skipped a questions were more likely provide lower scores of wellness. This figure plots a particpants skipped question proportion for the entire study on the x-axis and the average score provided for all questions they responded to during the study. We see a slight negative correlation in-line with out hypothesis.
We performed a two tailed ttest to determine if those who often skipped questions (high non-responders) scored differently that those who did not skip many questions (low non-responders). We set the cut-off for high and low non-responders at .23 or 23% (the median non-response rate). High non-responders were those skipping questions greater than 23% of the time. Low non-responders were those skipping questions less than 23% of the time. The ttest results are below. High non-responders had higher response scores than low non-responders. This findings was not significant.
| test | High non-responder mean score | Low non-responder mean score | Conf Interv for difference | p -value |
|---|---|---|---|---|
| two-tailed t test | 5.38 | 4.98 | (-0.09, 0.87) | 0.11 |