Introduction

Background:

Whoop is a fitness tracker that took the market by storm. The more prominent fitness wearable technlogies are Fitbit, Apple Watch, Garmin, etc. While these products are great in their own way, Whoop’s mindset was simple : provide technology and data that allows people to learn about their body and improve every day. As a whoop user, I have been fascinated about the data they provide and am very excited to dig into the health metrics at hand.

Context:

Individuals who use this device have interest in bettering their health, and this begins with waking up every morning, ready to attack the day. During this analysis, I will dive into one of Whoop’s core metrics , Recovery. Recovery is a score, 1-100, that whoop provides. How is this metric found? We do not know the answer at hand but the best guess is some combination of features that are used in a model to accurately predict and designate a recovery score.

Study Focus:

Research has stated that sleep is important in developing good health and waking up feeling refreshed, but how true is this? Whoop has gone through extensive studies and research developing this product and they believe Recoverry is the metric to increase. I will work to find out if sleep is the most important metric in regards to Recovery. In the process, I will find what variables can be used to enhance and predict recovery. Whoop provides many metrics, however for this analysis I will focus on these as the main feature set when implementing a model / analyzing for any significance:

Data

Independent Variables:
1. Sleep Efficiency : Balance between different stages of sleep (Light, REM, DEEP).
2. Hours of Sleep : Hours , Minutes asleep
3. Sleep Consistency : Are you sleeping and waking up at the same times every day and night?
4. Heart Rate Variability : Time in between off heart beats
5. Resting Heart Rate : Heart rate at a still or semi still state.
Dependent Variable:
1. Recovery Score : 0-100 scale of how ready your body is to take on additional strain in the day.

My goal is to find relationships between these variables and also potentially implement a model that predicts recovery “the best”.

The data can be found on a spreadsheet here : https://docs.google.com/spreadsheets/d/101VFdWUNabrT6nu9_EokXFK-Pq4pQLxUbUiooeK5H8M/export?format=csv&gid=1713221660

After reading data, I removed the rows with any missing values and selected the columns of interest, which will be seen below in a Data Frame form.

## Rows: 481 Columns: 42
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (2): Day of Week, timezone_offset
## dbl  (36): User ID, cycle_id, RHR, HRV, Recovery, Sleep Score, Hours in Bed,...
## date  (1): Date
## time  (3): Sleep Onset, Wake Onset, Last Nap End Time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 475 × 6
##      RHR SleepConsistency HoursOfSleep   HRV SleepEfficiency Recovery
##    <dbl>            <dbl>        <dbl> <dbl>           <dbl>    <dbl>
##  1    50               76         6.94   106              93       68
##  2    51               84         7.43    78              87       52
##  3    45               86         7.11   110              81       78
##  4    57               87         6.81    72              75       36
##  5    48               84         6.82   104              87       88
##  6    49               80         7.39    89              87       65
##  7    50               70         6.97   111              80       91
##  8    52               70         4.8     74              77       36
##  9    53               67         7.66    95              86       67
## 10    49               53         6.33   116              83       91
## # … with 465 more rows

Analysis

The main parts of this testing process will include analyzing what the recovery distribution looks like. I do not want to assume normality, and will plot to back results.

As seen, there is a semi-normal bell curve. The reason behind the skewness to the right (higher recovery) would be based on the individual. Since this is my data, if I took care of other metrics and paid attention, this would be a perfectly find chart. I can assume normality and continue.

The next step is to find any correlation between these metrics. Below will display that information.

## 
## The downloaded binary packages are in
##  /var/folders/23/3bv811b137g7_g_fk7s1y0_80000gn/T//RtmpGXz2bn/downloaded_packages

We do see a very strong correlation between HRV and Recovery, and a slight correlation between Sleep Efficiency and _Hours of Sleep__ , some negative (but strong) examples are RHR to a lower recovery and HRV which aligns with the studies. Usually Heart Rate Variability and Resting Heart Rate work in opposites, when one trends higher, the other lower.

I will now test different models to find the most accurate in terms of accuracy when predicting the recovery. Since this data is numerical and not categorical, Regression models will work best compared to Naive Bayes and K-Nearest-Neighbors. Those models work great with categorical data or binary results however in this situation, we want to get the recovery score out of a 1-100 scale.

The second will be as follows. I am going to shift the recovery score. This is how the score is displayed on the web dashboard for the company, but it is split into a Green,Yellow,Red Recovery. I am going to allow 1-33 to be red, 34-66 to be yellow, and 67 to 100 to be green. I will do this in a 0,1,2 format respectively.

This data for recovery, as stated, is a 1-100 scale so the values in that target column vary. I am going to run two different models and find the multiclass.ROC and the area under the curve. First will be with the whole data set of recovery and fine tuning that.

Logistic Regression with Original Recovery Scores

## [1] "Full Model with original Recovery data"

## 
## Call:
## lm(formula = Recovery ~ HRV + RHR + SleepConsistency + SleepEfficiency + 
##     HoursOfSleep, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.726  -6.352  -0.576   6.696  33.069 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -6.8241356 15.9321066  -0.428 0.668687    
## HRV               0.9817616  0.0380533  25.800  < 2e-16 ***
## RHR              -0.4924093  0.1821563  -2.703 0.007214 ** 
## SleepConsistency -0.0005756  0.0521227  -0.011 0.991195    
## SleepEfficiency  -0.0384792  0.1317360  -0.292 0.770395    
## HoursOfSleep      2.2654309  0.5825373   3.889 0.000121 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.872 on 338 degrees of freedom
## Multiple R-squared:  0.7588, Adjusted R-squared:  0.7552 
## F-statistic: 212.7 on 5 and 338 DF,  p-value: < 2.2e-16

There are a couple of notable metrics to mention with this summary.

Firstly - T Value: We see that the t-value, a metric to indicate unit change between variable and target (Recovery) occurs or exists. We see that HRV is the largest with 25.536, followed by Hours of Sleep with 3.769. In the negative direction, we see as RHR T-value is -2.669, and Sleep Consistency is -1.515, which is not that important.

Secondly - Pr(>|t): This value indicates significance in the variable, with relation to the target variable. If the value is less than .05, we can say that variable is significant, otherwise, we can remove from the model. Based off this summary, the variables that fall in this category are HRB, RHR, and Hours of Sleep.

Thirdly - Multiple R-Squared: This value is the residual error, which results in a 74.93 score. While this is not awful, it certainly can be improved so we will keep an eye on this.

## [1] "Error Rate"

## [1] 0.1624333

## [1] "AUC score, Area under Curve, and the closer to 100, the better the model"

## [1] 0.9410604

The next step is to run the model with only the significant variables and test for Residual Error and Error rate/AUC again.

## 
## Call:
## lm(formula = Recovery ~ HRV + RHR + HoursOfSleep, data = train_new)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.042  -6.192  -0.304   6.337  33.811 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.97858   11.11587   0.358 0.720639    
## HRV           0.97406    0.03597  27.080  < 2e-16 ***
## RHR          -0.72745    0.17641  -4.124 4.76e-05 ***
## HoursOfSleep  1.96212    0.55500   3.535 0.000467 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.231 on 321 degrees of freedom
## Multiple R-squared:  0.7936, Adjusted R-squared:  0.7917 
## F-statistic: 411.4 on 3 and 321 DF,  p-value: < 2.2e-16

## [1] "Error Rate"

## [1] 0.1518952

## [1] "AUC score, Area under Curve, and the closer to 100, the better the model"

## [1] 0.9090646

This new model did not differ much from the first, however the residual increased which is better, at roughly 76%. The error rate stayed the same

Based off this chart, we see that the relationship between expected and observed has a positive relationship, meaning there is a correlation, or in a sense, a linear relationship. This backs our model by indicating the accuracy levels and the fact that the predicted outcomes of each input set is close to the level of the actual observed score.

Also, the chart is split into 4 groups. To summarize, the groups are indicative if the recovery from the observed and expected sets are in the same range, meaning in the red, yellow, green (0,1,2) sets. Black means they are not in the same set and this could be due to a deviation that is minimal in value, however if the threshold is 66 for yellow, and two points are just 2 values apart, they could be plotted in two different regions but that difference is not signficant enough to worry at this point. The majority of the data points lie in the categories correctly.

The Second Model With Adjusted Recovery

df_copy = data.frame(df)
#Red
df_copy$Recovery[df_copy$Recovery <= 33] <- 0 
#Yellow
df_copy$Recovery[df_copy$Recovery > 33 & df_copy$Recovery <= 66 & df_copy$Recovery != 0] <- 1
#Green
df_copy$Recovery[df_copy$Recovery >= 67 & df_copy$Recovery!= 0 & df_copy$Recovery != 1] <- 2

## [1] "Adjusted Recovery data to 0,1,2 scale"

## [1] 0.9371142

The results do show that with the AUC of both models, the adjusted yields better results slightly. This is expected due to the nature of the data. With a logistic regression model, binary or multi-class classification results in better predictions. With variable data (numerical type) the goal is to minimize the residuals and a model can be graded on the lowest sum of residual error.

summary(modelB)

## 
## Call:
## lm(formula = Recovery ~ HRV + RHR + SleepConsistency + SleepEfficiency + 
##     HoursOfSleep, data = trainB)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03404 -0.25114 -0.00333  0.25667  1.13780 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -0.064944   0.645298  -0.101  0.91990    
## HRV               0.028674   0.001537  18.655  < 2e-16 ***
## RHR              -0.021711   0.007145  -3.039  0.00257 ** 
## SleepConsistency -0.002271   0.001937  -1.173  0.24185    
## SleepEfficiency  -0.001049   0.005948  -0.176  0.86016    
## HoursOfSleep      0.053979   0.021908   2.464  0.01427 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3801 on 321 degrees of freedom
## Multiple R-squared:  0.6356, Adjusted R-squared:  0.6299 
## F-statistic:   112 on 5 and 321 DF,  p-value: < 2.2e-16

print("AIC Score for Adjusted Recovery")

## [1] "AIC Score for Adjusted Recovery"

AIC(modelB)

## [1] 303.2264

AIC is much lower for the adjusted recovery, but that is expected. It is the simpler model with less options (Numerical Regression versus the classification of 3 classes). However the R value is much lower than full recovery model.

As seen above, the variables that had a p-value < 0.05 were HRV,RHR,and Hours of Sleep. This means that those variables had the best impact on predicting the Recovery score. I will now implement a model that uses those variables and compare the results to the original models.

## 
## Call:
## lm(formula = Recovery ~ HRV + RHR + HoursOfSleep, data = trainC)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.08504 -0.25875 -0.01994  0.23632  1.17549 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.349117   0.448433  -0.779 0.436802    
## HRV           0.026322   0.001420  18.538  < 2e-16 ***
## RHR          -0.020787   0.007246  -2.869 0.004381 ** 
## HoursOfSleep  0.077868   0.022361   3.482 0.000562 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3753 on 339 degrees of freedom
## Multiple R-squared:  0.6175, Adjusted R-squared:  0.6141 
## F-statistic: 182.4 on 3 and 339 DF,  p-value: < 2.2e-16

To summarize the results, we will view them side by side:

In the below chart, for reference, we want to minimize AIC and Maximize AUC:

AUC and AIC for each Log. Reg Model
Formula	AUC	AIC	R2
Original Recovery w/ All 5 Variables	0.9090646	2372.9591	0.7935866
Adjusted Recovery w/ All 5 Variables	0.9371142	303.2264	0.6355711
Adjusted Recovery w/ top 3 Variables	0.9423041	307.1124	0.6174923

There are some trade off, where AUC is 3 percent higher in the adjusted model with 3 variables, but the AIC is slightly higher. To explain this, AUC is the area under the ROC curve, means how well this model can differentiate between the classes, i.e, recovery score. Having this higher means this model can recognize the difference in the 0,1,2 scores, or even the full recovery score.

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                              Recovery          
## -----------------------------------------------
## HRV                          0.026***          
##                               (0.001)          
##                                                
## RHR                          -0.021***         
##                               (0.007)          
##                                                
## HoursOfSleep                 0.078***          
##                               (0.022)          
##                                                
## Constant                      -0.349           
##                               (0.448)          
##                                                
## -----------------------------------------------
## Observations                    343            
## R2                             0.617           
## Adjusted R2                    0.614           
## Residual Std. Error      0.375 (df = 339)      
## F Statistic          182.419*** (df = 3; 339)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Something of concern could be this statistic seen above which is the R squared value. This is a statistic we want to maximize however we see than this can range where as a solid model hovers around 90-95%. This is a common issue to run into with classification, and specifically binary versus multi-class classification versus regression. AUC can be thought of how well our model can differentiate new examples into their classes. We see that our model with 3 variable choices resulted in a high AUC and this is expected due to the nature of the model and lower variable count. However we see the R squared value is surprisingly low for our third model

There are different opinions on the metrics we used to analyze the models. R is a statistic we want to maximize, along with AUC, however AIC we want to minimize. In our situation, we have the worst AIC with the best R score, and the better AIC models have low R values. This is due to the target variable. When we use numerical regression, R is something we need to consider since it depicts how well the model and outcome are related to the orignal data. AUC and AIC however are not to be ignored, but something to think of as a trade off. In our case, we need to decide what kind of Recovery we want:

Do we want to predict a specific numerical value, in the domain of Recovery, and minimize the residual error?

Do we want to predict a category of the Recovery in 3 classes, and that would be sufficient enough in our analysis?

In short, if we want to find a real number in the domain of the original target variable, we need to use model 1, which was numerical predictions with the 3 significant variables; HRV, RHR, Hours of Sleep.

However, to run a different route, since we see the logistic regression model did not support the category route, let us test a different classifer, perhaps a decision tree model.

Decision Tree Model

Another way we can test this data is a Decision Tree, where the tree is formed by calculating the best ‘route’ to a decision, or in our case, the recovery score. This model requires classification, therefore we will use the adjusted dataframe that consist of the 0,1,2 Recovery Score ranking (red, yellow, green).

Also, since the tree needs to find a best variable to split on, the data needs to be in a binary format, or multiclass, but not numerical.

To do this, I will convert each column as a 0 or 1, that is, being below or above the median value of that column, then I will pass it into the Decision Tree Classifer and test the results.

## [1] "Train Table Recovery Distribution"

## 
##          0          1          2 
## 0.06722689 0.52100840 0.41176471

## [1] "Test Table Recovery Distribution"

## 
##         0         1         2 
## 0.1012658 0.5316456 0.3670886

We see that the split between recovery scores are similar in the Test vs Train split. 1, or yellow (33-66 percent recovery) is still the number one value recorded

In this tree, we do not see the use of the 0 score (red, 0-33) and this is due to the fact that the actual percentage of 0’s is so small and insignificant, it produces no information gain (the statistic that leads the classifier to pick the best attribute to split on).

## 
## Classification tree:
## rpart(formula = Recovery ~ ., data = train_dt, method = "class")
## 
## Variables actually used in tree construction:
## [1] HoursOfSleep HRV          RHR         
## 
## Root node error: 114/238 = 0.47899
## 
## n= 238 
## 
##        CP nsplit rel error  xerror     xstd
## 1 0.42105      0   1.00000 1.00000 0.067604
## 2 0.02193      1   0.57895 0.57895 0.060582
## 3 0.01000      3   0.53509 0.63158 0.062162

Something to take note of is the use of variables, which was HRV, Hours of Sleep, and RHR which were the three variables we found to be the most significant in the Logistic Regression Model we implemented prior.

##    predict_unseen
##       0   1   2
##   0   0  24   0
##   1   0 104  22
##   2   0  18  69

## [1] 0.7299578

We see with this test for the decision tree model, the accuracy is comprable to the R value found in the Logistic Regression model for full numerical prediction. This leads us to believe, with the support of the analysis, that for full prediction in the original domain set, we want to use Logistic Regression, however for classifying category of Recovery, we can utilize a Decision Tree Model.

Summary

During this analysis, we were able to narrow down the Whoop data set for 3 different models. As a recap:

Logistic Regression with Full Recovery
Logistic Regression with Adjusted Recovery Classes
Decision Tree with Adjusted Recovery Classes

Within each model, we found the significant variables to use and compared many different statistics. Circling back to our original question, is sleep the most important metric to worry about in terms of recovery? No, it is not. In this analysis, we found that both HRV and RHR had more significance in predicting the recovery, we see this with the correlation plot, and the significance values of the variables in the models, Hours of Sleep did have some importance but nearly as much.

The updated question to ponder is: What helps increase HRV, which in return, increases Recovery score? According to Whoop (click to view article), HRV can be increased by a number of ways:

Firstly, exercise and train within a healthy range. This means, do not be lazy and not exercise, but do not go over your personal limits.

Secondly, Nutrition is key, eating healthy foods and implementing a smart diet will lead to better body functions and HRV.

Thirdly, Hydration, consuming water based on activity level is key. Dehydration influences many factors from sleep quality to resting heart rate.

The list continues to speak about Sleep Consistency and Quality, however the first 3 points all impact Sleep itself. In order to increase or maintain a favorable recovery score, work on increasing HRV over time, and by following the steps listed above and in the article, you will see an overall increase in Recovery scores.

Analyzing Whoop Health Data

Kaivan Khazeni

2022-04-30