Contributors: Brandon Hao, Karishma Raghuram, Yujin
Lee, Chris Thornton, Pawan (Sine) Polcharoen, Kellen Whetstone
Date: June 11, 2023
Sleep plays a vital role in maintaining overall health and well-being. Quality rest is essential for cognitive function, mental health, and physical rejuvenation. However, in today’s fast-paced world, achieving consistent and adequate sleep is often put on the back burner. According to the Centers for Disease Control and Prevention, about 1 in 3 adults reported not getting enough sleep every day. Furthermore, 50 to 70 million adults in the US are affected by a sleep disorder, with 25 million impacted by obstructive sleep apnea. Children and teenagers also experience sleep deprivation. According to a study by the National Sleep Foundation, more than 65% of people aged 5 to 17 have not reached 8 hours of daily sleep.
Our study analyzes data from patients of the UCLA Sleep Laboratory. The main variable of interest is the Epworth Sleepiness Scale (ESS), a crucial measure of daytime sleepiness. This statistical report aims to explore the factors influencing this ESS measure and provide meaningful insights into the relationships between these factors to help us better understand how we can improve the average person’s sleep. Specifically, we will be looking at correlations between ESS and other factors such as sleep disordered breathing, periodic limb movements, sleep stage, etc. To further our research, we will also look into these relationships stratified by age and gender. Beyond non-parametric correlation, we utilize logistic regression models to determine the key predictors of ESS and the Apnea Hypopnea Index (AHI).
As for our second area of interest, we will be investigating the relationship between oxygen desaturation counts and cardiac arrhythmia. Given data regarding a patient’s total oxygen desaturation events below thresholds of 90%, 80%, and 70%, we will be analyzing whether these events correlate with the presence of cardiac arrhythmia.
When looking at the Epworth Sleepiness Scale (ESS), a value of 10 or greater is considered problematic. On top of that, a value 16 or greater will indicate severe sleep disorders such as narcolepsy. The question surveyed to obtain the ESS is based on the Likert Scale, which is rated from 0-3, and thus the highest possible score is 24. The scale was initially introduced in 1990 by an Australian psychologist, Dr. Murray Johns. As of 2023, ESS has been available in almost every language, and each question is assigned a maximum of three minutes. During the assessment, the respondent is required to rate their tendencies to fall asleep while engaging in eight different activities. As a convenient measuring tool, many doctors and clinical researchers have utilized ESS as well as develop other formulas to identify sleep disorders and recognize possible treatments; however, the accuracy of said test is somewhat contentious given the design of the survey and lack of efficacy. This ultimately proves problematic as a result of misdiagnosing patients and an abuse of insurance payouts.
In diagnosing sleep disorders, the Apnea Hypopnea Index (AHI) is a critical measure and can be more reasonably deterministic of abnormalities in sleep patterns. AHI is the combined average number of apnea and hypopnea events that occur on an hourly basis. While it has many shortcomings, such as taking the average of both apnea and hypopnea events, which connote different severities in sleep abnormalities (the former being a complete obstruction of airflow, while the latter being a less severe blockage), the AHI in combination with other valuable predictors related to oxygen desaturation, sleep time, etc., prove to be more effective in deriving critical information in sleep disease prognosis.
Our data is collected by the UCLA Sleep Laboratory under the guidance of Dr. Ravi Aysola. This data includes 400 studies of patient sleep observations, collected from January 1, 2023, through March 15, 2023. This dataset will be utilized in our work modeling for research question 1 and 2. However, for correlation testing for question 1, we will be using a separate data set.
Our main variables of interest other than ESS for the correlation testing are: Apnea Hypopnea Index (AHI), Periodic limb movement (PLM), DESATS LT, Body Mass Index (BMI), and Sleep Efficiency. The Apnea Hypopnea Index (AHI) is defined as the total number of apnea and hypopnea occurrences divided by total sleep time. Values of 5-14/hour are considered mild, 15-30/hour are considered moderate, and 30/hour are considered severe. The movement parameter periodic limb movement (PLM) index is the number of leg movements per hour of sleep. Values equal to or greater than 15 are considered abnormal. In turn, the LEG1+LEG2 Index variables are the number of individual leg movements per hour of sleep. To measure the degree of hypoxemia, we use the variable DESAT, which is the measure of inefficient blood oxygen during sleep. Specifically, DESATS LT 70, 80, and 90, measure the number of times that SaO2 dips below 70%, 80%, and 90% respectively.
The data regarding the Epworth Sleepiness Scale is obtained by sleep technicians during the lab meeting. As a result, the available data is listed under the variable name SleepData: ESS. The ESS data was collected by patients with a value between 0 and 24. Initially, the extraneous values for this variable were deleted in the data because of inconsistency or imprecision of the values. Such data would be highly misleading since values of 888 or 999 will significantly impact the mean ESS value, while also being outside the range of permissible values. Disregarding those invalid values, any value 10 or higher in the scale denotes abnormal sleep patterns. The requisites required to perform our analysis is to incorporate a model that captures the relationship between ESS and predictors including, but not limited to, age, weight, and sleep latency index. In addition to that, data cleaning was involved by removing non-response (0) values in the sleep efficiency index, sleep time, and Time:WAKE, as well as unreasonably large weight or BMI values.
Our first step to answer this question was to run correlation tests, looking at the relationship between Epworth Scale score and indices like periodic limb movements, apnea counts, oxygen desaturation, and more. First, we created a correlation plot with all of our variables included, utilizing a heatmap to better visualize the correlations.
Include figure 1: Correlation Heat Map for ESS > 10
Looking at the heatmap, darker colors represent correlation scores closer to 1, which indicates a stronger positive relationship between the two variables. For example, the variable “LEG1 Index” and “LEG2 Index” have a correlation of .91, meaning a patient that shows movement in one leg is very likely to show similar movement in the other leg as well. The diagonal of the plot shows all 1s, as each variable is perfectly correlated with itself.
To determine the significance of the correlation between ESS and PLM: Total, we can run a Pearson’s correlation test. We decided to stratify by gender, to better investigate the relationship between the variables, while accounting for differences within subgroups.
Include figure 2: Pearson’s correlation test results
Observing the above table, we see that the correlation between ESS and PLM: Total is only significant at a level of alpha = 0.05 for males, with a p-value of 0.002. This tells us that there is a significant positive relationship between male’s Epworth Scale score and their total leg movements while sleeping.
Include figures 3-5: Shapiro-Wilk Test results and Kendall, Spearman, Pearson correlation methods
As we see in the table above, our correlation values for all methods are very close to 0, which indicates a weak, if not near arbitrary, relationship between Arousals with PLM and ESS regardless of gender.
In addition to correlation, we used a logistic model to help identify which variables in our data set are effective in predicting ESS.
Include figure 6: ESS Prediction Logistic Regression Model Summary
Among the predictors examined, the significant predictors are Age, Weight, and Sleep Efficiency Index. We proceeded to gather predictions based on our remaining test data, applying the logistic regression model we created above.
Include figure 7: ESS Prediction Logistic Regression Model ROC Curve
Looking at the ROC curve, we can see that it is curved towards the top left corner, indicating that our logistic regression model does fairly well in terms of predicting ESS categories (normal vs abnormal sleep patterns). The area under the curve (AUC) is approximately 0.75, which suggests that the model has a good ability to discriminate between the two classes.
For our second research question, we aimed to investigate whether there is an association between oxygen desaturation events (hypoxemia) and the presence of cardiac arrhythmia. The specific oxygen desaturation levels of interest were below 90%, 80%, and 70% saturation, and the corresponding counts of desaturation events were used as predictors.
Before diving into modeling, we first examined the distribution of oxygen desaturation events within our dataset.
Include figure 8: Histogram of Oxygen Desaturation Events
The histogram above shows the distribution of oxygen desaturation events below 90% saturation. It’s clear that the majority of patients experience relatively few desaturation events, though there are outliers who experience significantly more.
To assess the relationship between hypoxemia and cardiac arrhythmia, we conducted logistic regression analyses using the counts of oxygen desaturation events as predictors. We also considered additional covariates such as BMI, Age, and AHI.
Include figure 9: Logistic Regression Model Summary for Hypoxemia and Cardiac Arrhythmia
The summary of our logistic regression model shows that oxygen desaturation events below 70% and 80% are significant predictors of cardiac arrhythmia, even after controlling for other covariates. This suggests that more severe episodes of hypoxemia are associated with a higher likelihood of arrhythmia.
Include figure 10: ROC Curve for Hypoxemia and Cardiac Arrhythmia Prediction Model
The ROC curve for our final model indicates a good level of prediction accuracy, with an AUC of approximately 0.78. This reinforces the finding that there is indeed a significant association between severe hypoxemia and the occurrence of cardiac arrhythmia.
10-fold cross-validation (90-10 train-test split) was performed on Lasso, Logistic Regression, and KNN (Forward stepwise selection was performed on the whole dataset) to produce confusion matrices and accuracy measures for the final models. The cross-validation results helped to ensure that our models were not overfitting and could generalize well to new data.
Each implementation was done on a computing environment equipped with
an Intel Core i3 CPU, 8 GB RAM, and 64-bit Windows 10 operating system.
The programming language and libraries used (e.g., stringr,
glmnet, caret, leaps,
ROCR) were based in R.
Overall, our analysis provided valuable insights into the predictors of ESS and the relationship between hypoxemia and cardiac arrhythmia. While our logistic regression models showed good predictive power, there is room for further improvement, particularly in terms of reducing false positives and negatives.
Future research could focus on exploring additional variables or incorporating more advanced machine learning techniques to further improve predictive accuracy. Additionally, investigating other health outcomes related to sleep disorders could provide a more comprehensive understanding of the broader impacts of sleep-related health issues.