“Does a persons weight and other health problems affect how severe their Sleep Apnea is, especially in the Black Males?”
Introduction
The Hypoxia MAP dataset serves as a vital clinical resource for investigating the link between chronic intermittent hypoxia, oxygen deprivation during sleep and intraoperative blood pressure stability. This study specifically examines how nocturnal oxygen saturation and the Apnea Hypopnea Index (AHI) impact cardiovascular health in patients undergoing surgical procedures.
I chose this topic because of having a deep personal and professional connection to respiratory health. My previous career as a Respiratory Therapist gave me a front row seat to the toll that untreated sleep disorders take on the body, especially in the African American community. On a personal level, I have witnessed the transformative power of treatment; my mother’s health significantly improved after she began using a CPAP (Continuous Positive Airway Pressure) machine to manage her own sleep apnea. This dataset allowed me to bridge my clinical experience and family history with data driven insights into a critical issue i believe plagues the Black community.
Data Cleaning: I was blessed to have had an accompanying Data Dictionary with my Data set, this allowed me to skip some steps in the aggregation process. The raw data (Hypoxia.xlxs) was filtered to focus strictly on African American participants using Race = 1 from the chat below and actual raw data. I mutated the original Female variable into a Male variable (1 = Male, 0 = Female) to streamline my categorical analysis and removed the redundant Female column. All variables were checked for missing values and formatted as either numeric or factors to ensure statistical accuracy in my regression. All health factors i used were converted to factors for modeling.
Step Explanation: I start with performing the initial data transformation. I generated the Male variable and then explicitly drop the original column to ensure no female specific data remains in the working environment. I include comments to track every modification made to the dataset.
## Generating Male Variable and Dropping Sex Data## 1 represents Male based on the original 0 codingHypoxia <- Hypoxia %>% dplyr::mutate(Male =ifelse(Female ==0, 1, 0)) %>% dplyr::select(-Female)
Variable Exploration
Step Explanation: Simple plots were used here to explore the distribution of my chosen variables. This helps identify the baseline characteristics of our groups; such as the prevalence of Smoking or average BMI, before running the more complex statistical models. notice that the distribution for smoking is almost equal and this will be explained during my the regression part of this presentation.
## Exploration of Age (Quantitative)ggplot(Hypoxia, aes(x = Age)) +geom_histogram(bins =25, fill ="#2E8", color ="#154360", alpha =0.8) +theme_light() +labs(title ="Age Distribution of Participants")
## Exploration of Male versus total pool (Categorical)ggplot(Hypoxia, aes(x =factor(Male))) +geom_bar(fill ="#1B4F72") +labs(title ="Categorical Distribution: Male Variable (1=Yes)")
## Exploration of BMI (Quantitative)ggplot(Hypoxia, aes(x = BMI)) +geom_histogram(bins =25, fill ="#2E86C1", color ="#154360", alpha =0.8) +theme_light() +labs(title ="Distribution of BMI", x ="BMI (kg/m²)", y ="Count")
## Exploration of Smoking (Categorical)ggplot(Hypoxia, aes(x =factor(Smoking))) +geom_bar(fill ="#A93226", color ="#641E16", width =0.7) +theme_light() +labs(title ="Distribution of Smoking Status", x ="Smoker (1=Yes, 0=No)")
Dplyr Inclusion/Exclusion Criteria
Step Explanation: This step uses dplyr to refine the study population. I filter for African Americans (Race = 1) and ensured my final dataframe containd only the 8 selected variables I chose. I also remove missing records for AHI and BMI to keep the regression model statistically valid.
Step Explanation: I ran a regression to see which health factors predict AHI. The model examines four quantitative variables and four categorical variables to identify the strongest predictors
Analysis of Results:
The regression analysis revealed a complex interraction between physical and metabolic health factors. BMI and Diabetes serve as the primary metabolic drivers; as BMI increases and blood sugar regulation is compromised, AHI severity rises significantly (p < 0.05). This suggests that physical airway narrowing from higher body mass is a dominant factor.
The Male variable highlights physiological differences, showing that even when BMI is equal, males often exhibit higher apnea severity than females.
While Smoking is a respiratory irritant that can theoretically increase inflammation in the airway, its impact in this model is less pronounced than the direct physical obstruction caused by BMI.
Overall, the Adjusted R-squared shows how much of the total variation in AHI is explained by these combined factors, confirming that for Male African American patients, managing metabolic health is the most critical factor in reducing sleep apnea risk.
# Step 8: Regression Analysismodel_final <-lm(AHI ~ Age + BMI +`Min Sao2`+ Male + Smoking + Diabetes, data = cleaned_data)summary(model_final)
Call:
lm(formula = AHI ~ Age + BMI + `Min Sao2` + Male + Smoking +
Diabetes, data = cleaned_data)
Residuals:
Min 1Q Median 3Q Max
-2.05110 -0.54145 0.02033 0.51259 1.86221
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.755919 1.804941 3.189 0.00249 **
Age -0.012757 0.013175 -0.968 0.33766
BMI 0.007909 0.016864 0.469 0.64115
`Min Sao2` -0.040046 0.013075 -3.063 0.00356 **
Male 0.671187 0.302245 2.221 0.03103 *
Smoking 0.309342 0.314147 0.985 0.32961
Diabetes 0.044952 0.286757 0.157 0.87608
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9039 on 49 degrees of freedom
Multiple R-squared: 0.4349, Adjusted R-squared: 0.3657
F-statistic: 6.285 on 6 and 49 DF, p-value: 5.982e-05
Shiny Interactive Visualization
Step Explanation: This final step provides an interactive Shiny dashboard. This gives users the ability to filter through the Data to see in real time how each of the regression factors based on the data set, affects the results. I also added a regression line check box, smoker yes/no selection box, diabetic yes/no check box and theme selection dropdown with the theme selection idea coming to me as a sort of brightness setting.
Steps in creating Shiny Interractive Model:
Input Mapping (The Sidebar): I created a control panel using a mix of sliderInput, checkboxGroupInput, and selectInput. Each of these corresponds directly to the seven predictors from my regression. For example, the sliderInput for BMI and Age allows for range-filtering, while the checkboxGroupInput handles the categorical datas of Smoking and Diabetes.
Reactive Filtering: Inside the server logic, I utilized a reactive pipeline using dplyr. This ensured that every time a user moves a slider or clicks a box, the Data Set is instantly subsetted to match those criteria before the plot is drawn.
Layered Aesthetic Mapping: To visualize all variables at once, I mapped the remaining factors to specific ggplot2 aesthetics:
X/Y Axes: BMI vs. AHI (the primary relationship).
Color: Hypertension (Hyper).
Size: Oxygen levels (Min Sao2).
Shape: Sex (Male).
Conditional Rendering: I implemented “if-statement” logic to handle the Regression Line and Theme Selection. This allows the geom_smooth layer and the theme functions to be added to the plot object only if the user has requested them through the UI.
Cleaning up Clusters: Finally, geom_jitter was used instead of geom_point to prevent “overplotting,” this ensured that the AHI levels remained visible even when multiple data points share the same coordinates.
ui <-fluidPage(titlePanel("Comprehensive Sleep Apnea Analysis: All Regression Factors"),sidebarLayout(sidebarPanel(# Quantitative FilterssliderInput("bmi_range", "1. BMI Range:", min =15, max =60, value =c(20, 45)),sliderInput("age_range", "2. Age Range:", min =18, max =90, value =c(18, 90)),# Categorical FilterscheckboxGroupInput("smoke_check", "3. Smoking Status:", choices =list("Non-Smoker"=0, "Smoker"=1), selected =c(0,1)),checkboxGroupInput("diab_check", "4. Diabetes Status:", choices =list("No"=0, "Yes"=1), selected =c(0,1)),# Plot OptionscheckboxInput("show_line", "5. Show Regression Line (BMI vs AHI)", value =TRUE),selectInput("theme_choice", "6. Choose Theme:", choices =c("Minimal"="minimal", "Classic"="classic", "Dark"="dark")) ),mainPanel(plotOutput("ahiPlot"),helpText("Color = Hypertension | Size = Oxygen Saturation (Min Sao2) | Shape = Male Status") ) ))server <-function(input, output) { output$ahiPlot <-renderPlot({ plot_df <- cleaned_data %>%filter(BMI >= input$bmi_range[1] & BMI <= input$bmi_range[2], Age >= input$age_range[1] & Age <= input$age_range[2], Smoking %in% input$smoke_check, Diabetes %in% input$diab_check) p <-ggplot(plot_df, aes(x = BMI, y = AHI, color =factor(Hyper), size =`Min Sao2`, shape =factor(Male))) +geom_jitter(alpha =0.7, width =0.15) +scale_color_manual(values =c("#27AE60", "#C0392B"), name ="Hypertension") +scale_shape_manual(values =c(1, 19), name ="Sex", labels =c("Female", "Male")) +labs(title ="AHI vs BMI: Impact of All Regression Factors", x ="BMI", y ="AHI Severity", caption ="Source: TSHS Hypoxia MAP")if (input$show_line) p <- p +geom_smooth(method ="lm", se =FALSE, color ="black", linetype ="dashed")if (input$theme_choice =="minimal") p <- p +theme_minimal()if (input$theme_choice =="classic") p <- p +theme_classic()if (input$theme_choice =="dark") p <- p +theme_dark() p })}shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents
The echo: false option disables the printing of code (only output is displayed).