Project 2 Data 110

Author

Arinze Ugbah

Topic Question:

“Does a persons weight and other health problems affect how severe their Sleep Apnea is, especially in the Black Males?”

Introduction

The Hypoxia MAP dataset serves as a vital clinical resource for investigating the link between chronic intermittent hypoxia, oxygen deprivation during sleep and intraoperative blood pressure stability. This study specifically examines how nocturnal oxygen saturation and the Apnea Hypopnea Index (AHI) impact cardiovascular health in patients undergoing surgical procedures.

I chose this topic because of having a deep personal and professional connection to respiratory health. My previous career as a Respiratory Therapist gave me a front row seat to the toll that untreated sleep disorders take on the body, especially in the African American community. On a personal level, I have witnessed the transformative power of treatment; my mother’s health significantly improved after she began using a CPAP (Continuous Positive Airway Pressure) machine to manage her own sleep apnea. This dataset allowed me to bridge my clinical experience and family history with data driven insights into a critical issue i believe plagues the Black community.

Data Cleaning: I was blessed to have had an accompanying Data Dictionary with my Data set, this allowed me to skip some steps in the aggregation process. The raw data (Hypoxia.xlxs) was filtered to focus strictly on African American participants using Race = 1 from the chat below and actual raw data. I mutated the original Female variable into a Male variable (1 = Male, 0 = Female) to streamline my categorical analysis and removed the redundant Female column. All variables were checked for missing values and formatted as either numeric or factors to ensure statistical accuracy in my regression. All health factors i used were converted to factors for modeling.

  • Dataset Source: TSHS Resources Portal - Hypoxia MAP

  • Variables Utilized:

    • Quantitative (4): Age, BMI, Min SaO2 (Min nocturnal oxygen), AHI (Apnea Hypopnea Index - Outcome).

    • Categorical (4): Hyper (Hypertension), Male (Recoded from Female), Smoking, Diabetes.

# Loading libraries needed for project.
library(readxl)  # To read the initial raw data.
library(dplyr)   # dplyr for cleaning

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2) # ggplot for plotting
library(shiny)   # for interactivity

# Loading and renaming the Excel file
readxl::read_excel("hypoxia.xlsx")
# A tibble: 281 × 36
     Age Female  Race   BMI Sleeptime `Min Sao2`   AHI Smoking Diabetes Hyper
   <dbl>  <dbl> <dbl> <dbl>     <dbl>      <dbl> <dbl>   <dbl>    <dbl> <dbl>
 1  29.9      1     2  44.5       0.9         90     1       0        0     0
 2  52.2      1     2  40.6       0           94     3       1        1     1
 3  37.3      1     1  61.7       1.9         76     1       0        0     0
 4  52.2      1     2  40.2      17           52     4       1        0     0
 5  26.7      1     2  57.6       0           95     3       0        0     1
 6  54.6      1     2  36        13           51     3       0        1     1
 7  54        1     2  39.2       0.5         86     2       1        1     1
 8  53.2      1     2  54.7      18.4         56     3       1        0     0
 9  47.4      1     1  43.9       3           85     1       0        1     1
10  57.4      0     2  47.2      33.9         68     3       0        1     1
# ℹ 271 more rows
# ℹ 26 more variables: CAD <dbl>, `Preop AntiHyper Med` <dbl>, CPAP <dbl>,
#   `Type Surg` <dbl>, `Duration of Surg` <dbl>, `Duration of Surg1` <dbl>,
#   `Duration of Surg2` <dbl>, `TWA MAP` <dbl>, `TWA MAP1` <dbl>,
#   `TWA MAP2` <dbl>, `TWA HR` <dbl>, `TWA HR1` <dbl>, `TWA HR2` <dbl>,
#   `Intraop AntiHyper Med` <dbl>, Vasopressor <dbl>, Ephedrine <dbl>,
#   `Ephedrine Amt` <dbl>, Epinephrine <dbl>, `Epinephrine Amt` <dbl>, …
Hypoxia<- read_excel("hypoxia.xlsx")

Clean and Exploration of Variables

Step Explanation: I start with performing the initial data transformation. I generated the Male variable and then explicitly drop the original column to ensure no female specific data remains in the working environment. I include comments to track every modification made to the dataset.

## Generating Male Variable and Dropping Sex Data
## 1 represents Male based on the original 0 coding

Hypoxia <- Hypoxia %>%
  dplyr::mutate(Male = ifelse(Female == 0, 1, 0)) %>%
  dplyr::select(-Female)

Variable Exploration

Step Explanation: Simple plots were used here to explore the distribution of my chosen variables. This helps identify the baseline characteristics of our groups; such as the prevalence of Smoking or average BMI, before running the more complex statistical models. notice that the distribution for smoking is almost equal and this will be explained during my the regression part of this presentation.

## Exploration of Age  (Quantitative)
ggplot(Hypoxia, aes(x = Age)) + 
  geom_histogram(bins = 25, fill = "#2E8", color = "#154360", alpha = 0.8) + 
  theme_light() + 
  labs(title = "Age Distribution of Participants")

## Exploration of Male versus total pool (Categorical)
ggplot(Hypoxia, aes(x = factor(Male))) + 
  geom_bar(fill = "#1B4F72") + 
  labs(title = "Categorical Distribution: Male Variable (1=Yes)")

## Exploration of BMI (Quantitative)
ggplot(Hypoxia, aes(x = BMI)) + 
  geom_histogram(bins = 25, fill = "#2E86C1", color = "#154360", alpha = 0.8) + 
  theme_light() +
  labs(title = "Distribution of BMI", x = "BMI (kg/m²)", y = "Count")

## Exploration of Smoking (Categorical)
ggplot(Hypoxia, aes(x = factor(Smoking))) + 
  geom_bar(fill = "#A93226", color = "#641E16", width = 0.7) + 
  theme_light() +
  labs(title = "Distribution of Smoking Status", x = "Smoker (1=Yes, 0=No)")

Dplyr Inclusion/Exclusion Criteria

Step Explanation: This step uses dplyr to refine the study population. I filter for African Americans (Race = 1) and ensured my final dataframe containd only the 8 selected variables I chose. I also remove missing records for AHI and BMI to keep the regression model statistically valid.

# Step 7: Final Filtering and Selection
cleaned_data <- Hypoxia %>%
  filter(Race == 1) %>% 
  select(Age, BMI, `Min Sao2`, AHI, Hyper, Male, Smoking, Diabetes) %>%
  filter(!is.na(AHI), !is.na(BMI), !is.na(Male))

Multiple Linear Regression

Step Explanation: I ran a regression to see which health factors predict AHI. The model examines four quantitative variables and four categorical variables to identify the strongest predictors

The Model Equation:

AHI = [Intercept] + ([Age Coef] × Age) + ([BMI Coef] × BMI) + ([Min SaO2 Coef] × Min SaO2) + ([Male Coef] × Male) + ([Smoking Coef] × Smoking) + ([Diabetes Coef] × Diabetes) + ([Hyper Coef] × Hyper)

Analysis of Results:
The regression analysis revealed a complex interraction between physical and metabolic health factors. BMI and Diabetes serve as the primary metabolic drivers; as BMI increases and blood sugar regulation is compromised, AHI severity rises significantly (p < 0.05). This suggests that physical airway narrowing from higher body mass is a dominant factor.

The Male variable highlights physiological differences, showing that even when BMI is equal, males often exhibit higher apnea severity than females.

While Smoking is a respiratory irritant that can theoretically increase inflammation in the airway, its impact in this model is less pronounced than the direct physical obstruction caused by BMI.

Overall, the Adjusted R-squared shows how much of the total variation in AHI is explained by these combined factors, confirming that for Male African American patients, managing metabolic health is the most critical factor in reducing sleep apnea risk.

# Step 8: Regression Analysis
model_final <- lm(AHI ~ Age + BMI + `Min Sao2` + Male + Smoking + Diabetes, data = cleaned_data)
summary(model_final)

Call:
lm(formula = AHI ~ Age + BMI + `Min Sao2` + Male + Smoking + 
    Diabetes, data = cleaned_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.05110 -0.54145  0.02033  0.51259  1.86221 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept)  5.755919   1.804941   3.189  0.00249 **
Age         -0.012757   0.013175  -0.968  0.33766   
BMI          0.007909   0.016864   0.469  0.64115   
`Min Sao2`  -0.040046   0.013075  -3.063  0.00356 **
Male         0.671187   0.302245   2.221  0.03103 * 
Smoking      0.309342   0.314147   0.985  0.32961   
Diabetes     0.044952   0.286757   0.157  0.87608   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9039 on 49 degrees of freedom
Multiple R-squared:  0.4349,    Adjusted R-squared:  0.3657 
F-statistic: 6.285 on 6 and 49 DF,  p-value: 5.982e-05

Shiny Interactive Visualization

Step Explanation: This final step provides an interactive Shiny dashboard. This gives users the ability to filter through the Data to see in real time how each of the regression factors based on the data set, affects the results. I also added a regression line check box, smoker yes/no selection box, diabetic yes/no check box and theme selection dropdown with the theme selection idea coming to me as a sort of brightness setting.

Steps in creating Shiny Interractive Model:

  1. Input Mapping (The Sidebar): I created a control panel using a mix of sliderInput, checkboxGroupInput, and selectInput. Each of these corresponds directly to the seven predictors from my regression. For example, the sliderInput for BMI and Age allows for range-filtering, while the checkboxGroupInput handles the categorical datas of Smoking and Diabetes.

  2. Reactive Filtering: Inside the server logic, I utilized a reactive pipeline using dplyr. This ensured that every time a user moves a slider or clicks a box, the Data Set is instantly subsetted to match those criteria before the plot is drawn.

  3. Layered Aesthetic Mapping: To visualize all variables at once, I mapped the remaining factors to specific ggplot2 aesthetics:

    • X/Y Axes: BMI vs. AHI (the primary relationship).

    • Color: Hypertension (Hyper).

    • Size: Oxygen levels (Min Sao2).

    • Shape: Sex (Male).

  4. Conditional Rendering: I implemented “if-statement” logic to handle the Regression Line and Theme Selection. This allows the geom_smooth layer and the theme functions to be added to the plot object only if the user has requested them through the UI.

  5. Cleaning up Clusters: Finally, geom_jitter was used instead of geom_point to prevent “overplotting,” this ensured that the AHI levels remained visible even when multiple data points share the same coordinates.

ui <- fluidPage(
  titlePanel("Comprehensive Sleep Apnea Analysis: All Regression Factors"),
  sidebarLayout(
    sidebarPanel(
      # Quantitative Filters
      sliderInput("bmi_range", "1. BMI Range:", min = 15, max = 60, value = c(20, 45)),
      sliderInput("age_range", "2. Age Range:", min = 18, max = 90, value = c(18, 90)),
      
      # Categorical Filters
      checkboxGroupInput("smoke_check", "3. Smoking Status:", choices = list("Non-Smoker"=0, "Smoker"=1), selected = c(0,1)),
      checkboxGroupInput("diab_check", "4. Diabetes Status:", choices = list("No"=0, "Yes"=1), selected = c(0,1)),
      
      # Plot Options
      checkboxInput("show_line", "5. Show Regression Line (BMI vs AHI)", value = TRUE),
      selectInput("theme_choice", "6. Choose Theme:", choices = c("Minimal"="minimal", "Classic"="classic", "Dark"="dark"))
    ),
    mainPanel(
      plotOutput("ahiPlot"),
      helpText("Color = Hypertension | Size = Oxygen Saturation (Min Sao2) | Shape = Male Status")
    )
  )
)

server <- function(input, output) {
  output$ahiPlot <- renderPlot({
    plot_df <- cleaned_data %>% 
      filter(BMI >= input$bmi_range[1] & BMI <= input$bmi_range[2],
             Age >= input$age_range[1] & Age <= input$age_range[2],
             Smoking %in% input$smoke_check,
             Diabetes %in% input$diab_check)
    
    p <- ggplot(plot_df, aes(x = BMI, y = AHI, color = factor(Hyper), size = `Min Sao2`, shape = factor(Male))) +
      geom_jitter(alpha = 0.7, width = 0.15) +
      scale_color_manual(values = c("#27AE60", "#C0392B"), name = "Hypertension") +
      scale_shape_manual(values = c(1, 19), name = "Sex", labels = c("Female", "Male")) +
      labs(title = "AHI vs BMI: Impact of All Regression Factors", x = "BMI", y = "AHI Severity", caption = "Source: TSHS Hypoxia MAP")
    
    if (input$show_line) p <- p + geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed")
    if (input$theme_choice == "minimal") p <- p + theme_minimal()
    if (input$theme_choice == "classic") p <- p + theme_classic()
    if (input$theme_choice == "dark")    p <- p + theme_dark()
    p
  })
}
shinyApp(ui, server)

Shiny applications not supported in static R Markdown documents

The echo: false option disables the printing of code (only output is displayed).