The dataset I used was created by Dr. Amy Nowacki, Associate Professor, Cleveland Clinic. It reveals data surrounding the results of surgeon performance according to the time of day, and other circumstantial conditions.The categorical variables include age, race, American Society of Anesthesiologist Physical Status, day of the week, month of the year, phase of the moon, in-hospital complication, and the presence of many different diseases. The numerical variables are age, complication rate, mortality rate, the statistical risks, and BMI.
I would like to explore the impact that the time of week/month can impact the hospital complications of cancer patients at various ages. I can predict that factors such as age correlate with hospital complications due to the common health-related impacts it can have. Much of this data is already collected in hospitals as a standard measure, but it was specifically derived from the surgical patients at the Cleveland Clinic between January 2005 and September 2010.
I chose to work with this dataset because I plan to enter the medical field. Although I will not be a surgeon, this topic is still relevant as it investigates the complications that might come with healthcare worker fatigue, and conditions. Medical care (especially surgeries) have little-to-no room for error, so the investigation of causes could greatly improve the American healthcare system.
Load In the Libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Rows: 32001 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): ahrq_ccs
dbl (24): age, gender, race, asa_status, bmi, baseline_cancer, baseline_cvd,...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Create a filtered subset of data to use
summary(surgery_timing$age)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.00 48.20 58.60 57.66 68.30 90.00 2
Mutate the Age Ranges
I used mutate to make age ranges instead of using individual ages in my visualizations because the volume of the data was too overwhelming. It was also challenging to use in its original form because the ages were in the form of decimals instead of whole numbers. The age ranges by 10’s is easier to plot, and more understandable for the viewer to make connections.
surgery_timing_cancer <- surgery_timing|>filter(baseline_cancer ==1) |>filter(!is.na(bmi) &!is.na(hour) &!is.na(asa_status) &!is.na(race) &!is.na(age))|>mutate(age_range =case_when(age <20~"1 to 20 Year Olds", age>=20& age <=29.9~"20 Year Olds", age>=30& age <=39.9~"30 Year Olds", age>=40& age <=49.9~"40 Year Olds", age>=50& age <=59.9~"50 Year Olds", age>=60& age <=69.9~"60 Year Olds", age>=70& age <=70.9~"70 Year Olds", age >=80~"Over 80"))|>filter(!is.na(age_range))head(surgery_timing_cancer)
I renamed the categorical variables that were abbreviated with numbers into their word description. It was challenging to keep track of what each number meant, so this made the graphs more understandable for me and the viewer. This also ensured that my categorical variables were not treated as a scale, instead of a “yes or no” question.
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.01612 0.09748 0.10937 0.13977 0.19730 0.46613
Logistic Model
fit1 <-glm(complication ~ age +as.factor(gender) + race + asa_status + bmi + baseline_diabetes +as.factor(baseline_pulmonary) + hour + dow, data = surgery_timing_cancer, family =binomial())summary(fit1)
Call:
glm(formula = complication ~ age + as.factor(gender) + race +
asa_status + bmi + baseline_diabetes + as.factor(baseline_pulmonary) +
hour + dow, family = binomial(), data = surgery_timing_cancer)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.5679757 0.3004321 -8.548 < 2e-16 ***
age 0.0035397 0.0027406 1.292 0.19650
as.factor(gender)1 0.1361787 0.0648446 2.100 0.03572 *
raceCaucasian -0.2776929 0.0992556 -2.798 0.00515 **
raceOther -0.2273877 0.1872009 -1.215 0.22449
asa_statusASA III 0.3556634 0.0697687 5.098 3.44e-07 ***
asa_statusASA IV & V 0.7488361 0.1575948 4.752 2.02e-06 ***
bmi 0.0079624 0.0048067 1.657 0.09762 .
baseline_diabetesNo Diabetes 0.2485521 0.1004695 2.474 0.01336 *
as.factor(baseline_pulmonary)1 -0.0241456 0.1062825 -0.227 0.82028
hour 0.0004462 0.0106881 0.042 0.96670
dow 0.0354087 0.0220798 1.604 0.10879
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6789.5 on 8146 degrees of freedom
Residual deviance: 6719.9 on 8135 degrees of freedom
(1 observation deleted due to missingness)
AIC: 6743.9
Number of Fisher Scoring iterations: 4
fit2 <-glm(complication ~ age +as.factor(gender) + race + asa_status + bmi + baseline_diabetes, data = surgery_timing_cancer, family =binomial())summary(fit2)
Call:
glm(formula = complication ~ age + as.factor(gender) + race +
asa_status + bmi + baseline_diabetes, family = binomial(),
data = surgery_timing_cancer)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.456520 0.271578 -9.045 < 2e-16 ***
age 0.003558 0.002741 1.298 0.19421
as.factor(gender)1 0.131463 0.064561 2.036 0.04172 *
raceCaucasian -0.276680 0.099248 -2.788 0.00531 **
raceOther -0.224009 0.187039 -1.198 0.23105
asa_statusASA III 0.355280 0.069449 5.116 3.13e-07 ***
asa_statusASA IV & V 0.747620 0.156630 4.773 1.81e-06 ***
bmi 0.007929 0.004805 1.650 0.09889 .
baseline_diabetesNo Diabetes 0.245996 0.100400 2.450 0.01428 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6789.5 on 8146 degrees of freedom
Residual deviance: 6722.5 on 8138 degrees of freedom
(1 observation deleted due to missingness)
AIC: 6740.5
Number of Fisher Scoring iterations: 4
Analysis
I initially included many of the dataset’s variables, but I created a second logistic model that was a narrowed down version with the more significant variables. By removing the hour, day of week, and baseline pulmonary disease, the model improved and the AIC went down. This improved model provides a more compelling image of the data. I found that the most significant variable by far was the ASA status of the patient with the lowest p value in the dataset. Race, gender and baseline diabetes is also highly significant in the model, so this is how I decided I wanted to include those factors in my visualizations and explore them further.
Background Research
This article aims to answer the question weather the ASA classification can be a sole predictor of potential surgical complications. The American Society of Anesthesiologist Physical Status is classified as level I being a healthy patient with minimal risk, II is a patient with a mild disease and minimal risk, III is a patient with a severe disease that does not incapacitate their functions, level IV indicates a severe life-threatening disease, and VI is a patient that is not expected to survive the surgery. By testing 2005 through 2016 NSQIP Participant Use Data Files, they found that the ASA status of a patient is consistently effective in predicting complication risks, however “other factors beyond medical covariates may influence medical complication and mortality after ambulatory surgery. For example, since patients must provide self-care after ambulatory surgery, it is possible that poor health literacy may be associated with adverse outcomes”. There are many other factors that play a role in a patients complications such as surgeon errors, the available resources of the hospital, or unknown allergens/reactions to medical procedures.
Citation:
Foley, Colin, et al. “American Society of Anesthesiologists Physical Status Classification as a Reliable Predictor of Postoperative Medical Complications and Mortality Following Ambulatory Surgery: An Analysis of 2,089,830 ACS-NSQIP Outpatient Cases.” BMC Surgery, vol. 21, no. 1, 2021, pp. 253–57, https://doi.org/10.1186/s12893-021-01256-6.
Summary of Cancer Patients BMI Across Variables
summary(surgery_timing_cancer$bmi)
Min. 1st Qu. Median Mean 3rd Qu. Max.
9.92 24.61 27.93 29.04 32.07 75.56
Plot One
plot <-ggplot(data=surgery_timing_cancer, aes(x=hour, y=bmi, color= asa_status))+geom_point()+xlim(7,20) +theme_bw()+scale_color_brewer(palette ="Paired") +labs( title ="Cancer Patient BMI Each Hour of the Day Distinguished by ASA Status",caption ="Source: TSHS",x="Hour of the Day", y="BMI", color="ASA Status")plot
Warning: Removed 149 rows containing missing values or values outside the scale range
(`geom_point()`).
This shows the type of patients given surgeries as the day goes on, comparing the cancer patients ASA status to their BMI. It is interesting to see patients labeled as “lower risk” with concerning, high BMI. I set the x-axis limit from 7,20 because that was the time frame of when elected surgeries seem to be scheduled in the data set. This not only gives a representation of the time of day at which many elected surgeries take place, but also the impact BMI can have on a patient’s ASA status. It was unexpected to find that many cancer patients placed under ASA level one also have a BMI that considers a person morbidly obese. In contrast, there are ASA IV or VI patients with a very low BMI because diseases can have many more causes than weight. There are also a higher proportion of low-risk to moderate patients that dilute the severity IV-VI, but they fall under all levels of BMI.
Plot Two
library(ggalluvial)plot2 <- surgery_timing_cancer |>ggplot(aes(axis1 = asa_status, axis2 = baseline_diabetes, y = bmi)) +geom_alluvium(aes(fill = baseline_diabetes))+geom_stratum() +geom_text(stat ="stratum", aes(label =after_stat(stratum)))+scale_fill_brewer(palette="Set2")+theme_void()+labs (title ="ASA Status of Diabetic or Non-diabetic Cancer Patients", x="Volume of Population", y="ASA Status", fill="Diabetes",caption ="Source: TSHS")plot2
This plot compares the cancer patients ASA status level with their presence or lack-thereof of diabetes. The stratum visual was useful to make connections between the intersecting conditions of diabetes that impacts (but is not bound by) ASA status. From this graph, a majority ASA I and II patients (mild to moderate diseases) do not have diabetes, while a larger portion of ASA III do have diabetes. It might be expected that more people in ASA IV or VI would have diabetes, but this is a smaller population as this group is more likely to be given emergency surgeries in their life-threatening condition.
Plot Three
plot3 <- surgery_timing_cancer |>ggplot(aes(x=age_range, y=ccsComplicationRate, fill= race))+geom_bar(stat ="identity")+theme_bw()+theme(axis.text.x =element_text(angle =90)) +scale_fill_brewer(palette="Accent")+labs( title ="Cancer Patient Surgical Complication Rate by Age and Race",caption ="Source: TSHS",x="Age",color="Race", y="Complication Rate")plot3
This plot shows the predicted complication rate of cancer patients by categorized by age and race. Demographics play a role every aspect of our lives, so I used the “stat= identity” to have a y-axis variable. There were very few patients that were not Caucasian, which is likely related to the privilege it requires to receive medical care, especially with elected procedures with no urgent emergency. In addition, in every age group except for “1 year olds”, African Americans have the highest complication rates over Caucasians or other races.
Conclusion
Overall, this dataset tells a huge story about the demographic disparities in access to healthcare, the impact a patient’s ASA status can have on their likelihood of complications, and the impact diabetes and many other diseases can have on a person’s overall health risks. To further highlight my most personally impactful points, it was incredibly surprising to see the the many relatively healthy (ASA I) patients with high BMI or diabetes. A patient must meet a certain criteria to be classified in their ASA status, so regardless of the reliability in the BMI scale in general, a severely obese patient is destined for health issues later in life. It was also shocking to see the consistently high complication rates of African American patients compared to other races. The dark history behind racism in the medical field still prevails in medical practices and beliefs today, leading to a dangerous relationship either of Black Americans avoiding medical care, or being mistreated when they do go in. There are many more relationships of variables in this dataset I would like to explore in the future. If I had more time I would have liked to include other diseases such as dementia, pulmonary disease,or Cardiovascular/Cerebrovascular Disease and the rates of surgical complications within that.