Colon cancer is a type of cancer that begins in the large intestine (colon), where the colon is the final part of the digestive tract. It is the second leading cause of cancer-related deaths, which is why it is very important to treat at the right age. While colon cancer typically affects elders, it can still happen at any age. The rates of colon cancer at younger ages have recently been increasing, but doctors don’t know why yet. Thankfully, there are many types of treatments available including surgery. However, there is a possibility that treatment causes the stop of eating and drinking for a limited time, which can affect the bmi of the patient. According to the world Journal of Surgical Oncology, young age was an independent predictor of better survival of colon cancer, while poorer survival was associated with male gender. The role of genetics in colorectal cancer (CRC) has become critical to the mission of disease prevention, early detection and effective treatment. Over the last century, CRC genetics has emerged from an unrecognized to a specialized field, encompassing all aspects of cancer care. It is important that everyone keeps these risks in mind, especially if it is in their genetics.
This ’urgent colon surgery” data is a vast dataset that shows preoperative risk factors, intraoperative variables, and 30-day postoperative mortality and morbidity outcomes for patients undergoing major surgical procedures in both the inpatient and outpatient setting. There are many variables one can consider, which encourages numerous ways to find out a specific study concept from part of this data. That is one of the things that I really like about this dataset, that you can explore through so many factors and how detailed it is. For example, the age of the patient, time of procedure, type of race, type of physician, weight height and so much more.
Source of data - The Participant Use Data File (PUF) is a Health Insurance Portability and Accountability Act (HIPAA)-compliant data file containing cases submitted to the American College of Surgeons National Surgical Quality Improvement Program. certified Surgical Clinical Reviewer (SCR) captures these data using a variety of methods including medical chart abstraction. In order to prevent bias in choosing cases for assessment, a systematic sampling process was developed. An important tool to utilize while performing the systematic sampling process is the 8-Day Cycle Schedule. This process assures that over time cases have equal chances of being selected from each day of the week.
Since there are 314 variables in this data set, I will describe the ones that I intend to use:
Sex – catagorical variable
Race_new - catagorical variable
PRNCPTX - catagorical variable – the reasons why patients were admitted for surgery
Homeyes - catagorical variable – discharged to go home or not
Age – quantitative variable
Expiredyes - catagorical variable – post surgery survival
bmi - quantitative variable
How does gender affect survival rate post colon cancer surgery? Do men have a greater chance of poor survival?
Can age predict survival rates? Does younger age have a better chance of survival?
What is the greatest reason patients go into sugery, What type of infections have a higher chance of not surviving?
Is there a correlation between age and bmi
What proportion of people get discharged to go home after surgery rather than any other care unit
In order to answer these questions, I performed summary statistics through graphs, and plotted age groups, gender, and infection survival counts in my code. I used chi squared test to see the relationships between these variables and a correlation plot to find the relation between age and bmi. I also found the confidence interval for the proportion of patients to get discharged after surgery.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(psych)
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(RColorBrewer)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(BSDA)
## Loading required package: lattice
##
## Attaching package: 'BSDA'
##
## The following object is masked from 'package:datasets':
##
## Orange
surgery_data <- read_csv("urgent_colon surgery.csv")
## New names:
## • `otheryes` -> `otheryes...8`
## • `Homeyes` -> `Homeyes...15`
## • `facilityyes` -> `facilityyes...16`
## • `otheryes` -> `otheryes...20`
## • `Homeyes` -> `Homeyes...35`
## • `facilityyes` -> `facilityyes...37`
## • `otheryes` -> `otheryes...38`
## • `` -> `...252`
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 43607 Columns: 314
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (69): SEX, RACE_NEW, ETHNICITY_HISPANIC, PRNCPTX, INOUT, TRANST, AGE, D...
## dbl (213): PUFYEAR, CASEID, femaleyes, whiteyes, blackyes, otheryes...8, CPT...
## lgl (32): OTHERPROC10, OTHERCPT10, OTHERWRVU10, CONCURR7, CONCPT7, CONWRVU7...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# summary(surgery_data)
surgery <- mutate(surgery_data,
Homeyes...15 = as.factor(case_when(Homeyes...15 == 0 ~ "not home",
Homeyes...15 == 1 ~ "at home")),
Expiredyes = as.factor(case_when(Expiredyes == 1 ~ "dead",
Expiredyes== 0 ~ "alive")))
surgery$AGE <- gsub("+","",surgery$AGE)
surgery$AGE <- as.numeric(surgery$AGE)
## Warning: NAs introduced by coercion
summary(surgery$AGE)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 18.00 54.00 65.00 63.22 75.00 90.00 1333
surgery2 <- rename(surgery, survival = Expiredyes, infection = PRNCPTX)
removena_surgery <- surgery2 %>%
filter(!is.na(survival) & !is.na(AGE) & !is.na(BMI)) # remove any Nas from the data
head(removena_surgery) # view the updated data
## # A tibble: 6 × 314
## PUFYEAR CASEID SEX femal…¹ RACE_…² white…³ black…⁴ other…⁵ ETHNI…⁶ infec…⁷
## <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 2018 8634492 male 0 White 1 0 0 N COLECT…
## 2 2017 6834792 female 1 White 1 0 0 Y COLECT…
## 3 2018 8328996 female 1 Black … 0 1 0 N COLECT…
## 4 2016 5055443 female 1 White 1 0 0 N COLECT…
## 5 2018 8694016 male 0 White 1 0 0 N COLECT…
## 6 2018 9030828 female 1 White 1 0 0 N COLECT…
## # … with 304 more variables: CPT <dbl>, WORKRVU <dbl>, INOUT <chr>,
## # TRANST <chr>, Homeyes...15 <fct>, facilityyes...16 <dbl>,
## # transferyes <dbl>, transferacuteyes <dbl>, transferEDyes <dbl>,
## # otheryes...20 <dbl>, AGE <dbl>, `65+` <dbl>, `18-29` <dbl>, `30-39` <dbl>,
## # `40-49` <dbl>, `50-59` <dbl>, `60-69` <dbl>, `70-79` <dbl>, `80-89` <dbl>,
## # `90+` <dbl>, Decade <dbl>, ADMYR <dbl>, OPERYR <dbl>, DISCHDEST <chr>,
## # Homeyes...35 <dbl>, survival <fct>, facilityyes...37 <dbl>, …
surgery_Age <- removena_surgery %>%
mutate(AGEGroup = ifelse(AGE < 10.5,"00-10 Years", # age groups 0 - 10 years
ifelse(AGE <20.5, "10-20 Years", # age groups 10 - 20 years
ifelse(AGE <30.5, "20-30 Years", # age groups 30 - 40 years
ifelse(AGE <40.5, "30-40 Years", # age groups 40 - 50 years
ifelse(AGE <50.5, "40-50 Years", # age groups 50 - 60 years
ifelse(AGE <60.5, "50-60 Years", # age groups 60 - 70 years
ifelse(AGE <70.5, "60-70 Years", "70+ Years"))))))))
unique(surgery_Age$AGEGroup)
## [1] "30-40 Years" "40-50 Years" "60-70 Years" "50-60 Years" "70+ Years"
## [6] "20-30 Years" "10-20 Years"
by_death <- removena_surgery %>%
group_by(survival, AGE, SEX, BMI) %>% # group by gender and age
summarize(count = n(),
bmi = mean(BMI)) # we need to use the mean bmi
## `summarise()` has grouped output by 'survival', 'AGE', 'SEX'. You can override
## using the `.groups` argument.
unique(by_death$survival)
## [1] alive dead
## Levels: alive dead
plot1 <- surgery_Age %>%
ggplot(aes(x=AGEGroup, fill = survival)) +
geom_bar(position = "dodge") +
scale_fill_manual(values=c('#00BFC4','#F8766D')) +
ggtitle("Age Groups and Survival Counts") +
xlab("Age Groups") + ylab("Count of Survival")
theme(axis.text.x = element_text(angle = 45))
## List of 1
## $ axis.text.x:List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : num 45
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
plot1
It seems like mostly everyone has a high chance of surviving colon cancer surgery.
table(surgery_Age$survival, surgery_Age$AGEGroup)
##
## 10-20 Years 20-30 Years 30-40 Years 40-50 Years 50-60 Years 60-70 Years
## alive 226 1259 2144 4013 7433 9146
## dead 3 24 52 140 405 846
##
## 70+ Years
## alive 12401
## dead 1762
chisq.test(table(surgery_Age$survival, surgery_Age$AGEGroup))
##
## Pearson's Chi-squared test
##
## data: table(surgery_Age$survival, surgery_Age$AGEGroup)
## X-squared = 752.83, df = 6, p-value < 2.2e-16
I can conclude that there is a strong association between age and survival count, as the p value is 0.00000000000000022, which is much smaller than the alpha level of 0.05
unique(removena_surgery$infection)
## [1] "COLECTOMY PRTL W/SKIN LEVEL CECOST/COLOSTOMY"
## [2] "COLECTOMY PARTIAL W/ANASTOMOSIS"
## [3] "COLECTOMY PRTL W/END COLOSTOMY & CLSR DSTL SGMT"
## [4] "COLECTOMY PRTL W/RMVL TERMINAL ILEUM & ILEOCOLOS"
## [5] "RPR UMBILICAL HERNIA AGE 5 YRS/> INCARCERATED"
## [6] "COLECTOMY PRTL W/COLOST/ILEOST & MUCOFISTULA"
## [7] "COLCT TOT ABDL W/O PRCTECT W/ILEOST/ILEOPXTS"
## [8] "COLECTOMY PRTL W/COLOPROCTOSTOMY"
## [9] "RPR 1ST INCAL/VNT HERNIA INCARCERATED"
## [10] "COLECTOMY PRTL W/COLOPROCTOSTOMY & COLOSTOMY"
## [11] "COLECTOMY PRTL ABDOMINAL & TRANSANAL APPROACH"
## [12] "COLECTOMY TOT ABDL W/PROCTECTOMY W/ILEOSTOMY"
## [13] "REPAIR FIRST ABDOMINAL WALL HERNIA"
## [14] "GSTRCT PRTL DSTL W/GASTROJEJUNOSTOMY"
## [15] "ENTRC RESCJ SMALL INTESTINE 1 RESCJ & ANAST"
## [16] "ENTEROLSS FRING INTSTINAL ADHESION SPX"
## [17] "ABLTJ OPN 1/> LVR TUM RF"
## [18] "EXPLORATORY LAPAROTOMY CELIOTOMY W/WO BIOPSY SPX"
## [19] "EXC/DESTRUCTION OPEN ABDOMINAL TUMORS >10.0 CM"
## [20] "APPENDEC RPTD APPENDIX ABSC/PRITONITIS"
## [21] "HEPATECTOMY RESCJ PARTIAL LOBECTOMY"
## [22] "APPENDECTOMY"
## [23] "EXC/DESTRUCTION OPEN ABDMNL TUMORS 5.1-10.0 CM"
## [24] "PNCRTECT DSTL NR-TOT W/PRSRV DUO CHLD-TYP PX"
## [25] "RPR RECRT INCAL/VNT HERNIA INCARCERATED"
## [26] "RPR 1ST INGUN HRNA AGE 5 YRS/> INCARCERATED"
## [27] "CLSR ENTEROVES FSTL W/INTESTINE&/BLADDER RESCJ"
## [28] "LAPAROSCOPIC APPENDECTOMY"
## [29] "RPR PARAESOPH HIATAL HERNIA W/LAPT W/O MESH"
## [30] "RESECJ/DBRDMT PANCREAS NECROTIZING PANCREATITIS"
## [31] "REVJ COLOSTOMY COMP RCNSTJ IN-DEPTH SPX"
## [32] "CLSR ENTEROENTERIC/ENTEROCOLIC FSTL"
## [33] "URETEROCOLON CONDUIT INTESTINE ANASTOMOSIS"
## [34] "PNCRTECT DSTL STOT W/O PNCRTCOJEJUNOSTOMY"
## [35] "DBRDMT SKN SUBQ T/M/F NECRO INFCTJ GENT/ABDL"
## [36] "CHOLECYSTECTOMY W/CHOLANGIOGRAPHY"
## [37] "CHOLECYSTECTOMY"
## [38] "EXCISION/DESTRUCTION OPEN ABDOMINAL TUMOR 5 CM/<"
## [39] "DRAINAGE PERITON ABSCESS/LOCAL PERITONITIS OPEN"
## [40] "PNCRTECT WHIPPLE W/O PANCREATOJEJUNOSTOMY"
## [41] "CLSR NTRSTM LG/SM RESCJ & COLORECTAL ANASTOMOSIS"
## [42] "MOBLJ SPLENIC FLXR PFRMD CONJUNCT W/PRTL COLCT"
## [43] "CLOSURE GASTROCOLIC FISTULA"
## [44] "LAPAROSCOPY COLECTOMY PARTIAL W/ANASTOMOSIS"
## [45] "INCISION AND DRAINAGE APPENDICEAL ABSCESS OPEN"
## [46] "GASTROJEJUNOSTOMY W/O VAGOTOMY"
## [47] "IMPLANT MESH OPN HERNIA RPR/DEBRIDEMENT CLOSURE"
## [48] "EMBLC/THRMBC RNL CELIAC MESENTRY AORTO-ILIAC ART"
## [49] "REVJ GSTR/JJ ANAST W/RCNSTJ W/O VGTMY"
## [50] "CLOSURE INTESTINAL CUTANEOUS FISTULA"
## [51] "GSTRCT PRTL DSTL W/GASTRODUODENOSTOMY"
## [52] "REVJ COLOSTOMY W/RPR PARACLST HERNIA SPX"
## [53] "ENTERECTOMY RESCJ SMALL INTESTINE W/ENTEROSTOMY"
## [54] "GASTRIC RSTCV W/PRTL GASTRECTOMY 50-100 CM"
## [55] "SPLENECTOMY TOTAL SEPARATE PROCEDURE"
## [56] "PRCTECT COMPL CMBN ABDOMINOPRNL W/CLST"
## [57] "RPR 1ST INGUN HRNA AGE 5 YRS/> REDUCIBLE"
## [58] "LAPS COLECTOMY PRTL W/COLOPXTSTMY LW ANAST"
## [59] "ENTEROENTEROST ANAST INT W/WO CUTAN NTRSTM SPX"
## [60] "ENTERORRHAPHY 1PERFORATION"
## [61] "RPR PARAESOPH HIATAL HERNIA W/THORCOM W/O MESH"
## [62] "LAPAROSCOPY SURG CHOLECYSTECTOMY"
## [63] "COLECTOMY TOT ABD W/PROCTECTOMY ILEOANAL ANAST"
## [64] "EXPL PO HEMRRG THROMBOSIS/INFCTJ ABD"
## [65] "EXC LOCAL MALIGNANT TUMOR STOMACH"
## [66] "MUSC MYOCUTANEOUS/FASCIOCUTANEOUS FLAP TRUNK"
## [67] "PRCTECT PRTL RESCJ RECTUM TABDL APPR"
## [68] "RPR BLOOD VESSEL DIRECT INTRA-ABDOMINAL"
## [69] "COLOSTOMY/SKIN LEVEL CECOSTOMY"
## [70] "CLSR NTRSTM LG/SM RESCJ & ANAST OTH/THN CLRCT"
## [71] "PNCRTECT PROX STOT W/PANCREATOJEJUNOSTOMY"
## [72] "RPR UMBILICAL HRNA 5 YRS/> REDUCIBLE"
## [73] "OOPHORECTOMY PARTIAL/TOTAL UNI/BI"
## [74] "RDCTJ VOLVULUS INTUSSUSCEPTION INT HRNA LAPT"
## [75] "NEPHRECTOMY W/PRTL URETERECTOMY W/OPEN RIB RESCJ"
## [76] "EXC 1/> SMALL/LARGE LESIONS INTESTINE ENTEROTOM"
## [77] "HEPATECTOMY RESCJ TOTAL LEFT LOBECTOMY"
## [78] "DEBRIDEMENT MUSCLE & FASCIA 20 SQ CM/<"
## [79] "UNLISTED LAPAROSCOPIC PX ABD PERTONEUM & OMENTUM"
## [80] "PNCRTECT W/PANCREATOJEJUNOSTOMY"
## [81] "GASTRORRHAPHY SUTR PRF8 DUOL/GSTR ULCER WND/INJ"
## [82] "UNLISTED PROCEDURE STOMACH"
## [83] "ENTERORRHAPHY MULTIPLE PERFORATIONS"
## [84] "I&D SOFT TISSUE ABSCESS SUBFASC"
## [85] "PRCTECT CMBN PULL-THRU W/RSVR W/NTRSTM"
## [86] "LAPS COLECTMY PRTL W/COLOPXTSTMY LW ANAST W/CLST"
## [87] "HEPATECTOMY RESCJ TRISEGMENTECTOMY"
## [88] "LAPS COLECTOMY PRTL W/RMVL TERMINAL ILEUM"
## [89] "LAPS MOBLJ SPLENIC FLXR PFRMD W/PRTL COLECTOMY"
## [90] "CLSR ENTEROVES FSTL W/O INTSTINAL/BLADDER RESCJ"
## [91] "LAPS COLECTOMY PRTL W/END CLST & CLSR DSTL SGM"
## [92] "ENTERECTOMY RESCJ SMALL INTESTINE EA RESCJ & ANA"
## [93] "ILEOSTOMY/JEJUNOSTOMY NON-TUBE"
## [94] "UNLISTED PROCEDURE PANCREAS"
## [95] "RPR LG OMPHALOCELE/GASTROSCHISIS W/WO PROSTH"
## [96] "OMENTAL FLAP INTRA-ABDOMINAL"
## [97] "HEPATECTOMY RESCJ TOTAL RIGHT LOBECTOMY"
## [98] "RPR DIPHRG HRNA OTH/THN NEONATAL TRAUMTC AQT"
## [99] "GSTRCT TOT W/ROUX-EN-Y RCNSTJ"
## [100] "LAPAROSCOPY SURG RPR INITIAL INGUINAL HERNIA"
## [101] "GSTRCT PRTL DSTL W/ROUX-EN-Y RCNSTJ"
## [102] "REMOVAL TUNNELED INTRAPERITONEAL CATHETER"
## [103] "UNLISTED LAPAROSCOPY PROCEDURE APPENDIX"
## [104] "CLSR RECTOVAGINAL FISTULA ABDOMINAL APPROACH"
## [105] "CORRJ MALROTATION BANDS&/RDCTJ VOLVULUS"
## [106] "REOPENING RECENT LAPAROTOMY"
## [107] "COLOTOMY EXPLORATION/BIOPSY/FOREIGN BODY REMOVAL"
## [108] "URINARY UNIDIVERSION"
## [109] "PANCREATICOJEJUNOSTOMY SIDE-TO-SIDE ANAST"
## [110] "RPR RECRT INGUN HERNIA ANY AGE INCARCERATED"
## [111] "RPR RECRT INCAL/VNT HERNIA REDUCIBLE"
## [112] "PELVIC EXENTERATION COLORECTAL MALIGNANCY"
## [113] "PROCTOPEXY ABDOMINAL APPROACH"
## [114] "CSTC COMPL W/URTROILEAL CONDUIT/BLDR W/INT ANAST"
## [115] "PRTL ESOPHECT DSTL W/WO PROX GASTRECT/PYLORPLSTY"
## [116] "PNCRTECT DSTL STOT W/PNCRTCOJEJUNOSTOMY"
## [117] "CLOSURE RECTOVESICAL FISTULA"
## [118] "LAPS SURG W/ASPIR CAVITY/CYST SINGLE/MULTIPLE"
## [119] "UNLISTED PROCEDURE ABDOMEN PERITONEUM & OMENTUM"
## [120] "REPAIR LACERATION DIAPHRAGM ANY APPROACH"
## [121] "RESCJ PRIM PRTL MAL W/BSO & OMNTC RAD DEBULKING"
## [122] "LAPAROSCOPY ENTEROLYSIS SEPARATE PROCEDURE"
## [123] "DRAINAGE OF RETROPERITONEAL ABSCESS OPEN"
## [124] "LAPS RPR PARAESPHGL HRNA INCL FUNDPLSTY W/O MESH"
## [125] "TOTAL ABDOMINAL HYSTERECT W/WO RMVL TUBE OVARY"
## [126] "RPR 1ST FEM HERNIA ANY AGE INCARCERATED"
## [127] "DEBRIDEMENT SUBCUTANEOUS TISSUE 20 SQ CM/<"
## [128] "ANORECTAL MYOMECTOMY"
## [129] "CLOSURE ENTEROSTOMY LG/SMALL INTESTINE"
## [130] "LAPS ENTERECT RESCJ 1 SMALL INTEST RESCJ & ANA"
## [131] "DIR RPR ANEURYSM ABDOMINAL AORTA"
## [132] "SUTR LG INTESTINE 1/MULT PERFORAT W/COLOSTOMY"
## [133] "BYP OTH/THN VEIN AORTOCELIAC AORTOMSN AORTORNL"
## [134] "RESECTION RECRT MAL W/OMENTECTOMY PEL LMPHADEC"
## [135] "CSTC COMPL W/CONDUIT/SIGMOID BLDR PEL LMPHADEC"
## [136] "SPLENC TOT EN BLOC EXTNSV DS CONJUNCT W/OTH PX"
## [137] "HEPATOTOMY OPEN DRAINAGE ABSCESS/CYST 1/2 STAGES"
## [138] "TOT ESOPHG W/THORCOM W/COLON NTRPSTJ/INT RCNSTJ"
## [139] "GSTRCT TOT W/ESOPHAGOENTEROSTOMY"
## [140] "CYSTECTOMY PARTIAL COMPLICATED"
## [141] "BYPASS GRAFT W/OTHER THAN VEIN ILIO-MESENTERIC"
## [142] "REVJ GASTRODUOL ANAST W/RCNSTJ W/O VAGOTOMY"
infections <- removena_surgery %>%
group_by(infection, survival = "dead") %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
head(10)
## `summarise()` has grouped output by 'infection'. You can override using the
## `.groups` argument.
infections
## # A tibble: 10 × 3
## # Groups: infection [10]
## infection survival count
## <chr> <chr> <int>
## 1 COLECTOMY PARTIAL W/ANASTOMOSIS dead 10275
## 2 COLECTOMY PRTL W/END COLOSTOMY & CLSR DSTL SGMT dead 9946
## 3 COLECTOMY PRTL W/RMVL TERMINAL ILEUM & ILEOCOLOS dead 7759
## 4 COLCT TOT ABDL W/O PRCTECT W/ILEOST/ILEOPXTS dead 2608
## 5 COLECTOMY PRTL W/SKIN LEVEL CECOST/COLOSTOMY dead 2519
## 6 COLECTOMY PRTL W/COLOPROCTOSTOMY dead 2383
## 7 COLECTOMY PRTL W/COLOST/ILEOST & MUCOFISTULA dead 2034
## 8 COLECTOMY PRTL W/COLOPROCTOSTOMY & COLOSTOMY dead 860
## 9 COLECTOMY TOT ABDL W/PROCTECTOMY W/ILEOSTOMY dead 331
## 10 ENTRC RESCJ SMALL INTESTINE 1 RESCJ & ANAST dead 143
ggplot(data = infections, mapping = aes(x = reorder(infection, count), y = count, col = infection)) +
geom_point() +
ggtitle("Death Count Acording to Reason of Surgery")+
labs(x=NULL, y="Patient Count")+
scale_x_discrete( guide = "none")+
theme_bw()
Three types of infections that caused these patients to have surgery really stand out in this plot. Colectomy prtl W/END Colostomy, Colectomy partial W/Anastomosis, and Colectomy Prtyl W/rmvl Terminal Ileum. These were the most common surgeries that did not have a good survival count.
infections_2 <- removena_surgery %>%
group_by (infection) %>%
filter(SEX == "female", survival == "dead") %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
head(3)
infections_2
## # A tibble: 3 × 2
## infection count
## <chr> <int>
## 1 COLECTOMY PRTL W/END COLOSTOMY & CLSR DSTL SGMT 412
## 2 COLECTOMY PARTIAL W/ANASTOMOSIS 392
## 3 COLECTOMY PRTL W/RMVL TERMINAL ILEUM & ILEOCOLOS 258
infections_3 <- removena_surgery %>%
group_by (infection) %>%
filter(SEX == "male", survival == "dead") %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
head(3)
infections_3
## # A tibble: 3 × 2
## infection count
## <chr> <int>
## 1 COLECTOMY PARTIAL W/ANASTOMOSIS 389
## 2 COLECTOMY PRTL W/END COLOSTOMY & CLSR DSTL SGMT 306
## 3 COLECTOMY PRTL W/RMVL TERMINAL ILEUM & ILEOCOLOS 275
surgery_hist <- surgery_Age%>%
group_by(SEX, survival, AGEGroup) %>%
# filter( survival == "dead") %>%
summarise(count = n())
## `summarise()` has grouped output by 'SEX', 'survival'. You can override using
## the `.groups` argument.
surgery_hist
## # A tibble: 28 × 4
## # Groups: SEX, survival [4]
## SEX survival AGEGroup count
## <chr> <fct> <chr> <int>
## 1 female alive 10-20 Years 83
## 2 female alive 20-30 Years 516
## 3 female alive 30-40 Years 953
## 4 female alive 40-50 Years 1827
## 5 female alive 50-60 Years 3660
## 6 female alive 60-70 Years 4912
## 7 female alive 70+ Years 7254
## 8 female dead 10-20 Years 1
## 9 female dead 20-30 Years 13
## 10 female dead 30-40 Years 25
## # … with 18 more rows
plot2 <- ggplot(surgery_hist, aes(x=AGEGroup, y=count, fill=survival))+
geom_bar(stat = "identity") + xlab("Age Groups") +
ylab("Count") +
ggtitle("Survival Counts according to ages and gender") +
scale_fill_manual(values = c("dark blue", "red")) +
theme_minimal(base_size = 10) + # theme for the graph
theme(plot.title = element_text(hjust = 0.5)) +
facet_grid(~SEX) # facet grid to see two bar plots according to each gender
ggplotly(plot2)
According to this plot, it seems females have a higher number of patients who died after surgery. Although it seems most people survived colon cancer surgery, we should not over look the number of patients who did not survive. Although, it looks like a fairly less count according to the distribution on the bar graph, the number is still almost 1000 for females in the age group of 70+.
H0 - There is no statistical significance between death rates and gender, explaining that gender does not affect on deaths post colon cancer surgery.
HA - There is statistical significance between death rates and gender, explaining that gender does affect on deaths post colon cancer surgery.
table(removena_surgery$survival, removena_surgery$SEX) %>% prop.table()
##
## female male
## alive 0.48188388 0.43702012
## dead 0.04248005 0.03861595
chisq.test(table(removena_surgery$survival, removena_surgery$SEX))
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(removena_surgery$survival, removena_surgery$SEX)
## X-squared = 0.00209, df = 1, p-value = 0.9635
The p-value is greater than the significance level of 0.05, so we failed to reject the null hypothesis, and that there is not enough evidence to conclude any association between gender and deaths.
x <- by_death$AGE
y <- by_death$bmi
plot(x, y, main="Scatterplot of Age Vs bmi", xlab="Age ", ylab="bmi ", pch=19)
abline(lm(y~x), col="red") # regression line (y~x)
res <- cor(by_death$AGE, by_death$bmi)
res
## [1] -0.06037877
lm <- lm(AGE ~ bmi , data = by_death)
summary(lm)
##
## Call:
## lm(formula = AGE ~ bmi, data = by_death)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.476 -9.658 1.804 12.120 30.733
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66.11895 0.31623 209.09 <2e-16 ***
## bmi -0.12120 0.01071 -11.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.77 on 35012 degrees of freedom
## Multiple R-squared: 0.003646, Adjusted R-squared: 0.003617
## F-statistic: 128.1 on 1 and 35012 DF, p-value: < 2.2e-16
There does not seem to be any correlation between age and bmi according to this model. The R squared value is nearly 0.004 which says that the best estimate of the amount of variance explained by the model is that it’s absolutely miniscule
p <- 33576 / n
se <- sqrt(p*(1-p)/n )
moe <- c(-1.96 * se, 1.96 * se)
conf_int <- p + moe
conf_int
## [1] 0.8388984 0.8460517
The 95% confidence interval is (0.839, 0.846)
Thus we are 95% confident that the proportion of patients who will get discharged to go home after surgery is between 0.839 and 0.846.
fit <- glm(survival ~ AGE + SEX + BMI, data = removena_surgery, family = "binomial")
summary(fit)
##
## Call:
## glm(formula = survival ~ AGE + SEX + BMI, family = "binomial",
## data = removena_surgery)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8463 -0.4675 -0.3781 -0.2897 3.0144
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.721643 0.138458 -41.324 < 2e-16 ***
## AGE 0.040781 0.001479 27.575 < 2e-16 ***
## SEXmale 0.135092 0.037434 3.609 0.000308 ***
## BMI 0.017940 0.002435 7.366 1.76e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 22433 on 39853 degrees of freedom
## Residual deviance: 21534 on 39850 degrees of freedom
## AIC: 21542
##
## Number of Fisher Scoring iterations: 6
According to this logistic regression model, all variables included seem to be very significant in predicting the survival of colon cancer surgery.
Age seems to have a strong relation with survival counts of colon cancer surgery. While majority of patients survive the surgery, as age increases, there is a higher count of patients who do not survive. Something else to notice is that as age increases, the number of patients total in the age group also increases, telling us that more and more people get diagnosed or are admitted for colon cancer at elderly ages.
younger age has a higher chance of survival than other age groups.
Colectomy prtl W/END Colostomy, Colectomy partial W/Anastomosis, and Colectomy Prtyl W/rmvl Terminal Ileum. These were the most common surgeries that did not have a good survival count.
Females have a higher number of patients who died after surgery
There is no correlation between age and bmi post colon cancer surgery
Age, Gender, and bmi have a significant relationship in concluding survival count of colon cancer
association between age and survival count p value is 0.00000000000000022, which is much smaller than the alpha level of 0.05
Relationship between gender and death counts - 0.9635 - The p-value is greater than the significance level of 0.05 we fail to reject the null hypothesis
COLECTOMY PRTL W/END COLOSTOMY & CLSR DSTL SGMT – female death count – 412
COLECTOMY PRTL W/END COLOSTOMY & CLSR DSTL SGMT – male death count - 389
Correlation between age and bmi - R squared value is nearly 0.004
Patients who get discharged to go home - The 95% confidence interval is (0.839, 0.846)
This is a very rich data set with numerous options of studies one can do by the variables. I believe the statistics were pretty thorough, and that I was able to get most of my answers using these techniques. Although there was a lot of factoring and using dplyr to get to my statistics, I was able to get mostly all the results I wanted. Something I wanted to do was perform more correlation statistics between variables, however there are a very few quantitative variables in this data set. Overall, it seems like colon cancer surgery is successful for most patients regardless of gender and age, and there are many treatment methods for this. I would hope most people are able to detect it at a younger age and have a higher chance of curing it.
“Colon Cancer Treatment.” Colon Cancer Treatment | Johns Hopkins Medicine, 7 Mar. 2022, https://www.hopkinsmedicine.org/health/conditions-and-diseases/colon-cancer/colon-cancer-treatment#:~:text=The%20most%20common%20treatment%20for,survival%20rate%20is%2090%20percent.
McKay, Andrew, et al. “Does Young Age Influence the Prognosis of Colorectal Cancer: A Population-Based Analysis - World Journal of Surgical Oncology.” BioMed Central, BioMed Central, 2 Dec. 2014, https://wjso.biomedcentral.com/articles/10.1186/1477-7819-12-370.
“Problems after Surgery.” Problems after Surgery | Bowel Cancer | Cancer Research UK, 11 Feb. 2022, https://www.cancerresearchuk.org/about-cancer/bowel-cancer/treatment/treatment-rectal/surgery-rectal/problems-after-surgery.
“Colon Cancer.” Mayo Clinic, Mayo Foundation for Medical Education and Research, 8 Oct. 2022, https://www.mayoclinic.org/diseases-conditions/colon-cancer/symptoms-causes/syc-20353669.