##Introduction
The data for assignment 3 used in my analysis was gathered from the paper “Imaging and Clinical Data Archive for Head and Neck Squamous Cell Carcinoma Patients Treated with Radiotherapy.” This paper detailed the collection and processing of computed tomography based imaging in patients with head and neck squamous cell carcinoma who were treated with radiotherapy. Using the data, my hypothesis is that individuals with a history of smoking are more likely to have lower survival probability when compared to individuals without a history of smoking. The outcome of interest, survival, is measured in months, from the time diagnosed to the date of death (primary death from head and neck squamous cell carcinoma). The primary exposure of interest, smoking history, is classified as “0”- never smoked, “1”- less than 10 pack-years, and “2” greater than or equal to 10 pack-years, with current smokers classified as yes (currently smoking) and no (not currently smoking). Variables identified and controlled for are gender, age group, specific diagnosis, grade and stage of diagnosis, and BMI.
library(readxl)
PT_data <- read_excel("Patient and Treatment Characteristics.xls")
View(PT_data)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(table1)
##
## Attaching package: 'table1'
## The following objects are masked from 'package:base':
##
## units, units<-
cleandat <- PT_data %>% mutate(
agegroup = factor(case_when(
Age <= 39 ~ "39 and under",
Age >= 40 & Age <= 49 ~ "40-49",
Age >= 50 & Age <= 59 ~ "50-59",
Age >= 60 & Age <= 69 ~ "60-69",
Age >= 70 ~ "70+"
))
) %>%
mutate(Stage = recode_factor(Stage,
"I" = "a",
"II" = "b",
"III" = "c",
"IVA" = "d",
"IVB" = "e"))
##Table 1 Our sample data set included a total of 215 participants, classified above as alive/dead, and broken into characteristics of gender, age group, diagnosis, grade, stage, smoking status, and BMI. Our study population was majority male, between age 50-69.
table1(~ Sex + agegroup + Diag + Grade + Stage + SmokingHistory + CurrentSmoker + BMIstarttreat | AliveorDead, data=cleandat, overall = "Total", rowlabelhead = "Characteristics", caption = "Table 1: Characteristics and Survival Status of Participants", topclass="Rtable1-zebra")
| Characteristics | Alive (N=138) |
Dead (N=77) |
Total (N=215) |
|---|---|---|---|
| Sex | |||
| Female | 22 (15.9%) | 11 (14.3%) | 33 (15.3%) |
| Male | 116 (84.1%) | 66 (85.7%) | 182 (84.7%) |
| agegroup | |||
| 39 and under | 6 (4.3%) | 1 (1.3%) | 7 (3.3%) |
| 40-49 | 23 (16.7%) | 9 (11.7%) | 32 (14.9%) |
| 50-59 | 64 (46.4%) | 26 (33.8%) | 90 (41.9%) |
| 60-69 | 33 (23.9%) | 31 (40.3%) | 64 (29.8%) |
| 70+ | 12 (8.7%) | 10 (13.0%) | 22 (10.2%) |
| Diag | |||
| CA BOT | 59 (42.8%) | 20 (26.0%) | 79 (36.7%) |
| CA glossopharyngeal sulcus | 2 (1.4%) | 0 (0%) | 2 (0.9%) |
| CA larynx | 3 (2.2%) | 3 (3.9%) | 6 (2.8%) |
| CA maxillary sinus | 1 (0.7%) | 2 (2.6%) | 3 (1.4%) |
| CA oral tongue | 3 (2.2%) | 3 (3.9%) | 6 (2.8%) |
| CA oropharynx | 2 (1.4%) | 0 (0%) | 2 (0.9%) |
| CA posteriot pharyngeal wall | 1 (0.7%) | 0 (0%) | 1 (0.5%) |
| CA pyriform sinus | 2 (1.4%) | 7 (9.1%) | 9 (4.2%) |
| CA retromolar trigone | 1 (0.7%) | 0 (0%) | 1 (0.5%) |
| CA soft palate | 2 (1.4%) | 1 (1.3%) | 3 (1.4%) |
| CA supraglottic | 9 (6.5%) | 9 (11.7%) | 18 (8.4%) |
| CA tonsil | 47 (34.1%) | 20 (26.0%) | 67 (31.2%) |
| CUP | 4 (2.9%) | 2 (2.6%) | 6 (2.8%) |
| NPC | 2 (1.4%) | 4 (5.2%) | 6 (2.8%) |
| CA alveolar ridge | 0 (0%) | 1 (1.3%) | 1 (0.5%) |
| CA buccal mucosa | 0 (0%) | 1 (1.3%) | 1 (0.5%) |
| CA hypopharynx | 0 (0%) | 2 (2.6%) | 2 (0.9%) |
| CA pharyngeal | 0 (0%) | 1 (1.3%) | 1 (0.5%) |
| recurrence CA retromolar trigone | 0 (0%) | 1 (1.3%) | 1 (0.5%) |
| Grade | |||
| moderately diff. | 55 (39.9%) | 34 (44.2%) | 89 (41.4%) |
| moderately to poorly diff. | 4 (2.9%) | 8 (10.4%) | 12 (5.6%) |
| poorly diff. | 65 (47.1%) | 24 (31.2%) | 89 (41.4%) |
| undiff. | 2 (1.4%) | 1 (1.3%) | 3 (1.4%) |
| well diff. | 11 (8.0%) | 8 (10.4%) | 19 (8.8%) |
| Well to moderately diff. | 1 (0.7%) | 2 (2.6%) | 3 (1.4%) |
| Stage | |||
| a | 0 (0%) | 4 (5.2%) | 4 (1.9%) |
| b | 3 (2.2%) | 2 (2.6%) | 5 (2.3%) |
| c | 19 (13.8%) | 12 (15.6%) | 31 (14.4%) |
| d | 108 (78.3%) | 48 (62.3%) | 156 (72.6%) |
| e | 8 (5.8%) | 11 (14.3%) | 19 (8.8%) |
| SmokingHistory | |||
| Mean (SD) | 1.10 (0.938) | 1.26 (0.923) | 1.16 (0.934) |
| Median [Min, Max] | 1.00 [0, 2.00] | 2.00 [0, 2.00] | 2.00 [0, 2.00] |
| CurrentSmoker | |||
| Mean (SD) | 0.275 (0.448) | 0.416 (0.496) | 0.326 (0.470) |
| Median [Min, Max] | 0 [0, 1.00] | 0 [0, 1.00] | 0 [0, 1.00] |
| BMIstarttreat | |||
| Mean (SD) | 29.5 (5.60) | 26.3 (4.73) | 28.4 (5.51) |
| Median [Min, Max] | 28.6 [18.2, 49.3] | 25.4 [17.3, 39.8] | 27.8 [17.3, 49.3] |
##Graph1
Graph 1: Survival Probability and Survival Time in Months Based on Age Group. Based on the graph, those aged 39 and younger had the highest survival probability, with other age groups varying over time.
library(ggplot2)
library(survival)
Y = Surv(cleandat$Survival, cleandat$SmokingHistory == 1)
kmfit = survfit(Y ~ cleandat$agegroup)
summary(kmfit, times = c(seq(0, 120, by = 10)))
## Call: survfit(formula = Y ~ cleandat$agegroup)
##
## cleandat$agegroup=39 and under
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 7 0 1 0 1 1
## 10 7 0 1 0 1 1
## 20 7 0 1 0 1 1
## 30 7 0 1 0 1 1
## 40 7 0 1 0 1 1
## 50 6 0 1 0 1 1
## 60 5 0 1 0 1 1
## 70 5 0 1 0 1 1
## 80 4 0 1 0 1 1
## 90 2 0 1 0 1 1
## 100 2 0 1 0 1 1
## 110 1 0 1 0 1 1
##
## cleandat$agegroup=40-49
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 32 0 1.000 0.000 1.000 1
## 10 31 0 1.000 0.000 1.000 1
## 20 26 0 1.000 0.000 1.000 1
## 30 23 0 1.000 0.000 1.000 1
## 40 23 0 1.000 0.000 1.000 1
## 50 20 0 1.000 0.000 1.000 1
## 60 19 0 1.000 0.000 1.000 1
## 70 15 1 0.944 0.054 0.844 1
## 80 11 2 0.804 0.103 0.626 1
## 90 3 0 0.804 0.103 0.626 1
## 100 2 0 0.804 0.103 0.626 1
##
## cleandat$agegroup=50-59
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 90 0 1.000 0.0000 1.000 1.000
## 10 88 1 0.989 0.0110 0.967 1.000
## 20 83 1 0.978 0.0157 0.947 1.000
## 30 74 1 0.965 0.0196 0.928 1.000
## 40 70 1 0.952 0.0236 0.907 0.999
## 50 66 0 0.952 0.0236 0.907 0.999
## 60 56 0 0.952 0.0236 0.907 0.999
## 70 49 1 0.935 0.0288 0.880 0.993
## 80 35 2 0.892 0.0403 0.817 0.975
## 90 17 2 0.812 0.0661 0.693 0.953
## 100 5 3 0.589 0.1242 0.390 0.891
## 110 1 0 0.589 0.1242 0.390 0.891
##
## cleandat$agegroup=60-69
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 64 0 1.000 0.0000 1.000 1
## 10 62 0 1.000 0.0000 1.000 1
## 20 51 2 0.966 0.0236 0.921 1
## 30 49 0 0.966 0.0236 0.921 1
## 40 46 0 0.966 0.0236 0.921 1
## 50 42 0 0.966 0.0236 0.921 1
## 60 32 1 0.939 0.0350 0.873 1
## 70 29 0 0.939 0.0350 0.873 1
## 80 21 1 0.902 0.0499 0.809 1
## 90 11 1 0.837 0.0774 0.698 1
## 100 5 0 0.837 0.0774 0.698 1
## 110 2 0 0.837 0.0774 0.698 1
##
## cleandat$agegroup=70+
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 22 0 1.000 0.0000 1.000 1
## 10 22 0 1.000 0.0000 1.000 1
## 20 19 0 1.000 0.0000 1.000 1
## 30 17 1 0.947 0.0512 0.852 1
## 40 16 0 0.947 0.0512 0.852 1
## 50 13 0 0.947 0.0512 0.852 1
## 60 10 1 0.861 0.0944 0.695 1
## 70 9 0 0.861 0.0944 0.695 1
## 80 7 0 0.861 0.0944 0.695 1
## 90 3 1 0.646 0.1995 0.353 1
## 100 1 0 0.646 0.1995 0.353 1
plot(kmfit, lty = c("solid", "solid", "dashed", "dashed", "solid"), col = c("red", "blue", "orange", "green", "black"), xlab = "Survival Time (In Months)", ylab = "Survival Probabilities")
legend("bottomleft", c("39 and Younger", "40-49", "50-59", "60-69", "70+"), lty = c("solid", "dashed"), col = c("red", "blue", "orange", "green", "black"))
##Graph2
Graph 2: Survival Probability and Survival Time in Months Based on Smoking Status. Based on the graph, those classified as current smokers had lower survival probabilities over time when compared to non-smokers.
kmfit2 = survfit(Y ~ cleandat$CurrentSmoker)
summary(kmfit2, times = c(seq(0, 120, by = 10)))
## Call: survfit(formula = Y ~ cleandat$CurrentSmoker)
##
## cleandat$CurrentSmoker=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 145 0 1.000 0.00000 1.000 1.000
## 10 140 1 0.993 0.00692 0.980 1.000
## 20 126 3 0.971 0.01412 0.944 0.999
## 30 116 2 0.956 0.01761 0.922 0.991
## 40 113 0 0.956 0.01761 0.922 0.991
## 50 104 0 0.956 0.01761 0.922 0.991
## 60 88 2 0.935 0.02250 0.892 0.980
## 70 76 2 0.914 0.02672 0.863 0.967
## 80 54 4 0.856 0.03756 0.785 0.933
## 90 24 3 0.780 0.05485 0.680 0.895
## 100 10 3 0.651 0.08334 0.507 0.837
## 110 1 0 0.651 0.08334 0.507 0.837
##
## cleandat$CurrentSmoker=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 70 0 1.000 0.0000 1.000 1
## 10 70 0 1.000 0.0000 1.000 1
## 20 60 0 1.000 0.0000 1.000 1
## 30 54 0 1.000 0.0000 1.000 1
## 40 49 1 0.980 0.0194 0.943 1
## 50 43 0 0.980 0.0194 0.943 1
## 60 34 0 0.980 0.0194 0.943 1
## 70 31 0 0.980 0.0194 0.943 1
## 80 24 1 0.949 0.0363 0.880 1
## 90 12 1 0.881 0.0735 0.748 1
## 100 5 0 0.881 0.0735 0.748 1
## 110 3 0 0.881 0.0735 0.748 1
plot(kmfit2, lty = c("dashed", "solid"), col = c("red", "blue"), xlab = "Survival Time (In Months)", ylab = "Survival Probabilities")
legend("bottomright", c("Current Smoker", "Not a Current Smoker"), lty = c("solid", "dashed"), col = c("red", "blue"))
##Graph3
Graph 3: Survival Probability and Survival Time in Months Based on Stage. Based on the graph, those classified as Stage C had lower survival probabilities over time when compared to participants in other stages.
kmfit3 = survfit(Y ~ cleandat$Stage)
summary(kmfit3, times = c(seq(0, 120, by = 10)))
## Call: survfit(formula = Y ~ cleandat$Stage)
##
## cleandat$Stage=a
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 4 0 1 0 1 1
## 10 4 0 1 0 1 1
## 20 3 0 1 0 1 1
## 30 2 0 1 0 1 1
## 40 1 0 1 0 1 1
##
## cleandat$Stage=b
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 5 0 1 0 1 1
## 10 5 0 1 0 1 1
## 20 5 0 1 0 1 1
## 30 5 0 1 0 1 1
## 40 5 0 1 0 1 1
## 50 5 0 1 0 1 1
## 60 3 0 1 0 1 1
## 70 3 0 1 0 1 1
## 80 3 0 1 0 1 1
## 90 3 0 1 0 1 1
##
## cleandat$Stage=c
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 31 0 1.000 0.0000 1.000 1
## 10 29 0 1.000 0.0000 1.000 1
## 20 25 1 0.966 0.0339 0.901 1
## 30 25 0 0.966 0.0339 0.901 1
## 40 24 1 0.927 0.0499 0.834 1
## 50 21 0 0.927 0.0499 0.834 1
## 60 20 1 0.883 0.0641 0.766 1
## 70 17 0 0.883 0.0641 0.766 1
## 80 12 1 0.828 0.0804 0.684 1
## 90 6 0 0.828 0.0804 0.684 1
## 100 3 1 0.621 0.1891 0.342 1
## 110 1 0 0.621 0.1891 0.342 1
##
## cleandat$Stage=d
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 156 0 1.000 0.00000 1.000 1.000
## 10 153 1 0.994 0.00639 0.981 1.000
## 20 137 2 0.980 0.01132 0.958 1.000
## 30 126 2 0.966 0.01511 0.937 0.996
## 40 123 0 0.966 0.01511 0.937 0.996
## 50 113 0 0.966 0.01511 0.937 0.996
## 60 92 1 0.955 0.01811 0.921 0.992
## 70 81 2 0.934 0.02321 0.890 0.981
## 80 60 3 0.895 0.03132 0.836 0.959
## 90 26 4 0.798 0.05447 0.698 0.912
## 100 11 2 0.707 0.07851 0.569 0.879
## 110 2 0 0.707 0.07851 0.569 0.879
##
## cleandat$Stage=e
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 19 0 1.00 0.000 1.000 1
## 10 19 0 1.00 0.000 1.000 1
## 20 16 0 1.00 0.000 1.000 1
## 30 12 0 1.00 0.000 1.000 1
## 40 9 0 1.00 0.000 1.000 1
## 50 8 0 1.00 0.000 1.000 1
## 60 7 0 1.00 0.000 1.000 1
## 70 6 0 1.00 0.000 1.000 1
## 80 3 1 0.75 0.217 0.426 1
## 90 1 0 0.75 0.217 0.426 1
## 100 1 0 0.75 0.217 0.426 1
## 110 1 0 0.75 0.217 0.426 1
plot(kmfit3, lty = c("solid", "solid", "dashed", "dashed", "solid"), col = c("red", "blue", "orange", "green", "black"), xlab = "Survival Time (In Months)", ylab = "Survival Probabilities")
legend("bottomleft", c("a", "b", "c", "d", "e"), lty = c("solid", "solid", "dashed", "dashed", "solid"), col = c("red", "blue", "orange", "green", "black"))
##DAG
The below DAG demonstrates the hypothesized association between exposure (smoking history) and outcome (survival). The DAG shows that smoking history can impact the outcome of survival through mediator of cancer status. Both age and sex are identified as potential confounders, with both having an impact on smoking status and survival probability. HPV status is also included in the DAG, as HPV status is associated with cancer status and survival probability. The DAG shows cancer status as being caused by either smoking history or HPV status, with cancer status then resulting in outcome survival probability.
A potential issue with this DAG is the lack of information in this dataset on HPV status. Most entries HPV status is not available, so even with knowledge from other studies concluding HPV-associated cancers, we are not able to conclude any correlation in this dataset.
testImplications <- function( covariance.matrix, sample.size ){
library(ggm)
tst <- function(i){ pcor.test( pcor(i,covariance.matrix), length(i)-2, sample.size )$pvalue }
tos <- function(i){ paste(i,collapse=" ") }
implications <- list(c("Age","Sex"),
c("Age","Cancer Status","HPV Status","History of Smoking","Current Smoker"),
c("History of Smoking","HPV Status","Age","Sex"),
c("Current Smoker","HPV Status","Age","Sex"),
c("Sex","Cancer Status","HPV Status","History of Smoking","Current Smoker"))
data.frame( implication=unlist(lapply(implications,tos)),
pvalue=unlist( lapply( implications, tst ) ) )
dag
"Cancer Status" [exposure,pos="0.225,0.433"]
"Current Smoker" [exposure,pos="-1.593,-0.003"]
"HPV Status" [pos="-1.395,-0.442"]
"History of Smoking" [exposure,pos="-1.030,0.426"]
"Survival Probability" [outcome,pos="0.793,0.014"]
Age [pos="-0.366,-0.632"]
Sex [pos="0.393,-0.439"]
"Cancer Status" -> "Survival Probability"
"Current Smoker" -> "Cancer Status"
"Current Smoker" -> "Survival Probability"
"HPV Status" -> "Cancer Status"
"HPV Status" -> "Survival Probability"
"History of Smoking" -> "Cancer Status"
"History of Smoking" -> "Current Smoker"
"History of Smoking" -> "Survival Probability"
Age -> "Current Smoker"
Age -> "HPV Status"
Age -> "History of Smoking"
Age -> "Survival Probability"
Sex -> "Current Smoker"
Sex -> "HPV Status"
Sex -> "History of Smoking"
Sex -> "Survival Probability"
}
uniAge <- lm(Survival ~ agegroup, data=cleandat)
print(uniAge)
##
## Call:
## lm(formula = Survival ~ agegroup, data = cleandat)
##
## Coefficients:
## (Intercept) agegroup40-49 agegroup50-59 agegroup60-69 agegroup70+
## 80.42 -22.29 -14.88 -20.56 -22.59
uniSex <- lm(Survival ~ Sex, data=cleandat)
print(uniSex)
##
## Call:
## lm(formula = Survival ~ Sex, data = cleandat)
##
## Coefficients:
## (Intercept) SexMale
## 63.539 -1.293
uniStage <- lm(Survival ~ Stage, data=cleandat)
print(uniStage)
##
## Call:
## lm(formula = Survival ~ Stage, data = cleandat)
##
## Coefficients:
## (Intercept) Stageb Stagec Staged Stagee
## 30.76 47.82 33.29 33.32 18.10
multi <- lm(Survival ~ Sex + agegroup + Diag + Grade + Stage + SmokingHistory + CurrentSmoker + BMIstarttreat, data=cleandat)
library(gtsummary)
tbl_regression(multi)
| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| Sex | |||
| Female | — | — | |
| Male | -4.9 | -17, 7.0 | 0.4 |
| agegroup | |||
| 39 and under | — | — | |
| 40-49 | -30 | -54, -6.0 | 0.014 |
| 50-59 | -22 | -45, -0.16 | 0.048 |
| 60-69 | -23 | -46, -0.29 | 0.047 |
| 70+ | -25 | -50, -0.52 | 0.045 |
| Diag | |||
| CA alveolar ridge | — | — | |
| CA BOT | 0.79 | -54, 55 | >0.9 |
| CA buccal mucosa | -58 | -133, 17 | 0.13 |
| CA glossopharyngeal sulcus | -21 | -87, 44 | 0.5 |
| CA hypopharynx | -51 | -118, 16 | 0.14 |
| CA larynx | -16 | -74, 43 | 0.6 |
| CA maxillary sinus | -37 | -99, 26 | 0.2 |
| CA oral tongue | -32 | -90, 26 | 0.3 |
| CA oropharynx | -11 | -78, 56 | 0.8 |
| CA pharyngeal | -57 | -134, 19 | 0.14 |
| CA posteriot pharyngeal wall | 24 | -50, 99 | 0.5 |
| CA pyriform sinus | -33 | -91, 25 | 0.3 |
| CA retromolar trigone | 39 | -37, 116 | 0.3 |
| CA soft palate | -1.8 | -63, 60 | >0.9 |
| CA supraglottic | -19 | -74, 37 | 0.5 |
| CA tonsil | -0.39 | -55, 54 | >0.9 |
| CUP | -2.4 | -61, 56 | >0.9 |
| NPC | -41 | -99, 17 | 0.2 |
| recurrence CA retromolar trigone | -4.2 | -85, 77 | >0.9 |
| Grade | |||
| moderately diff. | — | — | |
| moderately to poorly diff. | -0.50 | -19, 18 | >0.9 |
| poorly diff. | 1.8 | -6.9, 11 | 0.7 |
| undiff. | -8.1 | -40, 24 | 0.6 |
| well diff. | 3.4 | -11, 17 | 0.6 |
| Well to moderately diff. | -1.0 | -34, 32 | >0.9 |
| Stage | |||
| a | — | — | |
| b | 36 | -4.4, 76 | 0.081 |
| c | 36 | 2.4, 70 | 0.036 |
| d | 30 | -2.9, 62 | 0.074 |
| e | 16 | -18, 51 | 0.4 |
| SmokingHistory | 0.08 | -5.3, 5.4 | >0.9 |
| CurrentSmoker | 0.47 | -10, 11 | >0.9 |
| BMIstarttreat | 0.63 | -0.07, 1.3 | 0.077 |
|
1
CI = Confidence Interval
|
|||
res.cox <- coxph(Surv(cleandat$Survival, cleandat$SmokingHistory) ~ Sex + agegroup + Diag + Grade + Stage + CurrentSmoker + BMIstarttreat, data = cleandat)
## Warning in Surv(cleandat$Survival, cleandat$SmokingHistory): Invalid status
## value, converted to NA
## Warning in fitter(X, Y, istrat, offset, init, control, weights = weights, : Ran
## out of iterations and did not converge
res.cox
## Call:
## coxph(formula = Surv(cleandat$Survival, cleandat$SmokingHistory) ~
## Sex + agegroup + Diag + Grade + Stage + CurrentSmoker + BMIstarttreat,
## data = cleandat)
##
## coef exp(coef) se(coef) z
## SexMale -4.264e-01 6.528e-01 2.797e-01 -1.524
## agegroup40-49 -1.162e-01 8.903e-01 2.842e-01 -0.409
## agegroup50-59 -6.018e-01 5.478e-01 1.993e-01 -3.020
## agegroup60-69 -5.123e-01 5.991e-01 1.963e-01 -2.610
## agegroup70+ 3.679e-01 1.445e+00 3.370e-01 1.092
## DiagCA BOT -8.597e-01 4.233e-01 2.016e-01 -4.265
## DiagCA buccal mucosa 4.466e+00 8.698e+01 1.155e+00 3.867
## DiagCA glossopharyngeal sulcus 1.046e+00 2.845e+00 1.014e+00 1.031
## DiagCA hypopharynx 1.628e+00 5.091e+00 1.037e+00 1.570
## DiagCA larynx -1.544e+00 2.135e-01 1.011e+00 -1.526
## DiagCA maxillary sinus 1.277e+00 3.586e+00 6.191e-01 2.063
## DiagCA oral tongue 1.318e+00 3.738e+00 5.362e-01 2.459
## DiagCA oropharynx 6.790e-01 1.972e+00 1.014e+00 0.670
## DiagCA pharyngeal -7.616e+00 4.925e-04 4.063e+02 -0.019
## DiagCA posteriot pharyngeal wall -1.954e+00 1.417e-01 1.011e+00 -1.933
## DiagCA pyriform sinus 1.178e+00 3.247e+00 4.785e-01 2.461
## DiagCA soft palate -2.850e+00 5.785e-02 1.015e+00 -2.808
## DiagCA supraglottic 6.821e-02 1.071e+00 3.551e-01 0.192
## DiagCA tonsil -9.666e-01 3.804e-01 2.176e-01 -4.442
## DiagCUP -1.953e+00 1.418e-01 7.410e-01 -2.636
## DiagNPC 1.539e+00 4.658e+00 7.319e-01 2.102
## Diagrecurrence CA retromolar trigone -8.300e-01 4.361e-01 1.021e+00 -0.813
## Grademoderately to poorly diff. -1.338e-01 8.747e-01 3.584e-01 -0.373
## Gradepoorly diff. 1.433e-01 1.154e+00 2.032e-01 0.705
## Gradeundiff. 1.775e+00 5.902e+00 7.234e-01 2.454
## Gradewell diff. 6.156e-01 1.851e+00 3.535e-01 1.741
## GradeWell to moderately diff. 1.780e+00 5.931e+00 7.219e-01 2.466
## Stageb -2.922e+00 5.381e-02 5.191e-01 -5.630
## Stagec -2.790e+00 6.142e-02 2.673e-01 -10.440
## Staged -2.400e+00 9.076e-02 2.105e-01 -11.398
## Stagee -1.624e+00 1.972e-01 3.583e-01 -4.531
## CurrentSmoker 4.176e-01 1.518e+00 1.965e-01 2.126
## BMIstarttreat -1.382e-02 9.863e-01 1.920e-02 -0.720
## p
## SexMale 0.12740
## agegroup40-49 0.68270
## agegroup50-59 0.00253
## agegroup60-69 0.00904
## agegroup70+ 0.27490
## DiagCA BOT 2.00e-05
## DiagCA buccal mucosa 0.00011
## DiagCA glossopharyngeal sulcus 0.30256
## DiagCA hypopharynx 0.11651
## DiagCA larynx 0.12692
## DiagCA maxillary sinus 0.03915
## DiagCA oral tongue 0.01393
## DiagCA oropharynx 0.50294
## DiagCA pharyngeal 0.98504
## DiagCA posteriot pharyngeal wall 0.05319
## DiagCA pyriform sinus 0.01384
## DiagCA soft palate 0.00498
## DiagCA supraglottic 0.84766
## DiagCA tonsil 8.92e-06
## DiagCUP 0.00839
## DiagNPC 0.03554
## Diagrecurrence CA retromolar trigone 0.41607
## Grademoderately to poorly diff. 0.70886
## Gradepoorly diff. 0.48069
## Gradeundiff. 0.01412
## Gradewell diff. 0.08161
## GradeWell to moderately diff. 0.01367
## Stageb 1.80e-08
## Stagec < 2e-16
## Staged < 2e-16
## Stagee 5.87e-06
## CurrentSmoker 0.03351
## BMIstarttreat 0.47158
##
## Likelihood ratio test=78.1 on 33 df, p=1.603e-05
## n= 136, number of events= 113
## (79 observations deleted due to missingness)
test.ph <- cox.zph(res.cox)
## Warning in Surv(cleandat$Survival, cleandat$SmokingHistory): Invalid status
## value, converted to NA
test.ph
## chisq df p
## Sex 8.980 1 0.0027
## agegroup 7.326 4 0.1196
## Diag 36.203 17 0.0043
## Grade 6.728 5 0.2417
## Stage 7.268 4 0.1224
## CurrentSmoker 4.100 1 0.0429
## BMIstarttreat 0.645 1 0.4219
## GLOBAL 62.419 33 0.0015
plot(cox.zph(res.cox, transform = "log"))
## Warning in Surv(cleandat$Survival, cleandat$SmokingHistory): Invalid status
## value, converted to NA
My hypothesis is that individuals with a history of smoking are more likely to have lower survival probability when compared to individuals without a history of smoking. After using a cox proportional hazards analysis, I was able to conclude statistically significant findings that individuals with a history of smoking and current smokers are more likely to have lower survival probability. My chi sq for history of smoking and current smokers are 62.419 and 4.100, respectively, with significant p-values of 0.0015 and 0.0429. Sex and diagnosis also showed to have significant impact on the relationship between exposure and outcome, with significant p values less than 0.05.