Exercise 3: Practicum II (Spring 2022)

Rudy Martinez

3/9/2022


Libraries

library (survival)
library(survminer)
library(tidyverse)

Question 1

You are analyzing a 16 day study of patients who have Krusty the Clown disease. Your data includes a subject ID, their survival time and a status. A status of 0 means they are alive while a status of 1 means they died of the disease.

    1. Construct a survival table similar to the one in the lecture for the data. You will show your work within the table – if there is some division like 1/5 then show that like 1/5=.2 and if there is some multiplication then show the numbers being multiplied and the result e.g. .1 * 4 =0.4
data = read.csv("/Users/rudymartinez/Desktop/MSDA/Data-Analytics-Practicum-II-Spring-22/Exercises/Exercise 3/Question 1_Data.csv", header = 1)

print(data[0:6])
##   Time At_Risk Num_Died Risk_of_Dying Prob_not_Dying              Survival
## 1    0      10        0          0/10          10/10                     1
## 2    1      10        1          1/10           9/10            9/10 = 0.9
## 3    4       9        1           1/9            8/9       0.9 * 8/9 = 0.8
## 4    5       8        1           1/8            7/8       0.8 * 7/8 = 0.7
## 5    7       6        1           1/6            5/6     0.7 * 5/6 = 0.583
## 6   12       3        1           1/3            2/3  0.583 * 2/3 = 0.3886
## 7   14       2        1           1/2            1/2 0.3886 * 1/2 = 0.1943


    1. Plot a survival curve for the probabilities you generate in part a.
x_axis = 0:6
y_axis = data %>% select(Survival.Total)

krusty_chart = data.frame(x_axis, y_axis)

ggplot(krusty_chart, aes(x=x_axis, y=Survival.Total)) + geom_line()


Question 2

2. You are studying three different new drugs that may help slow the progress of La Traviata disease which compels people to sing opera until they exhaust themselves and die. Do the following:

    1. Draw a survival plot that shows the survival curves for all three drugs.
survival_data = read.csv("/Users/rudymartinez/Desktop/MSDA/Data-Analytics-Practicum-II-Spring-22/Exercises/Exercise 3/Question 2_Data.csv", header = 1)

head(survival_data)
##   Group Time Event
## 1     1  681     0
## 2     1  602     0
## 3     1  996     0
## 4     1 1162     0
## 5     1  833     0
## 6     1  477     0
survival_model = survfit(Surv(Time,Event)~Group, data=survival_data)

ggsurvplot(survival_model, 
                           conf.int=FALSE, 
                           pval=FALSE, 
                           risk.table=FALSE, 
                           legend.labs=c("Group 1", "Group 2", "Group 3"), 
                           legend.title="Groups:",  
                           palette=c("steelblue", "grey", "black"), 
                           title="Kaplan-Meier Curves", 
                           risk.table.height=.20)


    1. Test to see if overall there is an effect of any of the drugs on survival taken as a global set.
survival_effect = survdiff(Surv(Time,Event)~Group, data=survival_data)
survival_effect
## Call:
## survdiff(formula = Surv(Time, Event) ~ Group, data = survival_data)
## 
##          N Observed Expected (O-E)^2/E (O-E)^2/V
## Group=1 38       24     12.3     11.07     13.66
## Group=2 54       25     46.0      9.58     22.50
## Group=3 45       34     24.7      3.51      5.04
## 
##  Chisq= 25.7  on 2 degrees of freedom, p= 3e-06

Null Hypothesis: There is not a significant difference between the three drug groups in terms of survivability.

Alternative Hypothesis: There is a survival differential between the three drug groups (there is a significant difference between the three drug groups in terms of survivability).

The Chi-Squared test statistic is 27.6 with 2 degree of freedom and the corresponding p-value is less than .05 (p= 3e-06). Therefore we reject the null hypothesis, and we conclude that there is a survival differential between the three drug groups (there is a significant difference between the three drug groups in terms of survivability).


    1. Compare the survival curves for each of the three drugs with each other (three comparisons) and see if any if the curves are different from each other. Note that you should be sure to adjust for multiple group comparisons.

Group 1 and 2: Group 1 shows a drastic decrease in Survival probability within the first 500 units of time and an abrupt stop just after unit 1000. On the contrary, Group 2 maintains a higher Survival probability for the full duration of the study in comparison to Group with a controlled and gradual decrease during this period.

Group 2 and 3: Although both groups maintain a probability of Survival for the duration of the study, Group 3 shows a much more abrupt and less controlled decrease in Survival probability within the first 1000 units of time. Group 2 also maintains a higher Survival probability in comparison to Group 3.

Group 3 and 1: Both groups exhibit an abrupt decrease in Survival probability within the first 500 units of time; however, Group 3 maintains a higher probability during this first segment. Between 500 and 1000 units of time, Group 1’s Survival Probability improves; however, this is cut short when Group 1 looks to have no more participants. At this point, Group 3 then maintains a long lasting Survival probability for the remainder of the study.