Assignment

In this homework, you will use your own data set to conduct a statistical analysis of your own choice. Make sure to meet the following criteria:

  1. Use Zelig as your estimation and simulation tool;
  2. Try to use outcome variables that are either count, time to event, nominal, or ordinal variables (try not to repeat your earlier linear or logit models and learn something new);
  3. Read the relevant documentation on the Zelig homepage;
  4. Make an argument and then support it with statistical evidence and use all the relevant techniques you have learned so far to make it as convincing as you can. This homework is due on next Sunday (November 4).

Introduction

Organ donation remains a serious medical concern since there is a general acute shortage of organ donors, and people in need of a new organ often remain on waiting lists for long periods of times before a suitable donated organ can be found. Blood type is an important factor when it comes to organ transplants. Blood type AB, for example, is known as a universal recipient, since they can accept donor organs from all other blood types, though AB donors are limited to AB recipients only. Conversely, blood type O is considered a universal donor since it lacks the antigens of the other types – organs from Type-O can be accepted by someone with any blood type, but Type-O recipients can only accept from Type-O donors. Therefore, one can reasonably assume that people with Type O blood types will have the longest wait times on the liver transplant list, regardless of final disposition. Likewise, we might expect type AB to have the shortest wait times on the liver transplant list. The exponential regression model allows censorship

Data

This dataset is part of the survival package, and describes subjects on a liver transplant waiting list from 1990-1999. There are 815 observations, and 6 variables: age at addition to the waiting list, sex, blood type, year they entered the waiting list, time from entering to waiting list, and final dispoisition. The options for disposition are: received a transplant, died while waiting, withdrew from the list, or are still waiting (censored).

The exponential regression model allows for censorship. In this model, the dependent variable is the number of days spent on the organ transplant waiting list, represented by futime. The variable event represents the final disposition of the patient, which has 4 categories including censored. Ultimately, we are interested in wait time regardless of outcome, so the died while waiting, withdrew from list, or received transplant can serve as a “no longer waiting” category, while still waiting refers to those in the ‘censored’ variable.

Results

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
library(Zelig)
library(ZeligChoice)
library(faraway)
library(survival)
data(transplant)
tibble::glimpse(transplant)
## Observations: 815
## Variables: 6
## $ age    <dbl> 47, 55, 52, 40, 70, 66, 41, 55, 50, 61, 58, 61, 38, 37,...
## $ sex    <fct> m, m, m, f, m, f, f, m, m, m, m, m, f, m, f, m, m, f, f...
## $ abo    <fct> B, A, B, O, O, O, A, O, A, A, A, A, O, O, B, B, O, A, A...
## $ year   <dbl> 1994, 1991, 1996, 1995, 1996, 1996, 1998, 1998, 1990, 1...
## $ futime <dbl> 1197, 28, 85, 231, 1271, 58, 392, 30, 12, 139, 9, 451, ...
## $ event  <fct> death, ltx, ltx, ltx, censored, ltx, ltx, ltx, ltx, ltx...
str(transplant)
## 'data.frame':    815 obs. of  6 variables:
##  $ age   : num  47 55 52 40 70 66 41 55 50 61 ...
##  $ sex   : Factor w/ 2 levels "m","f": 1 1 1 2 1 2 2 1 1 1 ...
##  $ abo   : Factor w/ 4 levels "A","B","AB","O": 2 1 2 4 4 4 1 4 1 1 ...
##  $ year  : num  1994 1991 1996 1995 1996 ...
##  $ futime: num  1197 28 85 231 1271 ...
##  $ event : Factor w/ 4 levels "censored","death",..: 2 3 3 3 1 3 3 3 3 3 ...
summary(transplant)
##       age        sex     abo           year          futime      
##  Min.   :17.00   m:447   A :325   Min.   :1990   Min.   :   0.0  
##  1st Qu.:44.00   f:368   B :103   1st Qu.:1993   1st Qu.:  50.0  
##  Median :52.00           AB: 41   Median :1995   Median : 128.0  
##  Mean   :50.52           O :346   Mean   :1995   Mean   : 213.6  
##  3rd Qu.:58.00                    3rd Qu.:1997   3rd Qu.: 276.5  
##  Max.   :72.00                    Max.   :1999   Max.   :2055.0  
##  NA's   :18                                                      
##       event    
##  censored: 76  
##  death   : 66  
##  ltx     :636  
##  withdraw: 37  
##                
##                
## 
transplant$evcode[transplant$event=="death"] <- "2"
transplant$evcode[transplant$event=="ltx"] <- "1"
transplant$evcode[transplant$event=="withdraw"] <- "1"
transplant$evcode[transplant$event=="censored"] <- "1"
transplant$evcode <- factor(transplant$evcode)
head(transplant)
##   age sex abo year futime    event evcode
## 1  47   m   B 1994   1197    death      2
## 2  55   m   A 1991     28      ltx      1
## 3  52   m   B 1996     85      ltx      1
## 4  40   f   O 1995    231      ltx      1
## 5  70   m   O 1996   1271 censored      1
## 6  66   f   O 1996     58      ltx      1
str(transplant)
## 'data.frame':    815 obs. of  7 variables:
##  $ age   : num  47 55 52 40 70 66 41 55 50 61 ...
##  $ sex   : Factor w/ 2 levels "m","f": 1 1 1 2 1 2 2 1 1 1 ...
##  $ abo   : Factor w/ 4 levels "A","B","AB","O": 2 1 2 4 4 4 1 4 1 1 ...
##  $ year  : num  1994 1991 1996 1995 1996 ...
##  $ futime: num  1197 28 85 231 1271 ...
##  $ event : Factor w/ 4 levels "censored","death",..: 2 3 3 3 1 3 3 3 3 3 ...
##  $ evcode: Factor w/ 2 levels "1","2": 2 1 1 1 1 1 1 1 1 1 ...
head(transplant)
##   age sex abo year futime    event evcode
## 1  47   m   B 1994   1197    death      2
## 2  55   m   A 1991     28      ltx      1
## 3  52   m   B 1996     85      ltx      1
## 4  40   f   O 1995    231      ltx      1
## 5  70   m   O 1996   1271 censored      1
## 6  66   f   O 1996     58      ltx      1
transplant <- transplant %>%
  mutate(futime = as.integer(futime+1)) %>% 
  mutate(evcode = as.integer(sjmisc::rec(evcode, rec = "2=0; 1=1"))) %>% 
  select(futime, evcode, everything())
## Warning: package 'bindrcpp' was built under R version 3.4.4
head(transplant)
##   futime evcode age sex abo year    event
## 1   1198      0  47   m   B 1994    death
## 2     29      1  55   m   A 1991      ltx
## 3     86      1  52   m   B 1996      ltx
## 4    232      1  40   f   O 1995      ltx
## 5   1272      1  70   m   O 1996 censored
## 6     59      1  66   f   O 1996      ltx

Recoding futime and new variable evcode as integers. Added the +1 in the futime mutation because my time value included a zero entry, which made the surv formula not work. I’m pretty sure there is a better solution out there for eliminating the observations with a 0 in the futime column, but this was a quick/cheap way to fix it that doesn’t affect the rest of the data too much.

z.exp <- zelig(Surv(futime, evcode) ~ age + sex + abo,
               model = "exp", data = transplant, cite = F)
summary(z.exp)
## Model: 
## 
## Call:
## z5$zelig(formula = Surv(futime, evcode) ~ age + sex + abo, data = transplant)
##                Value Std. Error      z         p
## (Intercept)  4.95314    0.20195 24.526 7.76e-133
## age          0.00457    0.00379  1.205  2.28e-01
## sexf        -0.07811    0.07450 -1.048  2.94e-01
## aboB         0.32649    0.11946  2.733  6.28e-03
## aboAB       -0.10573    0.17437 -0.606  5.44e-01
## aboO         0.56767    0.08177  6.942  3.86e-12
## 
## Scale fixed at 1 
## 
## Exponential distribution
## Loglik(model)= -4693.6   Loglik(intercept only)= -4721.1
##  Chisq= 54.95 on 5 degrees of freedom, p= 1.3e-10 
## Number of Newton-Raphson Iterations: 5 
## n=797 (18 observations deleted due to missingness)
## 
## Next step: Use 'setx' method
x.low <- setx(z.exp, abo = "AB")
x.high <- setx(z.exp, abo = "O")
s.out <- sim(z.exp, x = x.low, x1 = x.high)
summary(s.out)
## 
##  sim x :
##  -----
## ev
##      mean       sd      50%     2.5%    97.5%
## 1 162.362 27.91202 159.5818 114.0414 226.4184
## pv
##          mean       sd      50%     2.5%    97.5%
## [1,] 155.1781 150.6581 114.7674 3.715012 584.3541
## 
##  sim x1 :
##  -----
## ev
##       mean       sd      50%     2.5%    97.5%
## 1 315.3864 20.90255 313.9083 278.7506 360.2394
## pv
##          mean      sd      50%     2.5%    97.5%
## [1,] 318.0286 306.839 238.0688 6.408346 1133.942
## fd
##       mean       sd      50%     2.5%   97.5%
## 1 153.0244 33.49507 155.5817 82.22952 215.316
s.out$graph()

Looking at the expected value and predicted value of waiting list time between blood type AB and type O, we see a longer duration for type O. The mean wait time for type AB is about 147 days, while type O has a mean of 308 days – it’s close to double the duration!

s.out$get_qi(xvalue="x1",qi="fd")%>%
  data.frame()%>%
  summary()
##        X1        
##  Min.   : 12.53  
##  1st Qu.:133.53  
##  Median :155.58  
##  Mean   :153.02  
##  3rd Qu.:174.75  
##  Max.   :291.15

The graphs are truly the most impressive means of communicating this information, but the numbers also tell the story pretty well. Looking at the mean expected value and interquartile Range for first difference between type AB and O, we can see a pretty huge range. Blood type O seems to be a real disadvantage on the liver transplant waiting list, lingering on the list between 45 and 264 days longerb than type AB. To put it in greater perspective, the minimum is about a month and half, and the max is close to 9 months!

Discussion

The dataset documentation makes it clear that the way that people move up and down along the transplant list has changed a lot since the 90s, and now accounts for other factors like how sick a person is and how much they need the transplant for survival. The shortage of available organs for transplant is still quite a big deal, however, so it’s still important to educate people about organ donation – and if this data is anything to go by, the greatest impact would be to focus on encouraging people with blood type O to become organ donors.