Data Description

Primary billiary cirrhosis (PBC) is a fatal condition in which the bile ducts of the liver are destroyed. The bile ducts are necessary for proper digestion of fats, removal of damaged or old red blood cells, and detoxification. PBC leads to a buildup of harmful toxins in the body as well as extensive irreversible scarring to the liver. Current research considers PBC an autoimmune disease in which the body attacks itself. This data set comes from the Mayo Clinic. Between 1974 and 1984 a double-blind experiment compared the outcomes of a potential drug for treatment called D-penicillamine.

The data include 312 patients enrolled in the clinical trial during this decade. An additional 112 patients did not enroll in the trial, but did have some measurements recorded. This makes for 424 total cases. However, the 112 aforementioned patients do have quite a bit of data missing. 6 patients in this group did not follow up with the trial resulting in a final count of 418 cases, 312 randomized and 106 nonrandomized.

I personally chose to work with this data set because it is relevant to the research I perform. I work in a lab analyzing embryonic liver development, with a special focus on how our research can be applied to adult pathologies like non-alcoholic fatty liver disease, hepatocellular carcinoma, and cirrhoris.

The main objective of this experiment was to determine the time to death or liver transplant, whichever came first and to observe the effect that D-penicillamine may have on that time period. However, this time is not the only variable included. The patients each had extensive data recorded about their case. The variables can be viewed here: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/Cpbc.html. However, the variables I have chosen to work with are below:

library(foreign)
read.csv("pbcvariables.csv")
##      Name                            Description        Kind Missing
## 1    bili                Serum Bilirubin (mg/dl)  continuous       0
## 2 albumin                        Albumin (gm/dl)  continuous       0
## 3 fu_days Time to Death or Liver Transplantation  continuous       0
## 4     age                                    Age  continuous       0
## 5    drug                     Drug for treatment categorical       0
library(foreign)
setwd("/Users/abigailray/Desktop")
pbc <- read.dta("pbc (1).dta")
\newpage

Univariate Distributions

redpbc <- subset(pbc, select = c(bili, albumin, sex, fu_days, age, drug))
attach(redpbc)
library(lattice)
bili_plot <- histogram(bili)
bili_plot

The plot of bilirubin exhibits a broad distribution of data points. However, most of the data appear below the 10 mg/dl level. The histogram for time to death or liver transplant shows a more normal distribution than bilirubin levels, however it is skewed to the left and most of the data exists below the 3000 day threshold.

library(lattice)
fu_days_plot <- histogram(fu_days)
fu_days_plot

print(bili_plot, position = c(0, 0, 0.5, 1), more = TRUE)
print(fu_days_plot, position = c(0.5, 0, 1, 1))

Pairs Plot

plot(redpbc)

Simple Linear Regression

library(ggplot2)
lm1 <- lm(fu_days ~ bili)
summary(lm1)
## 
## Call:
## lm(formula = fu_days ~ bili)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1980.0  -752.2  -158.8   660.2  2733.4 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2243.87      61.32  36.595   <2e-16 ***
## bili         -101.24      11.24  -9.007   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1012 on 416 degrees of freedom
## Multiple R-squared:  0.1632, Adjusted R-squared:  0.1612 
## F-statistic: 81.12 on 1 and 416 DF,  p-value: < 2.2e-16
plot(lm1)

ggplot(redpbc, aes(bili, fu_days)) + geom_point() + geom_smooth(method="lm", se=FALSE)

lm2 <- lm(fu_days ~ albumin)
summary(lm2)
## 
## Call:
## lm(formula = fu_days ~ albumin)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2730.6  -700.2  -142.0   597.2  3165.5 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1999.0      405.2  -4.933 1.17e-06 ***
## albumin       1119.9      115.0   9.737  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 998.1 on 416 degrees of freedom
## Multiple R-squared:  0.1856, Adjusted R-squared:  0.1837 
## F-statistic: 94.81 on 1 and 416 DF,  p-value: < 2.2e-16
plot(lm2)

ggplot(redpbc, aes(albumin, fu_days)) + geom_point() + geom_smooth(method="lm", se=FALSE)

\newpage

Multiple Linear Regression

mlr1 <- lm(fu_days ~ bili + albumin)
summary(mlr1)
## 
## Call:
## lm(formula = fu_days ~ bili + albumin)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2669.3  -683.5  -105.2   600.0  2996.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -907.22     418.04  -2.170   0.0306 *  
## bili          -74.69      11.11  -6.726 5.80e-11 ***
## albumin       876.52     115.18   7.610 1.86e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 948.9 on 415 degrees of freedom
## Multiple R-squared:  0.2657, Adjusted R-squared:  0.2621 
## F-statistic: 75.07 on 2 and 415 DF,  p-value: < 2.2e-16

All else held constant, patients that have 1 more mg/dl bilirubin tend to die or require a transplant about 74 days sooner. All else held constant, patients that have 1 more gm/dl albumin tend have 876 more days before dying or requiring a transplant. Patients with albumin and bilirubin levels of 0 are expected to die or need a transplant 907 days sooner.

Data provided by: Mayo Clinic Primary Biliary Cirrhosis Data From Fleming TR & Harrington DP (1991): Counting Processes & Survival Analysis. New York: Wiley; Appendix D; Courtesy Dr Terry Therneau of Mayo Clinic