Primary billiary cirrhosis (PBC) is a fatal condition in which the bile ducts of the liver are destroyed. The bile ducts are necessary for proper digestion of fats, removal of damaged or old red blood cells, and detoxification. PBC leads to a buildup of harmful toxins in the body as well as extensive irreversible scarring to the liver. Current research considers PBC an autoimmune disease in which the body attacks itself. This data set comes from the Mayo Clinic. Between 1974 and 1984 a double-blind experiment compared the outcomes of a potential drug for treatment called D-penicillamine.
The data include 312 patients enrolled in the clinical trial during this decade. An additional 112 patients did not enroll in the trial, but did have some measurements recorded. This makes for 424 total cases. However, the 112 aforementioned patients do have quite a bit of data missing. 6 patients in this group did not follow up with the trial resulting in a final count of 418 cases, 312 randomized and 106 nonrandomized.
I personally chose to work with this data set because it is relevant to the research I perform. I work in a lab analyzing embryonic liver development, with a special focus on how our research can be applied to adult pathologies like non-alcoholic fatty liver disease, hepatocellular carcinoma, and cirrhoris.
The main objective of this experiment was to determine the time to death or liver transplant, whichever came first and to observe the effect that D-penicillamine may have on that time period. However, this time is not the only variable included. The patients each had extensive data recorded about their case. The variables can be viewed here: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/Cpbc.html. However, the variables I have chosen to work with are below:
library(foreign)
read.csv("pbcvariables.csv")
## Name Description Kind Missing
## 1 bili Serum Bilirubin (mg/dl) continuous 0
## 2 albumin Albumin (gm/dl) continuous 0
## 3 fu_days Time to Death or Liver Transplantation continuous 0
## 4 age Age continuous 0
## 5 drug Drug for treatment categorical 0
library(foreign)
setwd("/Users/abigailray/Desktop")
pbc <- read.dta("pbc (1).dta")
\newpage
redpbc <- subset(pbc, select = c(bili, albumin, sex, fu_days, age, drug))
attach(redpbc)
library(lattice)
bili_plot <- histogram(bili)
bili_plot
The plot of bilirubin exhibits a broad distribution of data points. However, most of the data appear below the 10 mg/dl level. The histogram for time to death or liver transplant shows a more normal distribution than bilirubin levels, however it is skewed to the left and most of the data exists below the 3000 day threshold.
library(lattice)
fu_days_plot <- histogram(fu_days)
fu_days_plot
print(bili_plot, position = c(0, 0, 0.5, 1), more = TRUE)
print(fu_days_plot, position = c(0.5, 0, 1, 1))
plot(redpbc)
library(ggplot2)
lm1 <- lm(fu_days ~ bili)
summary(lm1)
##
## Call:
## lm(formula = fu_days ~ bili)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1980.0 -752.2 -158.8 660.2 2733.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2243.87 61.32 36.595 <2e-16 ***
## bili -101.24 11.24 -9.007 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1012 on 416 degrees of freedom
## Multiple R-squared: 0.1632, Adjusted R-squared: 0.1612
## F-statistic: 81.12 on 1 and 416 DF, p-value: < 2.2e-16
plot(lm1)
ggplot(redpbc, aes(bili, fu_days)) + geom_point() + geom_smooth(method="lm", se=FALSE)
lm2 <- lm(fu_days ~ albumin)
summary(lm2)
##
## Call:
## lm(formula = fu_days ~ albumin)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2730.6 -700.2 -142.0 597.2 3165.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1999.0 405.2 -4.933 1.17e-06 ***
## albumin 1119.9 115.0 9.737 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 998.1 on 416 degrees of freedom
## Multiple R-squared: 0.1856, Adjusted R-squared: 0.1837
## F-statistic: 94.81 on 1 and 416 DF, p-value: < 2.2e-16
plot(lm2)
ggplot(redpbc, aes(albumin, fu_days)) + geom_point() + geom_smooth(method="lm", se=FALSE)
mlr1 <- lm(fu_days ~ bili + albumin)
summary(mlr1)
##
## Call:
## lm(formula = fu_days ~ bili + albumin)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2669.3 -683.5 -105.2 600.0 2996.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -907.22 418.04 -2.170 0.0306 *
## bili -74.69 11.11 -6.726 5.80e-11 ***
## albumin 876.52 115.18 7.610 1.86e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 948.9 on 415 degrees of freedom
## Multiple R-squared: 0.2657, Adjusted R-squared: 0.2621
## F-statistic: 75.07 on 2 and 415 DF, p-value: < 2.2e-16
All else held constant, patients that have 1 more mg/dl bilirubin tend to die or require a transplant about 74 days sooner. All else held constant, patients that have 1 more gm/dl albumin tend have 876 more days before dying or requiring a transplant. Patients with albumin and bilirubin levels of 0 are expected to die or need a transplant 907 days sooner.
Data provided by: Mayo Clinic Primary Biliary Cirrhosis Data From Fleming TR & Harrington DP (1991): Counting Processes & Survival Analysis. New York: Wiley; Appendix D; Courtesy Dr Terry Therneau of Mayo Clinic