Phenobarbital is a benzodiazepine that is used to prevent convulsions in newborns. Pediatricians wonder whether the pharmacokinetics of phenobarbital in newborns that are treated with therapeutic hypothermia (TH) are the same for normal (normo-thermic) newborns. They have concentration (mg/L) – time (h) data available from patients that were treated with TH (=1) and patients of similar age that are not treated with TH (=0). All patients received a dose of 40 mg in this study.
Notes:
Read in the dataset Phenobarbital_TH.csv and describe
the design of the study in terms of number of individuals, treatments in
the study arms and number and timing of PK samples. Show the table using
DT (https://rstudio.github.io/DT/).
NB: Don’t change the file; all changes should be done
with R code
Pheno <- read_csv("C:/Users/fabia/OneDrive/Documenten/Fabian/Universiteit Leiden/BFW 3/Introduction to computational thinking/Phenobarbital_TH.csv")
## Rows: 210 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (4): ID, TIME, DV, TH
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(DT)
datatable(Pheno, colnames = c("ID", "Time (h)", "Concentration (mg/L)", "Treatment"), filter = 'top')
The concentration of phenobarbital was measured in 30 patients over the course of 48 hours. Concentration measurements were taken after administration at the following seven time points (in hours) for patients with therapeutic hypothermia (TH): 0, 0.5, 2, 4, 8, 24 and 48. For patients that do not suffer from TH the concentrations were measured at the following seven time points (again in hours): 0, 1, 2, 6, 12, 24 and 48. Patients with TH are indicated with a 1 in the treatment column, those without TH are indicated with a 0.
How many unique individuals are included in the data set for both treatment arms?
Pheno %>% filter(TIME == 0) %>% group_by(TH) %>% count()
## # A tibble: 2 × 2
## # Groups: TH [2]
## TH n
## <dbl> <int>
## 1 0 15
## 2 1 15
To get only one value per individual, the time was filtered on 0, as all participants had a measurement at this time point. The output shows that 15 patients were included in both groups.
Are there any differences between the pharmacokinetic profiles of phenobarbital patients treated with TH and patients that are not treated with TH? Base your assessment on a plot of the concentration-time data that includes all patients and uses different colors for patients with and without TH. Do this on a linear scale and semi-logarithmic scale.
ggplot(Pheno, aes(x = TIME, y = DV, colour = as.factor(TH), group = as.factor(ID))) + geom_line() + scale_color_manual(values = c("black", "grey")) + labs(y = "Concentration (mg/L)", x = "Time (h)", colour = "TH")
ggplot(Pheno, aes(x = TIME, y = DV, colour = as.factor(TH), group = as.factor(ID))) + geom_line() + scale_color_manual(values = c("black", "grey")) + scale_y_log10() + labs(y = "log(Concentration (mg/L))", x = "Time (h)", colour = "TH")
## Warning: Transformation introduced infinite values in continuous y-axis
It seems like the patients treated for TH remain on a higher concentration of phenobarbital after the initial uptake (the clearance appears to be slower). This can be seen in the profiles in both graphs, the grey lines remains higher over time than the black lines. This trend can be seen in both the normal and semi-logarithmic graph.
Can you detect any outlying data points? Make sure that you can differentiate ID numbers and treatment arms in the visualization. Do this on a linear scale and semi-logarithmic scale.
Hint: Fancy statistical outlier analysis not needed, with the correct visualization you should be able to identify the outliers by eye (also don’t worry too much about getting the correct ones, this question and the next are more about the removal and traceability than the actual points, however, you should explain why you think the points you removed are outliers)
ggplot(Pheno, aes(x = TIME, y = DV, colour = as.factor(ID), group = as.factor(ID))) + geom_line() + facet_wrap(~as.factor(TH)) + labs(y = "Concentration (mg/L)", x = "Time (h)", colour = "ID")
ggplot(Pheno, aes(x = TIME, y = DV, colour = as.factor(ID), group = as.factor(ID))) + geom_line() + facet_wrap(~as.factor(TH)) + scale_y_log10() + labs(y = "log(Concentration (mg/L))", x = "Time (h)", colour = "ID")
## Warning: Transformation introduced infinite values in continuous y-axis
From the graphs, two individuals seem to be outliers. The first one is 2018135 in TH group 0, its peak is much higher than every other patient. It seems unrealisticly high, therefore it is considered as an outlier. The second outlier is 2017154 from TH group 1. Around 6 hours after the administration of phenobarbital it shows a peak, which is much later than all the other patients. Then, the concentration drops drastically. Therefore, this is also considered as an outlier.
Make a subset of the dataset that does not include the outlying data points and use that dataset for the following questions. Make sure that it is clear in your script what subset of the data you have used (or what data points you have taken out), because traceability is very important!
Pheno2 <- Pheno %>% filter (!ID == 2018135, !ID == 2017154)
The new dataset contains every patient, except for 2018135 and 2017154.
What are the average Cmax (peak concentration), average Cmin (trough concentration), and average Cav (average concentration) for both treatment arms? Does it make sense to compare these values between both groups given the study design?
Hint: You need to summarize the output from summarize. (First summarize is per patient, second is summarizing over the patients)
#mean of the maxes
Pheno2 %>% filter(TH == 0) %>% group_by(ID) %>% summarize(max1 = max(DV)) %>% summarize(mean0 = mean(max1))
## # A tibble: 1 × 1
## mean0
## <dbl>
## 1 41.6
Pheno2 %>% filter(TH == 1) %>% group_by(ID) %>% summarize(max2 = max(DV)) %>% summarize(mean1 = mean(max2))
## # A tibble: 1 × 1
## mean1
## <dbl>
## 1 42.2
#mean of the mins
Pheno2 %>% filter(TH == 0) %>% filter(!TIME == 0) %>% group_by(ID) %>% summarize(min1 = min(DV)) %>% summarize(mean0 = mean(min1))
## # A tibble: 1 × 1
## mean0
## <dbl>
## 1 1.02
Pheno2 %>% filter(TH == 1) %>% filter(!TIME == 0) %>% group_by(ID) %>% summarize(min2 = min(DV)) %>% summarize(mean1 = mean(min2))
## # A tibble: 1 × 1
## mean1
## <dbl>
## 1 5.28
#mean of the averages
Pheno2 %>% filter(TH == 0) %>% group_by(ID) %>% summarize(av1 = mean(DV)) %>% summarize(mean0 = mean(av1))
## # A tibble: 1 × 1
## mean0
## <dbl>
## 1 18.4
Pheno2 %>% filter(TH == 1) %>% group_by(ID) %>% summarize(av2 = mean(DV)) %>% summarize(mean1 = mean(av2))
## # A tibble: 1 × 1
## mean1
## <dbl>
## 1 22.6
Comparing the maximum values makes sense, as this represents the distribution volume of the groups. The other two are not very purposeful to compare, as you want to know what the rate of clearance. To test that you do not need one value, but need to know the degree of decrease of the concentration
The graph of concentration-time data reveals a linear profile on a semi-logarithmic scale. In order to perform a linear regression on the data, add a new column of (natural) log-transformed values of the concentrations to the data set.
Pheno3 <- Pheno2 %>% mutate(lnDV = log(DV))
Pheno3$TH <- as.factor(Pheno3$TH)
Pheno3$ID <- as.factor(Pheno3$ID)
Pheno3[Pheno3 == "-Inf"] <- NA
Because the computer cannot read -Inf values, this value is changed to NA.
Fit a linear model to the ln-transformed concentration data, using both time and treatment as covariates. Make models with and without interaction and explain which model is better.
The peak concentration is mainly determined by the distribution volume of phenobarbital in this population, while the slope of the profiles is mainly determined by the clearance. Do you anticipate differences in distribution volume and clearance between patients on TH treatment and patients that are not on TH treatment, given the outcome of the model fit?
Hint: Which model should you use for your anticipation?
Bonus: Using the data_grid() and
gather_predictions() functions from the modelr package, can
you show the difference between the (predictions of the) two models in a
plot?
summary(lm(lnDV~TIME+TH, data = Pheno3))
##
## Call:
## lm(formula = lnDV ~ TIME + TH, data = Pheno3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.98507 -0.25437 0.01175 0.25735 2.09440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.51008 0.06974 50.34 <2e-16 ***
## TIME -0.06637 0.00247 -26.87 <2e-16 ***
## TH1 0.42559 0.08247 5.16 7e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5342 on 165 degrees of freedom
## (28 observations deleted due to missingness)
## Multiple R-squared: 0.8213, Adjusted R-squared: 0.8191
## F-statistic: 379.2 on 2 and 165 DF, p-value: < 2.2e-16
summary(lm(lnDV~TIME*TH, data = Pheno3))
##
## Call:
## lm(formula = lnDV ~ TIME * TH, data = Pheno3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.02324 -0.16937 0.01345 0.18409 1.53444
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.783440 0.067585 55.980 < 2e-16 ***
## TIME -0.084006 0.002990 -28.093 < 2e-16 ***
## TH1 -0.088154 0.093511 -0.943 0.347
## TIME:TH1 0.034310 0.004171 8.226 5.66e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4508 on 164 degrees of freedom
## (28 observations deleted due to missingness)
## Multiple R-squared: 0.8735, Adjusted R-squared: 0.8712
## F-statistic: 377.5 on 3 and 164 DF, p-value: < 2.2e-16
The distribution volume is determined by the intercept. The model with interaction does not show a significant difference between TH0 and TH1, while the model without interaction does. The difference in clearance is determined by the interaction value between Time and TH1. This is a significant value, so there could be a difference between the clearance of the two groups.
Fit a linear model to the ln-transformed concentration data of all individual patients in both treatment arms and make a table that includes ID number, treatment arm and the concentration at time = 0 (time0).
Hints:
Pheno3_nest <-
Pheno3 %>%
group_by(ID) %>%
nest()
Pheno3_nest1 <- Pheno3_nest %>%
rowwise() %>%
mutate(model = list(lm(lnDV ~ TIME, data = data)))
Pheno3_nest2 <- Pheno3_nest1 %>%
rowwise() %>%
mutate(
coef_intercept = model$coefficients[[1]],
coef_x = model$coefficients[[2]])
Pheno3_nest3 <- Pheno3_nest2 %>% group_by(ID) %>% unnest(cols = data) %>% select(ID, TH, coef_intercept)
Pheno_mooi <- Pheno3_nest3[!duplicated(Pheno3_nest3), ]
datatable(Pheno_mooi)
The concentration at time zero, \(time0 \approx \frac{dose}{distribution volume}\). Assume that the distribution volume in the population is normally distributed. Can you determine whether there is a statistically significant difference in the distribution volume of phenobarbital for patients with TH treatment and without TH treatment?
Pheno_mooi2 <- Pheno_mooi %>% mutate(Time0 = 40/coef_intercept)
t.test(Pheno_mooi2$Time0[1:14], Pheno_mooi2$Time0[15:28])
##
## Welch Two Sample t-test
##
## data: Pheno_mooi2$Time0[1:14] and Pheno_mooi2$Time0[15:28]
## t = 1.4916, df = 25.997, p-value = 0.1478
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.09518002 0.59869751
## sample estimates:
## mean of x mean of y
## 10.84125 10.58949
To test the difference in the distribution volume, a t-test is performed on the Time0 of both groups. The p-value is higher than 0.05, so there is no statistical difference in distribution volume of patients with TH and without TH treatment.
In conclusion, the initial data seemed to show a significant difference in clearance of the drug, when looking at it in the graph and grouping it by TH group. However, there did not seem to be a difference in the distribution volume, as the peaks did not seem to be higher for the two different groups. After removing two outliers with concentration values that seemed too high, statistical tests were performed on the groups. Making a linear model out of the (log transformed) data makes it possible to compare the slope and the intercept. The intercept represents the distribution volume, the slope represents the clearance. The model without interaction shows a significant difference in intercept, but the model with does not. The difference in slope is significant, which would mean that the clearance between the groups is significant. When the intercept was tested with a t-test, it showed that there was not a significant difference in intercept. Therefore, it can be assumed that the distribution volume in the two patient groups is the same.