Introduction

More effective treatment regimens for human immunodeficiency virus (HIV) in recent years has led to a greater number of virally suppressed cases in the United States. However, it remains that tuberculous (TB) is the leading cause of death among those living with HIV. Both HIV and TB are easily treated independently but treatment in conjunction by rifampicin, isoniazid, ethambutol, or pyrazinamide with combination antiretroviral therapy (cART) has potential to cause hepatotoxicity. Thus, given a high burden of mortality within the first two months of TB treatment among HIV patients, a randomized clinical trial was undertaken by Amogne et al. to assess the optimal time to initiate cART after TB treatment. Presented here is an attempt to reconstruct findings from the original publication, particularly Figure 2 and Table 2, to verify Kaplan-Meier survival and hazard ratio estimates.1

Methods

To reproduce both Figure 2 and Table 2, data from the supplement of the original publication was downloaded in the form of an Excel file. The file was read into R version 3.3.2 using the readxl package and column names were formatted by regular expression syntax and assigned using the magrittr package.2, 3, 4 Variables were coerced to appropriate types using the dplyr package.5 All syntax used for data manipulation is available on GitHub via the following link, https://github.com/schifferl/assignment3.

Survival as a function of time was estimated by the Kaplan-Meier product limit estimator as shown in Equation 1, whereby the probability of survival was given by the difference of the random variable \(T\) and \(t\), the observed survival at a given time. A right-continuous distribution of survival is given by the integral from \(t\) to infinity where \(f(x)\ dx\) represents the change in survival at a given time \(t\). The method was implemented in R using the survival package. 6, 7

\[S(t) = \Pr\{ T \ge t \} = 1 - F(t) = \int_t^\infty f(x)\ dx\]

Equation 1 - Survival as a Function of Time

Similarly, proportional hazards were estimated by the Cox proportional hazards model assuming only a semi-parametric distribution of baseline hazard within the population. That is, as the limit of survival went to zero, the hazard at time \(t\) was given by the probability that it was less or greater than \(T\), which represented the population distribution of hazard. Again, the method was implemented in R using the survival package and can be seen in Equation 2. 8, 9

\[h(t) = \lim_{s\rightarrow0} \frac{\Pr\{t \le T < t + s\ |\ T \ge t \} }{s}\]

Equation 2 - Proportional Hazard as a Function of Time

Finally, the survminer package was used to plot Kaplan-Meier curves and three custom functions using the knitr package were written to abstract the results of fitted objects and statistical tests into tables.10, 11, 12, 13

Results

It was possible to reproduce Figure 2 of the original publication and its corresponding risk set accurately and completely, as can be seen in Figure 1 below. Likewise, it was possible to replicate the chi-squared statistics for the log-rank test by week of randomization, as can be seen in Table 1 below. Among the risk strata, initiation of cART at week eight following TB therapy yielded the highest cumulative probability of survival, followed by initiation at week four, and finally with week one giving the lowest cumulative probability of survival. However, as indicated by the log-rank chi-squared test, the differences were not found to be statistically significant (P = 0.235).

Figure 1 - Survival by Week of Randomization

Chi-Squared Degrees of Freedom P-Value
2.895 2.000 0.235

Table 1 - Chi-Squared Statistics for Log-Rank Test by Week of Randomization

As for Table 2 of the original publication, in the absence of further methodological details concerning hepatotoxicity outcomes it was not possible to reproduce the findings accurately. Although it was possible to reconstruct Cox proportional hazards models concerning survival outcomes that were verified against the computations of another statistician. Of note, the univariate models by week of randomization and body mass index (BMI) did accurately reproduce the findings of the original publication; however, all other figures deviated significantly in comparison. Both univariate and multivariate analysis are shown below in Table 2 and Table 3, respectively.

Hazard Ratio 95% CI Lower Limit 95% CI Upper Limit P-Value
CD4_0 < 50 cells/μL 1.966 1.200 3.222 0.007
ALBUMIN < 3 g/dL 2.039 1.238 3.358 0.005
RANDOM week four 1.178 0.617 2.248 0.620
RANDOM week one 1.640 0.894 3.010 0.110
BMI 0.900 0.821 0.987 0.025

Table 2 - Univariate Analysis of Proportional Hazards

As relates to univariate analysis, reproduced findings were as follows. Baseline CD4 count less than 50 cells / µL was found to have a hazard ratio of 1.97 (95% CI 1.20 – 3.22; P < 0.01) as compared to baseline CD4 count of 51-199 cells / µL. Serum albumin less than 3 g / dL was found to have a hazard ratio of 2.04 (95% CI 1.24 – 3.36; P < 0.01) as compared to serum albumin greater than or equal to 3 g / dL. Concerning week of randomization, with week eight as the reference category, week four was found to have a hazard ratio of 1.18 (95% CI 0.62 – 2.25; P = 0.62) and week one was found to have a hazard ratio of 1.64 (95% CI 0.89 – 3.01; P = 0.11). Finally, BMI, computed as a continuous variable, was found to have a hazard ratio of 0.90 (95% CI 0.82 – 0.99; P = 0.03).

Hazard Ratio 95% CI Lower Limit 95% CI Upper Limit P-Value
CD4_0 < 50 cells/μL 1.690 1.023 2.792 0.041
ALBUMIN < 3 g/dL 1.749 1.045 2.926 0.033
RANDOM week four 1.196 0.625 2.288 0.588
RANDOM week one 1.667 0.901 3.084 0.104
BMI 0.918 0.835 1.010 0.079

Table 3 - Multivariate Analysis of Proportional Hazards

As relates to multivariate analysis, reproduced findings were as follows. Baseline CD4 count less than 50 cells / µL was found to have a hazard ratio of 1.69 (95% CI 1.02 – 2.79; P = 0.04) as compared to baseline CD4 count of 51-199 cells / µL. Serum albumin less than 3 g / dL was found to have a hazard ratio of 1.75 (95% CI 1.05 – 2.93; P = 0.03) as compared to serum albumin greater than or equal to 3 g / dL. Concerning week of randomization, with week eight as the reference category, week four was found to have a hazard ratio of 1.20 (95% CI 0.63 – 2.29; P = 0.59) and week one was found to have a hazard ratio of 1.67 (95% CI 0.90 – 3.08; P = 0.10). Finally, BMI, computed as a continuous variable, was found to have a hazard ratio of 0.92 (95% CI 0.84 – 1.01; P = 0.08).

Discussion

Through this reproduction it was found that results of the original publication were only ostensibly reproducible, suffering from a number of methodological shortcomings and syntactical issues. The Amogne et al. article would have done well to have included further information concerning the calculation of hepatotoxicity outcomes and how these related to the interruption of TB therapy. In the absence of such information, the hepatotoxicity outcomes had to be ignored and the mortality outcomes calculated in the relation to the other variables alone. A further barrier to the reproduction of the original findings was the formatting of the supplementary file, in which column names were written without concern for naming conventions or consistency. This represents an issue that is pervasive in research and indicates a need for further training of researchers in data science.

Regardless of issues impacting reproducibility, reproduction indicated similar findings concerning week of randomization, with later initiation of cART associated with higher survival. Thus, given a high burden of mortality within the first two months of TB treatment among HIV patients, these and the findings of Amogne et al. indicate that treatment of TB in HIV patients by rifampicin, isoniazid, ethambutol, or pyrazinamide is more safe and effective when cART is initiated after the first week of TB treatment. Even though the findings were not particularly strong, further study in the area should consider the burden of excess mortality that can be avoided by initiating cART after the first week of TB treatment.

References


  1. Amogne, W. et al. Efficacy and Safety of Antiretroviral Therapy Initiated One Week after Tuberculosis Therapy in Patients with CD4 Counts < 200 Cells/μL: TB-HAART Study, a Randomized Clinical Trial. PLoS ONE 10, e0122587 (2015)

  2. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  3. Hadley Wickham (2016). readxl: Read Excel Files. R package version 0.1.1. https://CRAN.R-project.org/package=readxl

  4. Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr

  5. Hadley Wickham and Romain Francois (2016). dplyr: A Grammar of Data Manipulation. R package version 0.5.0. https://CRAN.R-project.org/package=dplyr

  6. Therneau T (2015). A Package for Survival Analysis in S. version 2.38, https://CRAN.R-project.org/package=survival.

  7. Terry M. Therneau and Patricia M. Grambsch (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York. ISBN 0-387-98784-3.

  8. Therneau T (2015). A Package for Survival Analysis in S. version 2.38, https://CRAN.R-project.org/package=survival.

  9. Terry M. Therneau and Patricia M. Grambsch (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York. ISBN 0-387-98784-3.

  10. Alboukadel Kassambara and Marcin Kosinski (2016). survminer: Drawing Survival Curves using ‘ggplot2’. R package version 0.2.4. https://CRAN.R-project.org/package=survminer

  11. Yihui Xie (2016). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.15.1.

  12. Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963

  13. Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595