library(bookdown)
library(minpack.lm)
library(readxl)
library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)
library(ggpubr)
library(tibble)
library(ggsci)
library(latex2exp)
library(AICcmodavg)
library(vtreat)
library(kableExtra)
library(broom)
# automatically create a bib database for R packages
knitr::write_bib(c(
.packages(), 'bookdown', 'knitr', 'rmarkdown'
), 'packages.bib')Figure 0.1: Statistical analysis roadmap.
An inevitable difficulty of the comparative analysis of variably parameterized binding models lies in the differences of the degrees of freedom that are linked to model complexity. For this reason, binding models were evaluated based on the Akaike Information Criterion (\(AIC\)) which takes into account goodness-of-fit as quantified by the \(RMSE\) of the residuals but which is capable of neutralizing complexity differences between binding models as determined by the number of parameter. The \(AIC\) approximates the information theoretic Kullback-Leibler (KL) distance which captures the information that is lost when a model is used to approximate the true data generating process. Hence, the KL divergence between a model and the data generating process can be used as a measure of performance for that particular model. In general, this approach can be thought of as a restricted regression. The idea is to evaluate the set of binding models by computing \(AIC\) scores and aligning the members on an ordinal scale with respect to the best describing top-ranked model based on \(AIC\) differences.
\[\begin{equation} AIC_c = n\,log\left(\frac{\sum\limits_{i=1}^n \hat{\epsilon}^2_i}{n}\right)+2k+\frac{2k(k+1)}{n-k-1} \tag{2.1} \end{equation}\]
\[\begin{equation} \Delta_i=AIC_{c,i}-AIC_{c,min} \tag{2.2} \end{equation}\]
\[\begin{equation} \omega_i=\frac{exp(-\frac{1}{2}\Delta_i)}{\sum\limits_{r=1}^R exp(-\frac{1}{2}\Delta_r)} \tag{2.3} \end{equation}\]
\[\begin{equation} \frac{\omega_i}{\omega_j} \tag{2.4} \end{equation}\]
Figure 2.1: Level of empirical support of binding model \(i\). Shown is a plausibility key expressed in terms of \(AIC_c\) differences (\(\Delta_i\)) as described by Anderson and Burnham (2004) based on empirical results. \(0 \leq \Delta_i \leq 2\): Substantial support for binding model \(i\). \(4 \leq \Delta_i \leq 7\): Considerable less support for binding model \(i\). \(\Delta_i > 10\): Essentially no support for binding model \(i\).
bm1to2.deg.add model (blue diamond) to be the KL best model in the set since it received the lowest \(AIC_c\) score amounting to -52.4bm1to2.deg.add model (equation (2.2))bm1to2.deg.add model (\(\Delta_i\) = 0) and all other competing models (figure 3.1B)bm1to2.deg.add modelbm1to2.deg.add model and the remaining models in the set were computedbm1to2.deg.add model received 70.8% of the total weight of evidence of being the actual KL best model considering all models in the setbm1to2.deg.add model was the only one found to have substantial support of being the KL best model since no other model received a \(\Delta_i\) value less than 2 (see plausibility key, 2.1)bm1to2.coop.add model is ranked the second best in the set having an \(AIC_c\) score of -49.0 corresponding to a \(\Delta_i\) value of 3.35 which accounts for a weight of evidence of 13.3%bm1to2.deg.add model is 5.3 times more likely to be the KL best model instead of the bm1to2.coop.add modelbm1to1 which got the highest \(AIC_c\) score (-46.2) and a weight of evidence of 3.19% of being the KL best model in the setbm1to2.deg.add model than by the bm1to1 model.####.......Loading.....#####
bm1to1_nls <- readRDS("Output/model1.rds")
bm1to2.deg.add_nls <- readRDS("Output/model2.rds")
bm1to2.deg_nls <- readRDS("Output/model3.rds")
bm1to2.coop.add_nls <- readRDS("Output/model4.rds")
bm1to2.coop_nls <- readRDS("Output/model5.rds")
fi <- readRDS("Output/fi.rds")
rmse <- readRDS("Output/rmse.rds")
####..........model performance analysis by aicc..............####
model.list <- list(bm1to1 = bm1to1_nls,
bm1to2.deg.add = bm1to2.deg.add_nls,
bm1to2.deg = bm1to2.deg_nls,
bm1to2.coop.add = bm1to2.coop.add_nls,
bm1to2.coop = bm1to2.coop_nls)
aicc_score <- model.list %>%
map(`[[`, 1) %>%
map_dbl(~ AICc(.x))
k <- model.list %>%
map(`[[`, 1) %>%
map_dbl(~ length(coef(.x)))
model.aicc <- tibble(model = names(model.list),
k = k +1,
RMSE = rmse,
aicc = aicc_score)
model.aicc <- model.aicc %>%
mutate(del_aicc = aicc - min(aicc),
aicc_wt = (exp(-del_aicc/2)/sum(exp(-del_aicc/2)))*100) %>%
arrange(del_aicc) %>%
mutate(cum_wt = cumsum(aicc_wt))
model.aicc.best <- model.aicc[1,]
model.aicc.reality <- model.aicc %>%
select(model, aicc)
model.aicc.reality[6,1] <- c("reality")
model.aicc.reality[6,2] <- c(-60)
only.reality <- model.aicc.reality[6,]
only.best.model <- model.aicc.reality %>%
filter(model == "bm1to2.deg.add")Figure 3.1: Ranking of binding models according to their AICc scores. A: AICc scores evaluating model performances in describing the ThiITm•\(tRNA^{Phe}_{Bs}\) titration profile are highlighted as radial grid lines in the radar plot. They represent approximated KL distances on an ordinal scale with respect to the true data-generating process (“reality”, red diamond). “Reality” represents the origin of the plot and is arbitrarily set to -60 for visualization purposes. Models were ranked based on the AICc scores with the bm1to2.deg.add model (blue diamond) being the top-ranked model which got the smallest AICc score (-52.4) and therefore best describes the binding curve. B: Binding models were further ranked based on AICc differences (\(\Delta_i\)) with respect to the top-ranked bm1to2.deg.add model (\(\Delta_i = 0\)) so that “reality” treated as a constant across all models is canceled out. Calculated \(\Delta_i\) values point towards the bm1to2.coop.add model as being the most competitive model (\(\Delta_i = 3.3\)) in the set with respect to the top-ranked bm1to2.deg.add model.
| \(Model\) | \(k\) | \(RMSE\) | \(AIC_c\) | \(\Delta_i\) | \(\omega_i\) | \(CUSUM\) |
|---|---|---|---|---|---|---|
| bm1to2.deg.add | 3 | 0.0205627 | -52.35595 | 0.000000 | 70.847168 | 70.84717 |
| bm1to2.coop.add | 4 | 0.0204736 | -49.01018 | 3.345772 | 13.298334 | 84.14550 |
| bm1to2.deg | 4 | 0.0216753 | -47.64134 | 4.714610 | 6.707457 | 90.85296 |
| bm1to2.coop | 5 | 0.0178691 | -47.40347 | 4.952481 | 5.955316 | 96.80828 |
| bm1to1 | 3 | 0.0266239 | -46.15603 | 6.199927 | 3.191725 | 100.00000 |
Model selection based on the \(AIC_c\) efficiently and reliably trades off goodness-of-fit and model complexity. The known over-parameterized global bm1to2.coop model, for instance, which was ranked in first place according to the \(RMSE\) of the residuals is ranked in fourth place when the \(AIC_c\) score is the basis. Conversely, the bm1to2.deg.add model formerly ranked in third place according to the \(RMSE\) is top-ranked based on the \(AIC_c\) score. Based on re-expression of \(AIC_c\) differences in terms of evidence weigths as well as evidence rations the relative strength of support for each of the binding models in the set could be assessed. The bm1to2.deg.add model is the actual KL best model that most accurately captures the ThiITm•tRNA\(\mathrm{^{Phe}_{Bs}}\) binding profile (\(\omega_i = 70.8 \%\)). It is currently \(\frac{\omega_{bm1to2.deg.add}}{\omega_{bm1to2.coop}} = 11.9\) times more likely that the data were generated by the bm1to2.deg.add model than by the global bm1to2.coop model. The bm1to2.coop.add model is the most competing model in the set (\(\Delta_i\) = 3.3). Currently, it might remain elusive whether the KL divergence between this and the top-ranked bm1to2.deg.add binding model, respectively, translates into a strength of evidence large enough to convincingly provide support for a single best data describing model in the set when the \(AIC_c\) alone is used as the basis for evaluation. With respect to the binding process this means that no distinction might yet be made whether tRNA\(\mathrm{^{Phe}_{Bs}}\) binding to ThiITm would be best described as being solely statistical or as being positively cooperative. However, valuable information about the underlying binding stoichiometry of the ThiITm•tRNA\(\mathrm{^{Phe}_{Bs}}\) complex could be gained based on consideration of the cumulative weights of evidence (\(CUSUM\)). All binding models ranked in places one to four comprise the bm1to2.deg.add, bm1to2.coop.add, bm1to2.deg, and the bm1to2.coop models, respectively. They unite a weight of evidence of 96.8% that the KL best model is among the two-site binding models as opposed to the one-site bm1to1 model having a weight of evidence lagging relatively far behind (3.2%).