Predictors (X): Homologous pre-vaccinated HAI (D0, composite), Age, Sex, BMI, year, year of participants, prior vaccination history, formula changed (Yes/No)
Individually
H1N1 formula change
fit_ind_H1N1chg <-lme(comp_post_homo ~ H1N1_fmlchg, random =~1|ID, data = mydata)summary(fit_ind_H1N1chg)$tTable
Outcome (Y): Composite heterologous post-vaccinated HAI (D21/28) = log2(H1N1_GMT of all heterologous strains) + log2(H3N2_GMT of all heterologous strains) + log2(Yama_GMT of all heterologous strains) + log2(Vict_GMT of all heterologous strains)
Predictors (X): Homologous pre-vaccinated HAI (D0, composite), Heterologous pre-vaccinated HAI (D0, composite), Age, Sex, BMI, year, year of participants, prior vaccination history, formula changed (composite; Yes/No)
Individually
H1N1 formula change
mydata_heter = mydata %>%drop_na(comp_post_heter, comp_pre_heter,BMI)fit_ind_H1N1chg <-lme(comp_post_heter ~ H1N1_fmlchg, random =~1|ID, data = mydata_heter)summary(fit_ind_H1N1chg)$tTable
To evaluate the performance of the methods in the prediction, the data was divided into two subsets, training and testing samples.
Because of the longitudinal nature of the data, the first 70% of observations related to each patient were considered as the training sample and the remainder were considered as the testing set.
Methods comparison
Linear mixed effects model
Piecewise linear mixed effects model (When a single line is not sufficient to model a dataset)
Tree-based methods : Mixed-effects regression trees and random forest
We evaluated the generalization performance of each model in the training and testing samples.
Criteria will be used to compare the performance:
mean squared error (MSE)
mean absolute error (MAE)
mean absolute prediction error (MAPE)
determination coefficient (R2)
Variable importance
We will use a permutation procedure with 100 iterations to specify the importance of each variable in predicting.
In the permutation procedure, one variable was permuted, and the others were fixed. The original MAE was obtained from the prediction in the original dataset.
Then each variable was permuted 100 times, and the new MAE was obtained from each permutation for each variable.
The mean of differences between the new and the original MAEs was considered as the importance criterion.
Vaccine Dynamic prediction – Methodology
Two papers I recently read.
Functional varying coefficient models for longitudinal data (JASA, 2010) (Şentürk and Müller 2010)
Brief Introduction
It provides a versatile and flexible analysis tool for relating longitudinal responses to longitudinal predictors.
Specifically, this approach provides a novel representation of varying coefficient functions through suitable auto and cross-covariances of the underlying stochastic processes, which is particularly advantageous for sparse and irregular designs.
The proposed approach extends the varying coefficient models to a more general setting, in which not only current but also recent past values of the predictor time course may have an impact on the current value of the response time course.
The influence of past predictor values is modeled by a smooth history index function, while the effects on the response are described by smooth varying coefficient functions.
Illustration
The proposed functional varying coefficient model
\[E\{Y(t)|X(t)\}=\beta_0(t)+\beta_1(t)\int^\Delta_0 \gamma(u)X(t-u)du\] for \(t \in [\Delta, T]\) with a suitable \(T>0\).
In this model, the current value of the response process \(Y(t)\) at time \(t\) depends on the recent history of the predictor process \(X\) in a sliding window of length \(\Delta\).
The history index function \(\gamma(\cdot)\) defines the history index factor at \(\beta_1(t)\), by quantifying the influence of the recent history of the predictor values on the response.
The varying coefficient function \(\beta_1(\cdot)\) represents the magnitude of this influence as a function of time.
Functions \(\gamma\), \(\beta_1\), and the intercept function \(\beta_0\) are assumed to be smooth.
An assumption implicit in this model is that the history index function \(\gamma\) itself does not change over time, leading to a clear separation of time effects encoded in \(\beta_1\) and history effects encoded in \(\gamma\), thus decomposing the functional regression of Y on X into these two one-dimensional component functions.
Second step:
Once \(\gamma(\cdot)\) has been estimated, the model reduces to a varying coefficient model. Note that even if \(\gamma(\cdot)\) is assumed to be known, obtaining the predictors of the reduced varying coefficient model, that is, \(\int^\Delta_0 \gamma(u)X(t-u)du\) may not be straightforward, due to the sparsity of the measurements for the predictor process in the history window \([t−\Delta,t]\), which renders numerical integration infeasible.
They use functional estimation tools to develop a novel procedure for this step. Some existing methods including polynomial spline, smoothing spline and local polynomial smoothing may suffer in the case of sparse designs.
The key for their functional approach is to target the covariance structure of \(X\) and cross-covariance structure of \(X\) and \(Y\); estimates of these covariance surfaces behave well even under sparse designs.
Nonparametic predictive model for sparse and irregular longitudinal data (Biometrics, 2024) (Wang et al. 2024)
Background
In the literature on function data analysis, many studies have poured attention on predicting the mean response profiles based on repeated measures of functional predictors.
Even though they improve flexibility of regression to functional predictors, the complexity of estimation increases substantially:
Estimation of smooth functions in the additive model
Choice of basis functions and knots
Regularization
The complexity may cause a convergence issue, especially in sparse and irregular longitudinal data.
As an alternative, they propose a novel nonparametric approach with a kernel-weighted estimator.
Brief Introduction
The core idea of the kernel regression that prediction of the response variable over time is weighted to reflect the similarity of the history of the predictors.
Due to the inherent difficulty of longitudinal data, where observations are often measured irregularly at different time points among subjects, measurement-wise distances between subjects cannot be employed directly. Instead, they adopt the \(L_2\) metric based on individual trajectories of predictors estimated by the functional principal component analysis to measure the similarity between histories of repeatedly measured predictors.
The methodology assumes that the more similar the predictors’ trajectories are to each other, the more likely their response trajectories travel in a similar fashion.
In order to deal with the curse of dimensionality in pure-nonparametric approaches caused by multiple predictors, they propose a multiplicative model with multivariate Gaussian kernel function.
This model is capable of achieving dimension reduction as well as selecting functional covariates with predictive significance.
References
Şentürk, Damla, and Hans-Georg Müller. 2010. “Functional Varying Coefficient Models for Longitudinal Data.”Journal of the American Statistical Association 105 (491): 1256–64. https://doi.org/10.1198/jasa.2010.tm09228.
Wang, Shixuan, Seonjin Kim, Hyunkeun Ryan Cho, and Won Chang. 2024. “Nonparametric Predictive Model for Sparse and Irregular Longitudinal Data.”Biometrics 80 (1). https://doi.org/10.1093/biomtc/ujad023.