Shen Lab Meeting 05/01

Vaccine Project – Application paper

New predictors

Whether or not the vaccine formula changed compared with last year (four separate covariates for four strains)

Explorary analysis

Implement linear mixed model with random intercept:

Only one predictor for each model.
Put all variables together into the model.

Homologous responses

Outcome (Y): Composite homologous post-vaccinated HAI (D21/28) = log2(H1N1) + log2(H3N2) + log2(Yama) + log2(Vict)
Predictors (X): Homologous pre-vaccinated HAI (D0, composite), Age, Sex, BMI, year, year of participants, prior vaccination history, formula changed (Yes/No)

Individually

H1N1 formula change

fit_ind_H1N1chg <- lme(comp_post_homo ~ H1N1_fmlchg, random = ~1|ID, data = mydata)
summary(fit_ind_H1N1chg)$tTable

                 Value Std.Error   DF    t-value    p-value
(Intercept) 26.2923683 0.2078873 1227 126.474116 0.00000000
H1N1_fmlchg -0.3374622 0.1756038 1227  -1.921725 0.05487177

H3N2 formula change

fit_ind_H3N2chg <- lme(comp_post_homo ~ H3N2_fmlchg, random = ~1|ID, data = mydata)
summary(fit_ind_H3N2chg)$tTable

                Value Std.Error   DF   t-value      p-value
(Intercept) 29.392967 0.2552670 1227 115.14599 0.000000e+00
H3N2_fmlchg -3.871537 0.2268083 1227 -17.06965 8.808298e-59

Victoria formula change

fit_ind_Victchg <- lme(comp_post_homo ~ Vict_fmlchg, random = ~1|ID, data = mydata)
summary(fit_ind_Victchg)$tTable

                Value Std.Error   DF    t-value      p-value
(Intercept) 26.581665 0.1827535 1227 145.450937 0.000000e+00
Vict_fmlchg -1.293879 0.1540816 1227  -8.397366 1.245756e-16

Multiple variables

fit_comp_homo <- lme(comp_post_homo ~ Age + Gender + BMI + comp_pre_homo  + years_of_part + history + H1N1_fmlchg + H3N2_fmlchg + Vict_fmlchg, random = ~1|ID, data = mydata_rm.na)
summary(fit_comp_homo)$tTable

                    Value   Std.Error   DF     t-value       p-value
(Intercept)   25.11512418 0.747143861 1211  33.6148438 1.563266e-175
Age           -0.07612985 0.005766635 1211 -13.2017817  2.814554e-37
GenderM       -0.24853585 0.221328583 1211  -1.1229270  2.616911e-01
BMI            0.05542235 0.015914997 1211   3.4823980  5.146557e-04
comp_pre_homo  0.37810821 0.020178820 1211  18.7378750  5.431599e-69
years_of_part -0.02779429 0.060997217 1211  -0.4556648  6.487127e-01
history       -3.21545175 0.208491894 1211 -15.4224305  3.932607e-49
H1N1_fmlchg   -1.19405298 0.184845030 1211  -6.4597516  1.515555e-10
H3N2_fmlchg   -2.28248164 0.240875156 1211  -9.4757869  1.340547e-20
Vict_fmlchg   -0.59613842 0.165223935 1211  -3.6080633  3.210674e-04

Individual effects for formular change

H1N1

fit_H1N1chg_homo <- lme(H1N1_post_homo ~ H1N1_fmlchg + H1N1_pre_homo, random = ~1|ID, data = mydata)
summary(fit_H1N1chg_homo)$tTable

                   Value  Std.Error   DF   t-value       p-value
(Intercept)    4.1418596 0.11818101 1226 35.046744 5.362546e-187
H1N1_fmlchg   -0.3342229 0.06415713 1226 -5.209443  2.220047e-07
H1N1_pre_homo  0.4971611 0.01908929 1226 26.043983 2.236341e-119

H3N2

fit_H3N2chg_homo <- lme(H3N2_post_homo ~ H3N2_fmlchg + H3N2_pre_homo, random = ~1|ID, data = mydata)
summary(fit_H3N2chg_homo)$tTable

                   Value  Std.Error   DF   t-value       p-value
(Intercept)    4.0919463 0.13967872 1226 29.295416 1.913594e-143
H3N2_fmlchg   -0.1756029 0.09125600 1226 -1.924289  5.454961e-02
H3N2_pre_homo  0.5297709 0.01756893 1226 30.153852 6.831319e-150

Vict

fit_Victchg_homo <- lme(Vict_post_homo ~ Vict_fmlchg + Vict_pre_homo, random = ~1|ID, data = mydata)
summary(fit_Victchg_homo)$tTable

                   Value  Std.Error   DF   t-value       p-value
(Intercept)    3.9928381 0.10952991 1226 36.454316 1.051694e-197
Vict_fmlchg   -0.3448097 0.05847964 1226 -5.896236  4.802501e-09
Vict_pre_homo  0.4840010 0.01951466 1226 24.801916 2.198458e-110

Heterologous responses

Outcome (Y): Composite heterologous post-vaccinated HAI (D21/28) = log2(H1N1_GMT of all heterologous strains) + log2(H3N2_GMT of all heterologous strains) + log2(Yama_GMT of all heterologous strains) + log2(Vict_GMT of all heterologous strains)
Predictors (X): Homologous pre-vaccinated HAI (D0, composite), Heterologous pre-vaccinated HAI (D0, composite), Age, Sex, BMI, year, year of participants, prior vaccination history, formula changed (composite; Yes/No)

Individually

H1N1 formula change

mydata_heter = mydata %>% drop_na(comp_post_heter, comp_pre_heter,BMI)
fit_ind_H1N1chg <- lme(comp_post_heter ~ H1N1_fmlchg, random = ~1|ID, data = mydata_heter)
summary(fit_ind_H1N1chg)$tTable

                Value Std.Error  DF   t-value      p-value
(Intercept) 21.620017 0.1930256 830 112.00597 0.000000e+00
H1N1_fmlchg -2.019703 0.1685105 830 -11.98562 1.213728e-30

H3N2 formula change

fit_ind_H3N2chg <- lme(comp_post_heter ~ H3N2_fmlchg, random = ~1|ID, data = mydata_heter)
summary(fit_ind_H3N2chg)$tTable

               Value Std.Error  DF   t-value      p-value
(Intercept) 22.60097 0.2494797 830  90.59242 0.000000e+00
H3N2_fmlchg -2.61195 0.2285381 830 -11.42895 3.393231e-28

Victoria formula change

fit_ind_Victchg <- lme(comp_post_heter ~ Vict_fmlchg, random = ~1|ID, data = mydata_heter)
summary(fit_ind_Victchg)$tTable

                Value Std.Error  DF    t-value      p-value
(Intercept) 20.830698 0.1918451 830 108.580805 0.000000e+00
Vict_fmlchg -0.731336 0.1720935 830  -4.249644 2.383835e-05

Multiple variables

fit_comp_heter <- lme(comp_post_heter ~ Age + Gender + BMI + comp_pre_homo  + comp_pre_heter + years_of_part + history + H1N1_fmlchg + H3N2_fmlchg + Vict_fmlchg, random = ~1|ID, data = mydata_heter)
summary(fit_comp_heter)$tTable

                     Value   Std.Error  DF    t-value       p-value
(Intercept)    16.66027439 0.667755610 821  24.949658 1.060853e-102
Age            -0.05901610 0.004713775 821 -12.519922  4.887145e-33
GenderM        -0.08117910 0.176158055 821  -0.460831  6.450419e-01
BMI             0.03286951 0.012549828 821   2.619120  8.978310e-03
comp_pre_homo  -0.15070708 0.027000183 821  -5.581706  3.236530e-08
comp_pre_heter  0.70772289 0.034960689 821  20.243391  3.155301e-74
years_of_part  -0.41829894 0.051054151 821  -8.193240  9.718452e-16
history        -1.81793200 0.182232158 821  -9.975912  3.356632e-22
H1N1_fmlchg    -1.10283040 0.161669349 821  -6.821518  1.747747e-11
H3N2_fmlchg    -0.81728000 0.204837797 821  -3.989889  7.201117e-05
Vict_fmlchg    -0.49113531 0.134770924 821  -3.644223  2.850616e-04

Individual effects for formular change

H1N1

fit_H1N1chg_heter <- lme(H1N1_post_heter ~ H1N1_fmlchg + H1N1_pre_heter, random = ~1|ID, data = mydata_heter)
summary(fit_H1N1chg_homo)$tTable

                   Value  Std.Error   DF   t-value       p-value
(Intercept)    4.1418596 0.11818101 1226 35.046744 5.362546e-187
H1N1_fmlchg   -0.3342229 0.06415713 1226 -5.209443  2.220047e-07
H1N1_pre_homo  0.4971611 0.01908929 1226 26.043983 2.236341e-119

H3N2

fit_H3N2chg_heter <- lme(H3N2_post_heter ~ H3N2_fmlchg + H3N2_pre_heter, random = ~1|ID, data = mydata_heter)
summary(fit_H3N2chg_heter)$tTable

                   Value  Std.Error  DF   t-value       p-value
(Intercept)    1.5090344 0.09929053 829 15.198170  3.355520e-46
H3N2_fmlchg    0.3582428 0.05921830 829  6.049529  2.197160e-09
H3N2_pre_heter 0.8243381 0.01624540 829 50.742854 1.748897e-256

Vict

fit_Victchg_heter <- lme(Vict_post_heter ~ Vict_fmlchg + Vict_pre_homo, random = ~1|ID, data = mydata_heter)
summary(fit_Victchg_heter)$tTable

                  Value  Std.Error  DF   t-value       p-value
(Intercept)   4.0589042 0.12405918 829 32.717485 2.078727e-151
Vict_fmlchg   0.3338928 0.06230179 829  5.359281  1.083603e-07
Vict_pre_homo 0.3593616 0.02154984 829 16.675835  4.667443e-54

Longitudinal prediction framework (Ongoing):

Training and Test split

To evaluate the performance of the methods in the prediction, the data was divided into two subsets, training and testing samples.
Because of the longitudinal nature of the data, the first 70% of observations related to each patient were considered as the training sample and the remainder were considered as the testing set.

Methods comparison

Linear mixed effects model
Piecewise linear mixed effects model (When a single line is not sufficient to model a dataset)
Tree-based methods : Mixed-effects regression trees and random forest
Support-vector machine incorporating mixed effects
Neural networks

Evaluation Criteria

We evaluated the generalization performance of each model in the training and testing samples.

Criteria will be used to compare the performance:

mean squared error (MSE)
mean absolute error (MAE)
mean absolute prediction error (MAPE)
determination coefficient (R2)

Variable importance

We will use a permutation procedure with 100 iterations to specify the importance of each variable in predicting.

In the permutation procedure, one variable was permuted, and the others were fixed. The original MAE was obtained from the prediction in the original dataset.
Then each variable was permuted 100 times, and the new MAE was obtained from each permutation for each variable.
The mean of differences between the new and the original MAEs was considered as the importance criterion.

Vaccine Dynamic prediction – Methodology

Two papers I recently read.

Functional varying coefficient models for longitudinal data (JASA, 2010) (Şentürk and Müller 2010)

Brief Introduction

It provides a versatile and flexible analysis tool for relating longitudinal responses to longitudinal predictors.
Specifically, this approach provides a novel representation of varying coefficient functions through suitable auto and cross-covariances of the underlying stochastic processes, which is particularly advantageous for sparse and irregular designs.
The proposed approach extends the varying coefficient models to a more general setting, in which not only current but also recent past values of the predictor time course may have an impact on the current value of the response time course.
The influence of past predictor values is modeled by a smooth history index function, while the effects on the response are described by smooth varying coefficient functions.

Illustration

The proposed functional varying coefficient model

\[E\{Y(t)|X(t)\}=\beta_0(t)+\beta_1(t)\int^\Delta_0 \gamma(u)X(t-u)du\] for \(t \in [\Delta, T]\) with a suitable \(T>0\).

In this model, the current value of the response process \(Y(t)\) at time \(t\) depends on the recent history of the predictor process \(X\) in a sliding window of length \(\Delta\).
The history index function \(\gamma(\cdot)\) defines the history index factor at \(\beta_1(t)\), by quantifying the influence of the recent history of the predictor values on the response.
The varying coefficient function \(\beta_1(\cdot)\) represents the magnitude of this influence as a function of time.
Functions \(\gamma\), \(\beta_1\), and the intercept function \(\beta_0\) are assumed to be smooth.
An assumption implicit in this model is that the history index function \(\gamma\) itself does not change over time, leading to a clear separation of time effects encoded in \(\beta_1\) and history effects encoded in \(\gamma\), thus decomposing the functional regression of Y on X into these two one-dimensional component functions.

Second step:

Once \(\gamma(\cdot)\) has been estimated, the model reduces to a varying coefficient model. Note that even if \(\gamma(\cdot)\) is assumed to be known, obtaining the predictors of the reduced varying coefficient model, that is, \(\int^\Delta_0 \gamma(u)X(t-u)du\) may not be straightforward, due to the sparsity of the measurements for the predictor process in the history window \([t−\Delta,t]\), which renders numerical integration infeasible.

They use functional estimation tools to develop a novel procedure for this step. Some existing methods including polynomial spline, smoothing spline and local polynomial smoothing may suffer in the case of sparse designs.
The key for their functional approach is to target the covariance structure of \(X\) and cross-covariance structure of \(X\) and \(Y\); estimates of these covariance surfaces behave well even under sparse designs.

Nonparametic predictive model for sparse and irregular longitudinal data (Biometrics, 2024) (Wang et al. 2024)

Background

In the literature on function data analysis, many studies have poured attention on predicting the mean response profiles based on repeated measures of functional predictors.

Even though they improve flexibility of regression to functional predictors, the complexity of estimation increases substantially:

Estimation of smooth functions in the additive model
Choice of basis functions and knots
Regularization

The complexity may cause a convergence issue, especially in sparse and irregular longitudinal data.

As an alternative, they propose a novel nonparametric approach with a kernel-weighted estimator.

Brief Introduction

The core idea of the kernel regression that prediction of the response variable over time is weighted to reflect the similarity of the history of the predictors.
Due to the inherent difficulty of longitudinal data, where observations are often measured irregularly at different time points among subjects, measurement-wise distances between subjects cannot be employed directly. Instead, they adopt the \(L_2\) metric based on individual trajectories of predictors estimated by the functional principal component analysis to measure the similarity between histories of repeatedly measured predictors.
The methodology assumes that the more similar the predictors’ trajectories are to each other, the more likely their response trajectories travel in a similar fashion.
In order to deal with the curse of dimensionality in pure-nonparametric approaches caused by multiple predictors, they propose a multiplicative model with multivariate Gaussian kernel function.
This model is capable of achieving dimension reduction as well as selecting functional covariates with predictive significance.

References

Şentürk, Damla, and Hans-Georg Müller. 2010. “Functional Varying Coefficient Models for Longitudinal Data.” Journal of the American Statistical Association 105 (491): 1256–64. https://doi.org/10.1198/jasa.2010.tm09228.

Wang, Shixuan, Seonjin Kim, Hyunkeun Ryan Cho, and Won Chang. 2024. “Nonparametric Predictive Model for Sparse and Irregular Longitudinal Data.” Biometrics 80 (1). https://doi.org/10.1093/biomtc/ujad023.