1 General Introduction

Two-level non-linear mixed effects models will be considered for both the exercises required for this project.

Using a two-stage approach, the general form of the first, deterministic stage of the model formulation would be:

\[{y_{ijk}} = f(\phi_{ijk}, v_{ijk}) + \epsilon_{ijk}, \\\]

Where, $i = 1,\dots ,M$, the number of levels for the first-level factor; $\quad j = 1,\dots ,M_i$, the number of levels for the second-level factor; $\quad k = 1,\dots ,n_{ij}$, the number of observations for each of the experimental conditions; $\quad \epsilon_{ijk} \sim N(0, \Delta) \\$, the residual errors. $\phi_{ijk}$ represents the parameter value in the non-linear deterministic function, and $v_{ijk}$ represents covariates that feature in the function.

The second stage of the model formulation considers the role of fixed and random effects on the parameters, $\phi_{ijk}$ in the deterministic model and has a general form described by:

\[\phi_{ijk} = {A_{ijk}}\beta + B_{i,jk}b_i + B_{ijk}b_{ij}, \\\]

where, $A_{ijk}$ is the design matrix for the fixed effects; $\beta$ the vector of estimated coefficients for the fixed effects; $B_{i,jk}$ the design matrix for the first level random effects; $b_i$ the vector of random effect estimates, with distribution $\sim N(0,\Psi_1)$; $B_{ijk}$ the design matrix for the second, nested level of random effects; and finally $b_{ij}$ the second, nested effect estimates with distribution $\sim N(0,\Psi_2)$.

For both exercises, the first stage models will be described in the respective introductions, while the final models’ second stage design matrix structures will be illustrated for selected experimental conditions in the respective conclusions.

2 Exercise 8.2 - dataset: “PBG”

2.1 Introduction

Initially the first stage deterministic profile of the response variable for this dataset is described by a four-parameter logistic model, illustrated below:

\[ \\ f(\phi_{i,j,k}, v_{i,j,k}) = \phi_1 + \frac{(\phi_2 - \phi_3)}{(1 + (exp((\phi_3 - dose)/\phi_4))}, \\ \] where $i = 1\dots,6; \quad j = 1,2; \quad k = 1 \dots 6$

and where:

$\phi_1$ = Horizontal asymptote on the left
$\phi_2$ = Horizontal asymptote on the right
$\phi_3$ = Value of input at inflection point
$\phi_4$ = Scale parameter on the input axis

In the final section of this exercise a 3-parameter logistic model will be considered.

2.2 Initial Exploration

An initial exploration of the dataset indicated a grouping with the change of blood pressure (deltaBP) as response against a single predictor variable, dose. The grouping structure of the supplied dataset was based on single level ( $deltaBP \sim dose | Rabbit$ ).

The dataset is balanced with 60 observations of 5 rabbits showing the changes in blood pressure for two different treatments each of which used six different doses.

Figure 1: Changes of Blood Pressure by Rabbit by Treatment

The plots shown in Figure 1 above support the logistic nature of the deterministic model, although since the origins all appear to be around zero it may be possible to reduce the parameterisation by using a three parameter logistic function, which will be considered in the final section of this exercise. The plots also appear to indicate not only a Rabbit effect, but also a general, systematic horizontal effect for Treatment.

In order to facilitate the analysis of a two-level procedure, the grouping structure was changed to a nested structure with Treatment nested within Rabbit. It was also considered prudent to change the ordered nature of the Rabbit variable to avoid possible complications.

An initial non linear least squares without random effects was fitted to the data, using the self-starting R formula SSfpl(). The summary of the model indicates that the horizontal asymptote A ($\phi_1$) may not be significant, further supporting the potential of a 3-parameter model. Also, residual plots shown in Figure 2 below appear to indicate that both Treatment and Rabbit may well require random effects.

## 
## Formula: deltaBP ~ SSfpl(log(dose), A, B, xmid, scal)
## 
## Parameters:
##      Estimate Std. Error t value Pr(>|t|)    
## A      1.5912     1.3316   1.195  0.23712    
## B     27.7954     2.7241  10.203 2.19e-14 ***
## xmid   4.1214     0.1389  29.675  < 2e-16 ***
## scal   0.3642     0.1219   2.988  0.00416 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.598 on 56 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 1.361e-07

Figure 2: Plots of Residuals vs Fitted

In summary: the initial exploration indicates that a logistic (possibly only a 3-parameter model) may be appropriate to model the first stage of the procedure. The second stage would most likely require random effects for both Rabbit and for Treatment nested within Rabbit.

2.3 a)

In this section seperate 4-parameter non-linear least squares logistic models were fit to the two-factor-level data with deltaBP as outcome variable and log(dose) as the only predictor variable.

2.4 b)

In this section the confidence intervals on the coefficients from the seperate models are considered in Figure 3 below.

From the plots, it would appear that random effects for A and scal ($\phi_1$ and $\phi_4$) may not be needed, while the distribution of the intervals for B & xmid ($\phi_2$ and $\phi_3$) would indicate the opposite.

Furthermore, although the failure of two of the models to converge creates some uncertainty, it does look as if there is a systematic pattern for Treatment on the xmid ($\phi_3$) parameter at least for rabbits 1,2 & 4.

From the pairs plots in Figure 4 below it does not look as if there are strong correlations between the parameters.

Finally, from the plots in Figure 5 below it would appear as if the discrepencies in residuals for both Treatment and Rabbit are successfully dealt with in the seperation of models.

Figure 3: Confidence Intervals for Parameter Estimates

$**Figure 4:** Correlations of Parameters$

Figure 4: Correlations of Parameters

Figure 5: Adjusted Residual Plots

2.5 c)

In this section a non-linear mixed effects model was fit as per the requirements of the exercise, namely with a fixed effect for Treatment on the xmid ($\phi_3$) parameter and random effects for Rabbit on the B ($\phi_2$) parameter and for Treatment within Rabbit on the B as well as the xmid parameters ($\phi_2$ & $\phi_3$). Also, the model was initially fit with a diagonal $\Psi_2$ matrix before considering a general positive-definite alternative.

From the anova results, shown below, in which the first model represented the model with the diagonal $\Psi_2$ matrix and the second the general positive-definite alternative, the first - more parsimonious, diagonal - model will be the preferred one.

##           Model df      AIC      BIC    logLik   Test   L.Ratio p-value
## pbg_nlme      1  9 290.8261 308.8921 -136.4131                         
## pbg_nlme2     2 10 292.2766 312.3499 -136.1383 1 vs 2 0.5495725  0.4585

2.6 d)

In this section the preferred model’s summary was considered and indicates that treatment on xmid ($\phi_3$) appears to be significant. This is confirmed in the confidence intervals printed below.

##                            Value  Std.Error DF   t-value      p-value
## A                      1.6689747 0.30559707 46  5.461357 1.842263e-06
## B                     28.1729016 2.48622318 46 11.331606 6.592674e-15
## xmid.(Intercept)       4.5386405 0.09706396 46 46.759274 1.968217e-40
## xmid.TreatmentPlacebo -0.7616943 0.13456456 46 -5.660438 9.319719e-07
## scal                   0.2702343 0.02262587 46 11.943598 1.069965e-15

##                            lower       est.      upper
## A                      1.0538397  1.6689747  2.2841097
## B                     23.1683939 28.1729016 33.1774093
## xmid.(Intercept)       4.3432608  4.5386405  4.7340201
## xmid.TreatmentPlacebo -1.0325587 -0.7616943 -0.4908299
## scal                   0.2246908  0.2702343  0.3157778
## attr(,"label")
## [1] "Fixed effects:"

2.7 e)

In this section augmented predictions are plotted according to the requirement of the exercise in Figure 6 below. These show reasonably good fits to the observed data, indicating support for the existing model.

Figure 6: Prediction vs Observed

2.8 f)

As part of the model diagnostics, a plot of standardised residuals against fitted values was considered and illustrated in Figure 7 below. Although its difficult to be certain, there does appear to be some cause for concern with a larger variation at higher fitted values. An updated model including weights in proportion to the fitted values was applied.

Figure 7: Considering Constant Variance

With added adjustment of the error term, the distribution of variances appears to be improved as seen in Figure 8 below. This is confirmed in the analysis of variance comparison with lower AIC and BIC values shown below also.

Figure 8: Residual Plot with VarPower

##           Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## pbg_nlme      1  9 290.8261 308.8921 -136.4131                        
## pbg_nlme3     2 10 270.4750 290.5483 -125.2375 1 vs 2 22.35117  <.0001

Considering random effects against other covariates in Figure 9 does not appear to suggest the inclusion of any additional covariates in the model’s fixed effects.

Figure 9: Plot of Random Effects vs Covariates

2.9 g)

In this section a 3-parameter model was considered and compared with the existing 4-parameter version that was developed above.

In this case the first stage deterministic profile of the response will be described by a 3-parameter logistic model, illustrated below:

\[ \\ f(\phi_{i,j,k}, v_{i,j,k}) = \frac{\phi_1}{(1 + exp((\phi_2 - dose)/\phi_3)}, \\ \] where, as above, $i = 1\dots,6; \quad j = 1,2; \quad k = 1 \dots 6$

but where:

$\phi_1$ = The asymtote (Asym)
$\phi_2$ = Value of input at inflection point (xmid)
$\phi_3$ = Scale parameter on the input axis (scal)

Initially seperate non-linear models were considered in order to review the parameters that required random effect and to generate appropriate starting values for the mixed effects modelling.

The intervals shown in Figure 10 below confirm Asym ($\phi_1$) and xmid ($\phi_2$) as requiring random effects, although only two of the non-placebo conditions converged and may be cause for some concern.

Figure 10: Intervals for Parameter Coefficients

A non-linear mixed effect model with similar fixed and random effects as well as error weighting as in the model above was developed. The summary output below as well as the augmented predictions plots shown in Figure 11 did not provide cause for concern. In fact it’s difficult to distinguish between the two augmented prediction plots. It would therefore make sense to use the 3-parameter logistic function as a final, preferred model.

## Nonlinear mixed-effects model fit by REML
##   Model: deltaBP ~ SSlogis(log(dose), Asym, xmid, scal) 
##  Data: PBG_nu 
##        AIC      BIC    logLik
##   307.2692 325.4974 -144.6346
## 
## Random effects:
##  Formula: Asym ~ 1 | Rabbit
##             Asym
## StdDev: 5.004783
## 
##  Formula: list(Asym ~ 1, xmid ~ 1)
##  Level: Treatment %in% Rabbit
##  Structure: Diagonal
##             Asym xmid.(Intercept) Residual
## StdDev: 3.107361        0.1978429 1.869293
## 
## Variance function:
##  Structure: Power of variance covariate
##  Formula: ~fitted(.) 
##  Parameter estimates:
##     power 
## 0.0549134 
## Fixed effects: list(Asym ~ 1, xmid ~ Treatment, scal ~ 1) 
##                           Value Std.Error DF  t-value p-value
## Asym                  29.067986 2.6488309 47 10.97389       0
## xmid.(Intercept)       4.515351 0.1089473 47 41.44527       0
## xmid.TreatmentPlacebo -0.782796 0.1448044 47 -5.40588       0
## scal                   0.348934 0.0320544 47 10.88568       0
##  Correlation: 
##                       Asym   xm.(I) xmd.TP
## xmid.(Intercept)       0.159              
## xmid.TreatmentPlacebo -0.049 -0.701       
## scal                   0.249  0.271 -0.071
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.3754749 -0.1280988  0.3080933  0.8012587  2.2681759 
## 
## Number of Observations: 60
## Number of Groups: 
##                Rabbit Treatment %in% Rabbit 
##                     5                    10

Figure 11: Predictions vs Observed

2.10 Conclusion

In conclusion, the details of the second stage formulation for the final, preferred model will be described.

Considering Rabbit = 1 and Treatment = 1, the design matrices for this final 3-parameter model would look as in the expression below:

\[ \mathbf{\phi_{1,1}} = \left[\begin{array} {rrr} \phi_{1,1,1} \\ \phi_{1,1,2} \\ \phi_{1,1,3} \\ \end{array}\right] = \left[\begin{array} {rrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{array}\right] \left[\begin{array} {rrr} \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \\ \end{array}\right] + \left[\begin{array} {rrr} 1 \\ 0 \\ 0 \\ \end{array}\right] \left[\begin{array} {rrr} b_1 \end{array}\right] + \left[\begin{array} {rrr} 1 & 0 \\ 0 & 1\\ 0 & 0\\ \end{array}\right] \left[\begin{array} {rrr} b_{11,1} \\ b_{11,2} \end{array}\right] \]

In this case: \[\Psi_1 = 5.005^2\] and

\[\Psi_2 = \left[\begin{array} {rrr} 3.2^2 & 0 \\ 0 & 0.2^2\\ \end{array}\right]\]

while the distribution of the error term $\epsilon_{ijk}$ in the model is given by

\[\Delta \sim N(0,E[y_{ij}/\phi_i]^\theta)\]

3 Exercise 8.3 - dataset: “Glucose2”

3.1 Introduction

In this section the first stage deterministic part of the model development was provided by the exercise and is shown below.

\[f(\phi_{i,j,k}, v_{i,j,k}) = \phi_1 + \phi_2Time^3exp(-\phi_3Time)\] where $i = 1\dots,7; \quad j = 1,2; \quad k = 1 \dots 14$

This equation is used to create an R-function that will be used in the model developments below.

3.2 Initial Exploration

An initial exploration of the dataset indicated a grouped object with glucose concentration (glucose) as a response variable on Time. The grouping structure of the dataset already contained a nesting of second order factor Date within the first order factor Subject. The grouping structure: $glucose \sim Time | Subject/Date$ .

The dataset is balanced with 196 observations of 7 subjects showing glucose measurements for two different days with Date = 2 representing the measurements with a food additive and measured at 14 different time slots.

The plots shown Figure 12 below show similar increases in concentration followed by gradual decreases. Although they all exhibit much the same general pattern the differences between subjects and days within subjects certainly appear to vary quite considerably. The profile for subject 6 on day 1 appears to be an anomoly.

Figure 12: Plots of Time vs Glucose Concentration

3.3 a)

In this section the data needs to be displayed at the Subject level and these plots need to be considered for systematic differences. From the plots in Figure 13 below it is difficult to confirm systematic differences, although in four out of the seven Subjects it would appear as if the second day (ie., with the food additive) results in lower peaks with somewhat slower increases of absorption, but with faster declines and lower levels of dispersion.

Figure 13: Glucose Concentration vs Time by Subject and Date

3.4 b)

In this section seperate non-linear least squares models will be considered for each Subject and also for each Date within Subject, utilising given starting values. The confidence intervals for the three parameters will be considered.

From the plots in Figure 14 below it would appear as if $\phi_1$ and $\phi_2$ vary significantly while $\phi_3$ does not. In both cases it would appear that values for $\phi_1$ and $\phi_2$ may benefit from the adding of random effects. The intervals for $\phi_3$ do not appear to support this.

An additional plot shown in Figure 15 to consider possible correlation between the parameters does show what could indicate a positive linear relationship between $\phi_3$ and $\phi_2$.

Figure 15: Parameter Scatterplots

3.5 c)

In this section a two-level non-linear mixed effects model will be considered, utilising starting values provided by the object created for seperate models that were considered in (b).

The summary of the model, printed below, does show the expected positive linear relationship between $\phi_3$ and $\phi_2$, but at 0.56 it is only moderate.

The perfect correlation between the random effects’ standard deviation estimates between $\phi_1$ and $\phi_2$ in the second level of random effects estimation does indicate that a diagonal matrix may result in an improved model. The very wide confidence intervals (essentially between -1 and + 1), also shown below, supports this notion.

## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: glucose ~ g_function(Time, phi1, phi2, phi3) 
##  Data: Glucose2 
##        AIC      BIC    logLik
##   398.8875 431.6686 -189.4437
## 
## Random effects:
##  Formula: list(phi1 ~ 1, phi2 ~ 1)
##  Level: Subject
##  Structure: General positive-definite, Log-Cholesky parametrization
##      StdDev     Corr  
## phi1 0.26239722 phi1  
## phi2 0.05986605 -0.148
## 
##  Formula: list(phi1 ~ 1, phi2 ~ 1)
##  Level: Date %in% Subject
##  Structure: General positive-definite, Log-Cholesky parametrization
##          StdDev     Corr
## phi1     0.09205164 phi1
## phi2     0.12276276 1   
## Residual 0.57120972     
## 
## Fixed effects: list(phi1 ~ 1, phi2 ~ 1, phi3 ~ 1) 
##         Value  Std.Error  DF  t-value p-value
## phi1 3.661504 0.11602056 180 31.55910       0
## phi2 0.428246 0.05301183 180  8.07831       0
## phi3 0.589642 0.01386016 180 42.54219       0
##  Correlation: 
##      phi1  phi2 
## phi2 0.100      
## phi3 0.174 0.567
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.80924346 -0.50336389 -0.06286356  0.46613005  2.66997995 
## 
## Number of Observations: 196
## Number of Groups: 
##           Subject Date %in% Subject 
##                 7                14

## Approximate 95% confidence intervals
## 
##  Random Effects:
##   Level: Subject 
##                       lower        est.     upper
## sd(phi1)        0.131340410  0.26239722 0.5242279
## sd(phi2)        0.005554645  0.05986605 0.6452156
## cor(phi1,phi2) -0.963488646 -0.14798382 0.9346890
##   Level: Date 
##                      lower       est.     upper
## sd(phi1)        0.01807350 0.09205164 0.4688358
## sd(phi2)        0.06253253 0.12276276 0.2410057
## cor(phi1,phi2) -1.00000000 0.99960772 1.0000000
## 
##  Within-group standard error:
##     lower      est.     upper 
## 0.5141497 0.5712097 0.6346023

3.6 d)

In this section the finding from section (c) will be implemented using diagonal matrices for both grouping levels. An analysis of variance, shown below, comparing the two models suggest the model with diagonal matrices for both levels of random effects would be the preferred model.

##         Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## g_nlme      1 10 398.8875 431.6686 -189.4437                        
## g_nlme2     2  8 396.7356 422.9606 -190.3678 1 vs 2 1.848193  0.3969

An exploration of the confidence intervals of the random effects suggest that with the very broad interval (around 6 units) and near zero lower bounds, a model with only $\phi_1$ at the first Subject level and with only $\phi_2$ at the second, Date-within-Subject level, may be an improvement.

## Approximate 95% confidence intervals
## 
##  Fixed effects:
##          lower      est.     upper
## phi1 3.4297950 3.6586184 3.8874417
## phi2 0.3207326 0.4237908 0.5268490
## phi3 0.5607373 0.5876033 0.6144694
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: Subject 
##                 lower       est.     upper
## sd(phi1) 0.1370823640 0.26854930 0.5260978
## sd(phi2) 0.0002830917 0.04221404 6.2948687
##   Level: Date 
##                 lower       est.     upper
## sd(phi1) 0.0007296575 0.06662523 6.0835692
## sd(phi2) 0.0716967383 0.13624615 0.2589101
## 
##  Within-group standard error:
##     lower      est.     upper 
## 0.5152563 0.5732428 0.6377550

In order to consider this question exhaustively various alternative models were considered resulting in a “best” option model, g_nlme5, with random effect at the Subject level for $\phi_1$ and at the Date-within- Subject level for $\phi_2$ - as was predicted from considering the confidence intervals above. Only the final analysis of variance is printed out below.

##         Model df      AIC      BIC    logLik   Test   L.Ratio p-value
## g_nlme2     1  8 396.7356 422.9606 -190.3678                         
## g_nlme5     2  6 392.8488 412.5175 -190.4244 1 vs 2 0.1131936   0.945

Plotting standardised residuals againt Time for the current preferred model as shown in Figure 16 indicate what appear to be some cyclical variation over time, although this is not very obvious and will not be considered in this exercise.

Figure 16: Residuals vs Time

Furthermore, considering standardised residuals against the fitted values shown in Figure 17 below also gives some cause for concern about the assumption of homoscedasticity.

Figure 17: Residuals against Fitted Values

3.7 e)

In this section a semivariogram will be plotted to consider the possibility of autocorrelated data.

The plot in Figure 18 does indicate an increase in variance over a short distance. Although the difference between the nugget and the sill is relatively small, as is the overall size of the range. Therefore, although this plot does provide some evidence of auto-correlation in the dataset, this autocorrelation appears to apply only to data at small distances.

Figure 18: Variogram against Time

3.8 f)

In this section a continuous autoregression model will be imposed on the within-group errors, using Time as a covariate. From the earlier plots on residuals and the exploration of homoscedasticity, weights will also be added to adjust for what appeared to be heteroscedastic behaviour. The subsequent analysis of variance, printed below, does indicate that these adjustments improve the current model.

##         Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## g_nlme5     1  6 392.8488 412.5175 -190.4244                        
## g_nlme7     2  8 369.0642 395.2891 -176.5321 1 vs 2 27.78464  <.0001

Plots of the new model in Figure 19 below show that both the auto-correlation and the heteroscadasticity has been addressed, although the slope in the adjusted variogram has only improved and not been eliminated.

Figure 19: Variogram against Time (adjusted)

3.9 g)

In this section a model used by Hand and Crowder (1996) will be compared with the model developed thus far in this exercise.

The analysis of variance, printed below, suggest that the model developed thus far in this exercise is a better model.

##         Model df      AIC      BIC    logLik
## g_nlme7     1  8 369.0642 395.2891 -176.5321
## hc_mod      2  8 384.7317 410.9567 -184.3659

The two models were also compared by considering the predictions against observed values.

The Hand and Crowder model predictions shown in Figure 20 below only considered random effects at the first level of Subject while the predictions for the currently preferred model shown in Figure 21 considered random errors at two levels, namely Subject and Date-within-Subject.

Comparing these two groups of plots do not offer conclusive support for either model, although the simpler one-level Hand and Crowder model may be preferred based on a) its simplicity and also; b) the fact that it’s predictions are quite reasonable and c) the very small value (0.0002323) of the standard deviation for $\phi_2$ random effect on Date-within-Subject.

Figure 20: Model: Hand and Crowder (1996)

Figure 21: Model: Current

3.10 h)

In this final section of the exercise a non-linear mixed effects model will be fitted, but this time incorporating a Date effect in the fixed effects of the model. The summary of the model, printed below, shows that the Date interaction may not be significant at all levels. Furthermore, the predictions resulting from the inclusion of Date, shown in the plots in Figure 22 result in considerably poorer fits.

## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: glucose ~ g_function(Time, phi1, phi2, phi3) 
##  Data: Glucose2 
##        AIC      BIC    logLik
##   443.9108 479.9701 -210.9554
## 
## Random effects:
##  Formula: phi1 ~ 1 | Subject
##         phi1.(Intercept)
## StdDev:        0.2939087
## 
##  Formula: phi2 ~ 1 | Date %in% Subject
##         phi2.(Intercept)    Residual
## StdDev:     3.939733e-12 0.006161728
## 
## Correlation Structure: Continuous AR(1)
##  Formula: ~Time | Subject/Date 
##  Parameter estimate(s):
##       Phi 
## 0.7799865 
## Variance function:
##  Structure: Power of variance covariate
##  Formula: ~fitted(.) 
##  Parameter estimates:
##    power 
## 3.404843 
## Fixed effects: phi1 + phi2 + phi3 ~ Date 
##                      Value Std.Error  DF   t-value p-value
## phi1.(Intercept)  5.129411 0.4071838 178 12.597288  0.0000
## phi1.Date2       -0.166322 0.5244574   6 -0.317132  0.7619
## phi2.(Intercept) -0.001390 0.0007122 178 -1.951732  0.0525
## phi2.Date2       -0.001396 0.0013903 178 -1.004335  0.3166
## phi3.(Intercept)  0.101755 0.0134890 178  7.543558  0.0000
## phi3.Date2        0.035806 0.0192526 178  1.859795  0.0646
##  Correlation: 
##                  p1.(I) ph1.D2 p2.(I) ph2.D2 p3.(I)
## phi1.Date2       -0.715                            
## phi2.(Intercept) -0.741  0.576                     
## phi2.Date2        0.380 -0.682 -0.512              
## phi3.(Intercept)  0.438 -0.340 -0.908  0.465       
## phi3.Date2       -0.307  0.282  0.636 -0.799 -0.701
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.014303852 -0.576333583  0.005061725  0.691882111  3.272925658 
## 
## Number of Observations: 196
## Number of Groups: 
##           Subject Date %in% Subject 
##                 7                14

Figure 22: Model including Date

3.11 Conclusion

In conclusion, although the simpler model of Hand and Crowder (1996) would be this author’s preferred model in application, the final model as it was developed in this exercise will be use to demonstrate the second stage details.

Considering Subject = 7 and Date = 2, the design matrices for this final model would look as in the expression below:

\[ \mathbf{\phi_{7,2}} = \left[\begin{array} {rrr} \phi_{7,2,1} \\ \phi_{7,2,2} \\ \phi_{7,2,3} \\ \end{array}\right] = \left[\begin{array} {rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array}\right] \left[\begin{array} {rrr} \beta_1 \\ \beta_2 \\ \beta_3 \\ \end{array}\right] + \left[\begin{array} {rrr} 1 \\ 0 \\ 0 \\ \end{array}\right] \left[\begin{array} {rrr} b_7 \end{array}\right] + \left[\begin{array} {rrr} 0 \\ 1 \\ 0 \\ \end{array}\right] \left[\begin{array} {rrr} b_{7,2} \\ \end{array}\right] \]

In this case: \[\Psi_1 = 0.73^2\] and

\[\Psi_2 = 0.00023^2\]

The variance $\Delta$ of the distribution of the error term $\epsilon_{ijk}$ in the model is determined by $E[y_{ij}|\phi_i]^\theta$ due to its dependence on fitted values, as well as on the adjustment to the auto-correlation identified in the data.

4 Appendix : R-Code

knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
knitr::opts_chunk$set(cache=TRUE)
knitr::knit_hooks$set(inline = function(x) { if(!is.numeric(x)){ x }else{ prettyNum(round(x,2), big.mark=" ") } })
# setting fig reference facilities and options
figRef <- local({
    tag <- numeric()
    created <- logical()
    used <- logical()
    function(label, caption, prefix = options("figcap.prefix"), 
        sep = options("figcap.sep"), prefix.highlight = options("figcap.prefix.highlight")) {
        i <- which(names(tag) == label)
        if (length(i) == 0) {
            i <- length(tag) + 1
            tag <<- c(tag, i)
            names(tag)[length(tag)] <<- label
            used <<- c(used, FALSE)
            names(used)[length(used)] <<- label
            created <<- c(created, FALSE)
            names(created)[length(created)] <<- label
        }
        if (!missing(caption)) {
            created[label] <<- TRUE
            paste0(prefix.highlight, prefix, " ", i, sep, prefix.highlight, 
                " ", caption)
        } else {
            used[label] <<- TRUE
            paste(prefix, tag[label])
        }
    }
})
options(figcap.prefix = "Figure", figcap.sep = ":", figcap.prefix.highlight = "**")

# load required package
library(nlme)

# some basic exploration
data(PBG)
head(PBG)
isBalanced(PBG) # yes
table(PBG$dose) #  6 doses (dose), 10 measurements each
table(PBG$Treatment) # 2 Treatments (Treatment), 30 measurements each
table(PBG$Rabbit) # 5 Rabbits (Rabbit), 12 measurements each

# exploring structure
plot(PBG, inner = ~Treatment)

# create desired grouped structure:
PBG_nu <- groupedData(deltaBP ~ dose | Rabbit/Treatment, data = PBG)

#str(PBG_nu) # sure Rabbit should not an ordered factor

# changing ordered to unordered factor
PBG_nu$Rabbit <- factor(unclass(PBG_nu$Rabbit))
# str(PBG_nu)
# head(PBG_nu)

# try single nls regression model
pbg_nls <- nls(deltaBP ~ SSfpl(log(dose), A, B, xmid, scal),
               data = PBG_nu)

summary(pbg_nls) # suggests A may not be significant.

# exploring level residuals
plot(pbg_nls, Treatment ~ resid(.), abline=0) # looks like a definite Treatment effect
plot(pbg_nls, Rabbit ~ resid(.), abline=0) # not strong evidence of Rabbit effects

# a) nls list
pbg_lst <- nlsList(deltaBP ~ SSfpl(log(dose), A, B, xmid, scal), data = PBG_nu)

# b) Intervals
plot(intervals(pbg_lst))

# Exploring correlations

plot(pairs(pbg_lst)) # no discernable correlation between four parameters. maybe between B and scal

#  considering residual plots again
plot(pbg_lst, Rabbit ~ resid(.),abline=0)
plot(pbg_lst, Treatment ~ resid(.),abline=0)

# creating vector of fixed effect estimates
fix_start <- fixef(pbg_lst)

# first nlme model
pbg_nlme <- nlme(deltaBP ~ SSfpl(log(dose), A, B, xmid, scal), data = PBG_nu,
                 fixed = list(A + B ~ 1, xmid ~ Treatment, scal ~ 1),
                 random = list(Rabbit = B ~ 1,
                               Treatment = pdDiag(list(B ~ 1, xmid ~ 1))),
                 start = c(fix_start[c(1,2)], fix_start[3], 0, fix_start[4]),
                 method = "REML")

# second nlme model
pbg_nlme2 <- update(pbg_nlme, random = list(Rabbit = B ~ 1,
                                            Treatment = list(B ~ 1, xmid ~ 1)))

# anova
anova(pbg_nlme, pbg_nlme2) # second model does not improve... so stick to first

# a summary of preferred model
summary(pbg_nlme)$tTable # does show that fixed effect for treatment on xmid is significant

# considering intervals of coefficients.
intervals(pbg_nlme)$fixed

#e_augmented prediction plot
plot(augPred(pbg_nlme, scales = list(x = list(log = 2))))

# considering heteroscedasticity
plot(pbg_nlme)

# adding weights
pbg_nlme3 <- update(pbg_nlme,
                    weights = varPower())
plot(pbg_nlme3)

anova(pbg_nlme, pbg_nlme3)

# considering other covariates
pairs(ranef(pbg_nlme3))

# first considering seperate models and also generate starting values
pbg_lst3 <- nlsList(deltaBP ~ SSlogis(log(dose), Asym, xmid, scal), data = PBG_nu)

plot(intervals(pbg_lst3))

# vector of starting values
fix_start3 <- fixef(pbg_lst3)

# constructing the model
pbg_nlme4 <- nlme(deltaBP ~ SSlogis(log(dose), Asym, xmid, scal), data = PBG_nu,
                 fixed = list(Asym ~ 1, xmid ~ Treatment, scal ~ 1),
                 random = list(Rabbit = Asym ~ 1,
                               Treatment = pdDiag(list(Asym ~ 1, xmid ~ 1))),
                 start = c(fix_start3[1], fix_start3[2],0, fix_start3[3]),
                 weights = varPower(),
                 method = "REML")

# checking the summary information
summary(pbg_nlme4)

# plotting prections
plot(augPred(pbg_nlme4, scales = list(x = list(log = 2))))

# defining the first stage function:
g_function <- function(Time, phi1, phi2, phi3) {
        phi1 + phi2*(Time^3)*exp(-phi3*Time)
}
## test the function
#g_function(5,6,4,2)

data(Glucose2)
# head(Glucose2)
# isBalanced(Glucose2)
# table(Glucose2$Subject)
# table(Glucose2$Date)
# table(Glucose2$Time)
plot(Glucose2, layout = c(2,7))  # looks like SSfol (First-Order Compartment Model)

# first plot
plot(Glucose2, display = 1)
g_lst1 <- nlsList(glucose ~ g_function(Time, phi1, phi2, phi3),
                 data = Glucose2,
                 level = 1,
                 start = c(phi1 = 5, phi2 = -1, phi3 = 1))
plot(intervals(g_lst1), layout = c(3,1))

g_lst2 <- nlsList(glucose ~ g_function(Time, phi1,phi2,phi3),
                  data = Glucose2,
                  level = 2,
                  start = c(phi1 = 5, phi2 = -1, phi3 = 1))
plot(intervals(g_lst2), layout = c(3,1))

# pais plots
plot(pairs(g_lst2))# phi2 and phi3 may be correlated

g_nlme <- nlme(glucose ~ g_function(Time, phi1, phi2, phi3),
               data = Glucose2,
               fixed = list(phi1 ~ 1, phi2 ~ 1, phi3 ~ 1),
               random = list(Subject = phi1 + phi2 ~ 1,
                             Date = phi1 + phi2 ~ 1),
               start = fixef(g_lst2)
               )

summary(g_nlme)

intervals(g_nlme, which = "var-cov") # basically no precision (-1 to 1) so dont really add value
# should consider a diagonal model
g_nlme2 <- update(g_nlme,
                  random = list(Subject = pdDiag(phi1 + phi2 ~ 1),
                                Date = pdDiag(phi1 + phi2 ~ 1)))

anova(g_nlme, g_nlme2) # argues for second model
# intervals for model 2
intervals(g_nlme2) # checking if I can drop random effects...


# various alternative models to consider
g_nlme3 <- update(g_nlme2,
                  random = list(Subject = phi1 ~ 1,
                                Date = phi1 ~ 1))
g_nlme4 <- update(g_nlme2,
                  random = list(Subject = phi2 ~ 1,
                                Date = phi2 ~ 1))
g_nlme5 <- update(g_nlme2,
                  random = list(Subject = phi1 ~ 1,
                                Date = phi2 ~ 1))
g_nlme6 <- update(g_nlme2,
                  random = list(Subject = phi2 ~ 1,
                                Date = phi1 ~ 1))

## analyses of variance
#anova(g_nlme2, g_nlme3) # no improvement
#anova(g_nlme2, g_nlme4) # no improvement
anova(g_nlme2, g_nlme5) # improvement
#anova(g_nlme2, g_nlme6) # no improvement

# plotting residuals against time
plot(g_nlme5, resid(., type = "p") ~ Time, abline = 0)

# considering residuals against fitted values
# 
plot(g_nlme5) # confirms a need to consider the variance in the residuals...

# variogram
plot(Variogram(g_nlme5, form = ~ Time, maxDist = 22))
g_nlme7 <- update(g_nlme5,
                  weights = varPower(),
                  corr = corCAR1(form = ~ Time))

anova(g_nlme5, g_nlme7) # does drop AIC and BIC values and loglik increases

# second variogram
plot(Variogram(g_nlme7, form = ~ Time, maxDist = 22))

# considering residuals again.
plot(g_nlme7)
hc_mod <- nlme(glucose ~ g_function(Time, phi1, phi2, phi3),
                 data = Glucose2,
                 fixed = list(phi1 ~ 1, phi2 ~ 1, phi3 ~ 1),
                 random = phi1 + phi2 ~ 1 | Subject,
                 corr = corCAR1(form = ~ Time | Subject/Date),
                 start = fixef(g_lst2))

anova(g_nlme7, hc_mod) # suggests my model is better with lower AIC and BIC values...

# considering predictions for the Hand and Crowder model
plot(augPred(hc_mod))

# considering predictions for the current model
plot(augPred(g_nlme7))


# final model including Date interaction in fixed effects
fin_nlme <- update(g_nlme7,
                   fixed = phi1 + phi2 + phi3 ~  Date,
                   start = (c(fixef(g_lst2), 0, 0, 0)))

summary(fin_nlme)

plot(augPred(fin_nlme))

References

Bates, José C Pinheiro; Douglas M. 2000. Mixed-Effects Models in S and S-Plus. Springer.

Humburg, P. 2014. “Using Knitr and Pandoc to Create Reproducible Scientific Reports.” october. http://galahad.well.ox.ac.uk/repro/.

Pinheiro, Jose, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, and R Core Team. 2017. nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme.

RStudio Team. 2015. RStudio: Integrated Development Environment for R. Boston, MA: RStudio, Inc. http://www.rstudio.com/.

Assignment: Longitudinal Data Analysis

Hans-Peter Bakker (BKKHAN001)

10/24/2017