0. Accessing the owprm Package

install.packages("devtools") ## if not installed already
library("devtools")
devtools::install_github("wgalvord/owprm")
library(owprm)
help(package = "owprm")

1. Introduction

Repeated measures data arise in a wide array of different research environments. The term “repeated” is used here to describe measurements that “are made on the same characteristic on the same observational unit but on more than one occasion.” (Crowder and Hand, 1990, p. 1)

Because the measurements for a particular individual, subject, patient or observational unit are repeated, they are not independent within the individual. Therefore, special care must be taken in the statistical analysis of such data.

We analyze some repeated measures data taken from the textbook by Crowder and Hand (1990), Example 3.3 on page 32. Twelve hospital patients underwent a dietary regime treatment. Over the course of seven occasions, measurements were taken on ascorbic acid for each patient. The seven occasions were divided into three treatment phases - twice before, thrice during, and twice after the treatment regime. The seven occasions occurred at weeks 1, 2, 6, 10, 14, 15, & 16. The ascorbic acid measurements are made on the same patient (subject) over time, and therefore, are not statistically independent within the individual patient.

2. The Model

For this problem, the model may be written as

\[ y_{ijk} = \pi_{i} + \Phi_{j} + o_{k(j)} + \pi\Phi_{ij} + \pi o_{ik(j)} \]

where \(y_{ijk}\) is \(ijk^{th}\) response for the \(i^{th}\) patient on the \(k^{th}\) occasion in the \(j^{th}\) phase, \(\pi_{i}\) is the \(i^{th}\) patient, \(\Phi_{j}\) is the \(j^{th}\) phase, \(o_{k(j)}\) is the \(k^{th}\) occasion nested within the \(j^{th}\) phase, and \(\pi o_{ik(j)}\) is the interaction between the \(i^{th}\) patient and the \(k^{th}\) occasion nested within the \(j^{th}\) phase. In this analysis, the patients (patient), occasions (occ) and phases (Phase) are considered as factors in the statistical setup. In particular, the occ variable is a factor variable, not a numeric variable. Therefore, this statistical model is fully-saturated. It contains no error term. That is, there is no \(\epsilon_{ijk}\) represented in the model. The error term for testing the hypothesis of interest in the anova is constructed in a special way, as described in the sequel.

The patient, patient, represented as \(\pi_{i}\), is a random factor. A patient is randomly selected from a potentially large population of patients. However, Phase (Phase) and occ (Occasion) are fixed factors. They are measured or controlled by the investigator and are comprised of a relatively low number of finite levels. In the model, any interaction between a fixed factor and a random factor is a random factor.

In order to test for the hypothesis of whether some relative upward or downward shift occurs among Phase’s, it is necessary to construct the \(F\) statistic in a non-traditional way. To obtain the correct \(F\) statistic, it is necessary to generate the expected mean-squares from the structural model. The expected mean-squares can be derived from different algorithms, such as the Cornfield-Tukey algorithm, as outlined, for example, in Winer (1971).

3. Plots and Data Description

Figure 1 shows a plot of the data from Crowder and Hand (1990). This plot is constructed from a groupedData data frame from package nlme (Pinheiro & Bates, 2000). The data appear to show a rise at week 6 and a fall (drop) at week 15.

Reaction of twelve `patient`s to treatment (`Rx`) across seven Occasions (`occ`) in three `Phase`s.

Reaction of twelve patients to treatment (Rx) across seven Occasions (occ) in three Phases.

Figure 2 shows an alternative plot of the ascorbic data, emphasizing that the occasions on the \(x\)-axis are to be considered as a factor dimension as opposed to a numeric dimension. This plot is constructed from the ggplot2 package (Wickham, 2009). [Note: In the interest of reconstructing this plot exactly, set.seed(11) before each layer of the plot produces the jittered data values positioned as seen here. {See Leek (2015), Peng (2015), Xie (2014) in connection with dynamic documentation and reproducible research}]

Reaction of twelve `patient`s to treatment (`Rx`) across seven Occasions (`occ`) in three `Phase`s.

Reaction of twelve patients to treatment (Rx) across seven Occasions (occ) in three Phases.

On the Occasions axis, the responses are jittered to avoid over-plotting. Legends for the patients, patient, are included in the plot. For these data, we have three phases, Phase, which are designated as pre, Rx, and post to correspond to the respones before, during, and after the treatment regimen. With respect to the spread or variability of the data, it appears that the ascorbic acid responses are reasonably homogeneous throughout the seven occasions.

In these analyses, we are interested in detecting Phase shifts. The responses appear to be relatively low in the first phase, pre. They are followed by an increase in the second phase, Rx and then by a drop in the third phase, post.

4. Analysis

In this problem, there is one hypothesis of interest. It is motivated by the question: Is there a difference in the mean levels of responses across the three Phases under consideration, pre, Rx, and post? The criterion variable is ascorbic acid, i.e., ascorbic.acid in the ascorbic data frame. Here is an abbreviated anova (analysis of variance) summary table for this analysis.

##               Df Sum Sq Mean Sq
## patient       11  3.682  0.3347
## Phase          2  5.961  2.9805
## occ            4  0.203  0.0509
## patient:Phase 22  2.397  0.1090
## patient:occ   44  2.052  0.0466

The correct test for determining whether there is an effect due to Phase is constructed by taking the mean square for Phase (2.9805) and dividing it by the mean square for patient:Phase (0.1090), and entering the \(F\) distribution with the appropriate degrees of freedom, shown here. The printing of the probability value to four significant digits is deliberate.

## 
##       F (null) Num df Den df       p.val
## Value    27.36      2     22 0.000001079

5. The owprm Package

We examine/analyze this problem with the use of the owprm package. First load the owprm package.

library(owprm)
data(ascorbic)
str(ascorbic)
## 'data.frame':    84 obs. of  7 variables:
##  $ patient      : Factor w/ 12 levels "Pat.01","Pat.02",..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ week         : num  1 2 6 10 14 15 16 1 2 6 ...
##  $ occ          : Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ phase.num    : num  1 1 2 2 2 3 3 1 1 2 ...
##  $ Phase        : Factor w/ 3 levels "post","pre","Rx": 2 2 3 3 3 1 1 2 2 3 ...
##  $ ascorbic.acid: num  0.22 0 1.03 0.67 0.75 0.65 0.59 0.18 0 0.96 ...
##  $ Occasion     : int  1 2 3 4 5 6 7 1 2 3 ...

ascorbic.acid is the criterion variable of interest. We fit a fully-saturated analysis of variance (aov) object with the interaction of three factor variables, patient, Phase and occ. Next we display the aovObj

aovObj <- aov(ascorbic.acid ~ patient*Phase*occ, data = ascorbic)
aovObj
## Call:
##    aov(formula = ascorbic.acid ~ patient * Phase * occ, data = ascorbic)
## 
## Terms:
##                  patient    Phase      occ patient:Phase patient:occ
## Sum of Squares  3.682024 5.961060 0.203488      2.396947    2.052163
## Deg. of Freedom       11        2        4            22          44
## 
## 168 out of 252 effects not estimable
## Estimated effects may be unbalanced

Before proceeding further, use the summary method on the aovObj.

summary(aovObj)
##               Df Sum Sq Mean Sq
## patient       11  3.682  0.3347
## Phase          2  5.961  2.9805
## occ            4  0.203  0.0509
## patient:Phase 22  2.397  0.1090
## patient:occ   44  2.052  0.0466

We next apply the owprm() function to the aovObj. Because the aovObj is fully saturated, a warning is printed to the screen with the following command.

owprm_aovObj <- owprm(aovObj)

The warning messages can be avoided by executing the suppressWarnings() command. Now display the owprm_aovObj object.

owprm_aovObj <- suppressWarnings(owprm(aovObj))
owprm_aovObj
## Call: 
## owprm.default(aovObj = aovObj)
## 
## Anova Summary Table for Occ w/in Phases Rep Meas Analy: 
##       F (null) Num df Den df        p.val
## Value 27.35632      2     22 1.079227e-06
## 
## Summary of Saturated ANOVA Object: 
##               Df Sum Sq Mean Sq
## patient       11  3.682  0.3347
## Phase          2  5.961  2.9805
## occ            4  0.203  0.0509
## patient:Phase 22  2.397  0.1090
## patient:occ   44  2.052  0.0466

The owprm_aovObj object is a list containing two components: (1) Anova Summary Table for Occ w/in Phases Rep Meas Analy, and (2) Summary of Saturated ANOVA Object. The Anova Summary Table for Occ w/in Phases Rep Meas Analy shows the correct \(F\) statistic under the null hypothesis of no Phase shifts, the numerator and denominator degrees of freedom, and the probability value. The Anova Summary Table for Occ w/in Phases Rep Meas Analy simply shows the anova summary table for the fully saturated aov fitted object. No probability values are shown because there is no “error” term in the fully saturated model.

In practice, we are generally most interest in the first Anova Summary Table for Occ w/in Phases Rep Meas Analy component. This conveniently provided by the summary method applied to the owprm_aovObj object.

summary(owprm_aovObj)
## 
##       F (null) Num df Den df        p.val
## Value 27.35632      2     22 1.079227e-06

This result may be displayed in a “prettier” form with the following code.

options(digits = 4, scipen = 2)
summary(owprm_aovObj)
## 
##       F (null) Num df Den df       p.val
## Value    27.36      2     22 0.000001079

References

Alvord WG and Carchedi N (2016) “Occasions Within Phases Repeated Measures Analysis - Derivation of Expected Mean Squares”, https://rpubs.com/wgalvord

Cornfield J and Tukey JW (1956) ‘Average values of mean squares in factorials’ Annals of Mathematical Statistics, 27, 907-949.

Crowder MJ and Hand DJ (1990) Analysis of Repeated Measures, Chapman and Hall, London.

Del Prete GQ, Oswald K, Lara A, Shoemaker R, Smedley J, Macallister R, Coalter V, Wiles A, Wiles R, Li Y, Fast R, Kaiser R, Lu B, Zheng J, Alvord WG, Trubey CM, Piatak M, Deleage C, Keele BF, Estes JD, Hesselgesser J, Geleziunas R, Lifson JD (2015) “Elevated plasma viral loads in romidepsin treated SIV-infected rhesus macaques on suppressive combination antiretroviral therapy.” Antimicrobial Agents and Chemotherapy, 60: 1560-1572, doi:10.1128/AAC.02625-15.

Del Prete GQ, Shoemaker R, Oswald K, Lara A, Trubey CM, Fast R, Schneider DK, Kiser R, Coalter V, Wiles A, Wiles R, Freemire B, Keele BF, Estes JD, Quiñones OA, Smedley J, Macallister R, Sanchez RI, Waid JS, Tane CM, Alvord WG, Hazuda DJ, Piatak M, Lifson JD “Effect of suberoylanilide hydroxamic acid (SAHA) administration on the residual virus pool in a model of combination antiretroviral therapy-mediated suppression in SIVmac239-infected indian rhesus macaques.” Antimicrobial Agents and Chemotherapy, 58: 6790-806. PMID 25182644 DOI: 10.1128/AAC.03746-14

Pinheiro jC and Bates DM (2000) Mixed-Effects Models in S and S-Plus, Springer, New York.

Pinheiro J, Bates D, DebRoy S, Sarkar D and R Core Team (2015). nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-122, .

Leek J (2015) The Elements of Data Analytic Style, Leanpub.

Peng R (2015) Report Writing for Data Science in R, Leanpub.

Winer BJ (1971) Statistical Principles in Experimental Design, 2nd ed., McGraw-Hill.

Xie Y (2014) Dynamic Documents with R and knitr, CRC Press.