Design of Experiments

Statistical Background: Distributions

The most popular distribution is the normal distribution. The normal distribution is characterized by two parameters, a mean $\mu$ and a variance $\sigma^2$. Below is a plot of the standard normal distribution, which is just a normal distribution with zero mean and unit variance.

x <- seq(-3, 3, length = 200)
plot(x, dnorm(x, 0, 1), 
     type = "l", 
     ylab = "Density", 
     xlab = "Normal Variates",
     main = "Standard Normal Distribution")

A similar looking distribution is the t distribution, which depends on the degrees of freedom parameter $\nu$. Below is a plot of t distributions with varying degrees of freedom.

# Display the Student's t distributions with various
# degrees of freedom and compare to the normal distribution

x <- seq(-3, 3, length = 200)
hx <- dnorm(x, mean = 0, sd = 1)

degf <- c(1, 5, 15, 30)
colors <- c("red", "blue", "darkgreen", "orange", "black")
labels <- c("df = 1", "df = 3", "df = 8", "df = 30", "Normal")

plot(x, 
     hx, 
     type = "l", 
     lty = 1,
     lwd = 2,
     xlab = "Random variates", 
     ylab = "Density", 
     main = "Comparison of t Distributions")

for(i in 1:4){
  lines(x, 
        dt(x, degf[i]), 
        lwd = 2, 
        col = colors[i])
}

legend("topright", 
       inset = .05, 
       title = "Distributions",
       labels, 
       lwd = 2, 
       lty = c(1, 1, 1, 1, 1), 
       col = colors)

The chi-square distribution is import for the t-tests and analysis of variance models. Below is a plot of different chi-square distributions with varying degrees of freedom. The distribution is characterized by one parameter, $k$, its degrees of freedom.

x <- seq(0, 5, length = 200)

degf <- c(1, 2, 4, 8)
colors <- c("red", "blue", "darkgreen", "orange", "black")
labels <- c("df = 1", "df = 2", "df = 4", "df = 8", "df = 10")

plot(x, 
     dchisq(x, 10), 
     type = "l", 
     lty = 1, 
     lwd = 2,
     xlab = "Random variates", 
     ylab = "Density", 
     main = expression(paste("Comparison of ", chi^2, " Distributions")),
     ylim = c(0, .5))

for (i in 1:4){
  lines(x, 
        dchisq(x, degf[i]), 
        lwd = 2, 
        col = colors[i])
}

legend("topright", 
       inset = .05, 
       title = "Distributions",
       labels, 
       lwd = 2, 
       lty = c(1, 1, 1, 1, 1), 
       col = colors)

The F distribution is import for analysis of variance models. Below is a plot of different F distributions with varying degrees of freedom. The distribution is characterized by two parameters, $\nu_1$ and $\nu_2$; these are the degrees of freedom parameters.

x <- seq(0, 5, length = 200)

degf <- c(1, 5, 10, 20)
colors <- c("red", "blue", "darkgreen", "orange", "black")
labels <- c("df = 1", "df = 5", "df = 10", "df = 20", "df = 30")

plot(x, 
     df(x, 30, 30), 
     type = "l", 
     lty = 1,
     lwd = 2,
     xlab = "Random variates", 
     ylab = "Density", 
     main = "Comparison of F Distributions (df, 30)",
     ylim = c(0, max(df(x, 30, 30))))

for(i in 1:4){
  lines(x, 
        df(x, degf[i], 30), 
        lwd = 2, 
        col = colors[i])
}

legend("topright", 
       inset = .05, 
       title = "Distributions",
       labels, 
       lwd = 2, 
       lty = c(1, 1, 1, 1, 1), 
       col = colors)

Statistical Background: Rejection Regions and Acceptance Regions in Hypothesis Testing

The rejection region(s) for a hypothesis test can either be one-tailed or two-tailed, depending on the way the hypotheses are framed. For instance, suppose you have been given the task of determining if the reliability of a new widget for smartphones is significantly different than the reliability of the gold standard widget. In this case, there is no directional hypothesis. That is, the reliability of the new widget could be either less than or greater than the reliability of the gold standard widget. In this case, if we are using a two-tailed z-test with $\alpha$ = .05, the rejection region would have the shape, bounded by critical values of -1.96 and 1.96.

x <- seq(-3, 3, length = 200)
plot(x, 
     dnorm(x, 0, 1), 
     type = "l", 
     ylab = "Density",
     xlab = "Z",
     main = "Two-Tailed Rejection Region")

area_x <- seq(-1.96, 1.96, length = 100)
area_y <- dnorm(area_x, mean = 0, sd = 1)
polygon(x = c(-1.96, area_x, 1.96), y = c(0, area_y, 0), col = "green")

reject_x <- seq(-3, -1.96, length = 100)
reject_y <- dnorm(reject_x, mean = 0, sd = 1)
polygon(x = c(-3, reject_x, -1.96), y = c(0, reject_y, 0), col = "red")


reject_x <- seq(1.96, 3, length = 100)
reject_y <- dnorm(reject_x, mean = 0, sd = 1)
polygon(x = c(1.96, reject_x, 3), y = c(0, reject_y, 0), col = "red")

text(0, .15, "Accept \n 95% Pr")
arrows(-2.5, 0.08, -2.2, 0.02)
text(-2.55, .13, "Reject \n 2.5% Pr")

arrows(2.5, 0.08, 2.2, 0.02)
text(2.55, .13, "Reject \n 2.5% Pr")

Now suppose you are given the task of determining if the reliability of a new widget for smartphones is significantly better than the reliability of the gold standard widget. In this case, there is a directional hypothesis. Therefore, the acceptance and rejection regions would be shifted to reflect this hypothesis. Now the rejection region is separated by z = 1.65,

x <- seq(-3, 3, length = 200)
plot(x, 
     dnorm(x, 0, 1), 
     type = "l", 
     ylab = "Density", 
     xlab = "Z",
     main = "One-Tailed Rejection Region")

area_x <- seq(-3, 1.65, length = 100)
area_y <- dnorm(area_x, mean = 0, sd = 1)
polygon(x = c(-3, area_x, 1.65), y = c(0, area_y, 0), col = "green")


reject_x <- seq(1.65, 3, length = 100)
reject_y <- dnorm(reject_x, mean = 0, sd = 1)
polygon(x = c(1.65, reject_x, 3), y = c(0, reject_y, 0), col = "red")

text(0, .15, "Accept \n 95% Pr")

arrows(2.5, 0.08, 2.2, 0.02)
text(2.55, .13, "Reject \n 5% Pr")

Example 1: Two Formulations of Portland Cement Mortar as Related to Tension Bond Strength

Let’s load and view the raw data. The variables are defined as follow:

Group = Factor variable with 2 levels
1. Modified = Modified mortar
2. Unmodified = Unmodified mortar
Strength = Tension bond strength

ex1 <- data.frame(Group = factor(c(rep('Modified', 10), rep('Unmodified', 10))), 
                  Strength = c(16.85, 16.40, 17.21, 16.35, 16.52, 17.04, 16.96, 17.15, 16.59, 16.57,
                               16.62, 16.75, 17.37, 17.12, 16.98, 16.87, 17.34, 17.02, 17.08, 17.27)
                  )
print(ex1)

##         Group Strength
## 1    Modified    16.85
## 2    Modified    16.40
## 3    Modified    17.21
## 4    Modified    16.35
## 5    Modified    16.52
## 6    Modified    17.04
## 7    Modified    16.96
## 8    Modified    17.15
## 9    Modified    16.59
## 10   Modified    16.57
## 11 Unmodified    16.62
## 12 Unmodified    16.75
## 13 Unmodified    17.37
## 14 Unmodified    17.12
## 15 Unmodified    16.98
## 16 Unmodified    16.87
## 17 Unmodified    17.34
## 18 Unmodified    17.02
## 19 Unmodified    17.08
## 20 Unmodified    17.27

Before proceeding to hypothesis testing, let’s calculate some basic statistics and plots. With only two groups, a dot plot or chart is sometimes helpful to visualize the data separated within each group.

dotchart(x = ex1$Strength,                  # x = the response variable
         groups = ex1$Group,                # groups = the factor variable
         main = "Dot Plot",                 # main = title of plot
         xlab = "Tension Bond Strength")    # xlab = x-axis label

We can see that compared to the Unmodified mortar group, the Modified mortar group has more data on the lower end of Strength. Another plot to visualize the group differences is a boxplot.

boxplot(ex1$Strength ~ ex1$Group,          # formula to create boxplot: response ~ factor
        main = "Boxplot",                  # main = title of plot
        xlab = "Factor Levels",            # xlab = x-axis label
        ylab = "Tension Bond Strength")    # ylab = y-axis label

Again, we see that the spread of Strength in the Modified group is lower than the Unmodified group. Although plots are very useful to describe these data, let’s calculate some descriptive statistics for each group.

g1 <- subset(ex1, ex1$Group == "Modified")
cat("Modified Group", "\nMean:", mean(g1$Strength), "\nVariance", var(g1$Strength))

## Modified Group 
## Mean: 16.764 
## Variance 0.1001378

g2 <- subset(ex1, ex1$Group == "Unmodified")
cat("Unmodified Group", "\nMean:", mean(g2$Strength), "\nVariance", var(g2$Strength))

## Unmodified Group 
## Mean: 17.042 
## Variance 0.06146222

We can test the null hypothesis of interest, $H_0:\mu_{Modified} = \mu_{Unmodified}$, using an independent samples t-test. First, let’s test the assumptions of normality and equal variances.

par(mfrow = c(1, 2)) 
qqnorm(g1$Strength, main = "Modified Group")
qqline(g1$Strength)
qqnorm(g2$Strength, main = "Unmodified Group")
qqline(g2$Strength)

H0_var <- var.test(ex1$Strength ~ ex1$Group,
                   ratio = 1,
                   alternative = "two.sided",
                   conf.level = 0.95)
print(H0_var)

## 
##  F test to compare two variances
## 
## data:  ex1$Strength by ex1$Group
## F = 1.6293, num df = 9, denom df = 9, p-value = 0.4785
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.4046845 6.5593806
## sample estimates:
## ratio of variances 
##           1.629257

Assuming the standard $\alpha = .05$, since .4785 > .05, we can conclude that the variances of scores for the two groups are not significantly different. Therefore, we have support for our equal variance assumption. For the t-test assuming equal variances, we have

H0_means <- t.test(ex1$Strength ~ ex1$Group,
                   alternative = "two.sided",
                   paired = FALSE,
                   var.equal = TRUE,
                   conf.level = 0.95)
print(H0_means)

## 
##  Two Sample t-test
## 
## data:  ex1$Strength by ex1$Group
## t = -2.1869, df = 18, p-value = 0.0422
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.54507339 -0.01092661
## sample estimates:
##   mean in group Modified mean in group Unmodified 
##                   16.764                   17.042

How do we interpret this output? At $\alpha = .05$, since .0422 < .05, we can reject the null hypothesis and conclude that the mean tension bond strengths between the two cement mortar formulations are significantly different. What recommendations would you make to the engineer about the results of the DOE?

Example 2: Power Analysis Independent Samples t-Test

To perform a power analysis in R, we will first install the prw package and load it into our current session.

install.packages("pwr", repos = 'http://cran.us.r-project.org')

## Installing package into 'C:/Users/rmilleti/Documents/R/win-library/3.1'
## (as 'lib' is unspecified)

## package 'pwr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\rmilleti\AppData\Local\Temp\1\RtmpSQBXfv\downloaded_packages

library(pwr)

## Warning: package 'pwr' was built under R version 3.1.3

To find the optimal number of subjects in each group, based on $\alpha$ = .05 and power = .80, we first need to specify the effect size and a pooled standard deviation. We are given information about the expected average difference between groups, ~ 10, so we can use this number as our effect. For the pooled standard deviation, we can average the standard deviations, $\sqrt{\frac{15^2 + 17^2}{2}} = 16.03$.

pwr.t.test(d = 10/16.03,
           power = .8,
           sig.level = .05,
           type = "two.sample",
           alternative = "two.sided")

## 
##      Two-sample t test power calculation 
## 
##               n = 41.31968
##               d = 0.6238303
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Based on our results, we can round the sample size and conclude that approximately 42 subjects would be needed per group. We can take the analysis a step further and see what would happen if we varied the effect sizes.

ptab <- cbind(NULL, NULL)       

 for (i in c(.2, .3, .4, .5, .6, .7, .8, .9, 1, 1.1, 1.2)){
   pwrt <- pwr.t.test(d = i,
                      power = .8,
                      sig.level = .05,
                      type = "two.sample",
                      alternative = "two.sided")
   
   ptab <- rbind(ptab, cbind(pwrt$d, pwrt$n))
 }

print(ptab)

##       [,1]      [,2]
##  [1,]  0.2 393.40570
##  [2,]  0.3 175.38467
##  [3,]  0.4  99.08032
##  [4,]  0.5  63.76561
##  [5,]  0.6  44.58579
##  [6,]  0.7  33.02458
##  [7,]  0.8  25.52457
##  [8,]  0.9  20.38633
##  [9,]  1.0  16.71473
## [10,]  1.1  14.00190
## [11,]  1.2  11.94226

plot(ptab[, 1],
     ptab[, 2],
     type = "b",
     xlab = "Effect Size",
     ylab ="Sample Size")

ptab <- cbind(NULL, NULL)       

 for (i in c(.2, .3, .4, .5, .6, .7, .8, .9, 1, 1.1, 1.2)){
   pwrt <- pwr.t.test(d = i,
                      power = .8,
                      sig.level = .05,
                      type = "two.sample",
                      alternative = "two.sided")
   
   ptab <- rbind(ptab, cbind(pwrt$d, pwrt$n))
 }

ptab

##       [,1]      [,2]
##  [1,]  0.2 393.40570
##  [2,]  0.3 175.38467
##  [3,]  0.4  99.08032
##  [4,]  0.5  63.76561
##  [5,]  0.6  44.58579
##  [6,]  0.7  33.02458
##  [7,]  0.8  25.52457
##  [8,]  0.9  20.38633
##  [9,]  1.0  16.71473
## [10,]  1.1  14.00190
## [11,]  1.2  11.94226

As effect sizes increase, we need a smaller sample to detect them (assuming power = .80). If we view the plot as a function of sample sizes instead of effect sizes…

pwrt<-pwr.t.test(d = .7,
                 n = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
                 sig.level = .05,
                 type = "two.sample",
                 alternative = "two.sided")

plot(pwrt$n,pwrt$power, 
     type = "b",
     xlab = "Sample size",
     ylab = "Power")

As expected, power increases with sample size.

Example 3: Cotton Tensile Strength

Let’s load and view the raw data. The variables are defined as follow:

Group = Factor variable with 5 levels
1. p15 = 15% of cotton combined with sythetic fibers
2. p20 = 20% of cotton combined with sythetic fibers
3. p25 = 25% of cotton combined with sythetic fibers
4. p30 = 30% of cotton combined with sythetic fibers
5. p35 = 35% of cotton combined with sythetic fibers
Strength = Tensile strength of cotton fibers

ex2 <- data.frame(Group = factor(c(rep('p15', 5), 
                                   rep('p20', 5),
                                   rep('p25', 5),
                                   rep('p30', 5),
                                   rep('p35', 5))), 
                  Strength = c(7, 7, 12, 11, 9,
                               12, 17, 13, 16, 17,
                               16, 17, 18, 15, 19,
                               19, 25, 22, 19, 23,
                               9, 10, 13, 15, 11)
                  )
print(ex2)

##    Group Strength
## 1    p15        7
## 2    p15        7
## 3    p15       12
## 4    p15       11
## 5    p15        9
## 6    p20       12
## 7    p20       17
## 8    p20       13
## 9    p20       16
## 10   p20       17
## 11   p25       16
## 12   p25       17
## 13   p25       18
## 14   p25       15
## 15   p25       19
## 16   p30       19
## 17   p30       25
## 18   p30       22
## 19   p30       19
## 20   p30       23
## 21   p35        9
## 22   p35       10
## 23   p35       13
## 24   p35       15
## 25   p35       11

First, we can examine the marginal distributions of the response variable for each group. These will give us an idea of the spread of the data and also help to identify univariate outliers.

boxplot(ex2$Strength ~ ex2$Group,          # formula to create boxplot: response ~ factor
        main = "Boxplot",                  # main = title of plot
        xlab = "Factor Levels",            # xlab = x-axis label
        ylab = "Tensile Strength")         # ylab = y-axis label

The boxplots reveal that tensile strength scores increase as the percentage of cotton combined with fibers increases, but then decreases at 35% cotton. Let’s proceed to test hypotheses of interest using the analysis of variance (ANOVA) model and $\alpha$ = .05.

model_ex2 <- aov(ex2$Strength ~ ex2$Group)
summary(model_ex2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## ex2$Group    4  463.4  115.86   22.45 3.69e-07 ***
## Residuals   20  103.2    5.16                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the results of the ANOVA model, since $p$ < .05, we can reject the null hypothesis of equal means and conclude that at least one pair of means is significantly different. Before we determine which means are significantly different, we should verify that the model assumptions are met. Proceeding to trust any results of a statistical model without verifying the assumptions can lead to biased estimates and inferences (e.g., Type I and II Errors).

We can visually and statistically check the normality assumption and the homogeneity of variance assumption. We will use a Q-Q plot to visually check for normality first.

qqnorm(model_ex2$residuals, 
       main = "Q-Q Plot of Model Residuals")
qqline(model_ex2$residuals)

Aside from the tail ends of the distribution, it appears that the residuals are normally distributed. A more formal way to test for normality is using a hypothesis test. One such test is the Shapiro-Wilk’s test of normality. In this case, the null hypothesis is that the variable has a normal distribution, whereas, the alternative hypothesis is the complement. We will use $\alpha$ = .05.

shapiro.test(model_ex2$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model_ex2$residuals
## W = 0.9233, p-value = 0.06099

The p-value here is not less than .05, therefore, we do not have evidence to reject the normality assumption. Notice, in some cases we do not want to reject the null hypothesis!

Now, to check the homogeneity of variance assumption. We will construct a scatterplot of the predicted values (x-axis) by the residual values (y-axis) and see if there is a constant spread of the data across predicted values. In the ideal case, the scatterplot will not show a discernable trend. That is, no fanning effect or weird patterns in the residuals.

plot(model_ex2$fitted.values, 
     model_ex2$residuals,
     xlab = "Fitted Values",
     ylab = "Residuals",
     main = "Scatterplot of Residuals by Fitted Values")

abline(h = 0,
       v = 0,
       col = "red",
       lty = 2,
       lwd = 2)

Based on this plot, the homogeneity of variance assumption seems reasonable. However, a more formal (statistical) test is always recommended. We will use Bartlett’s test for equal variances with the standard alpha.

bartlett.test(ex2$Strength ~ ex2$Group)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  ex2$Strength by ex2$Group
## Bartlett's K-squared = 0.9462, df = 4, p-value = 0.9178

Supporting our visual inspection, results of Barlett’s test do not reveal a concern for unequal variances. Overall, the assumptions of the ANOVA seem to be met so we can proceed with post hoc analyses. We will use Tukey’s Honestly Significant Difference (HSD) tests to conduct all possible pairwise comparisons. In general, with $k$ groups, there are $\frac{k(k-1)}{2}$ comparisons to be made.

mult_comp_ex2 <- TukeyHSD(model_ex2)
print(mult_comp_ex2)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = ex2$Strength ~ ex2$Group)
## 
## $`ex2$Group`
##          diff         lwr        upr     p adj
## p20-p15   5.8   1.5009668 10.0990332 0.0051840
## p25-p15   7.8   3.5009668 12.0990332 0.0002257
## p30-p15  12.4   8.1009668 16.6990332 0.0000003
## p35-p15   2.4  -1.8990332  6.6990332 0.4729848
## p25-p20   2.0  -2.2990332  6.2990332 0.6394031
## p30-p20   6.6   2.3009668 10.8990332 0.0014743
## p35-p20  -3.4  -7.6990332  0.8990332 0.1656435
## p30-p25   4.6   0.3009668  8.8990332 0.0323454
## p35-p25  -5.4  -9.6990332 -1.1009668 0.0096553
## p35-p30 -10.0 -14.2990332 -5.7009668 0.0000085

plot(mult_comp_ex2)

From the plot, we see that 7 out of 10 comparisons are significant. Let’s just keep the significant ones.

print(round(mult_comp_ex2$`ex2$Group`[mult_comp_ex2$`ex2$Group`[, 4] < .05, ], 7))

##          diff         lwr       upr     p adj
## p20-p15   5.8   1.5009668 10.099033 0.0051840
## p25-p15   7.8   3.5009668 12.099033 0.0002257
## p30-p15  12.4   8.1009668 16.699033 0.0000003
## p30-p20   6.6   2.3009668 10.899033 0.0014743
## p30-p25   4.6   0.3009668  8.899033 0.0323454
## p35-p25  -5.4  -9.6990332 -1.100967 0.0096553
## p35-p30 -10.0 -14.2990332 -5.700967 0.0000085

How do we interpret these effects?

Example 4: Effect of Vitamin C on Tooth Growth in Guinea Pigs

Let’s load and view the raw data. The variables are defined as follow:

supp = Factor variable with 2 levels
1. OJ = Orange juice
2. VC = Ascorbic acid
dose = Factor variable with 3 levels
1. Low = Low dose
2. Medium = Medium dose
3. High = High dose
len = Length of tooth

data(ToothGrowth)
print(ToothGrowth)

##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 7  11.2   VC  0.5
## 8  11.2   VC  0.5
## 9   5.2   VC  0.5
## 10  7.0   VC  0.5
## 11 16.5   VC  1.0
## 12 16.5   VC  1.0
## 13 15.2   VC  1.0
## 14 17.3   VC  1.0
## 15 22.5   VC  1.0
## 16 17.3   VC  1.0
## 17 13.6   VC  1.0
## 18 14.5   VC  1.0
## 19 18.8   VC  1.0
## 20 15.5   VC  1.0
## 21 23.6   VC  2.0
## 22 18.5   VC  2.0
## 23 33.9   VC  2.0
## 24 25.5   VC  2.0
## 25 26.4   VC  2.0
## 26 32.5   VC  2.0
## 27 26.7   VC  2.0
## 28 21.5   VC  2.0
## 29 23.3   VC  2.0
## 30 29.5   VC  2.0
## 31 15.2   OJ  0.5
## 32 21.5   OJ  0.5
## 33 17.6   OJ  0.5
## 34  9.7   OJ  0.5
## 35 14.5   OJ  0.5
## 36 10.0   OJ  0.5
## 37  8.2   OJ  0.5
## 38  9.4   OJ  0.5
## 39 16.5   OJ  0.5
## 40  9.7   OJ  0.5
## 41 19.7   OJ  1.0
## 42 23.3   OJ  1.0
## 43 23.6   OJ  1.0
## 44 26.4   OJ  1.0
## 45 20.0   OJ  1.0
## 46 25.2   OJ  1.0
## 47 25.8   OJ  1.0
## 48 21.2   OJ  1.0
## 49 14.5   OJ  1.0
## 50 27.3   OJ  1.0
## 51 25.5   OJ  2.0
## 52 26.4   OJ  2.0
## 53 22.4   OJ  2.0
## 54 24.5   OJ  2.0
## 55 24.8   OJ  2.0
## 56 30.9   OJ  2.0
## 57 26.4   OJ  2.0
## 58 27.3   OJ  2.0
## 59 29.4   OJ  2.0
## 60 23.0   OJ  2.0

We need to do some basic preprocessing of the data so R understands that the dose variable is a factor with several levels instead of numerical.

ToothGrowth$dose = factor(ToothGrowth$dose, 
                          levels=c(0.5,1.0,2.0),
                          labels=c("low","med","high")
)
str(ToothGrowth)   # str is a function to examine the structure of a data frame

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "low","med","high": 1 1 1 1 1 1 1 1 1 1 ...

Proceeding as we have before, let’s start by examining a boxplot of the marginal distribution of the response for each group. We can use the attach function so we do not need to use the $ after the dataset name to reference the variable of interest.

attach(ToothGrowth)
boxplot(len ~ supp * dose, 
        data = ToothGrowth,
        ylab = "Tooth Length", 
        main = "Boxplots of Tooth Growth Data")

From these plots, we see that len increases as dose increases and that at each level of dose, the len for OJ is generally higher than the len for VC.

When there are multiple factors, it is helpful to visual the interaction between the factors in terms of the response of interest. In this example, we will examine the plot of dose by len at different levels of supp.

with(ToothGrowth, 
     interaction.plot(x.factor = dose, 
                      trace.factor = supp,
                      response = len, 
                      fun = mean, 
                      type = "b", 
                      legend = T,
                      ylab = "Tooth Length", 
                      main = "Interaction Plot",
                      pch = c(1,19)))

In terms of plots, when there is no interaction between factors, we expect to see parallel lines. In this case, the lines are not parallel. Therefore, we can test if this interaction is statistically significant. Before proceeding with hypothesis testing, we will use another plot to visualize the data.

coplot(len ~ dose | supp, 
       data = ToothGrowth, 
       panel = panel.smooth,
       xlab = "ToothGrowth data: length vs dose, given type of supplement")

These conditional plots show us the effect of len vs. dose, conditional on the type of supp. Here we see that the effect is more linaer in the VC group than the OJ group. How do we interpret this?

We can now run the factorial ANOVA, check model assumptions, and then proceed to conduct post hoc analyses. We will continue to use $\alpha$ = .05.

aov.out = aov(len ~ supp * dose, 
              data = ToothGrowth)
summary(aov.out)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## supp         1  205.4   205.4  15.572 0.000231 ***
## dose         2 2426.4  1213.2  92.000  < 2e-16 ***
## supp:dose    2  108.3    54.2   4.107 0.021860 *  
## Residuals   54  712.1    13.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on our output, we see that we have a significant main effect for supp, dose, and a significant interaction. In the case of significant main effects and significant interaction, the recommended strategy is to interpret the interaction effect. The reason is because the main effect assumes a constant effect between a factor and the response, conditional at values of the other factor(s). In the case of a significant interaction, the interaction is telling us that the effect of a factor and the response is not constant across levels of another factor.

There are several ways to follow-up a significant interaction effect. We could conduct complex contrasts if we had theoretical reason to do so. Here, we will stick with simple contrasts, namely, pairwise comparisons. First, let’s check the two assumptions from before. We will use the graphical and statistical tests for normality and graphical for homogeneity of variance. For normality,

qqnorm(aov.out$residuals,
      main = "Q-Q Plot of Model Residuals")
qqline(aov.out$residuals)

shapiro.test(aov.out$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  aov.out$residuals
## W = 0.985, p-value = 0.6694

What can we conclude based on normality?

plot(aov.out$fitted.values, 
     aov.out$residuals,
     xlab = "Fitted Values",
     ylab = "Residuals",
     main = "Scatterplot of Residuals by Fitted Values")

abline(h = 0,
       v = 0,
       col = "red",
       lty = 2,
       lwd = 2)

Does the scatterplot appear to have a pattern?

Let’s proceed to figure out where the mean differences are. Again, we will use Tukey’s HSD post hoc analyses. The first output below contains the grand mean, marginal means, and cell means. What means do the main effects compare? What about the interaction effect?

model.tables(aov.out, 
             type = "means", 
             se = FALSE)

## Tables of means
## Grand mean
##          
## 18.81333 
## 
##  supp 
## supp
##     OJ     VC 
## 20.663 16.963 
## 
##  dose 
## dose
##    low    med   high 
## 10.605 19.735 26.100 
## 
##  supp:dose 
##     dose
## supp low   med   high 
##   OJ 13.23 22.70 26.06
##   VC  7.98 16.77 26.14

TukeyHSD(aov.out)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = len ~ supp * dose, data = ToothGrowth)
## 
## $supp
##       diff       lwr       upr     p adj
## VC-OJ -3.7 -5.579828 -1.820172 0.0002312
## 
## $dose
##            diff       lwr       upr   p adj
## med-low   9.130  6.362488 11.897512 0.0e+00
## high-low 15.495 12.727488 18.262512 0.0e+00
## high-med  6.365  3.597488  9.132512 2.7e-06
## 
## $`supp:dose`
##                  diff        lwr        upr     p adj
## VC:low-OJ:low   -5.25 -10.048124 -0.4518762 0.0242521
## OJ:med-OJ:low    9.47   4.671876 14.2681238 0.0000046
## VC:med-OJ:low    3.54  -1.258124  8.3381238 0.2640208
## OJ:high-OJ:low  12.83   8.031876 17.6281238 0.0000000
## VC:high-OJ:low  12.91   8.111876 17.7081238 0.0000000
## OJ:med-VC:low   14.72   9.921876 19.5181238 0.0000000
## VC:med-VC:low    8.79   3.991876 13.5881238 0.0000210
## OJ:high-VC:low  18.08  13.281876 22.8781238 0.0000000
## VC:high-VC:low  18.16  13.361876 22.9581238 0.0000000
## VC:med-OJ:med   -5.93 -10.728124 -1.1318762 0.0073930
## OJ:high-OJ:med   3.36  -1.438124  8.1581238 0.3187361
## VC:high-OJ:med   3.44  -1.358124  8.2381238 0.2936430
## OJ:high-VC:med   9.29   4.491876 14.0881238 0.0000069
## VC:high-VC:med   9.37   4.571876 14.1681238 0.0000058
## VC:high-OJ:high  0.08  -4.718124  4.8781238 1.0000000

plot(TukeyHSD(aov.out))

How do we interpret this output?

Design of Experiments - Examples

Robert Milletich

2015-10-28

Statistical Background: Distributions

Statistical Background: Rejection Regions and Acceptance Regions in Hypothesis Testing

Example 2: Power Analysis Independent Samples t-Test

Example 3: Cotton Tensile Strength

Example 4: Effect of Vitamin C on Tooth Growth in Guinea Pigs