R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

## package 'kableExtra' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\fiona\AppData\Local\Temp\RtmpwV2X5E\downloaded_packages

Week4Practice

Question

Do the drugs help? YES

Results

There is a significant interaction between “Drug1” and “Drug2” (F(4,80) = 2.961). Both “Drug1” alone and “Drug2” alone also show significant effects (F(2,80) = 405.411, P < 0.001) (F(2,80) = 313.645, P < 0.001). All assumptions of ANOVA held, so no adjustment was made.

After performing a post-hoc Tukey test:

  • Considering “Drug1” “a”, the addition of “Drug2” “x” to “y” (p < 0.001), “y” to “z” (p < 0.001), and “x” to “z” (p < 0.001) are significant.
  • Considering “Drug1” “b”, the addition of “Drug2” “x” to “y” (p < 0.001), “y” to “z” (p < 0.001), and “x” to “z” (p < 0.001) are significant.
  • Considering “Drug1” “c”, the addition of “Drug2” “x” to “y” (p < 0.001), “y” to “z” (p < 0.001), and “x” to “z” (p < 0.001) are significant.

Code

Explore Data

ggplot(data, aes(x = Drug1, y = Rating, fill = Drug1)) +
  theme_bw() +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Dark2") +
  geom_boxplot() + 
  labs(title = "Does Drug1 impact Ratings")

ggplot(data, aes(x = Drug2, y = Rating, fill = Drug2)) +
  theme_bw() +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Dark2") +
  geom_boxplot() + 
  labs(title = "Does Drug2 impact Ratings")

Test Data

bartlett.test(Rating ~ Drug1, data = data)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Rating by Drug1
## Bartlett's K-squared = 1.0587, df = 2, p-value = 0.589
bartlett.test(Rating ~ Drug2, data = data)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Rating by Drug2
## Bartlett's K-squared = 0.14381, df = 2, p-value = 0.9306
shapiro.test(data$Rating)
## 
##  Shapiro-Wilk normality test
## 
## data:  data$Rating
## W = 0.9755, p-value = 0.09018
summary(aov(Rating ~ Drug1, data = data))
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Drug1        2   1550   774.9   48.48 7.97e-15 ***
## Residuals   86   1374    16.0                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(Rating ~ Drug2, data = data))
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## Drug2        2   1199   599.5   29.89 1.4e-10 ***
## Residuals   86   1725    20.1                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

No Blocking !

Factorial

summary(aov(Rating ~ Drug1 + Drug2, data = data))
##             Df Sum Sq Mean Sq F value Pr(>F)    
## Drug1        2 1549.7   774.9   370.8 <2e-16 ***
## Drug2        2 1198.9   599.5   286.9 <2e-16 ***
## Residuals   84  175.5     2.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(Rating ~ Drug1 + Drug2 + Drug1:Drug2, data = data))
##             Df Sum Sq Mean Sq F value Pr(>F)    
## Drug1        2 1549.7   774.9 405.411 <2e-16 ***
## Drug2        2 1198.9   599.5 313.645 <2e-16 ***
## Drug1:Drug2  4   22.6     5.7   2.961 0.0246 *  
## Residuals   80  152.9     1.9                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
data <- 
  data %>%
  mutate(cocktail = factor(paste(Drug1,Drug2)))

ggplot(data, aes(x = cocktail, y = Rating, fill = cocktail)) +
  theme_bw() +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Set1") +
  geom_boxplot() + 
  labs(title = "Does the cocktail impact Ratings")

summary(aov(Rating ~ Drug1 + Drug2 + Drug1:Drug2, data = data))
##             Df Sum Sq Mean Sq F value Pr(>F)    
## Drug1        2 1549.7   774.9 405.411 <2e-16 ***
## Drug2        2 1198.9   599.5 313.645 <2e-16 ***
## Drug1:Drug2  4   22.6     5.7   2.961 0.0246 *  
## Residuals   80  152.9     1.9                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(Rating ~ Drug1 + Drug2 + cocktail, data = data))
##             Df Sum Sq Mean Sq F value Pr(>F)    
## Drug1        2 1549.7   774.9 405.411 <2e-16 ***
## Drug2        2 1198.9   599.5 313.645 <2e-16 ***
## cocktail     4   22.6     5.7   2.961 0.0246 *  
## Residuals   80  152.9     1.9                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc test

TukeyHSD(aov(Rating ~ Drug1 + Drug2 + Drug1:Drug2, data = data))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Rating ~ Drug1 + Drug2 + Drug1:Drug2, data = data)
## 
## $Drug1
##           diff        lwr       upr p adj
## b-a  -4.512792  -5.365725 -3.659859     0
## c-a -10.310345 -11.177377 -9.443313     0
## c-b  -5.797553  -6.650486 -4.944620     0
## 
## $Drug2
##          diff       lwr       upr p adj
## y-x -4.189467 -5.057015 -3.321919     0
## z-x -9.005432 -9.866199 -8.144666     0
## z-y -4.815965 -5.661522 -3.970409     0
## 
## $`Drug1:Drug2`
##                  diff         lwr        upr     p adj
## b:x-a:x -2.922222e+00  -4.9474745  -0.896970 0.0005097
## c:x-a:x -9.666667e+00 -11.7445316  -7.588802 0.0000000
## a:y-a:x -2.922222e+00  -4.9474745  -0.896970 0.0005097
## b:y-a:x -8.722222e+00 -10.7474745  -6.696970 0.0000000
## c:y-a:x -1.342222e+01 -15.4474745 -11.396970 0.0000000
## a:z-a:x -8.022222e+00 -10.0474745  -5.996970 0.0000000
## b:z-a:x -1.276768e+01 -14.7488433 -10.786510 0.0000000
## c:z-a:x -1.872222e+01 -20.7474745 -16.696970 0.0000000
## c:x-b:x -6.744444e+00  -8.7696967  -4.719192 0.0000000
## a:y-b:x  7.105427e-15  -1.9712358   1.971236 1.0000000
## b:y-b:x -5.800000e+00  -7.7712358  -3.828764 0.0000000
## c:y-b:x -1.050000e+01 -12.4712358  -8.528764 0.0000000
## a:z-b:x -5.100000e+00  -7.0712358  -3.128764 0.0000000
## b:z-b:x -9.845455e+00 -11.7713685  -7.919541 0.0000000
## c:z-b:x -1.580000e+01 -17.7712358 -13.828764 0.0000000
## a:y-c:x  6.744444e+00   4.7191922   8.769697 0.0000000
## b:y-c:x  9.444444e-01  -1.0808078   2.969697 0.8583631
## c:y-c:x -3.755556e+00  -5.7808078  -1.730303 0.0000028
## a:z-c:x  1.644444e+00  -0.3808078   3.669697 0.2078478
## b:z-c:x -3.101010e+00  -5.0821766  -1.119844 0.0001161
## c:z-c:x -9.055556e+00 -11.0808078  -7.030303 0.0000000
## b:y-a:y -5.800000e+00  -7.7712358  -3.828764 0.0000000
## c:y-a:y -1.050000e+01 -12.4712358  -8.528764 0.0000000
## a:z-a:y -5.100000e+00  -7.0712358  -3.128764 0.0000000
## b:z-a:y -9.845455e+00 -11.7713685  -7.919541 0.0000000
## c:z-a:y -1.580000e+01 -17.7712358 -13.828764 0.0000000
## c:y-b:y -4.700000e+00  -6.6712358  -2.728764 0.0000000
## a:z-b:y  7.000000e-01  -1.2712358   2.671236 0.9674422
## b:z-b:y -4.045455e+00  -5.9713685  -2.119541 0.0000001
## c:z-b:y -1.000000e+01 -11.9712358  -8.028764 0.0000000
## a:z-c:y  5.400000e+00   3.4287642   7.371236 0.0000000
## b:z-c:y  6.545455e-01  -1.2713685   2.580459 0.9750171
## c:z-c:y -5.300000e+00  -7.2712358  -3.328764 0.0000000
## b:z-a:z -4.745455e+00  -6.6713685  -2.819541 0.0000000
## c:z-a:z -1.070000e+01 -12.6712358  -8.728764 0.0000000
## c:z-b:z -5.954545e+00  -7.8804594  -4.028631 0.0000000

Charting

data %>%
  group_by(Drug1, Drug2) %>%
  summarize(
    count = n(),
    mean = mean(Rating),
    sd = sd(Rating),
    .groups = "keep") %>%
  kable(
    caption = "Does the cocktail impact Ratings",
    col.names = c("Drug 1", "Drug 2", "Count", "Mean", "Standard Deviation"),
    digits = c(0,0,0,2,2)) %>%
  kable_styling() %>%
  collapse_rows(columns = c(1))
Does the cocktail impact Ratings
Drug 1 Drug 2 Count Mean Standard Deviation
a x 9 36.22 1.30
a y 10 33.30 1.34
a z 10 28.20 1.32
b x 10 33.30 1.77
b y 10 27.50 1.35
b z 11 23.45 1.21
c x 9 26.56 1.13
c y 10 22.80 1.55
c z 10 17.50 1.35

ANOVAExample2

Question

Do the main effects of “PG” and “Condition” impact “Valuation”? Yes

Results

1.There is evidence to suggest that there is difference in group means among Conditions(F(19.61,2), p<0.05) * No evidence of different variance * There is evidence of non-normality for Valuation (w = 0.988, p = 0.036)

2.There is evidence to suggest that there is difference in group means among Productor Gamble( t=-6.0463,df = 237.76, p<0.05) * No evidence of different variance * There is evidence of non-normality for Valuation (w = 0.982, p = 0.036)

Code

data <- read_excel("C:/Users/fiona/OneDrive/Desktop/Anly 510 Fall 2022/Assignments/Factorial ANOVA/ANOVAExample2.xlsx")
data <- 
  data %>%
  mutate(
    PG = factor(ProductorGamble),
    Condition = factor(Condition),
    Session = factor(Session),
    ProductorGamble = NULL)

Explore Data

ggplot(data, aes(x = PG, y = Valuation, fill = PG)) +
  theme_bw() +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Dark2") +
  geom_boxplot() + 
  labs(title = "Does PG impact Valuation")

ggplot(data, aes(x = Condition, y = Valuation, fill = Condition)) +
  theme_bw() +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Dark2") +
  geom_boxplot() + 
  labs(title = "Does Condition impact Valuation")

Test Data

bartlett.test(Valuation ~ PG, data = data)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Valuation by PG
## Bartlett's K-squared = 0.11719, df = 1, p-value = 0.7321
bartlett.test(Valuation ~ Condition, data = data)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Valuation by Condition
## Bartlett's K-squared = 0.57097, df = 2, p-value = 0.7516
bartlett.test(Valuation ~ Session, data = data)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Valuation by Session
## Bartlett's K-squared = 0.48295, df = 2, p-value = 0.7855
shapiro.test(data$Valuation)
## 
##  Shapiro-Wilk normality test
## 
## data:  data$Valuation
## W = 0.97406, p-value = 0.0002225
xtabs(~PG + Condition + Session, data = data)
## , , Session = 1
## 
##          Condition
## PG        Buyer Chooser Seller
##   Gamble     14      13     13
##   Product    13      14     13
## 
## , , Session = 2
## 
##          Condition
## PG        Buyer Chooser Seller
##   Gamble     13      14     13
##   Product    13      13     14
## 
## , , Session = 3
## 
##          Condition
## PG        Buyer Chooser Seller
##   Gamble     13      13     14
##   Product    14      13     13
summary(aov(Valuation ~ PG, data = data))
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## PG            1  429.3   429.3   36.56 5.69e-09 ***
## Residuals   238 2795.1    11.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(Valuation ~ Condition, data = data))
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Condition     2  457.9  228.95   19.61 1.31e-08 ***
## Residuals   237 2766.5   11.67                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Factorial + Blocking

summary(aov(Valuation ~ PG + Condition + Session, data = data))
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## PG            1  429.3   429.3  44.180 2.09e-10 ***
## Condition     2  457.9   229.0  23.560 4.76e-10 ***
## Session       2   63.1    31.6   3.249   0.0406 *  
## Residuals   234 2274.0     9.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(Valuation ~ PG*Condition*Session, data = data))
##                       Df Sum Sq Mean Sq F value   Pr(>F)    
## PG                     1  429.3   429.3  44.813 1.75e-10 ***
## Condition              2  457.9   229.0  23.898 3.98e-10 ***
## Session                2   63.1    31.6   3.295   0.0389 *  
## PG:Condition           2   55.2    27.6   2.882   0.0581 .  
## PG:Session             2   12.0     6.0   0.624   0.5366    
## Condition:Session      4   23.7     5.9   0.619   0.6492    
## PG:Condition:Session   4   56.2    14.0   1.467   0.2133    
## Residuals            222 2126.9     9.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Charting

data %>%
  group_by(PG) %>%
  summarize(
    count = n(),
    mean = mean(Valuation),
    sd = sd(Valuation)) %>%
  kable(
    caption = "Does the main effect of PG impact Valuation",
    col.names = c("PG", "Count", "Mean", "Standard Deviation"),
    digits = c(0,0,2,2)) %>%
  kable_styling() 
Does the main effect of PG impact Valuation
PG Count Mean Standard Deviation
Gamble 120 4.89 3.37
Product 120 7.57 3.48
data %>%
  group_by(Condition) %>%
  summarize(
    count = n(),
    mean = mean(Valuation),
    sd = sd(Valuation)) %>%
  kable(
    caption = "Does the main effect of Condition impact Valuation",
    col.names = c("Condition", "Count", "Mean", "Standard Deviation"),
    digits = c(0,0,2,2)) %>%
  kable_styling() 
Does the main effect of Condition impact Valuation
Condition Count Mean Standard Deviation
Buyer 80 4.40 3.30
Chooser 80 6.55 3.37
Seller 80 7.74 3.58

1. Why must we include interactions in factorial experiments?

An interaction occurs if the effect of one independent variable on the dependent variable (the main effect) changes based on the level of the other independent variable. Therefore, if we do not include interactions in factorial experiments, we might interpret the main effect incorrectly because in this case, we ignore the effect of the presence of the other independent variable on the main effect (which might be significant depending on the level of the other independent variable).

2. If we find a higher order interaction is significant but its lower order components are not should we include the lower order components?

In order to get a robust estimate, we must always include the lower order components if they are not significant. For e.g. If we have an interaction of type PQR, then we must include PQ, QR, P*R, P, Q, and R. The lower order components will be measuring an altogether different effect than what they were without an interaction. In case, we ignore the lower order components than we are missing an effect created by them on the output. In addition, these components may be insignificant but would help in evaluating the hypothesis.