Analysis of variance with one continuous dependent variable and two categorical (factor) independent variables
library(readxl)
library(car)
## Loading required package: carData
library(agricolae)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::recode() masks car::recode()
## ✖ purrr::some() masks car::some()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sjstats)
library(ggpubr)
# read the dataset
rabbit <- read_excel("datasets.xlsx", sheet = "anova", range = "J14:O77")
str(rabbit)
## tibble [63 × 6] (S3: tbl_df/tbl/data.frame)
## $ sr. : num [1:63] 1 2 3 4 5 6 7 8 9 10 ...
## $ Treatment : chr [1:63] "Control" "Control" "Control" "Probiotic" ...
## $ Replication: chr [1:63] "R1" "R2" "R3" "R1" ...
## $ Day : num [1:63] 0 0 0 0 0 0 0 0 0 15 ...
## $ Weight : num [1:63] 10.5 11 11.8 12 11.5 10.7 10 12.5 11.3 12 ...
## $ Mortality : num [1:63] 1 1 1 0 1 0 2 0 1 1 ...
# Dependent variable: Weight
# Independent variable: Treatment, Replication, Day = as.factor()
rabbit$Replication = as.factor(rabbit$Replication)
rabbit$Treatment = as.factor(rabbit$Treatment)
rabbit$Day = as.factor(rabbit$Day)
str(rabbit)
## tibble [63 × 6] (S3: tbl_df/tbl/data.frame)
## $ sr. : num [1:63] 1 2 3 4 5 6 7 8 9 10 ...
## $ Treatment : Factor w/ 3 levels "Antibiotic","Control",..: 2 2 2 3 3 3 1 1 1 2 ...
## $ Replication: Factor w/ 3 levels "R1","R2","R3": 1 2 3 1 2 3 1 2 3 1 ...
## $ Day : Factor w/ 7 levels "0","15","30",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ Weight : num [1:63] 10.5 11 11.8 12 11.5 10.7 10 12.5 11.3 12 ...
## $ Mortality : num [1:63] 1 1 1 0 1 0 2 0 1 1 ...
# anova model: aov(dv ~ iv + iv + ..., data)
# 3 independent variables: 3 ways anova
mod = aov(Weight ~ Replication + Treatment + Day, data = rabbit)
anova(mod)
## Analysis of Variance Table
##
## Response: Weight
## Df Sum Sq Mean Sq F value Pr(>F)
## Replication 2 7.9 3.95 0.7489 0.4779
## Treatment 2 535.3 267.65 50.7622 5.963e-13 ***
## Day 6 7687.2 1281.21 242.9953 < 2.2e-16 ***
## Residuals 52 274.2 5.27
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this additive model, replications do not have any effects on the weight of the rabbits. So, local control was alright. Now, we will remove this replication from the model. However, we will add interaction term in the model.
# Interaction: Treatment + Day + Treatment:Day
# or Interaction: Treatment*Day
mod_int = aov(Weight ~ Treatment + Day + Treatment:Day, data = rabbit)
summary(mod_int)
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 2 535 267.6 243.35 < 2e-16 ***
## Day 6 7687 1281.2 1164.90 < 2e-16 ***
## Treatment:Day 12 236 19.7 17.87 8.87e-13 ***
## Residuals 42 46 1.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that Treatment (control, probiotic and antibiotic) and day intervals along with their interactions have significant effects on the rabbit weights. Let us see the interaction plot.
# interaction.plot(x.factor = rabbit$Day, trace.factor = rabbit$Treatment, response = rabbit$Weight)
# with(data, ............), does not need $ signs
with(rabbit, interaction.plot(Day, Treatment, Weight))
This figure shows that the effect of treatments increased with the increase in the day of observation. In the earlier days, the effects of the treatments were lower while in the later days the effects were larger. If there was no interaction effect, the lines would be parallel (not converging or diverging).
Assumptions of ANOVA: Normality of dependent variable. It is not usually checked. Because, when observations per group less than 10, insufficient observations to check normality. So, we can check the normality of the residuals.
boxplot(Weight ~ Day*Treatment, rabbit)
ggqqplot(rabbit, x = 'Weight', facet.by = c('Treatment', 'Day'))
We will now check the residuals (observed value - predicted value).
plot(mod_int, which = 2)
Now, check the equality of variances across the groups/factors. \(H_0: Variances~are~equal~for~all~factor~levels\)
car::leveneTest(Weight ~ Day*Treatment, rabbit)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 20 0.5276 0.9375
## 42
Equality of variance of residuals can also be checked by the first diagnostic plot.
plot(mod_int, which = 1)
CV of dependent variable = 46.31%
CV of model: quality of model fit = 4.15% (very low level of errors in the model)
\(\eta^2\) = variance explained by the factor
Post-hoc analysis to see which means are different
# Create a function to calculate the CV
cv = function (x) {sd(x)/mean(x)*100}
# Calculate cv of the dependent variable
cv(rabbit$Weight)
## [1] 46.30991
# Model cv
# Model standard error = Residuals standard error = Residual standard deviation = 1.048733
mod_int
## Call:
## aov(formula = Weight ~ Treatment + Day + Treatment:Day, data = rabbit)
##
## Terms:
## Treatment Day Treatment:Day Residuals
## Sum of Squares 535.292 7687.232 235.877 46.193
## Deg. of Freedom 2 6 12 42
##
## Residual standard error: 1.048733
## Estimated effects may be unbalanced
# model cv = 4.15%
1.048733/mean(rabbit$Weight)*100
## [1] 4.146751
Variance explained by different factors (\(\eta^2\)): Treatment explained 6.3% of the variance in the weight of the rabbit and Day explained most of the variance (90.4%).
anova_stats(mod_int)
## etasq | partial.etasq | omegasq | partial.omegasq | epsilonsq | cohens.f
## ------------------------------------------------------------------------
## 0.063 | 0.921 | 0.063 | 0.885 | 0.063 | 3.404
## 0.904 | 0.994 | 0.903 | 0.991 | 0.903 | 12.900
## 0.028 | 0.836 | 0.026 | 0.763 | 0.026 | 2.260
## | | | | |
##
## etasq | term | sumsq | df | meansq | statistic | p.value | power
## ------------------------------------------------------------------------------
## 0.063 | Treatment | 535.292 | 2 | 267.646 | 243.350 | < .001 | 1
## 0.904 | Day | 7687.232 | 6 | 1281.205 | 1164.900 | < .001 | 1
## 0.028 | Treatment:Day | 235.877 | 12 | 19.656 | 17.872 | < .001 | 1
## | Residuals | 46.193 | 42 | 1.100 | | |
Post hoc analysis with Tukey HSD
posthoc = HSD.test(mod_int, trt = c('Treatment', 'Day'), console = TRUE)
##
## Study: mod_int ~ c("Treatment", "Day")
##
## HSD Test for Weight
##
## Mean Square Error: 1.099841
##
## Treatment:Day, means
##
## Weight std r se Min Max Q25 Q50 Q75
## Antibiotic:0 11.26667 1.2503333 3 0.6054864 10.0 12.5 10.65 11.3 11.90
## Antibiotic:15 13.73333 0.6658328 3 0.6054864 13.0 14.3 13.45 13.9 14.10
## Antibiotic:30 18.96667 0.2516611 3 0.6054864 18.7 19.2 18.85 19.0 19.10
## Antibiotic:45 24.80000 0.4358899 3 0.6054864 24.3 25.1 24.65 25.0 25.05
## Antibiotic:60 30.90000 1.5716234 3 0.6054864 29.2 32.3 30.20 31.2 31.75
## Antibiotic:75 38.16667 1.6653328 3 0.6054864 36.3 39.5 37.50 38.7 39.10
## Antibiotic:90 47.56667 0.6027714 3 0.6054864 47.0 48.2 47.25 47.5 47.85
## Control:0 11.10000 0.6557439 3 0.6054864 10.5 11.8 10.75 11.0 11.40
## Control:15 12.80000 0.7549834 3 0.6054864 12.0 13.5 12.45 12.9 13.20
## Control:30 15.63333 0.4041452 3 0.6054864 15.2 16.0 15.45 15.7 15.85
## Control:45 20.13333 1.0503968 3 0.6054864 19.1 21.2 19.60 20.1 20.65
## Control:60 24.30000 0.8000000 3 0.6054864 23.5 25.1 23.90 24.3 24.70
## Control:75 28.86667 2.1548395 3 0.6054864 27.2 31.3 27.65 28.1 29.70
## Control:90 36.10000 0.9000000 3 0.6054864 35.2 37.0 35.65 36.1 36.55
## Probiotic:0 11.40000 0.6557439 3 0.6054864 10.7 12.0 11.10 11.5 11.75
## Probiotic:15 15.50000 1.1789826 3 0.6054864 14.2 16.5 15.00 15.8 16.15
## Probiotic:30 20.46667 0.8736895 3 0.6054864 19.5 21.2 20.10 20.7 20.95
## Probiotic:45 26.50000 0.6557439 3 0.6054864 25.9 27.2 26.15 26.4 26.80
## Probiotic:60 34.50000 1.1789826 3 0.6054864 33.5 35.8 33.85 34.2 35.00
## Probiotic:75 40.06667 1.3203535 3 0.6054864 38.9 41.5 39.35 39.8 40.65
## Probiotic:90 48.33333 0.8504901 3 0.6054864 47.5 49.2 47.90 48.3 48.75
##
## Alpha: 0.05 ; DF Error: 42
## Critical Value of Studentized Range: 5.382464
##
## Minimun Significant Difference: 3.259009
##
## Treatments with the same letter are not significantly different.
##
## Weight groups
## Probiotic:90 48.33333 a
## Antibiotic:90 47.56667 a
## Probiotic:75 40.06667 b
## Antibiotic:75 38.16667 bc
## Control:90 36.10000 cd
## Probiotic:60 34.50000 d
## Antibiotic:60 30.90000 e
## Control:75 28.86667 ef
## Probiotic:45 26.50000 fg
## Antibiotic:45 24.80000 g
## Control:60 24.30000 g
## Probiotic:30 20.46667 h
## Control:45 20.13333 h
## Antibiotic:30 18.96667 h
## Control:30 15.63333 i
## Probiotic:15 15.50000 i
## Antibiotic:15 13.73333 ij
## Control:15 12.80000 ij
## Probiotic:0 11.40000 j
## Antibiotic:0 11.26667 j
## Control:0 11.10000 j
group1 = posthoc$means[ , c(2:3)] %>% rownames_to_column('Combination')
group2 = posthoc$groups %>% rownames_to_column('Combination')
group = left_join(group1, group2, by = 'Combination')
plot_data = group %>% mutate(
ll = Weight - 1.96*(std/sqrt(r)),
ul = Weight + 1.96*(std/sqrt(r))
)
plot_data
## Combination std r Weight groups ll ul
## 1 Antibiotic:0 1.2503333 3 11.26667 j 9.851781 12.68155
## 2 Antibiotic:15 0.6658328 3 13.73333 ij 12.979873 14.48679
## 3 Antibiotic:30 0.2516611 3 18.96667 h 18.681885 19.25145
## 4 Antibiotic:45 0.4358899 3 24.80000 g 24.306744 25.29326
## 5 Antibiotic:60 1.5716234 3 30.90000 e 29.121541 32.67846
## 6 Antibiotic:75 1.6653328 3 38.16667 bc 36.282165 40.05117
## 7 Antibiotic:90 0.6027714 3 47.56667 a 46.884567 48.24877
## 8 Control:0 0.6557439 3 11.10000 j 10.357956 11.84204
## 9 Control:15 0.7549834 3 12.80000 ij 11.945656 13.65434
## 10 Control:30 0.4041452 3 15.63333 i 15.176000 16.09067
## 11 Control:45 1.0503968 3 20.13333 h 18.944698 21.32197
## 12 Control:60 0.8000000 3 24.30000 g 23.394715 25.20529
## 13 Control:75 2.1548395 3 28.86667 ef 26.428236 31.30510
## 14 Control:90 0.9000000 3 36.10000 cd 35.081554 37.11845
## 15 Probiotic:0 0.6557439 3 11.40000 j 10.657956 12.14204
## 16 Probiotic:15 1.1789826 3 15.50000 i 14.165856 16.83414
## 17 Probiotic:30 0.8736895 3 20.46667 h 19.477994 21.45534
## 18 Probiotic:45 0.6557439 3 26.50000 fg 25.757956 27.24204
## 19 Probiotic:60 1.1789826 3 34.50000 d 33.165856 35.83414
## 20 Probiotic:75 1.3203535 3 40.06667 b 38.572546 41.56079
## 21 Probiotic:90 0.8504901 3 48.33333 a 47.370913 49.29575
ggplot(plot_data) +
aes(x = Combination, y = Weight, fill = Combination) +
geom_col() +
geom_text(aes(label = round(Weight, 0), y = Weight - 5 )) +
geom_text(aes(label = groups, y = Weight + 5), color = 'blue') +
geom_errorbar(aes(ymin = ll, ymax = ul), width = 0.2, alpha = 0.7) +
theme_test() +
theme(legend.position = 'none') +
labs(x = 'Combination (Treatment:Day)', y = 'Average weight (kg)') +
theme(axis.text.x = element_text(angle = 90, vjust = 0.4, hjust = 1))
# ggsave('Anova.png', dpi = 600, height = 6, width = 8, units = 'in')