Introduction

In this blog post, we will explore the analysis of variance (ANOVA) in the context of a synthetic dataset. ANOVA helps us assess whether the means of two or more groups are statistically different from each other.

Synthetic Dataset

Let’s generate a synthetic dataset for illustration purposes.

set.seed(123)  # for reproducibility
num_groups <- 3
group_sizes <- c(30, 25, 35)

# Creating a synthetic categorical variable (e.g., treatment groups)
groups <- rep(1:num_groups, each = group_sizes)
## Warning in rep(1:num_groups, each = group_sizes): first element used of 'each'
## argument
# Generating synthetic values for a quantitative variable (e.g., dependent variable)
values <- rnorm(sum(group_sizes), mean = c(10, 15, 20), sd = 3)

# Creating a data frame
mydata <- data.frame(Group = factor(groups), DependentVar = values)

Analysis of Variance (ANOVA)

Now, let’s run ANOVA and interpret the results.

# Load required libraries
library(car)
## Loading required package: carData
# Run ANOVA
model <- lm(DependentVar ~ Group, data = mydata)
anova_result <- Anova(model, type = "II")

# Interpret ANOVA results
summary(anova_result)
##      Sum Sq               Df           F value           Pr(>F)      
##  Min.   :   7.167   Min.   : 2.00   Min.   :0.1555   Min.   :0.8562  
##  1st Qu.: 506.497   1st Qu.:23.25   1st Qu.:0.1555   1st Qu.:0.8562  
##  Median :1005.826   Median :44.50   Median :0.1555   Median :0.8562  
##  Mean   :1005.826   Mean   :44.50   Mean   :0.1555   Mean   :0.8562  
##  3rd Qu.:1505.156   3rd Qu.:65.75   3rd Qu.:0.1555   3rd Qu.:0.8562  
##  Max.   :2004.486   Max.   :87.00   Max.   :0.1555   Max.   :0.8562  
##                                     NA's   :1        NA's   :1

Post Hoc Tests

If the ANOVA results are significant, we may need to run post hoc tests for pairwise comparisons.

# Load required libraries
library(agricolae)

# Run post hoc tests
posthoc_result <- LSD.test(model, "Group", console = TRUE)
## 
## Study: model ~ "Group"
## 
## LSD t Test for DependentVar 
## 
## Mean Square Error:  23.04007 
## 
## Group,  means and individual ( 95 %) CI
## 
##   DependentVar      std  r        se      LCL      UCL      Min      Max
## 1     14.85869 4.986279 30 0.8763574 13.11683 16.60054 8.124882 25.14519
## 2     15.53502 5.048204 30 0.8763574 13.79316 17.27687 6.203811 24.10581
## 3     15.07326 4.332767 30 0.8763574 13.33141 16.81512 6.944274 23.44642
##        Q25      Q50      Q75
## 1 11.20294 14.20481 17.90791
## 2 11.90035 14.84283 19.50612
## 3 12.05657 13.95511 17.53644
## 
## Alpha: 0.05 ; DF Error: 87
## Critical Value of t: 1.987608 
## 
## least Significant Difference: 2.463355 
## 
## Treatments with the same letter are not significantly different.
## 
##   DependentVar groups
## 2     15.53502      a
## 3     15.07326      a
## 1     14.85869      a
# Interpret post hoc test results
print(posthoc_result)
## $statistics
##    MSerror Df     Mean       CV  t.value      LSD
##   23.04007 87 15.15565 31.67139 1.987608 2.463355
## 
## $parameters
##         test p.ajusted name.t ntr alpha
##   Fisher-LSD      none  Group   3  0.05
## 
## $means
##   DependentVar      std  r        se      LCL      UCL      Min      Max
## 1     14.85869 4.986279 30 0.8763574 13.11683 16.60054 8.124882 25.14519
## 2     15.53502 5.048204 30 0.8763574 13.79316 17.27687 6.203811 24.10581
## 3     15.07326 4.332767 30 0.8763574 13.33141 16.81512 6.944274 23.44642
##        Q25      Q50      Q75
## 1 11.20294 14.20481 17.90791
## 2 11.90035 14.84283 19.50612
## 3 12.05657 13.95511 17.53644
## 
## $comparison
## NULL
## 
## $groups
##   DependentVar groups
## 2     15.53502      a
## 3     15.07326      a
## 1     14.85869      a
## 
## attr(,"class")
## [1] "group"

Visualizations

Let’s create visualizations to better understand our data.

Bar Chart

# Load required library
library(ggplot2)

# Bar chart
ggplot(mydata, aes(x = Group, y = DependentVar)) +
  geom_bar(stat = "summary", fun = "mean", fill = "skyblue") +
  labs(title = "Mean Values Across Groups", x = "Group", y = "Mean Dependent Variable")

Box Plot

# Box plot
ggplot(mydata, aes(x = Group, y = DependentVar, fill = Group)) +
  geom_boxplot() +
  labs(title = "Box Plot of Dependent Variable Across Groups", x = "Group", y = "Dependent Variable")

Scatter Plot with Trend Line

# Scatter plot with trend line
ggplot(mydata, aes(x = as.numeric(Group), y = DependentVar)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Scatter Plot with Trend Line", x = "Group", y = "Dependent Variable")
## `geom_smooth()` using formula = 'y ~ x'