ch12-lab-2.knit

title: “Weekly Lab Homework Assignment: Interaction Effects” author: “Mario Rizzardi” date: “5/6/2025” output: html_document: toc: true toc_depth: 3 toc_float: true theme: flatly highlight: pygments —

Instructions

In this lab assignment, you will practice interpreting interactions by visualizing them with different types of graphs. You’ll work with simulated datasets and explore interactions between categorical x categorical, linear x linear, and categorical x linear variables.

Please follow the instructions for each exercise, and use ggplot2 for all visualizations.

Exercise 1: Categorical x Categorical Interaction

Task: Use the dataset with two categorical variables and one outcome variable. Fit a model with a categorical x categorical interaction, and visualize the interaction using a bar graph with error bars.

Use the dataset with two categorical variables: Education_Level (e.g., “High School”, “College”) and Job_Type (e.g., “Office”, “Field”), and an outcome variable Job_Satisfaction.
Fit a linear model including the interaction between Education_Level and Job_Type on Job_Satisfaction.
Create a bar graph with error bars to visualize the interaction.
Interpret the interaction term and the graph.
Run emmeans to compare all groups. First run emmeans and then pairs.

library(ggplot2)
library(dplyr)
library(interactions)
library(emmeans)

# Simulate data
set.seed(123)
Education_Level <- factor(rep(c("High School", "College"), each = 50))
Job_Type <- factor(rep(c("Office", "Field"), each = 25, times = 2))
Job_Satisfaction <- ifelse(Education_Level == "College", 
                           70 + 10 * (Job_Type == "Office"), 
                           60 + 5 * (Job_Type == "Field")) + rnorm(100, sd = 5)

data1 <- data.frame(Education_Level, Job_Type, Job_Satisfaction)

# Calculate means and standard errors with correct handling
means1 <- data1 %>%
  group_by(Education_Level, Job_Type) %>%
  summarise(
    Job_Satisfaction_Mean = mean(Job_Satisfaction),
    SE = (sd(Job_Satisfaction) / sqrt(n())),  # Calculate SE 
    lower = Job_Satisfaction_Mean - 1.96 * SE,  # Lower bound of the confidence interval
    upper = Job_Satisfaction_Mean + 1.96 * SE   # Upper bound of the confidence interval
  )

# Check the calculated values
print(means1)

## # A tibble: 4 × 6
## # Groups:   Education_Level [2]
##   Education_Level Job_Type Job_Satisfaction_Mean    SE lower upper
##   <fct>           <fct>                    <dbl> <dbl> <dbl> <dbl>
## 1 College         Field                     71.4 0.830  69.8  73.0
## 2 College         Office                    80.1 0.972  78.1  82.0
## 3 High School     Field                     65.5 0.919  63.7  67.3
## 4 High School     Office                    59.8 0.947  58.0  61.7

#Run your lm() model and summary() below using the data1 dataset
# Dummy Coding:
# High school is coded as 1, college is coded as 0
# Office is coded as 1, Field is coded as 0.

#Run your summary() below

Interpret any significant Effects:

# Plot the bar graph of means with error bars.

Interpretation of Plot:

Below run emmeans to estimate the marginal means.

Below run pairs to perform pairwise comparisons (post hoc tests) with Tukey adjustment for multiple comparisons.

Exercise 2: Linear x Linear Interaction

Task: Use the dataset with two continuous variables and one outcome variable. Fit a model with a linear x linear interaction, and visualize the interaction using a 2D plot with the interactions library.

Use the two continious predictors from the dataset: Age and Weekly_Hours_Worked, and an outcome variable Income.
Fit a linear model including the interaction between Age and Weekly_Hours_Worked.
Create a 2D plot using interact_plot() to visualize the interaction.
Interpret the interaction term and the graph.

# Simulate data
set.seed(123)
Age <- rnorm(100, mean = 40, sd = 10)
Weekly_Hours_Worked <- rnorm(100, mean = 40, sd = 5)
Income <- 30000 + 500 * Age + 1000 * Weekly_Hours_Worked + 50 * Age * Weekly_Hours_Worked + rnorm(100, sd = 5000)

data2 <- data.frame(Age, Weekly_Hours_Worked, Income)

# Fit the lm() model

# summary() of the model

Interpretation of the plot:

# Simulated data: linear x linear interaction
set.seed(123)
dataset2 <- data.frame(
  Stress = rnorm(100, mean = 50, sd = 10),
  Sleep = rnorm(100, mean = 7, sd = 1.5),
  Performance = NA
)
dataset2$Performance <- 70 + 0.5 * dataset2$Sleep - 0.4 * dataset2$Stress + 0.05 * dataset2$Sleep * dataset2$Stress + rnorm(100)

# Fit the model with interaction
model2 <- lm(Performance ~ Sleep * Stress, data = dataset2)
summary(model2)

## 
## Call:
## lm(formula = Performance ~ Sleep * Stress, data = dataset2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8719 -0.6777 -0.1086  0.5897  2.3166 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  74.157254   2.596623  28.559  < 2e-16 ***
## Sleep        -0.007462   0.382275  -0.020    0.984    
## Stress       -0.483531   0.051592  -9.372 3.32e-15 ***
## Sleep:Stress  0.060607   0.007633   7.940 3.80e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9468 on 96 degrees of freedom
## Multiple R-squared:  0.9587, Adjusted R-squared:  0.9574 
## F-statistic: 742.8 on 3 and 96 DF,  p-value: < 2.2e-16

# Visualization using ggplot2
library(ggplot2)
ggplot(dataset2, aes(x = Sleep, y = Performance, color = Stress)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x * Stress, se = FALSE) +
  labs(title = "Interaction Between Sleep and Stress on Performance")

## Warning: Failed to fit group -1.
## Caused by error:
## ! object 'Stress' not found

Interpretation: The interaction term indicates that the effect of sleep on performance depends on the level of stress. If the interaction term is significant and positive, higher stress increases the benefit of sleep on performance.

# Use emmeans to probe interaction
library(emmeans)
dataset2$Stress_group <- cut(dataset2$Stress, breaks = quantile(dataset2$Stress, probs = c(0, 0.33, 0.66, 1)), labels = c("Low", "Medium", "High"))
model2b <- lm(Performance ~ Sleep * Stress_group, data = dataset2)
emmeans(model2b, ~ Sleep | Stress_group)

## Stress_group = Low:
##  Sleep emmean    SE df lower.CL upper.CL
##   6.84   71.3 0.188 93     70.9     71.6
## 
## Stress_group = Medium:
##  Sleep emmean    SE df lower.CL upper.CL
##   6.84   70.6 0.191 93     70.2     71.0
## 
## Stress_group = High:
##  Sleep emmean    SE df lower.CL upper.CL
##   6.84   69.9 0.185 93     69.5     70.2
## 
## Confidence level used: 0.95

Exercise 3: Categorical x Linear Interaction

Task: Use the simulated dataset with one categorical variable and one continuous variable as predictors. Fit a model with a categorical x linear interaction, and visualize the interaction using an interaction plot.

Use the dataset with the two predictos: one categorical variable Gender and one continuous variable Study_Hours, and an outcome variable Test_Score.
Fit a linear model including the interaction between Gender and Study_Hours.
Create an interaction plot using ggplot2 to visualize the interaction.
Interpret the interaction term and the graph.

# Simulate data
set.seed(123)
Gender <- factor(rep(c("Male", "Female"), each = 50))
Study_Hours <- rnorm(100, mean = 5, sd = 2)
Test_Score <- 60 + 10 * (Gender == "Female") + 5 * Study_Hours + 5 * (Gender == "Female") * Study_Hours + rnorm(100, sd = 5)

data3 <- data.frame(Gender, Study_Hours, Test_Score)

# Fit the model
#Dummy coding: Female coded as 0, Male coded as 1


# Summary of the model

Interpretation for significant main effects:

# Plot the interaction

Interpretation of graph:

# Simulated data: categorical x linear interaction
set.seed(456)
dataset3 <- data.frame(
  Treatment = rep(c("A", "B"), each = 50),
  Dosage = rep(seq(1, 10, length.out = 50), 2),
  Response = NA
)
dataset3$Response <- 5 + 2 * (dataset3$Treatment == "B") + 1.5 * dataset3$Dosage - 
  0.8 * (dataset3$Treatment == "B") * dataset3$Dosage + rnorm(100)

# Fit model
model3 <- lm(Response ~ Treatment * Dosage, data = dataset3)
summary(model3)

## 
## Call:
## lm(formula = Response ~ Treatment * Dosage, data = dataset3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.30401 -0.64065 -0.03181  0.68451  2.11604 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        5.22392    0.33092  15.786  < 2e-16 ***
## TreatmentB         1.77801    0.46800   3.799 0.000255 ***
## Dosage             1.48608    0.05420  27.418  < 2e-16 ***
## TreatmentB:Dosage -0.76939    0.07665 -10.037  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.016 on 96 degrees of freedom
## Multiple R-squared:  0.9178, Adjusted R-squared:  0.9153 
## F-statistic: 357.5 on 3 and 96 DF,  p-value: < 2.2e-16

# Visualization
ggplot(dataset3, aes(x = Dosage, y = Response, color = Treatment)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Interaction Between Treatment and Dosage on Response")

## `geom_smooth()` using formula = 'y ~ x'

Interpretation: The interaction shows that the slope of dosage on response differs by treatment type. For example, if Treatment A shows a strong positive dosage effect and Treatment B a weaker one, the interaction term is significant.

# Estimated marginal means for dosage
library(emmeans)
dataset3$Dosage_group <- cut(dataset3$Dosage, breaks = 3)
model3b <- lm(Response ~ Treatment * Dosage_group, data = dataset3)
emmeans(model3b, ~ Treatment * Dosage_group)

##  Treatment Dosage_group emmean    SE df lower.CL upper.CL
##  A         (0.991,4]      9.02 0.368 94     8.30     9.75
##  B         (0.991,4]      9.02 0.368 94     8.29     9.75
##  A         (4,7]         13.15 0.379 94    12.40    13.90
##  B         (4,7]         10.58 0.379 94     9.83    11.34
##  A         (7,10]        18.00 0.368 94    17.27    18.73
##  B         (7,10]        13.21 0.368 94    12.48    13.94
## 
## Confidence level used: 0.95

Exercise 4: Visualizing Multivariate Interactions

Task: Given a multivariate dataset, create different types of graphs to visualize interactions and discuss which type of graph is most appropriate.

Use the simulated multivariate dataset with both continuous and categorical variables.
Create a graph for each type of interaction (categorical x categorical, categorical x continuous, continuous x continuous) . Use a interact_plot for the continuous x continuous interaction. Use any variables you’d like as long as they fit the variable type.

# Simulate data
set.seed(123)
Age <- rnorm(100, mean = 35, sd = 8)
Experience <- rnorm(100, mean = 10, sd = 5)
Gender <- factor(rep(c("Male", "Female"), each = 50))
Job_Type <- factor(rep(c("Office", "Field"), each = 25, times = 2))
Salary <- 30000 + 1000 * Age + 2000 * Experience + 150 * Age * Experience + 
  5000 * (Gender == "Female") + rnorm(100, sd = 5000)

data4 <- data.frame(Age, Experience, Gender, Job_Type, Salary)

head(data4)

##        Age Experience Gender Job_Type   Salary
## 1 30.51619   6.447967   Male   Office 113921.3
## 2 33.15858  11.284419   Male   Office 148415.8
## 3 47.46967   8.766541   Male   Office 156098.7
## 4 35.56407   8.262287   Male   Office 128880.7
## 5 36.03430   5.241907   Male   Office 102779.7
## 6 48.72052   9.774861   Male   Office 167324.5

tail(data4)

##          Age Experience Gender Job_Type    Salary
## 95  45.88522   3.445992 Female    Field 121507.64
## 96  30.19792  19.986067 Female    Field 196034.22
## 97  52.49866  13.003544 Female    Field 225240.32
## 98  47.26089   3.743643 Female    Field 109532.84
## 99  33.11440   6.944170 Female    Field 116600.46
## 100 26.78863   4.072600 Female    Field  92548.31

# Categorical x Categorical Interaction (Bar graph with means and SE)

Graph interpretation:

# Categorical x Continuous plot

Graph interpretation:

# Continuous x Continuous plot

Graph interpretation:

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.

# Your own dataset or simulation to demonstrate a three-way interaction
set.seed(789)
dataset4 <- data.frame(
  Gender = rep(c("Male", "Female"), each = 100),
  Experience = rep(rnorm(100, 5, 2), 2),
  Department = rep(c("HR", "Tech"), 100),
  Salary = NA
)
dataset4$Salary <- 30000 + 2000 * (dataset4$Gender == "Female") +
  1500 * dataset4$Experience + 
  1000 * (dataset4$Department == "Tech") +
  500 * (dataset4$Gender == "Female") * dataset4$Experience +
  400 * (dataset4$Gender == "Female") * (dataset4$Department == "Tech") +
  rnorm(200, 0, 2000)

# Fit model with 3-way interaction
model4 <- lm(Salary ~ Gender * Experience * Department, data = dataset4)
summary(model4)

## 
## Call:
## lm(formula = Salary ~ Gender * Experience * Department, data = dataset4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5659.0 -1327.8   200.8  1192.8  6149.8 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                          30903.89     716.03  43.160  < 2e-16 ***
## GenderMale                            -802.42    1012.62  -0.792  0.42909    
## Experience                            2060.37     139.89  14.728  < 2e-16 ***
## DepartmentTech                        2707.06    1102.39   2.456  0.01495 *  
## GenderMale:Experience                 -646.17     197.83  -3.266  0.00129 ** 
## GenderMale:DepartmentTech            -1523.10    1559.01  -0.977  0.32982    
## Experience:DepartmentTech              -16.04     207.66  -0.077  0.93851    
## GenderMale:Experience:DepartmentTech    49.43     293.68   0.168  0.86651    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1997 on 192 degrees of freedom
## Multiple R-squared:  0.8327, Adjusted R-squared:  0.8266 
## F-statistic: 136.5 on 7 and 192 DF,  p-value: < 2.2e-16

# Visualization using interaction plot
ggplot(dataset4, aes(x = Experience, y = Salary, color = Department, linetype = Gender)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Three-Way Interaction: Gender x Experience x Department on Salary")

## `geom_smooth()` using formula = 'y ~ x'

Interpretation: A significant 3-way interaction indicates the effect of experience on salary differs based on both gender and department. For example, women in Tech might see greater salary gains per year of experience compared to other groups.

# Use emmeans for 3-way interaction probing
library(emmeans)
dataset4$Experience_group <- cut(dataset4$Experience, breaks = 3)
model4b <- lm(Salary ~ Gender * Department * Experience_group, data = dataset4)
emmeans(model4b, ~ Gender * Department * Experience_group)

##  Gender Department Experience_group emmean   SE  df lower.CL upper.CL
##  Female HR         (-1.19,2.45]      34915 1090 188    32774    37056
##  Male   HR         (-1.19,2.45]      32199 1090 188    30058    34340
##  Female Tech       (-1.19,2.45]      36512 1530 188    33484    39540
##  Male   Tech       (-1.19,2.45]      32855 1530 188    29827    35883
##  Female HR         (2.45,6.07]       39367  477 188    38425    40309
##  Male   HR         (2.45,6.07]       35978  477 188    35036    36920
##  Female Tech       (2.45,6.07]       42536  494 188    41562    43510
##  Male   Tech       (2.45,6.07]       37637  494 188    36663    38611
##  Female HR         (6.07,9.71]       46147  737 188    44692    47602
##  Male   HR         (6.07,9.71]       40704  737 188    39250    42159
##  Female Tech       (6.07,9.71]       47949  627 188    46713    49186
##  Male   Tech       (6.07,9.71]       41468  627 188    40232    42705
## 
## Confidence level used: 0.95

Graph interpretation:

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.