title: “Weekly Lab Homework Assignment: Interaction Effects” author: “Mario Rizzardi” date: “5/6/2025” output: html_document: toc: true toc_depth: 3 toc_float: true theme: flatly highlight: pygments —
Replace “Your Name” with your actual name.
In this lab assignment, you will practice interpreting interactions by visualizing them with different types of graphs. You’ll work with simulated datasets and explore interactions between categorical x categorical, linear x linear, and categorical x linear variables.
Please follow the instructions for each exercise, and use
ggplot2
for all visualizations.
Task: Use the dataset with two categorical variables and one outcome variable. Fit a model with a categorical x categorical interaction, and visualize the interaction using a bar graph with error bars.
Education_Level
(e.g., “High School”, “College”) and
Job_Type
(e.g., “Office”, “Field”), and an outcome variable
Job_Satisfaction
.Education_Level
and Job_Type
on
Job_Satisfaction
.emmeans
to compare all groups. First run
emmeans
and then pairs
.library(ggplot2)
library(dplyr)
library(interactions)
library(emmeans)
# Simulate data
set.seed(123)
Education_Level <- factor(rep(c("High School", "College"), each = 50))
Job_Type <- factor(rep(c("Office", "Field"), each = 25, times = 2))
Job_Satisfaction <- ifelse(Education_Level == "College",
70 + 10 * (Job_Type == "Office"),
60 + 5 * (Job_Type == "Field")) + rnorm(100, sd = 5)
data1 <- data.frame(Education_Level, Job_Type, Job_Satisfaction)
# Calculate means and standard errors with correct handling
means1 <- data1 %>%
group_by(Education_Level, Job_Type) %>%
summarise(
Job_Satisfaction_Mean = mean(Job_Satisfaction),
SE = (sd(Job_Satisfaction) / sqrt(n())), # Calculate SE
lower = Job_Satisfaction_Mean - 1.96 * SE, # Lower bound of the confidence interval
upper = Job_Satisfaction_Mean + 1.96 * SE # Upper bound of the confidence interval
)
# Check the calculated values
print(means1)
## # A tibble: 4 Ă— 6
## # Groups: Education_Level [2]
## Education_Level Job_Type Job_Satisfaction_Mean SE lower upper
## <fct> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 College Field 71.4 0.830 69.8 73.0
## 2 College Office 80.1 0.972 78.1 82.0
## 3 High School Field 65.5 0.919 63.7 67.3
## 4 High School Office 59.8 0.947 58.0 61.7
#Run your lm() model and summary() below using the data1 dataset
# Dummy Coding:
# High school is coded as 1, college is coded as 0
# Office is coded as 1, Field is coded as 0.
#Run your summary() below
Interpret any significant Effects:
# Plot the bar graph of means with error bars.
Interpretation of Plot:
Below run emmeans
to estimate the marginal means.
Below run pairs
to perform pairwise comparisons (post
hoc tests) with Tukey adjustment for multiple comparisons.
Task: Use the dataset with two continuous variables
and one outcome variable. Fit a model with a linear x linear
interaction, and visualize the interaction using a 2D plot with the
interactions
library.
Age
and Weekly_Hours_Worked
, and an outcome variable
Income
.Age
and Weekly_Hours_Worked
.interact_plot()
to visualize the
interaction.# Simulate data
set.seed(123)
Age <- rnorm(100, mean = 40, sd = 10)
Weekly_Hours_Worked <- rnorm(100, mean = 40, sd = 5)
Income <- 30000 + 500 * Age + 1000 * Weekly_Hours_Worked + 50 * Age * Weekly_Hours_Worked + rnorm(100, sd = 5000)
data2 <- data.frame(Age, Weekly_Hours_Worked, Income)
# Fit the lm() model
# summary() of the model
Interpretation of the plot:
# Simulated data: linear x linear interaction
set.seed(123)
dataset2 <- data.frame(
Stress = rnorm(100, mean = 50, sd = 10),
Sleep = rnorm(100, mean = 7, sd = 1.5),
Performance = NA
)
dataset2$Performance <- 70 + 0.5 * dataset2$Sleep - 0.4 * dataset2$Stress + 0.05 * dataset2$Sleep * dataset2$Stress + rnorm(100)
# Fit the model with interaction
model2 <- lm(Performance ~ Sleep * Stress, data = dataset2)
summary(model2)
##
## Call:
## lm(formula = Performance ~ Sleep * Stress, data = dataset2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8719 -0.6777 -0.1086 0.5897 2.3166
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.157254 2.596623 28.559 < 2e-16 ***
## Sleep -0.007462 0.382275 -0.020 0.984
## Stress -0.483531 0.051592 -9.372 3.32e-15 ***
## Sleep:Stress 0.060607 0.007633 7.940 3.80e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9468 on 96 degrees of freedom
## Multiple R-squared: 0.9587, Adjusted R-squared: 0.9574
## F-statistic: 742.8 on 3 and 96 DF, p-value: < 2.2e-16
# Visualization using ggplot2
library(ggplot2)
ggplot(dataset2, aes(x = Sleep, y = Performance, color = Stress)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x * Stress, se = FALSE) +
labs(title = "Interaction Between Sleep and Stress on Performance")
## Warning: Failed to fit group -1.
## Caused by error:
## ! object 'Stress' not found
Interpretation: The interaction term indicates that the effect of sleep on performance depends on the level of stress. If the interaction term is significant and positive, higher stress increases the benefit of sleep on performance.
# Use emmeans to probe interaction
library(emmeans)
dataset2$Stress_group <- cut(dataset2$Stress, breaks = quantile(dataset2$Stress, probs = c(0, 0.33, 0.66, 1)), labels = c("Low", "Medium", "High"))
model2b <- lm(Performance ~ Sleep * Stress_group, data = dataset2)
emmeans(model2b, ~ Sleep | Stress_group)
## Stress_group = Low:
## Sleep emmean SE df lower.CL upper.CL
## 6.84 71.3 0.188 93 70.9 71.6
##
## Stress_group = Medium:
## Sleep emmean SE df lower.CL upper.CL
## 6.84 70.6 0.191 93 70.2 71.0
##
## Stress_group = High:
## Sleep emmean SE df lower.CL upper.CL
## 6.84 69.9 0.185 93 69.5 70.2
##
## Confidence level used: 0.95
Task: Use the simulated dataset with one categorical variable and one continuous variable as predictors. Fit a model with a categorical x linear interaction, and visualize the interaction using an interaction plot.
Gender
and one continuous variable
Study_Hours
, and an outcome variable
Test_Score
.Gender
and Study_Hours
.ggplot2
to visualize
the interaction.# Simulate data
set.seed(123)
Gender <- factor(rep(c("Male", "Female"), each = 50))
Study_Hours <- rnorm(100, mean = 5, sd = 2)
Test_Score <- 60 + 10 * (Gender == "Female") + 5 * Study_Hours + 5 * (Gender == "Female") * Study_Hours + rnorm(100, sd = 5)
data3 <- data.frame(Gender, Study_Hours, Test_Score)
# Fit the model
#Dummy coding: Female coded as 0, Male coded as 1
# Summary of the model
Interpretation for significant main effects:
# Plot the interaction
Interpretation of graph:
# Simulated data: categorical x linear interaction
set.seed(456)
dataset3 <- data.frame(
Treatment = rep(c("A", "B"), each = 50),
Dosage = rep(seq(1, 10, length.out = 50), 2),
Response = NA
)
dataset3$Response <- 5 + 2 * (dataset3$Treatment == "B") + 1.5 * dataset3$Dosage -
0.8 * (dataset3$Treatment == "B") * dataset3$Dosage + rnorm(100)
# Fit model
model3 <- lm(Response ~ Treatment * Dosage, data = dataset3)
summary(model3)
##
## Call:
## lm(formula = Response ~ Treatment * Dosage, data = dataset3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.30401 -0.64065 -0.03181 0.68451 2.11604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.22392 0.33092 15.786 < 2e-16 ***
## TreatmentB 1.77801 0.46800 3.799 0.000255 ***
## Dosage 1.48608 0.05420 27.418 < 2e-16 ***
## TreatmentB:Dosage -0.76939 0.07665 -10.037 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.016 on 96 degrees of freedom
## Multiple R-squared: 0.9178, Adjusted R-squared: 0.9153
## F-statistic: 357.5 on 3 and 96 DF, p-value: < 2.2e-16
# Visualization
ggplot(dataset3, aes(x = Dosage, y = Response, color = Treatment)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Interaction Between Treatment and Dosage on Response")
## `geom_smooth()` using formula = 'y ~ x'
Interpretation: The interaction shows that the slope of dosage on response differs by treatment type. For example, if Treatment A shows a strong positive dosage effect and Treatment B a weaker one, the interaction term is significant.
# Estimated marginal means for dosage
library(emmeans)
dataset3$Dosage_group <- cut(dataset3$Dosage, breaks = 3)
model3b <- lm(Response ~ Treatment * Dosage_group, data = dataset3)
emmeans(model3b, ~ Treatment * Dosage_group)
## Treatment Dosage_group emmean SE df lower.CL upper.CL
## A (0.991,4] 9.02 0.368 94 8.30 9.75
## B (0.991,4] 9.02 0.368 94 8.29 9.75
## A (4,7] 13.15 0.379 94 12.40 13.90
## B (4,7] 10.58 0.379 94 9.83 11.34
## A (7,10] 18.00 0.368 94 17.27 18.73
## B (7,10] 13.21 0.368 94 12.48 13.94
##
## Confidence level used: 0.95
Task: Given a multivariate dataset, create different types of graphs to visualize interactions and discuss which type of graph is most appropriate.
interact_plot
for the continuous x continuous interaction.
Use any variables you’d like as long as they fit the variable type.# Simulate data
set.seed(123)
Age <- rnorm(100, mean = 35, sd = 8)
Experience <- rnorm(100, mean = 10, sd = 5)
Gender <- factor(rep(c("Male", "Female"), each = 50))
Job_Type <- factor(rep(c("Office", "Field"), each = 25, times = 2))
Salary <- 30000 + 1000 * Age + 2000 * Experience + 150 * Age * Experience +
5000 * (Gender == "Female") + rnorm(100, sd = 5000)
data4 <- data.frame(Age, Experience, Gender, Job_Type, Salary)
head(data4)
## Age Experience Gender Job_Type Salary
## 1 30.51619 6.447967 Male Office 113921.3
## 2 33.15858 11.284419 Male Office 148415.8
## 3 47.46967 8.766541 Male Office 156098.7
## 4 35.56407 8.262287 Male Office 128880.7
## 5 36.03430 5.241907 Male Office 102779.7
## 6 48.72052 9.774861 Male Office 167324.5
tail(data4)
## Age Experience Gender Job_Type Salary
## 95 45.88522 3.445992 Female Field 121507.64
## 96 30.19792 19.986067 Female Field 196034.22
## 97 52.49866 13.003544 Female Field 225240.32
## 98 47.26089 3.743643 Female Field 109532.84
## 99 33.11440 6.944170 Female Field 116600.46
## 100 26.78863 4.072600 Female Field 92548.31
# Categorical x Categorical Interaction (Bar graph with means and SE)
Graph interpretation:
# Categorical x Continuous plot
Graph interpretation:
# Continuous x Continuous plot
Graph interpretation:
Submission Instructions:
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.
# Your own dataset or simulation to demonstrate a three-way interaction
set.seed(789)
dataset4 <- data.frame(
Gender = rep(c("Male", "Female"), each = 100),
Experience = rep(rnorm(100, 5, 2), 2),
Department = rep(c("HR", "Tech"), 100),
Salary = NA
)
dataset4$Salary <- 30000 + 2000 * (dataset4$Gender == "Female") +
1500 * dataset4$Experience +
1000 * (dataset4$Department == "Tech") +
500 * (dataset4$Gender == "Female") * dataset4$Experience +
400 * (dataset4$Gender == "Female") * (dataset4$Department == "Tech") +
rnorm(200, 0, 2000)
# Fit model with 3-way interaction
model4 <- lm(Salary ~ Gender * Experience * Department, data = dataset4)
summary(model4)
##
## Call:
## lm(formula = Salary ~ Gender * Experience * Department, data = dataset4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5659.0 -1327.8 200.8 1192.8 6149.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30903.89 716.03 43.160 < 2e-16 ***
## GenderMale -802.42 1012.62 -0.792 0.42909
## Experience 2060.37 139.89 14.728 < 2e-16 ***
## DepartmentTech 2707.06 1102.39 2.456 0.01495 *
## GenderMale:Experience -646.17 197.83 -3.266 0.00129 **
## GenderMale:DepartmentTech -1523.10 1559.01 -0.977 0.32982
## Experience:DepartmentTech -16.04 207.66 -0.077 0.93851
## GenderMale:Experience:DepartmentTech 49.43 293.68 0.168 0.86651
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1997 on 192 degrees of freedom
## Multiple R-squared: 0.8327, Adjusted R-squared: 0.8266
## F-statistic: 136.5 on 7 and 192 DF, p-value: < 2.2e-16
# Visualization using interaction plot
ggplot(dataset4, aes(x = Experience, y = Salary, color = Department, linetype = Gender)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Three-Way Interaction: Gender x Experience x Department on Salary")
## `geom_smooth()` using formula = 'y ~ x'
Interpretation: A significant 3-way interaction indicates the effect of experience on salary differs based on both gender and department. For example, women in Tech might see greater salary gains per year of experience compared to other groups.
# Use emmeans for 3-way interaction probing
library(emmeans)
dataset4$Experience_group <- cut(dataset4$Experience, breaks = 3)
model4b <- lm(Salary ~ Gender * Department * Experience_group, data = dataset4)
emmeans(model4b, ~ Gender * Department * Experience_group)
## Gender Department Experience_group emmean SE df lower.CL upper.CL
## Female HR (-1.19,2.45] 34915 1090 188 32774 37056
## Male HR (-1.19,2.45] 32199 1090 188 30058 34340
## Female Tech (-1.19,2.45] 36512 1530 188 33484 39540
## Male Tech (-1.19,2.45] 32855 1530 188 29827 35883
## Female HR (2.45,6.07] 39367 477 188 38425 40309
## Male HR (2.45,6.07] 35978 477 188 35036 36920
## Female Tech (2.45,6.07] 42536 494 188 41562 43510
## Male Tech (2.45,6.07] 37637 494 188 36663 38611
## Female HR (6.07,9.71] 46147 737 188 44692 47602
## Male HR (6.07,9.71] 40704 737 188 39250 42159
## Female Tech (6.07,9.71] 47949 627 188 46713 49186
## Male Tech (6.07,9.71] 41468 627 188 40232 42705
##
## Confidence level used: 0.95
Graph interpretation:
Submission Instructions:
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.