Replace “Your Name” with your actual name.
In this lab assignment, you will practice interpreting interactions by visualizing them with different types of graphs. You’ll work with simulated datasets and explore interactions between categorical x categorical, linear x linear, and categorical x linear variables.
Please follow the instructions for each exercise, and use
ggplot2
for all visualizations.
Task: Use the dataset with two categorical variables and one outcome variable. Fit a model with a categorical x categorical interaction, and visualize the interaction using a bar graph with error bars.
Education_Level
(e.g., “High School”, “College”) and
Job_Type
(e.g., “Office”, “Field”), and an outcome variable
Job_Satisfaction
.Education_Level
and Job_Type
on
Job_Satisfaction
.emmeans
to compare all groups. First run
emmeans
and then pairs
.# Simulate data
set.seed(123)
Education_Level <- factor(rep(c("High School", "College"), each = 50))
Job_Type <- factor(rep(c("Office", "Field"), each = 25, times = 2))
Job_Satisfaction <- ifelse(Education_Level == "College",
70 + 10 * (Job_Type == "Office"),
60 + 5 * (Job_Type == "Field")) + rnorm(100, sd = 5)
data1 <- data.frame(Education_Level, Job_Type, Job_Satisfaction)
# Calculate means and standard errors with correct handling
means1 <- data1 %>%
group_by(Education_Level, Job_Type) %>%
summarise(
Job_Satisfaction_Mean = mean(Job_Satisfaction),
SE = (sd(Job_Satisfaction) / sqrt(n())), # Calculate SE
lower = Job_Satisfaction_Mean - 1.96 * SE, # Lower bound of the confidence interval
upper = Job_Satisfaction_Mean + 1.96 * SE # Upper bound of the confidence interval
)
# Check the calculated values
print(means1)
## # A tibble: 4 × 6
## # Groups: Education_Level [2]
## Education_Level Job_Type Job_Satisfaction_Mean SE lower upper
## <fct> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 College Field 71.4 0.830 69.8 73.0
## 2 College Office 80.1 0.972 78.1 82.0
## 3 High School Field 65.5 0.919 63.7 67.3
## 4 High School Office 59.8 0.947 58.0 61.7
#Run your lm() model and summary() below using the data1 dataset
# Dummy Coding:
# High school is coded as 1, college is coded as 0
# Office is coded as 1, Field is coded as 0.
#Run your summary() below
mod.1 <- lm(Job_Satisfaction ~ Job_Type*Education_Level, data = data1)
summary(mod.1)
##
## Call:
## lm(formula = Job_Satisfaction ~ Job_Type * Education_Level, data = data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5970 -2.8385 -0.2066 3.0467 10.3341
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.4129 0.9187 77.735 < 2e-16
## Job_TypeOffice 8.6383 1.2992 6.649 1.80e-09
## Education_LevelHigh School -5.9022 1.2992 -4.543 1.61e-05
## Job_TypeOffice:Education_LevelHigh School -14.3157 1.8373 -7.792 7.83e-12
##
## (Intercept) ***
## Job_TypeOffice ***
## Education_LevelHigh School ***
## Job_TypeOffice:Education_LevelHigh School ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.593 on 96 degrees of freedom
## Multiple R-squared: 0.7344, Adjusted R-squared: 0.7261
## F-statistic: 88.47 on 3 and 96 DF, p-value: < 2.2e-16
Interpret any significant Effects:
Intercept: When job type is field and education level is college, the
predicted job satisfaction is 71.41.
There is a significant main effect of job type such that in comparison to workers on the workers on the field, workers in the office have a 8.64 higher job satisfaction (b = 8.64, se = 1.30, p < 0.001).
There is a significant main effect of education level such that in comparison to college educated workers, high school educated workers have 5.9 lower job satisfaction. (b = -5.90, se = 1.3, p < 0.001).
# Plot the bar graph of means with error bars.
ggplot(data1, aes(x = Job_Type, y = Job_Satisfaction, fill = Education_Level, color = Education_Level)) +
stat_summary(fun.y = mean, position = position_dodge(), geom = "bar") +
stat_summary(fun.data = mean_cl_normal, position = position_dodge(), geom = "errorbar") +
theme_apa() + scale_fill_grey() + scale_color_grey()
Interpretation of Plot: There is a significant interaction between job type and education level, such that, while overall college degree workers have higher job satisfaction, this difference is significantly larger in office job setting (b = -14.3157, se = 1.8373, p < 00.1).
Below run emmeans
to estimate the marginal means.
Below run pairs
to perform pairwise comparisons (post
hoc tests) with Tukey adjustment for multiple comparisons.
## contrast estimate SE df t.ratio p.value
## Field College - Office College -8.64 1.3 96 -6.649 <.0001
## Field College - Field High School 5.90 1.3 96 4.543 0.0001
## Field College - Office High School 11.58 1.3 96 8.913 <.0001
## Office College - Field High School 14.54 1.3 96 11.192 <.0001
## Office College - Office High School 20.22 1.3 96 15.562 <.0001
## Field High School - Office High School 5.68 1.3 96 4.370 0.0002
##
## P value adjustment: tukey method for comparing a family of 4 estimates
Task: Use the dataset with two continuous variables
and one outcome variable. Fit a model with a linear x linear
interaction, and visualize the interaction using a 2D plot with the
interactions
library.
Age
and Weekly_Hours_Worked
, and an outcome variable
Income
.Age
and Weekly_Hours_Worked
.interact_plot()
to visualize the
interaction.# Simulate data
set.seed(123)
Age <- rnorm(100, mean = 40, sd = 10)
Weekly_Hours_Worked <- rnorm(100, mean = 40, sd = 5)
Income <- 30000 + 500 * Age + 1000 * Weekly_Hours_Worked + 50 * Age * Weekly_Hours_Worked + rnorm(100, sd = 5000)
data2 <- data.frame(Age, Weekly_Hours_Worked, Income)
# Fit the lm() model
mod.2 <- lm(Income ~ Age*Weekly_Hours_Worked, data = data2)
# summary() of the model
summary(mod.2)
##
## Call:
## lm(formula = Income ~ Age * Weekly_Hours_Worked, data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9360 -3389 -543 2948 11583
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56644.49 18039.76 3.140 0.00225 **
## Age -182.83 446.51 -0.409 0.68311
## Weekly_Hours_Worked 397.91 461.04 0.863 0.39025
## Age:Weekly_Hours_Worked 65.91 11.45 5.757 1.03e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4734 on 96 degrees of freedom
## Multiple R-squared: 0.9678, Adjusted R-squared: 0.9668
## F-statistic: 962.5 on 3 and 96 DF, p-value: < 2.2e-16
interact_plot(mod.2, pred = Age, modx = Weekly_Hours_Worked, modx.values = "plus-minus", colors = c("black", "black")) +
theme_apa()
Interpretation of the plot: There is a significant interaction between age and weekly hours worked on income such that the positive relationship between age and income is stronger for those who work more hours (b = 65.91, se = 11.45, p < 0.001).
Task: Use the simulated dataset with one categorical variable and one continuous variable as predictors. Fit a model with a categorical x linear interaction, and visualize the interaction using an interaction plot.
Gender
and one continuous variable
Study_Hours
, and an outcome variable
Test_Score
.Gender
and Study_Hours
.ggplot2
to visualize
the interaction.# Simulate data
set.seed(123)
Gender <- factor(rep(c("Male", "Female"), each = 50))
Study_Hours <- rnorm(100, mean = 5, sd = 2)
Test_Score <- 60 + 10 * (Gender == "Female") + 5 * Study_Hours + 5 * (Gender == "Female") * Study_Hours + rnorm(100, sd = 5)
data3 <- data.frame(Gender, Study_Hours, Test_Score)
# Fit the model
#Dummy coding: Female coded as 0, Male coded as 1
mod.3 <- lm(Test_Score ~ Gender*Study_Hours, data = data3)
summary(mod.3)
##
## Call:
## lm(formula = Test_Score ~ Gender * Study_Hours, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.1191 -3.3752 -0.4846 3.0552 15.0753
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.3200 2.1265 34.009 < 2e-16 ***
## GenderMale -13.9836 2.9233 -4.784 6.22e-06 ***
## Study_Hours 9.5983 0.3805 25.223 < 2e-16 ***
## GenderMale:Study_Hours -4.5206 0.5323 -8.493 2.54e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.824 on 96 degrees of freedom
## Multiple R-squared: 0.9624, Adjusted R-squared: 0.9613
## F-statistic: 820.2 on 3 and 96 DF, p-value: < 2.2e-16
Interpretation for significant main effects: Intercept: When female have 0 study hours the test score is predicte to be 72.32. GenderMale: In comparison to females, males are predicted to have a 13.98 lower test score (b = -13.98, se = 2.92, p < 0.001). Study_Hours: There is a significant main effect of study hours such that each additional hour of studying, test scored increase by 9.60 (b = 9.60, se = 0.38, p < 0.001).
# Plot the interaction
ggplot(data3, aes(x = Study_Hours, y = Test_Score, color = Gender)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, aes(linetype = Gender)) +
theme_apa() +
scale_color_manual(values = c("Female" = "black", "Male" = "black"))
## `geom_smooth()` using formula = 'y ~ x'
Interpretation of graph: There is a significant interaction of gener and study hours such that the positive relation between study hours and test score is stronger for females (b = -4.5206, se = 0.5323, p )
Task: Given a multivariate dataset, create different types of graphs to visualize interactions and discuss which type of graph is most appropriate.
interact_plot
for the continuous x continuous interaction.
Use any variables you’d like as long as they fit the variable type.# Simulate data
set.seed(123)
Age <- rnorm(100, mean = 35, sd = 8)
Experience <- rnorm(100, mean = 10, sd = 5)
Gender <- factor(rep(c("Male", "Female"), each = 50))
Job_Type <- factor(rep(c("Office", "Field"), each = 25, times = 2))
Salary <- 30000 + 1000 * Age + 2000 * Experience + 150 * Age * Experience +
5000 * (Gender == "Female") + rnorm(100, sd = 5000)
data4 <- data.frame(Age, Experience, Gender, Job_Type, Salary)
head(data4)
## Age Experience Gender Job_Type Salary
## 1 30.51619 6.447967 Male Office 113921.3
## 2 33.15858 11.284419 Male Office 148415.8
## 3 47.46967 8.766541 Male Office 156098.7
## 4 35.56407 8.262287 Male Office 128880.7
## 5 36.03430 5.241907 Male Office 102779.7
## 6 48.72052 9.774861 Male Office 167324.5
## Age Experience Gender Job_Type Salary
## 95 45.88522 3.445992 Female Field 121507.64
## 96 30.19792 19.986067 Female Field 196034.22
## 97 52.49866 13.003544 Female Field 225240.32
## 98 47.26089 3.743643 Female Field 109532.84
## 99 33.11440 6.944170 Female Field 116600.46
## 100 26.78863 4.072600 Female Field 92548.31
# Categorical x Categorical Interaction (Bar graph with means and SE)
ggplot(data4, aes(x = Job_Type, y = Salary, fill = Gender, color = Gender)) +
stat_summary(fun.y = mean, position = position_dodge(), geom = "bar") +
stat_summary(fun.data = mean_cl_normal, position = position_dodge(), geom = "errorbar") +
theme_apa() + scale_fill_grey() + scale_color_grey()
Graph interpretation:
# Categorical x Continuous plot
ggplot(data4, aes(x = Experience, y = Salary, color = Gender)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, aes(linetype = Gender)) +
theme_apa() +
scale_color_manual(values = c("Female" = "black", "Male" = "black"))
Graph interpretation:
# Continuous x Continuous plot
mod.4 <- lm(Salary ~ Age*Experience, data =data4)
interact_plot(mod.4, pred = Age, modx = Experience, modx.values = "plus-minus", colors = c("black", "black")) +
theme_apa()
Graph interpretation:
Submission Instructions:
Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.