Replace “Your Name” with your actual name.

Instructions

In this lab assignment, you will practice interpreting interactions by visualizing them with different types of graphs. You’ll work with simulated datasets and explore interactions between categorical x categorical, linear x linear, and categorical x linear variables.

Please follow the instructions for each exercise, and use ggplot2 for all visualizations.

Exercise 1: Categorical x Categorical Interaction

Task: Use the dataset with two categorical variables and one outcome variable. Fit a model with a categorical x categorical interaction, and visualize the interaction using a bar graph with error bars.

  1. Use the dataset with two categorical variables: Education_Level (e.g., “High School”, “College”) and Job_Type (e.g., “Office”, “Field”), and an outcome variable Job_Satisfaction.
  2. Fit a linear model including the interaction between Education_Level and Job_Type on Job_Satisfaction.
  3. Create a bar graph with error bars to visualize the interaction.
  4. Interpret the interaction term and the graph.
  5. Run emmeans to compare all groups. First run emmeans and then pairs.
library(ggplot2)
library(dplyr)
library(interactions)
library(emmeans)
# Simulate data
set.seed(123)
Education_Level <- factor(rep(c("High School", "College"), each = 50))
Job_Type <- factor(rep(c("Office", "Field"), each = 25, times = 2))
Job_Satisfaction <- ifelse(Education_Level == "College", 
                           70 + 10 * (Job_Type == "Office"), 
                           60 + 5 * (Job_Type == "Field")) + rnorm(100, sd = 5)

data1 <- data.frame(Education_Level, Job_Type, Job_Satisfaction)

# Calculate means and standard errors with correct handling
means1 <- data1 %>%
  group_by(Education_Level, Job_Type) %>%
  summarise(
    Job_Satisfaction_Mean = mean(Job_Satisfaction),
    SE = (sd(Job_Satisfaction) / sqrt(n())),  # Calculate SE 
    lower = Job_Satisfaction_Mean - 1.96 * SE,  # Lower bound of the confidence interval
    upper = Job_Satisfaction_Mean + 1.96 * SE   # Upper bound of the confidence interval
  )

# Check the calculated values
print(means1)
## # A tibble: 4 × 6
## # Groups:   Education_Level [2]
##   Education_Level Job_Type Job_Satisfaction_Mean    SE lower upper
##   <fct>           <fct>                    <dbl> <dbl> <dbl> <dbl>
## 1 College         Field                     71.4 0.830  69.8  73.0
## 2 College         Office                    80.1 0.972  78.1  82.0
## 3 High School     Field                     65.5 0.919  63.7  67.3
## 4 High School     Office                    59.8 0.947  58.0  61.7
mod.1 <- lm(Job_Satisfaction ~ Job_Type*Education_Level, data=data1)
summary(mod.1)
## 
## Call:
## lm(formula = Job_Satisfaction ~ Job_Type * Education_Level, data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5970  -2.8385  -0.2066   3.0467  10.3341 
## 
## Coefficients:
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                71.4129     0.9187  77.735  < 2e-16
## Job_TypeOffice                              8.6383     1.2992   6.649 1.80e-09
## Education_LevelHigh School                 -5.9022     1.2992  -4.543 1.61e-05
## Job_TypeOffice:Education_LevelHigh School -14.3157     1.8373  -7.792 7.83e-12
##                                              
## (Intercept)                               ***
## Job_TypeOffice                            ***
## Education_LevelHigh School                ***
## Job_TypeOffice:Education_LevelHigh School ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.593 on 96 degrees of freedom
## Multiple R-squared:  0.7344, Adjusted R-squared:  0.7261 
## F-statistic: 88.47 on 3 and 96 DF,  p-value: < 2.2e-16

Interpret any significant Effects:
Intercept: when job type is field and education level is college, the predicted job satisf. is 71.41.

There is sig. main effect of job type between feild and office workers. Office satisf. is 8.64 higher then field work. (b = 8.64, se 1.30, p < 0.001)

There is sig. main effect of education level shown by college educated worker having 5.9 higher job satisf. over high school educated workers. (b = -5.90, se 1.30, p < 0.001)

ggplot(data1, aes(x = Job_Type, y = Job_Satisfaction, fill = Education_Level, color = Education_Level)) +
  stat_summary(fun.y = mean, position = position_dodge(), geom = "bar") +
  stat_summary(fun.data = mean_cl_normal, position = position_dodge(), geom = "errorbar") + theme_apa() + scale_fill_grey() + scale_color_grey()

Interpretation of Plot: There is a sig. interaction between job type and education level such that while overall college degree workers have higher job satisfaction, this difference is sig. larger in office job settings. (b = -14.3157, se = 1.8373, p < 0.001)

Below run emmeans to estimate the marginal means.

emmeans.model <- emmeans(mod.1, ~ Job_Type*Education_Level)

Below run pairs to perform pairwise comparisons (post hoc tests) with Tukey adjustment for multiple comparisons.

pairs(emmeans.model, adjust = "Tukey")
##  contrast                               estimate  SE df t.ratio p.value
##  Field College - Office College            -8.64 1.3 96  -6.649  <.0001
##  Field College - Field High School          5.90 1.3 96   4.543  0.0001
##  Field College - Office High School        11.58 1.3 96   8.913  <.0001
##  Office College - Field High School        14.54 1.3 96  11.192  <.0001
##  Office College - Office High School       20.22 1.3 96  15.562  <.0001
##  Field High School - Office High School     5.68 1.3 96   4.370  0.0002
## 
## P value adjustment: tukey method for comparing a family of 4 estimates

Exercise 2: Linear x Linear Interaction

Task: Use the dataset with two continuous variables and one outcome variable. Fit a model with a linear x linear interaction, and visualize the interaction using a 2D plot with the interactions library.

  1. Use the two continious predictors from the dataset: Age and Weekly_Hours_Worked, and an outcome variable Income.
  2. Fit a linear model including the interaction between Age and Weekly_Hours_Worked.
  3. Create a 2D plot using interact_plot() to visualize the interaction.
  4. Interpret the interaction term and the graph.
# Simulate data
set.seed(123)
Age <- rnorm(100, mean = 40, sd = 10)
Weekly_Hours_Worked <- rnorm(100, mean = 40, sd = 5)
Income <- 30000 + 500 * Age + 1000 * Weekly_Hours_Worked + 50 * Age * Weekly_Hours_Worked + rnorm(100, sd = 5000)

data2 <- data.frame(Age, Weekly_Hours_Worked, Income)
mod.2 <- lm(Income ~ Age*Weekly_Hours_Worked, data = data2)
summary(mod.2)
## 
## Call:
## lm(formula = Income ~ Age * Weekly_Hours_Worked, data = data2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9360  -3389   -543   2948  11583 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             56644.49   18039.76   3.140  0.00225 ** 
## Age                      -182.83     446.51  -0.409  0.68311    
## Weekly_Hours_Worked       397.91     461.04   0.863  0.39025    
## Age:Weekly_Hours_Worked    65.91      11.45   5.757 1.03e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4734 on 96 degrees of freedom
## Multiple R-squared:  0.9678, Adjusted R-squared:  0.9668 
## F-statistic: 962.5 on 3 and 96 DF,  p-value: < 2.2e-16

(Intercept) 56644.49 18039.76 3.140 0.00225 ** Age -182.83 446.51 -0.409 0.68311
Weekly_Hours_Worked 397.91 461.04 0.863 0.39025
Age:Weekly_Hours_Worked 65.91 11.45 5.757 1.03e-07 ***

Intercept: When Age & Weekly hours are 0m the predicted income is 56,644.

There is no significant main effect of age on income (b = -182.83, se = 446.51, p > 0.05).

There is no sig. main effect of weekly hours worker on income (b = 397.91, se = 461.04, p > 0.05).

interact_plot(mod.2, pred = Age, modx = Weekly_Hours_Worked, modx.values = "plus-minus", colors = c("black","black")) + theme_apa()

Interpretation of the plot: There is a sig. interaction between age and weekly hours worker on income such that the positive relationship between age and income is stronger for thos who work more hours. (b = 65.91, se = 11.45, p < 0.001)

Exercise 3: Categorical x Linear Interaction

Task: Use the simulated dataset with one categorical variable and one continuous variable as predictors. Fit a model with a categorical x linear interaction, and visualize the interaction using an interaction plot.

  1. Use the dataset with the two predictos: one categorical variable Gender and one continuous variable Study_Hours, and an outcome variable Test_Score.
  2. Fit a linear model including the interaction between Gender and Study_Hours.
  3. Create an interaction plot using ggplot2 to visualize the interaction.
  4. Interpret the interaction term and the graph.
# Simulate data
set.seed(123)
Gender <- factor(rep(c("Male", "Female"), each = 50))
Study_Hours <- rnorm(100, mean = 5, sd = 2)
Test_Score <- 60 + 10 * (Gender == "Female") + 5 * Study_Hours + 5 * (Gender == "Female") * Study_Hours + rnorm(100, sd = 5)

data3 <- data.frame(Gender, Study_Hours, Test_Score)
mod.3 <- lm(Test_Score ~ Gender*Study_Hours, data = data3)
summary(mod.3)
## 
## Call:
## lm(formula = Test_Score ~ Gender * Study_Hours, data = data3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.1191 -3.3752 -0.4846  3.0552 15.0753 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             72.3200     2.1265  34.009  < 2e-16 ***
## GenderMale             -13.9836     2.9233  -4.784 6.22e-06 ***
## Study_Hours              9.5983     0.3805  25.223  < 2e-16 ***
## GenderMale:Study_Hours  -4.5206     0.5323  -8.493 2.54e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.824 on 96 degrees of freedom
## Multiple R-squared:  0.9624, Adjusted R-squared:  0.9613 
## F-statistic: 820.2 on 3 and 96 DF,  p-value: < 2.2e-16

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 72.3200 2.1265 34.009 < 2e-16 GenderMale -13.9836 2.9233 -4.784 6.22e-06 Study_Hours 9.5983 0.3805 25.223 < 2e-16 GenderMale:Study_Hours -4.5206 0.5323 -8.493 2.54e-13 Interpretation for significant main effects: Intercept: When females have 0 study hours, the test score is predicted to be 72.32. GenderMale: In comparison to females, males are predicted to have 13.98 lower test score (b + -13.98, se = 2.92, p <0.001). Study_Hours: There is sig. main effect of study hours such that for each additional hour of studying, test scored increased by 9.60 (b = 9.60, se = 0.38, p < 0.001)

ggplot(data3, aes(x = Study_Hours, y = Test_Score, color = Gender)) + 
  geom_point() +
  geom_smooth(method = "lm" , se = FALSE, aes(linetype = Gender)) +
  theme_apa() +
  scale_color_manual(values = c("Female" = "black", "Male"="Blue"))
## `geom_smooth()` using formula = 'y ~ x'

Interpretation of graph: There is a signficant interaction of gender and study hours such that the positive realtion between study hours and test score is stronger for females (b = -4.5206, se = 0.5323)

Exercise 4: Visualizing Multivariate Interactions

Task: Given a multivariate dataset, create different types of graphs to visualize interactions and discuss which type of graph is most appropriate.

  1. Use the simulated multivariate dataset with both continuous and categorical variables.
  2. Create a graph for each type of interaction (categorical x categorical, categorical x continuous, continuous x continuous) . Use a interact_plot for the continuous x continuous interaction. Use any variables you’d like as long as they fit the variable type.
# Simulate data
set.seed(123)
Age <- rnorm(100, mean = 35, sd = 8)
Experience <- rnorm(100, mean = 10, sd = 5)
Gender <- factor(rep(c("Male", "Female"), each = 50))
Job_Type <- factor(rep(c("Office", "Field"), each = 25, times = 2))
Salary <- 30000 + 1000 * Age + 2000 * Experience + 150 * Age * Experience + 
  5000 * (Gender == "Female") + rnorm(100, sd = 5000)

data4 <- data.frame(Age, Experience, Gender, Job_Type, Salary)

head(data4)
##        Age Experience Gender Job_Type   Salary
## 1 30.51619   6.447967   Male   Office 113921.3
## 2 33.15858  11.284419   Male   Office 148415.8
## 3 47.46967   8.766541   Male   Office 156098.7
## 4 35.56407   8.262287   Male   Office 128880.7
## 5 36.03430   5.241907   Male   Office 102779.7
## 6 48.72052   9.774861   Male   Office 167324.5
tail(data4)
##          Age Experience Gender Job_Type    Salary
## 95  45.88522   3.445992 Female    Field 121507.64
## 96  30.19792  19.986067 Female    Field 196034.22
## 97  52.49866  13.003544 Female    Field 225240.32
## 98  47.26089   3.743643 Female    Field 109532.84
## 99  33.11440   6.944170 Female    Field 116600.46
## 100 26.78863   4.072600 Female    Field  92548.31
ggplot(data4, aes(x = Job_Type, y = Salary, fill = Gender, color = Gender)) +
  stat_summary(fun.y = mean, position = position_dodge(), geom = "bar") +
  stat_summary(fun.data = mean_cl_normal, position = position_dodge(), geom = "errorbar") + theme_apa() + scale_fill_grey() + scale_color_grey()

Graph interpretation: The graph is showing that women compared to men overall have a higher salary whether it is field or office work.

ggplot(data4, aes(x = Experience, y = Salary, color = Gender)) + 
  geom_point() +
  geom_smooth(method = "lm" , se = FALSE, aes(linetype = Gender)) +
  theme_apa() +
  scale_color_manual(values = c("Female" = "black", "Male"="Black"))

Graph interpretation: Overall women and men as they both gain experience, women make on average always more than men until late in their careers when it levels out and they make about the same.

# Continuous x Continuous plot
mod.4 <- lm (Salary ~Age*Experience, sata = data4)
interact_plot(mod.4, pred = Age, modx = Experience, modx.values = "plus-minus", colors = c("black","black")) + theme_apa() 

Graph interpretation: As age increases salary increases aswell.

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.