simpleslopes_example

Simple slopes example

Get example data and packages:

library(dplyr) # I always load this


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(rempsyc)

Suggested APA citation: Thériault, R. (2023). rempsyc: Convenience functions for psychology. 
Journal of Open Source Software, 8(87), 5466. https://doi.org/10.21105/joss.05466

head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The tutorial I am using, says we need to standardize our variables. We scale them, which means we subtract each variable by its mean and then divide by the standard deviation. This puts all the variables on the same scale, so like a -5 value in cylinders is 5 “points” below its mean in the same way a -5 in weight is 5 “points” below its mean.

mtcars2 <- lapply(mtcars, scale) %>% as.data.frame()

I guess simple slopes are often used in the context of “moderation” analyses, which is the same as an interaction effect. But I’ve never done it this way before! We are looking at whether there is an interaction between weight and gear on mpg, or we can say that we are looking to see if weight is a moderator on the effect of gear on mpg.

It also prints out a fancy table!

moderations <- nice_mod(
  data = mtcars2,
  response = "mpg",
  predictor = "gear",
  moderator = "wt"
)

moderations

  Dependent Variable Predictor df           B          t            p
1                mpg      gear 28 -0.08718042 -0.7982999 4.314156e-01
2                mpg        wt 28 -0.94959988 -8.6037724 2.383144e-09
3                mpg   gear:wt 28 -0.23559962 -2.1551077 3.989970e-02
          sr2     CI_lower   CI_upper
1 0.004805465 0.0000000000 0.02702141
2 0.558188818 0.3142326391 0.80214500
3 0.035022025 0.0003502202 0.09723370

(my_table <- nice_table(moderations, highlight = TRUE))

Dependent Variable	Predictor	df	b*	t	p	sr2	95% CI
mpg	gear	28	-0.09	-0.80	.431	.00	[0.00, 0.03]
	wt	28	-0.95	-8.60	< .001***	.56	[0.31, 0.80]
	gear × wt	28	-0.24	-2.16	.040*	.04	[0.00, 0.10]

Note about the sr² : it’s called a delta R-squared, and it shows the proportion of variance explained (unique contribution) of that variable on the whole model.

So now we do the simple slopes. It’s going to give us three slopes: a low, medium, and high slope for the effect of the moderating variable (here its weight).

slopes <- nice_slopes(
  data = mtcars2,
  response = "mpg",
  predictor = "gear",
  moderator = "wt"
)
slopes

  Dependent Variable Predictor (+/-1 SD) df           B          t          p
1                mpg       gear (LOW-wt) 28  0.14841920  1.0767040 0.29080233
2                mpg      gear (MEAN-wt) 28 -0.08718042 -0.7982999 0.43141565
3                mpg      gear (HIGH-wt) 28 -0.32278004 -1.9035367 0.06729622
          sr2 CI_lower   CI_upper
1 0.008741702        0 0.03886052
2 0.004805465        0 0.02702141
3 0.027322839        0 0.08179662

nice_table(slopes, highlight = TRUE)

Dependent Variable	Predictor (+/-1 SD)	df	b*	t	p	sr2	95% CI
mpg	gear (LOW-wt)	28	0.15	1.08	.291	.01	[0.00, 0.04]
	gear (MEAN-wt)	28	-0.09	-0.80	.431	.00	[0.00, 0.03]
	gear (HIGH-wt)	28	-0.32	-1.90	.067	.03	[0.00, 0.08]

So it shows that none of the individual slopes are that meaningful, which maybe demonstrates that in this example, the overall effect of the interaction between weight and gear tells the story better than looking at each level of weight, also probably that the interaction was weak anyway.

Simple slopes on my actual data

Now let’s try it on my data for MU and Study time.

sg_data <- read.csv("C:/Users/David/Documents/GitHub/Kool_R_club/random/data/sg_data.csv", header = T)

First we scale everything…

sg_data_scaled <- sg_data
num_vars <- c("mean_MU", "mean_StudyTime", "age")

sg_data_scaled[num_vars] <- lapply(sg_data[num_vars], function(x) as.vector(scale(x)))
# This just scales the IVs and keeps it a regular vector instead of a weird matrix

# make sure condition is a factor
sg_data_scaled$condsFile <- as.factor(sg_data_scaled$condsFile)

My original model for MU was a linear mixed effects model. Not sure if that will work with this. Let’s try it first.

library(lme4)

Loading required package: Matrix

MU_LMM <- lmer(mean_MU ~ condsFile + age + age*condsFile + (1|participant), 
                    data = sg_data_scaled)

Try the nice_slopes() on it…

nice_lm_slopes(MU_LMM, predictor = "mean_MU", moderator = "age") %>%
  nice_table(highlight = TRUE)

Error in nice_lm_slopes(MU_LMM, predictor = "mean_MU", moderator = "age"): Model must be of class 'lm' or be a 'list()' of lm models (using 'c()' won't work).

It does not work! OK let’s downgrade to a linear model and try it:

MU_LM <- lm(mean_MU ~ condsFile + age + age*condsFile, 
                    data = sg_data_scaled)

nice_lm_slopes(MU_LM, predictor = "age", moderator = "condsFile") %>%
  nice_table(highlight = TRUE)

Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm): Calling var(x) on a factor x is defunct.
  Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.

This is not going to work because condition is a factor and this only words with two numeric predictors…

But it looks like something called a Johnson-Neyman technique can help us!

Johnson Neyman technique!

From the interactions library, the Johnson Neyman method “tells you the range of values of the moderator in which the slope of the predictor is significant vs. nonsignificant at a specified alpha level” - that’s literally exactly what I need! And apparently it is compatible with GLM… I hope that is the case because we need a GLM to fit the model for Study time! Let’s try MU first:

library(interactions)

sim_slopes(MU_LMM, pred = age, modx = condsFile, jnplot = TRUE)

Warning: Johnson-Neyman intervals are not available for factor predictors or
moderators.

SIMPLE SLOPES ANALYSIS

Slope of age when condsFile = longDelay: 

  Est.   S.E.   t val.      p
------ ------ -------- ------
  0.22   0.10     2.12   0.04

Slope of age when condsFile = noDelay: 

  Est.   S.E.   t val.      p
------ ------ -------- ------
  0.11   0.11     1.06   0.29

Slope of age when condsFile = totalResponse: 

  Est.   S.E.   t val.      p
------ ------ -------- ------
  0.55   0.10     5.29   0.00

Wow it still does not work for factor moderators aggghhh!

Try an alternative method using jtools:

library(jtools)

johnson_neyman(MU_LMM, pred = "age", modx = "condsFile")

Error in `johnson_neyman()`:
! Could not find interaction term in the model, so Johnson-Neyman interval
could not be calculated.

# Doesn't work!

johnson_neyman(MU_LM, pred = "age", modx = "condsFile")

Error in `johnson_neyman()`:
! Could not find interaction term in the model, so Johnson-Neyman interval
could not be calculated.

# Doesn't work!

Emmeans?

OK nothing is working. A clunky way we could maybe look at this is via pairwise comparisons across age via emmeans.

library(emmeans)

Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'

# Literally just making arbitrary separations
ages_to_probe <- seq(min(sg_data_scaled$age), max(sg_data_scaled$age), length.out = 5)

# Run the pairwise comparisons
emm <- emmeans(MU_LMM, ~ condsFile | age, at = list(age = ages_to_probe))

pairs(emm)

age = -1.60552:
 contrast                  estimate    SE  df t.ratio p.value
 longDelay - noDelay        -0.0254 0.255 159  -0.100  0.9945
 longDelay - totalResponse   0.2890 0.252 157   1.148  0.4862
 noDelay - totalResponse     0.3144 0.255 159   1.234  0.4347

age = -0.79777:
 contrast                  estimate    SE  df t.ratio p.value
 longDelay - noDelay         0.0610 0.172 159   0.355  0.9331
 longDelay - totalResponse   0.0244 0.170 157   0.143  0.9888
 noDelay - totalResponse    -0.0367 0.172 159  -0.213  0.9753

age =  0.00997:
 contrast                  estimate    SE  df t.ratio p.value
 longDelay - noDelay         0.1475 0.134 159   1.097  0.5172
 longDelay - totalResponse  -0.2402 0.133 157  -1.807  0.1707
 noDelay - totalResponse    -0.3877 0.134 159  -2.884  0.0124

age =  0.81772:
 contrast                  estimate    SE  df t.ratio p.value
 longDelay - noDelay         0.2340 0.174 159   1.346  0.3720
 longDelay - totalResponse  -0.5048 0.172 157  -2.938  0.0106
 noDelay - totalResponse    -0.7388 0.174 159  -4.250  0.0001

age =  1.62546:
 contrast                  estimate    SE  df t.ratio p.value
 longDelay - noDelay         0.3204 0.257 159   1.247  0.4276
 longDelay - totalResponse  -0.7694 0.254 157  -3.029  0.0080
 noDelay - totalResponse    -1.0899 0.257 159  -4.241  0.0001

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 3 estimates

OK so that worked, but let’s get age back into its original scale.

# Get mean and SD from the original age
age_mean <- mean(sg_data$age)
age_sd   <- sd(sg_data$age)

# Back-transform the scaled ages
ages_original <- ages_to_probe * age_sd + age_mean

# Get ages for the pairwise comparisons
emm_df <- as.data.frame(emm)
emm_df$age_original <- emm_df$age * age_sd + age_mean
emm_df

age = -1.6055202:
 condsFile         emmean        SE     df   lower.CL   upper.CL age_original
 longDelay     -0.3809926 0.1953639 224.26 -0.7659764  0.0039912          4.5
 noDelay       -0.3555554 0.1991460 225.89 -0.7479768  0.0368660          4.5
 totalResponse -0.6699680 0.1953639 224.26 -1.0549518 -0.2849842          4.5

age = -0.7977740:
 condsFile         emmean        SE     df   lower.CL   upper.CL age_original
 longDelay     -0.2042349 0.1320656 224.26 -0.4644832  0.0560133          5.1
 noDelay       -0.2652660 0.1345362 225.84 -0.5303727 -0.0001592          5.1
 totalResponse -0.2286061 0.1320656 224.26 -0.4888543  0.0316422          5.1

age =  0.0099722:
 condsFile         emmean        SE     df   lower.CL   upper.CL age_original
 longDelay     -0.0274772 0.1031795 224.26 -0.2308025  0.1758481          5.7
 noDelay       -0.1749765 0.1050513 225.79 -0.3819828  0.0320298          5.7
 totalResponse  0.2127558 0.1031795 224.26  0.0094305  0.4160811          5.7

age =  0.8177184:
 condsFile         emmean        SE     df   lower.CL   upper.CL age_original
 longDelay      0.1492805 0.1333457 224.26 -0.1134905  0.4120514          6.3
 noDelay       -0.0846870 0.1358914 225.87 -0.3524642  0.1830901          6.3
 totalResponse  0.6541177 0.1333457 224.26  0.3913468  0.9168887          6.3

age =  1.6254646:
 condsFile         emmean        SE     df   lower.CL   upper.CL age_original
 longDelay      0.3260382 0.1970953 224.26 -0.0623577  0.7144340          6.9
 noDelay        0.0056024 0.2009779 225.92 -0.3904285  0.4016334          6.9
 totalResponse  1.0954796 0.1970953 224.26  0.7070838  1.4838754          6.9

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

Let’s now try this for Study time. Since we aren’t using simple slopes or Johnson Neyman, we go back to the un-scaled data (but we still center age).

library(lme4)

# Center age
sg_data$age.ct <- sg_data$age - mean(sg_data$age)

# make sure condition is a factor
sg_data$condsFile <- as.factor(sg_data$condsFile)

studyTimeGlmm <- glmer(mean_StudyTime ~ condsFile + age.ct + age.ct*condsFile + (1|participant), 
                         data = sg_data, 
                         family = inverse.gaussian(link = "identity"), nAGQ = 0)

ages_to_probe <- seq(min(sg_data$age.ct), max(sg_data$age.ct), length.out = 5)

# Run the pairwise comparisons
emm <- emmeans(studyTimeGlmm, ~ condsFile | age.ct, at = list(age.ct = ages_to_probe))

pairs(emm)

age.ct = -1.19259:
 contrast                  estimate    SE  df z.ratio p.value
 longDelay - noDelay          0.715 0.748 Inf   0.955  0.6054
 longDelay - totalResponse    2.789 0.694 Inf   4.016  0.0002
 noDelay - totalResponse      2.074 0.610 Inf   3.402  0.0019

age.ct = -0.59259:
 contrast                  estimate    SE  df z.ratio p.value
 longDelay - noDelay          0.758 0.504 Inf   1.505  0.2887
 longDelay - totalResponse    1.537 0.455 Inf   3.376  0.0021
 noDelay - totalResponse      0.779 0.395 Inf   1.972  0.1191

age.ct =  0.00741:
 contrast                  estimate    SE  df z.ratio p.value
 longDelay - noDelay          0.801 0.345 Inf   2.320  0.0531
 longDelay - totalResponse    0.285 0.377 Inf   0.755  0.7304
 noDelay - totalResponse     -0.516 0.336 Inf  -1.535  0.2743

age.ct =  0.60741:
 contrast                  estimate    SE  df z.ratio p.value
 longDelay - noDelay          0.844 0.396 Inf   2.134  0.0830
 longDelay - totalResponse   -0.967 0.536 Inf  -1.806  0.1674
 noDelay - totalResponse     -1.812 0.493 Inf  -3.675  0.0007

age.ct =  1.20741:
 contrast                  estimate    SE  df z.ratio p.value
 longDelay - noDelay          0.888 0.605 Inf   1.468  0.3063
 longDelay - totalResponse   -2.220 0.801 Inf  -2.771  0.0154
 noDelay - totalResponse     -3.107 0.739 Inf  -4.205  0.0001

P value adjustment: tukey method for comparing a family of 3 estimates

# Back-transform to original age
emm_df <- as.data.frame(emm)
emm_df$age_original <- emm_df$age.ct + age_mean
emm_df

age.ct = -1.1925926:
 condsFile       emmean        SE  df asymp.LCL asymp.UCL age_original
 longDelay     7.575235 0.7397812 Inf  6.125290  9.025179          4.5
 noDelay       6.860713 0.6681407 Inf  5.551182  8.170245          4.5
 totalResponse 4.786248 0.5916572 Inf  3.626621  5.945875          4.5

age.ct = -0.5925926:
 condsFile       emmean        SE  df asymp.LCL asymp.UCL age_original
 longDelay     7.208520 0.4976884 Inf  6.233068  8.183971          5.1
 noDelay       6.450700 0.4496234 Inf  5.569454  7.331945          5.1
 totalResponse 5.671677 0.3904410 Inf  4.906426  6.436927          5.1

age.ct =  0.0074074:
 condsFile       emmean        SE  df asymp.LCL asymp.UCL age_original
 longDelay     6.841805 0.3635853 Inf  6.129191  7.554419          5.7
 noDelay       6.040686 0.3281628 Inf  5.397499  6.683873          5.7
 totalResponse 6.557105 0.3584652 Inf  5.854527  7.259684          5.7

age.ct =  0.6074074:
 condsFile       emmean        SE  df asymp.LCL asymp.UCL age_original
 longDelay     6.475090 0.4480824 Inf  5.596865  7.353315          6.3
 noDelay       5.630672 0.4037336 Inf  4.839368  6.421975          6.3
 totalResponse 7.442534 0.5274633 Inf  6.408725  8.476343          6.3

age.ct =  1.2074074:
 condsFile       emmean        SE  df asymp.LCL asymp.UCL age_original
 longDelay     6.108375 0.6733829 Inf  4.788569  7.428181          6.9
 noDelay       5.220658 0.6067053 Inf  4.031537  6.409779          6.9
 totalResponse 8.327963 0.7756320 Inf  6.807752  9.848174          6.9

Confidence level used: 0.95

Sliding window?

This one is kind of weird. Let’s just try it.

MU_LMM <- lmer(mean_MU ~ condsFile + age + age*condsFile + (1|participant), 
                    data = sg_data)

ages_seq <- seq(min(sg_data$age), max(sg_data$age), by = .25)

results <- lapply(ages_seq, function(a) {
  em <- emmeans(MU_LMM, ~ condsFile | age, at = list(age = a))
  test <- contrast(em, "pairwise")  
  cbind(age = a, as.data.frame(test))
})

results_df <- do.call(rbind, results)

results_df

    age                  contrast  age    estimate        SE       df
1  4.50       longDelay - noDelay 4.50 -0.04908257 0.4914631 158.8683
2  4.50 longDelay - totalResponse 4.50  0.55759458 0.4857785 157.1642
3  4.50   noDelay - totalResponse 4.50  0.60667715 0.4914631 158.8683
4  4.75       longDelay - noDelay 4.75  0.02043637 0.4196752 158.8498
5  4.75 longDelay - totalResponse 4.75  0.34485748 0.4148739 157.1642
6  4.75   noDelay - totalResponse 4.75  0.32442111 0.4196752 158.8498
7  5.00       longDelay - noDelay 5.00  0.08995530 0.3550162 158.8240
8  5.00 longDelay - totalResponse 5.00  0.13212037 0.3510171 157.1642
9  5.00   noDelay - totalResponse 5.00  0.04216507 0.3550162 158.8240
10 5.25       longDelay - noDelay 5.25  0.15947424 0.3020988 158.7918
11 5.25 longDelay - totalResponse 5.25 -0.08061674 0.2987619 157.1642
12 5.25   noDelay - totalResponse 5.25 -0.24009097 0.3020988 158.7918
13 5.50       longDelay - noDelay 5.50  0.22899317 0.2679716 158.7647
14 5.50 longDelay - totalResponse 5.50 -0.29335384 0.2650611 157.1642
15 5.50   noDelay - totalResponse 5.50 -0.52234701 0.2679716 158.7647
16 5.75       longDelay - noDelay 5.75  0.29851211 0.2601380 158.7654
17 5.75 longDelay - totalResponse 5.75 -0.50609095 0.2573113 157.1642
18 5.75   noDelay - totalResponse 5.75 -0.80460306 0.2601380 158.7654
19 6.00       longDelay - noDelay 6.00  0.36803105 0.2808072 158.7979
20 6.00 longDelay - totalResponse 6.00 -0.71882805 0.2776939 157.1642
21 6.00   noDelay - totalResponse 6.00 -1.08685910 0.2808072 158.7979
22 6.25       longDelay - noDelay 6.25  0.43754998 0.3245790 158.8379
23 6.25 longDelay - totalResponse 6.25 -0.93156516 0.3208920 157.1642
24 6.25   noDelay - totalResponse 6.25 -1.36911514 0.3245790 158.8379
25 6.50       longDelay - noDelay 6.50  0.50706892 0.3836253 158.8688
26 6.50 longDelay - totalResponse 6.50 -1.14430226 0.3791866 157.1642
27 6.50   noDelay - totalResponse 6.50 -1.65137118 0.3836253 158.8688
28 6.75       longDelay - noDelay 6.75  0.57658786 0.4519989 158.8893
29 6.75 longDelay - totalResponse 6.75 -1.35703937 0.4467061 157.1642
30 6.75   noDelay - totalResponse 6.75 -1.93362722 0.4519989 158.8893
       t.ratio      p.value
1  -0.09987030 9.945164e-01
2   1.14783718 4.862276e-01
3   1.23443066 4.347070e-01
4   0.04869567 9.986935e-01
5   0.83123450 6.842255e-01
6   0.77302896 7.201011e-01
7   0.25338362 9.652343e-01
8   0.37639302 9.249225e-01
9   0.11876941 9.922536e-01
10  0.52788770 8.577892e-01
11 -0.26983609 9.606678e-01
12 -0.79474323 7.068023e-01
13  0.85454280 6.696703e-01
14 -1.10674063 5.113452e-01
15 -1.94926282 1.284410e-01
16  1.14751455 4.864071e-01
17 -1.96684308 1.238791e-01
18 -3.09298579 6.586909e-03
19  1.31061817 3.913708e-01
20 -2.58856307 2.827538e-02
21 -3.87048129 4.628103e-04
22  1.34805373 3.708630e-01
23 -2.90304883 1.171018e-02
24 -4.21812557 1.217477e-04
25  1.32178191 3.851972e-01
26 -3.01778102 8.310225e-03
27 -4.30464672 8.616529e-05
28  1.27563984 4.110115e-01
29 -3.03787936 7.816653e-03
30 -4.27794636 9.590928e-05

Jiminy crickets! Well it does show that the significant effect seems to begin around age 5.75, and the difference between no-delay and long-delay was never significant so we don’t need to worry about that always been non-sig.