Loading some packages and data.

library(tidyverse)
library(broom)
library(emmeans)

icon <- read_csv('perry_winter_2017_iconicity.csv')
sim <- read_csv('winter_matlock_2013_similarity.csv')
lonely <- read_csv('sidhu&pexman_2017_iconicity.csv')
vinson_yelp <- read_csv('vinson_dale_2014_yelp.csv')

Categorical-Continuous Interaction: Different slopes for different groups

We’ll be looking at how sensory experience ratings (SER) predict iconicity for different classes of words.

NV <- filter(icon, POS %in% c('Noun', 'Verb'), !is.na(SER))
table(NV$POS)

Noun Verb 
1106  373 

As always, plot first, analyze later:

NV %>% ggplot(aes(x = SER, y = Iconicity, color = POS))+
  geom_point()+
  geom_smooth(method = 'lm')+
  theme_bw()

or, a bit more clearly:

NV %>% ggplot(aes(x = SER, y = Iconicity, color = POS))+
  geom_point()+
  geom_smooth(method = 'lm')+
  theme_bw()+
  facet_wrap(~POS)

What we see is a rather substantial difference in slopes.

What happens if we don’t account for that difference?

First, a standard linear model without interaction:

NV_mdl <- lm(Iconicity ~ SER + POS, data = NV)
summary(NV_mdl)

Call:
lm(formula = Iconicity ~ SER + POS, data = NV)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0956 -0.7210 -0.1233  0.6146  3.6902 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.11935    0.10040  -1.189    0.235    
SER          0.23319    0.02789   8.362   <2e-16 ***
POSVerb      0.60159    0.06388   9.418   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.045 on 1476 degrees of freedom
Multiple R-squared:  0.08226,   Adjusted R-squared:  0.08102 
F-statistic: 66.15 on 2 and 1476 DF,  p-value: < 2.2e-16

We see that both variables are significant predictors, and the model explains a small amount of the total variation (though perhaps not bad for linguistic data like this).

Graphically, we can plot the predicted values over the empirical values.

NV_mdl_pred <- cbind(NV, predict(NV_mdl, interval = "confidence"))

NV_mdl_pred %>% ggplot(aes(x = SER, y = Iconicity, shape = POS, color = POS))+
  geom_point(alpha = .4)+
  geom_line(aes(x = SER, y = fit, color = POS)) +
  geom_ribbon(aes(ymin=lwr,ymax=upr, fill = POS), alpha=0.3)+
  theme_bw()

Now we’ll model the interaction:

NV_int_mdl <- lm(Iconicity ~ SER + POS + SER:POS, data = NV)
summary(NV_int_mdl)

Call:
lm(formula = Iconicity ~ SER + POS + SER:POS, data = NV)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8448 -0.7043 -0.1257  0.5864  3.5845 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.27394    0.11065   2.476  0.01341 *  
SER          0.11817    0.03108   3.801  0.00015 ***
POSVerb     -0.95542    0.20971  -4.556 5.65e-06 ***
SER:POSVerb  0.50838    0.06535   7.780 1.36e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.025 on 1475 degrees of freedom
Multiple R-squared:  0.1184,    Adjusted R-squared:  0.1166 
F-statistic: 66.05 on 3 and 1475 DF,  p-value: < 2.2e-16

The interaction is significant, and overall the model explains a fair amount more variance - up to almost 12%, or around 140-150% of the model w/o interaction. What’s a bit counter-intuitive here is that now POSVerb has a lower intercept - but that doesn’t mean the overall average of POSVerb is lower like it did in the first model. It just means that the intercept, after accounting for the difference in slopes, is now lower than the intercept for POSNoun.

Let’s plot this one, too:

NV_int_mdl_pred <- cbind(NV, predict(NV_int_mdl, interval = "confidence"))

NV_int_mdl_pred %>% ggplot(aes(x = SER, y = Iconicity, shape = POS, color = POS))+
  geom_point(alpha = .4)+
  geom_line(aes(x = SER, y = fit, color = POS)) +
  geom_ribbon(aes(ymin=lwr,ymax=upr, fill = POS), alpha=0.3)+
  theme_bw()

Let’s center things to get a slightly different picture/interpretation, though the overall explanatory power of the model will not change.

NV <- mutate(NV, SER_c = SER - mean(SER, na.rm = T)) #center
NV_int_mdl_c <- lm(Iconicity ~ SER_c*POS, data = NV) #model
summary(NV_int_mdl_c)

Call:
lm(formula = Iconicity ~ SER_c * POS, data = NV)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8448 -0.7043 -0.1257  0.5864  3.5845 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.66423    0.03102  21.414  < 2e-16 ***
SER_c          0.11817    0.03108   3.801  0.00015 ***
POSVerb        0.72371    0.06456  11.209  < 2e-16 ***
SER_c:POSVerb  0.50838    0.06535   7.780 1.36e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.025 on 1475 degrees of freedom
Multiple R-squared:  0.1184,    Adjusted R-squared:  0.1166 
F-statistic: 66.05 on 3 and 1475 DF,  p-value: < 2.2e-16

Now POSVerb has a positive value - with the centered predictor, this means that at the average level of SER, POSVerb tend to have higher Iconicity ratings, and this interpretation seems to better characterize a pattern in the data.

Let’s also plot this model now:

NV_int_mdl_c_pred <- cbind(NV, predict(NV_int_mdl_c, interval = "confidence"))

NV_int_mdl_c_pred %>% ggplot(aes(x = SER_c, y = Iconicity, shape = POS, color = POS))+
  geom_point(alpha = .4)+
  geom_line(aes(x = SER_c, y = fit, color = POS)) +
  geom_ribbon(aes(ymin=lwr,ymax=upr, fill = POS), alpha=0.3)+
  geom_vline(xintercept = 0, linetype = 2)+
  theme_bw()

Categorical*Categorical Interactions - an added effect for certain combinations of categories

Examining the data

table(sim$Phon, sim$Sem)
           
            Different Similar
  Different        91      86
  Similar          97      90

OR

sim %>% count(Phon, Sem)

Let’s do the basic model, no interactions:

sim_mdl <- lm(Distance ~ Phon + Sem, data = sim)

summary(sim_mdl)

Call:
lm(formula = Distance ~ Phon + Sem, data = sim)

Residuals:
    Min      1Q  Median      3Q     Max 
-79.350 -26.453  -4.167  19.536 138.833 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   79.556      3.631  21.912   <2e-16 ***
PhonSimilar    5.795      4.181   1.386   0.1667    
SemSimilar   -10.184      4.181  -2.435   0.0154 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 39.81 on 360 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.02148,   Adjusted R-squared:  0.01604 
F-statistic: 3.951 on 2 and 360 DF,  p-value: 0.02008

Here, phonological similarity had no effect, but there was an effect for semantic similarity. But let’s try an interaction now:

sim_mdl_int <- lm(Distance ~ Phon*Sem, data = sim)
summary(sim_mdl_int)

Call:
lm(formula = Distance ~ Phon * Sem, data = sim)

Residuals:
   Min     1Q Median     3Q    Max 
-80.42 -26.42  -4.40  19.60 139.99 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)              78.400      4.201  18.663   <2e-16 ***
PhonSimilar               8.023      5.833   1.375    0.170    
SemSimilar               -7.819      6.010  -1.301    0.194    
PhonSimilar:SemSimilar   -4.593      8.375  -0.548    0.584    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 39.85 on 359 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.0223,    Adjusted R-squared:  0.01413 
F-statistic: 2.729 on 3 and 359 DF,  p-value: 0.04386

Notably here, now there are no significant predictors and the overall variance explained by the model is 1%. Not a great model! But what’s interesting is that there might be an additional reduction in distance when phonology and semantic properties of the word pairs are similar.

What’s useful here is generating predictions for the various combinations of categories. You can feed some ‘new data’ to a regression model and get predicted Y values. This can be handy for summarizing or illustrating the model predictions.

Phon <- rep(c('Different', 'Similar'), each = 2)
Sem <- rep(c('Different', 'Similar'), times = 2)
newdata <- tibble(Phon, Sem)
newdata

Now predict…

newdata$fit <- predict(sim_mdl_int, newdata)

newdata

Compute averages…

newdata %>% group_by(Sem) %>% summarise(distM = mean(fit))

Let’s try sum coding to get main effects:

sim <- sim %>% mutate(Phon_sum = factor(Phon),
                      Sem_sum = factor(Sem))

contrasts(sim$Phon_sum) <- contr.sum(2)
contrasts(sim$Sem_sum) <- contr.sum(2)

And now a new model…

summary(sum_mdl)

Call:
lm(formula = Distance ~ Phon_sum * Sem_sum, data = sim)

Residuals:
   Min     1Q Median     3Q    Max 
-80.42 -26.42  -4.40  19.60 139.99 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)          77.354      2.094  36.946   <2e-16 ***
Phon_sum1            -2.863      2.094  -1.367   0.1723    
Sem_sum1              5.058      2.094   2.416   0.0162 *  
Phon_sum1:Sem_sum1   -1.148      2.094  -0.548   0.5837    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 39.85 on 359 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.0223,    Adjusted R-squared:  0.01413 
F-statistic: 2.729 on 3 and 359 DF,  p-value: 0.04386

So this model results in a significant main effect for semantic similarity - this is hard to reconcile, but as Winter noted the sum-coded main effect is essentially twice as large as the simple effect (i.e., going from -1 to 1 rather than 0 to 1), which impacts the calculation of “how likely is this difference if the true difference is 0”.

Interlude: Factorial ANOVA

sim_anova <- aov(Distance ~ Phon*Sem, data = sim)
summary(sim_anova)
             Df Sum Sq Mean Sq F value Pr(>F)  
Phon          1   3124    3124   1.967 0.1617  
Sem           1   9402    9402   5.920 0.0155 *
Phon:Sem      1    478     478   0.301 0.5837  
Residuals   359 570193    1588                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
1 observation deleted due to missingness

Notice that the statistical significance aligns (roughly) with the centered linear model above. You can do follow up with emmeans.

sim_anova_emm
 Phon      Sem       emmean   SE  df lower.CL upper.CL
 Different Different   78.4 4.20 359     70.1     86.7
 Similar   Different   86.4 4.05 359     78.5     94.4
 Different Similar     70.6 4.30 359     62.1     79.0
 Similar   Similar     74.0 4.20 359     65.7     82.3

Confidence level used: 0.95 

and then…

pairs(sim_anova_emm)
 contrast                                estimate   SE  df t.ratio p.value
 Different Different - Similar Different    -8.02 5.83 359  -1.375  0.5156
 Different Different - Different Similar     7.82 6.01 359   1.301  0.5629
 Different Different - Similar Similar       4.39 5.94 359   0.739  0.8814
 Similar Different - Different Similar      15.84 5.90 359   2.684  0.0380
 Similar Different - Similar Similar        12.41 5.83 359   2.128  0.1462
 Different Similar - Similar Similar        -3.43 6.01 359  -0.571  0.9408

P value adjustment: tukey method for comparing a family of 4 estimates 

Continuous*Continuous Interaction - changing the slope as you get more of X and Z

lonely <- filter(lonely, Iconicity >= 0)

lonely_mdl <- lm(Iconicity ~ SER*ARC, data = lonely)

summary(lonely_mdl)

Call:
lm(formula = Iconicity ~ SER * ARC, data = lonely)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0178 -0.6258 -0.1407  0.4645  3.2562 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.3601     0.3822   3.559 0.000385 ***
SER           0.3612     0.1079   3.346 0.000841 ***
ARC          -0.7929     0.6419  -1.235 0.216956    
SER:ARC      -0.5255     0.1848  -2.844 0.004519 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.842 on 1385 degrees of freedom
Multiple R-squared:  0.148, Adjusted R-squared:  0.1462 
F-statistic: 80.22 on 3 and 1385 DF,  p-value: < 2.2e-16

And now with standardized predictors (but not yet outcome variable…):

lonely <- mutate(lonely, SER_z = scale(SER), ARC_z = scale(ARC))

lonely_mdl_z <- lm(Iconicity ~ SER_z*ARC_z, data = lonely)

summary(lonely_mdl_z)

Call:
lm(formula = Iconicity ~ SER_z * ARC_z, data = lonely)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0178 -0.6258 -0.1407  0.4645  3.2562 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.15565    0.02308  50.069  < 2e-16 ***
SER_z        0.07115    0.02331   3.052  0.00231 ** 
ARC_z       -0.32426    0.02308 -14.048  < 2e-16 ***
SER_z:ARC_z -0.06775    0.02382  -2.844  0.00452 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.842 on 1385 degrees of freedom
Multiple R-squared:  0.148, Adjusted R-squared:  0.1462 
F-statistic: 80.22 on 3 and 1385 DF,  p-value: < 2.2e-16

Again, we run into this somewhat strange phenomenon of a single-predictor effect changing/becoming significant after a transformation. But what’s most important here is the interaction.

Coming up with a really good visualization of continuous*continuous interaction effects is hard. I don’t like the 3D plane plots that Winter uses - they’re really cumbersome to try and wrap your brain around. A more suitable approach is to pick a few values for one continuous variable (e.g., the mean, and then 1 SD above and 1 SD below) and plot the other continuous variable against the dependent variable to illustrate how the slope changes. We’ll use the interactions package to do this quickly.

interact_plot(lonely_mdl, pred = SER, modx = ARC, 
              modx.values = "mean-plus-minus",
              plot.points = T, interval = T,
              colors = "Blues")

Nonlinear effects

In this example, there’s a pretty clear non-linear relationship between number of stars in the Yelp review and the information density in the text of the review.

vinson_yelp %>% ggplot(aes(x = stars, y = across_uni_info))+
  geom_jitter(alpha = .3, height = 0, width = .25)+
  stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax= "mean", size= 0.3, geom = "crossbar",
               color = "red")+
  geom_smooth()+
  theme_bw()

Or, as Winter did…

vinson_yelp %>% group_by(stars) %>%
  summarise(AUI_mean = mean(across_uni_info)) %>%
  ggplot(aes(x = stars, y = AUI_mean))+
  geom_line(lintype = 2)+
  geom_point(size = 3)+
  theme_bw()

Centering or other transformations can be helpful when working with polynomials. For instance, 3 stars appears to be the lowest information, but 3^2 is greater than 2^2, so we’ll want 3 stars to be at the bottom of the valley. We can do that by centering.

vinson_yelp <- vinson_yelp %>% mutate(stars_c = stars - mean(stars),
                                      stars_c2 = stars_c^2)

Now to model…

yelp_mdl <- lm(across_uni_info ~ stars_c + stars_c2, data = vinson_yelp)
summary(yelp_mdl)

Call:
lm(formula = across_uni_info ~ stars_c + stars_c2, data = vinson_yelp)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.34147 -0.31699 -0.04251  0.27335  2.47805 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 9.691281   0.006463 1499.44   <2e-16 ***
stars_c     0.042908   0.004710    9.11   <2e-16 ***
stars_c2    0.037363   0.003014   12.39   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4624 on 9997 degrees of freedom
Multiple R-squared:  0.01558,   Adjusted R-squared:  0.01538 
F-statistic: 79.08 on 2 and 9997 DF,  p-value: < 2.2e-16

And now for some predictions and plots…

yelp_preds <- tibble(stars_c = sort(unique(vinson_yelp$stars_c)))

yelp_preds <- mutate(yelp_preds, stars_c2 = stars_c^2)

yelp_preds$fit <- predict(yelp_mdl, yelp_preds)

yelp_preds

Plot:

yelp_preds %>% ggplot(aes(x = stars_c, y = fit))+
  geom_point(size = 3)+
  geom_line(linetype = 2)+
  theme_bw()

While this illustrates the trend, we don’t have a great sense of how large the effect is or how strongly the ratings are characterized by this pattern. It’s generally a good idea for plots to include the whole range of possible values, or at least the whole range of observed values…

yelp_preds %>% ggplot(aes(x = stars_c, y = fit))+
  geom_point(size = 3)+
  geom_line(linetype = 2)+
  scale_y_continuous(limits = c(min(vinson_yelp$across_uni_info), max(vinson_yelp$across_uni_info)))+
  theme_bw()

So this looks a bit different! It also lines up a bit more with the very modest R2 value from the regression model.

Such a finding might be theoretically interesting or have some practical impact, but that requires some more thinking and understanding of the constructs and context.

---
title: "Interactions Tutorial"
output: html_notebook
---

Loading some packages and data. 

```{r message=FALSE}
library(tidyverse)
library(broom)
library(emmeans)
library(interactions)

icon <- read_csv('perry_winter_2017_iconicity.csv')
sim <- read_csv('winter_matlock_2013_similarity.csv')
lonely <- read_csv('sidhu&pexman_2017_iconicity.csv')
vinson_yelp <- read_csv('vinson_dale_2014_yelp.csv')

```


# Categorical-Continuous Interaction: Different slopes for different groups

We'll be looking at how sensory experience ratings (SER) predict iconicity *for different classes of words*.

```{r}
NV <- filter(icon, POS %in% c('Noun', 'Verb'), !is.na(SER))
table(NV$POS)
```

As always, plot first, analyze later:

```{r warning= F}
NV %>% ggplot(aes(x = SER, y = Iconicity, color = POS))+
  geom_point()+
  geom_smooth(method = 'lm')+
  theme_bw()
```
or, a bit more clearly:

```{r}
NV %>% ggplot(aes(x = SER, y = Iconicity, color = POS))+
  geom_point()+
  geom_smooth(method = 'lm')+
  theme_bw()+
  facet_wrap(~POS)
```
What we see is a rather substantial difference in slopes.

What happens if we don't account for that difference?

First, a standard linear model without interaction:
```{r}
NV_mdl <- lm(Iconicity ~ SER + POS, data = NV)
summary(NV_mdl)
```

We see that both variables are significant predictors, and the model explains a small amount of the total variation (though perhaps not bad for linguistic data like this).

Graphically, we can plot the predicted values over the empirical values.

```{r}
NV_mdl_pred <- cbind(NV, predict(NV_mdl, interval = "confidence"))

NV_mdl_pred %>% ggplot(aes(x = SER, y = Iconicity, shape = POS, color = POS))+
  geom_point(alpha = .4)+
  geom_line(aes(x = SER, y = fit, color = POS)) +
  geom_ribbon(aes(ymin=lwr,ymax=upr, fill = POS), alpha=0.3)+
  theme_bw()
```

Now we'll model the interaction:

```{r}
NV_int_mdl <- lm(Iconicity ~ SER + POS + SER:POS, data = NV)
summary(NV_int_mdl)
```

The interaction is significant, and overall the model explains a fair amount more variance - up to almost 12%, or around 140-150% of the model w/o interaction. What's a bit counter-intuitive here is that now POSVerb has a lower intercept - but that doesn't mean the overall average of POSVerb is lower like it did in the first model. It just means that the intercept, after accounting for the difference in slopes, is now lower than the intercept for POSNoun.

Let's plot this one, too:


```{r}
NV_int_mdl_pred <- cbind(NV, predict(NV_int_mdl, interval = "confidence"))

NV_int_mdl_pred %>% ggplot(aes(x = SER, y = Iconicity, shape = POS, color = POS))+
  geom_point(alpha = .4)+
  geom_line(aes(x = SER, y = fit, color = POS)) +
  geom_ribbon(aes(ymin=lwr,ymax=upr, fill = POS), alpha=0.3)+
  theme_bw()
```
Let's center things to get a slightly different picture/interpretation, though the overall explanatory power of the model will not change.

```{r}
NV <- mutate(NV, SER_c = SER - mean(SER, na.rm = T)) #center
NV_int_mdl_c <- lm(Iconicity ~ SER_c*POS, data = NV) #model
summary(NV_int_mdl_c)
```

Now POSVerb has a positive value - with the centered predictor, this means that at the average level of SER, POSVerb tend to have higher Iconicity ratings, and this interpretation seems to better characterize a pattern in the data. 

Let's also plot this model now:

```{r}
NV_int_mdl_c_pred <- cbind(NV, predict(NV_int_mdl_c, interval = "confidence"))

NV_int_mdl_c_pred %>% ggplot(aes(x = SER_c, y = Iconicity, shape = POS, color = POS))+
  geom_point(alpha = .4)+
  geom_line(aes(x = SER_c, y = fit, color = POS)) +
  geom_ribbon(aes(ymin=lwr,ymax=upr, fill = POS), alpha=0.3)+
  geom_vline(xintercept = 0, linetype = 2)+
  theme_bw()
```
# Categorical*Categorical Interactions - an added effect for certain combinations of categories
Examining the data

```{r}
table(sim$Phon, sim$Sem)
```

OR

```{r}
sim %>% count(Phon, Sem)
```
Let's do the basic model, no interactions:

```{r}
sim_mdl <- lm(Distance ~ Phon + Sem, data = sim)

summary(sim_mdl)
```

Here, phonological similarity had no effect, but there was an effect for semantic similarity. But let's try an interaction now:

```{r}
sim_mdl_int <- lm(Distance ~ Phon*Sem, data = sim)
summary(sim_mdl_int)
```

Notably here, now there are no significant predictors and the overall variance explained by the model is 1%. Not a great model! But what's interesting is that there might be an additional reduction in distance when phonology and semantic properties of the word pairs are similar.

What's useful here is generating predictions for the various combinations of categories. You can feed some 'new data' to a regression model and get predicted Y values. This can be handy for summarizing or illustrating the model predictions.

```{r}
Phon <- rep(c('Different', 'Similar'), each = 2)
Sem <- rep(c('Different', 'Similar'), times = 2)
newdata <- tibble(Phon, Sem)
newdata
```

Now predict...

```{r}
newdata$fit <- predict(sim_mdl_int, newdata)

newdata
```
Compute averages...
```{r}
newdata %>% group_by(Sem) %>% summarise(distM = mean(fit))
```

Let's try sum coding to get main effects:

```{r}
sim <- sim %>% mutate(Phon_sum = factor(Phon),
                      Sem_sum = factor(Sem))

contrasts(sim$Phon_sum) <- contr.sum(2)
contrasts(sim$Sem_sum) <- contr.sum(2)
```

And now a new model...

```{r}
sum_mdl <- lm(Distance ~ Phon_sum*Sem_sum, data = sim)
summary(sum_mdl)
```
So this model results in a significant main effect for semantic similarity - this is hard to reconcile, but as Winter noted the sum-coded main effect is essentially twice as large as the simple effect (i.e., going from -1 to 1 rather than 0 to 1), which impacts the calculation of "how likely is this difference if the true difference is 0".


# Interlude: Factorial ANOVA

```{r}
sim_anova <- aov(Distance ~ Phon*Sem, data = sim)
summary(sim_anova)
```
Notice that the statistical significance aligns (roughly) with the centered linear model above. You can do follow up with `emmeans`.

```{r}
sim_anova_emm <- emmeans(sim_anova, specs = c("Phon", "Sem"))

sim_anova_emm
```
and then...

```{r}
pairs(sim_anova_emm)
```


# Continuous*Continuous Interaction - changing the slope as you get more of X and Z

```{r}
lonely <- filter(lonely, Iconicity >= 0)

lonely_mdl <- lm(Iconicity ~ SER*ARC, data = lonely)

summary(lonely_mdl)
```

And now with standardized predictors (but not yet outcome variable...):
```{r}
lonely <- mutate(lonely, SER_z = scale(SER), ARC_z = scale(ARC))

lonely_mdl_z <- lm(Iconicity ~ SER_z*ARC_z, data = lonely)

summary(lonely_mdl_z)
```

Again, we run into this somewhat strange phenomenon of a single-predictor effect changing/becoming significant after a transformation. But what's most important here is the interaction.

Coming up with a really good visualization of continuous*continuous interaction effects is hard. I don't like the 3D plane plots that Winter uses - they're really cumbersome to try and wrap your brain around. A more suitable approach is to pick a few values for one continuous variable (e.g., the mean, and then 1 SD above and 1 SD below) and plot the other continuous variable against the dependent variable to illustrate how the slope changes. We'll use the `interactions` package to do this quickly.

```{r}
interact_plot(lonely_mdl, pred = SER, modx = ARC, 
              modx.values = "mean-plus-minus",
              plot.points = T, interval = T,
              colors = "Blues")
```


# Nonlinear effects

In this example, there's a pretty clear non-linear relationship between number of stars in the Yelp review and the information density in the text of the review.

```{r}
vinson_yelp %>% ggplot(aes(x = stars, y = across_uni_info))+
  geom_jitter(alpha = .3, height = 0, width = .25)+
  stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax= "mean", size= 0.3, geom = "crossbar",
               color = "red")+
  geom_smooth()+
  theme_bw()
```

Or, as Winter did...

```{r}
vinson_yelp %>% group_by(stars) %>%
  summarise(AUI_mean = mean(across_uni_info)) %>%
  ggplot(aes(x = stars, y = AUI_mean))+
  geom_line(lintype = 2)+
  geom_point(size = 3)+
  theme_bw()
```

Centering or other transformations can be helpful when working with polynomials. For instance, 3 stars appears to be the lowest information, but 3^2 is greater than 2^2, so we'll want 3 stars to be at the bottom of the valley. We can do that by centering.

```{r}
vinson_yelp <- vinson_yelp %>% mutate(stars_c = stars - mean(stars),
                                      stars_c2 = stars_c^2)
```

Now to model...

```{r}
yelp_mdl <- lm(across_uni_info ~ stars_c + stars_c2, data = vinson_yelp)
summary(yelp_mdl)
```

And now for some predictions and plots...

```{r}
yelp_preds <- tibble(stars_c = sort(unique(vinson_yelp$stars_c)))

yelp_preds <- mutate(yelp_preds, stars_c2 = stars_c^2)

yelp_preds$fit <- predict(yelp_mdl, yelp_preds)

yelp_preds
```

Plot:

```{r}
yelp_preds %>% ggplot(aes(x = stars_c, y = fit))+
  geom_point(size = 3)+
  geom_line(linetype = 2)+
  theme_bw()
```

While this illustrates the trend, we don't have a great sense of how large the effect is or how strongly the ratings are characterized by this pattern. It's generally a good idea for plots to include the whole range of possible values, or at least the whole range of observed values...

```{r}
yelp_preds %>% ggplot(aes(x = stars_c, y = fit))+
  geom_point(size = 3)+
  geom_line(linetype = 2)+
  scale_y_continuous(limits = c(min(vinson_yelp$across_uni_info), max(vinson_yelp$across_uni_info)))+
  theme_bw()
```
So this looks a bit different! It also lines up a bit more with the very modest R2 value from the regression model.

Such a finding might be theoretically interesting or have some practical impact, but that requires some more thinking and understanding of the constructs and context.
