In this R Markdown Notebook, I give a few examples on how to plot interactions and marginal effects. Many people, faculy and students alike, often come to the PRC Consultants to understand interactions in different models. This Notebook will outline one important element to interpretting interactions: visualizations.
In R, there are often times multiple packages to perform similar functions given that it is a de-centralized, open sourced software/language. Therefore, I will go through multiple different packages, each with their own pros and cons, to help give more options.
We include interactions in our model to test for moderation effects, which can be formally defined as:
If the effect of X on Y depends on M, a moderator effect takes place
## Installing different packages that we will be using throughout this demonstration
install.packages("car") #An extremely useful/in-depth regression package
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/car_3.0-3.tgz'
Content type 'application/x-gzip' length 1590381 bytes (1.5 MB)
==================================================
downloaded 1.5 MB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
install.packages("stargazer") #Produces easy to read regression results
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/stargazer_5.2.2.tgz'
Content type 'application/x-gzip' length 621898 bytes (607 KB)
==================================================
downloaded 607 KB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
install.packages("effects") #We will use this to create our interactions
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/effects_4.1-2.tgz'
Content type 'application/x-gzip' length 2833258 bytes (2.7 MB)
==================================================
downloaded 2.7 MB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
install.packages("ggplot2") #Our incredibly powerful and versatile graphing package
Error in install.packages : Updating loaded packages
install.packages("sjPlot") #Marginal effects package
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/sjPlot_2.7.2.tgz'
Content type 'application/x-gzip' length 1479163 bytes (1.4 MB)
==================================================
downloaded 1.4 MB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
install.packages("ggplot2")
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/ggplot2_3.2.1.tgz'
Content type 'application/x-gzip' length 3973186 bytes (3.8 MB)
==================================================
downloaded 3.8 MB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
install.packages("sjmisc") #Marginal effects Package
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/sjmisc_2.8.2.tgz'
Content type 'application/x-gzip' length 532576 bytes (520 KB)
==================================================
downloaded 520 KB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
install.packages("dplyr") #For data manipulation
Error in install.packages : Updating loaded packages
install.packages("corrplot") #Visualizing correlations
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/corrplot_0.84.tgz'
Content type 'application/x-gzip' length 5452777 bytes (5.2 MB)
==================================================
downloaded 5.2 MB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
install.packages("dplyr")
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/dplyr_0.8.3.tgz'
Content type 'application/x-gzip' length 6850794 bytes (6.5 MB)
==================================================
downloaded 6.5 MB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
install.packages("msm")
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/msm_1.6.7.tgz'
Content type 'application/x-gzip' length 1531698 bytes (1.5 MB)
==================================================
downloaded 1.5 MB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
install.packages("jtools")
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.6/jtools_2.0.1.tgz'
Content type 'application/x-gzip' length 2104486 bytes (2.0 MB)
==================================================
downloaded 2.0 MB
The downloaded binary packages are in
/var/folders/1f/vrdsjx2x7lj8kfqywtgw95_m0000gn/T//Rtmp4u0y2n/downloaded_packages
For these examples, I will be using a sample dataset coming from my own research on LGBTI policy adoption from 1991-2018. In this first model, we will be predicting a country’s score on an LGBTI Policy Index based on being in the Global South/North, level of democracy, and percent of women in parliament.
Before starting, I first just want to visualize the correlations of the variables in my dataset
library(corrplot)
corrplot 0.84 loaded
M <- cor(lgbti_data[,2:10]) #dataset is called lgbti_data and I'm getting the correlations for all variables between the 2nd and the 10th variables (the 1st and last variable are strings)
#Plotting the correlations
corrplot(M, method = "color")
#Re-doing the plot but only keeping half of the matrix
corrplot(M, method = "color", type= "upper")
#This time, using the corrplot.mixed function, I'm going to also include the correlation coefficient itself in addition to the color visuals.
corrplot.mixed(M)
Now, we are going to move into detecting if we need interaction terms in our model through visual detection. The two variables we will be using are region (Global South vs. Global North) and the percent of women in parliament. These are categorical and continuous variables, respectively.
library(ggplot2)
#QPlot stands for "Quick Plot"
#First, we are going to see if there's any visual association between the percent of women in parliament and more progressive LGBTI policies
qplot(x = women_parliament, y = policy_index, data = lgbti_data) +
geom_smooth(method = "lm") #this option is including the line of best fit
#Now, we are going to see if this association changes based on whether a country is in the Global North or the Global South
qplot(x = women_parliament, y = policy_index, facets = ~region, data = lgbti_data) +
geom_smooth(method = "lm") + #this option is including the line of best fit
theme_bw() #Removes the gray background
#Alternatively, we can plot them on the same graph in case the side by side comparisons were too difficult to visually detect.
qplot(x = women_parliament, y = policy_index, data = lgbti_data, color = region) +
geom_smooth(method = "lm") +
theme_bw()+ #Removes the gray background
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
legend.key = element_blank()) + #Removes the lines
xlab("Percent (%) of Women in Parliament") + #changing the labels of the axes
ylab("LGBTI Policy Index Score")
Based on this visual inspection, it appears that an interaction term is warranted because the slopes of the two lines are different. The effect of women in parliament on progressive LGBTI policies may be contingent, therefore, on the world region
#Predicting a county's score on the LGBTI Policy Index using women in parliament (continuous) and region (categorical)
lgbti.model.1 <- lm(policy_index ~
women_parliament +
region,
data = lgbti_data)
#Re-Running the model with the interaction
lgbti.model.2 <- lm(policy_index ~
women_parliament * # using the astrisk (*) instead of the plus (+)
region,
data = lgbti_data)
library(stargazer)
Please cite as:
Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
# Now, we can look at the results from our models to see if there is indeed a significant interaction.
#R does not automatically report the results from a model, we must ask for them. There are a few ways of doing so, the built-in function is to type summary(x). The output isn't visually appealing, however, so I will be using the stargazer package that produces a more quality regression table.
stargazer(lgbti.model.1, lgbti.model.2,type="text",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=FALSE,
notes.append = FALSE,
header=FALSE)
length of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changednumber of rows of result is not a multiple of vector length (arg 2)
=========================================================================================
Dependent variable:
-----------------------------------------------------
policy_index
Main Effects Interaction
(1) (2)
-----------------------------------------------------------------------------------------
Constant 2.399*** 1.056***
(0.097) (0.143)
women_parliament 0.072*** 0.136***
(0.003) (0.006)
regionGlobal South -2.842*** -1.128***
(0.083) (0.160)
women_parliament:regionGlobal South -0.089***
(0.007)
-----------------------------------------------------------------------------------------
Observations 4,341 4,341
R2 0.332 0.355
Adjusted R2 0.332 0.355
Residual Std. Error 2.377 (df = 4338) 2.336 (df = 4337)
F Statistic 1,078.067*** (df = 2; 4338) 796.457*** (df = 3; 4337)
=========================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
library(jtools) # for summ()
#Besides stargazer, the summ() command in the jtools package also allows for cleaner regression tables.
summ(lgbti.model.2)
[4mMODEL INFO:[24m
[3mObservations:[23m 4341
[3mDependent Variable:[23m policy_index
[3mType:[23m OLS linear regression
[4mMODEL FIT:[24m
[3mF[23m(3,4337) = 796.46, [3mp[23m = 0.00
[3mR² = [23m0.36
[3mAdj. R² = [23m0.35
[3mStandard errors: OLS[23m
----------------------------------------------------------------
Est. S.E. t val. p
--------------------------------- ------- ------ -------- ------
(Intercept) 1.06 0.14 7.36 0.00
women_parliament 0.14 0.01 22.59 0.00
regionGlobal South -1.13 0.16 -7.06 0.00
women_parliament:regionGlobal -0.09 0.01 -12.50 0.00
South
----------------------------------------------------------------
library(interactions)
#Plotting the interactions with the Interaction package
interact_plot(lgbti.model.2, pred = women_parliament, modx = region)
#One thing you'll notice is that the default plot visuals are different for each package
Visually inspecting whether two continuous variables are moderating one another and, therefore, in need of an interaction in your model is a bit more difficult. One strategy is to convert one of the continuous variables into factor variables. For example, let’s say that the effect of women in pariament is conditioned on the level of democracy. Since democracy is continuous, I will covert it into three groups: mean and +/- 1 standard deviation from the mean.
#Loading dplyr for data manipulation
library(dplyr)
#I'm putting the democracy variable, polity2_vdem, into an object called "x," which I can then manipulate.
x <- lgbti_data$polity2_vdem
#Now, I am creating a new variable called "democracy" that will group countries together based on their value of x (the polity2_vdem variable)
lgbti_data$democracy <-
case_when(x > mean(x)+sd(x) ~ "Strongly Democratic",
x < mean(x)+sd(x) & x > mean(x)-sd(x) ~ "Average Level of Democracy",
x < mean(x)-sd(x) ~ "Weakly Democratic")
# Seeing how many observations fall into each category
count(lgbti_data, democracy)
Now, we are going to plot this.
library(ggplot2)
lgbti_data %>% # These symbols are operators that essentially mean "do the following function"
ggplot() +
aes(x = women_parliament, y = policy_index, group = democracy, color = democracy) +
geom_point(color = "grey", alpha = .7) +
geom_smooth(method = "lm") +
theme_bw()+ #Removes the gray background
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
legend.key = element_blank()) + #Removes the lines
xlab("Percent (%) of Women in Parliament") + #changing the labels of the axes
ylab("LGBTI Policy Index Score")
Given that there are different slopes based on the level of democracy, it indeed appears that an interaction effect may again be warranted within our models. Next, we are going to go ahead and test for this.
#Running the model with an interaction
lgbti.model.3 <- lm(policy_index
~polity2_vdem +
women_parliament,
data =lgbti_data)
#Re-Running the model with an interaction
lgbti.model.4 <- lm(policy_index
~polity2_vdem *
women_parliament,
data =lgbti_data)
stargazer(lgbti.model.3, lgbti.model.4,type="text",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=FALSE,
notes.append = FALSE,
header=FALSE)
length of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changednumber of rows of result is not a multiple of vector length (arg 2)
=====================================================================================
Dependent variable:
-------------------------------------------------------
policy_index
Main Effects Interaction
(1) (2)
-------------------------------------------------------------------------------------
Constant -0.537*** -0.174***
(0.063) (0.061)
polity2_vdem 0.206*** 0.020**
(0.006) (0.009)
women_parliament 0.085*** 0.057***
(0.003) (0.003)
polity2_vdem:women_parliament 0.011***
(0.0004)
-------------------------------------------------------------------------------------
Observations 4,341 4,341
R2 0.358 0.446
Adjusted R2 0.358 0.446
Residual Std. Error 2.331 (df = 4338) 2.165 (df = 4337)
F Statistic 1,209.942*** (df = 2; 4338) 1,164.134*** (df = 3; 4337)
=====================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Next, we are going to plot this interaction to better understand the effect of women in parliament and democracy on LGBTI policies. Because this is a continuous x continuous interaction, we are going to plot the effect of women in parliament at the mean and +/- one standard deviation around the mean
#Getting the standard deviation and means for women in parliament
women.sd <- c(mean(lgbti_data$women_parliament)-sd(lgbti_data$women_parliament),
mean(lgbti_data$women_parliament),
mean(lgbti_data$women_parliament)+sd(lgbti_data$women_parliament))
#rounding to two decimal places
women.sd <- round(women.sd, 2)
women.sd
[1] 5.15 16.57 27.98
#Now we are going to run the interaction using the effects package
library(effects)
Loading required package: carData
lattice theme set by effectsTheme()
See ?effectsTheme for details.
# Running the interaction at mean and +/- one standard deviation
interaction.sd <- effect(c("polity2_vdem*women_parliament"), lgbti.model.4,
xlevels=list(polity2_vdem=c(-3, 3, 9),
women_parliament=c(5.1, 15.39, 27.69)))
# put data in data frame
interaction.sd <- as.data.frame(interaction.sd)
# Create factors of the different variables in your interaction
interaction.sd$women <-factor(interaction.sd$women,
levels=c(5.1, 15.39, 27.69),
labels=c("1 SD Below Mean", "Mean", "1 SD Above Mean"))
interaction.sd$democracy <-factor(interaction.sd$polity2_vdem,
levels=c(-3, 3, 9),
labels=c("Authoritarian", "Transition", "Democratic"))
Now making the plot
library(ggplot2)
# Time to plot the interaction
Plot.SD<-ggplot(data=interaction.sd, aes(x=women, y=fit, group=democracy))+
geom_line(size=1, aes(color=democracy))+ #Can adjust the thickness of your lines
geom_point(aes(colour = democracy), size=2)+ #Can adjust the size of your points
geom_ribbon(aes(ymin=fit-se, ymax=fit+se),fill="gray",alpha=.6)+ #Can adjust your error bars
ylim(-2,10)+ #Puts a limit on the y-axis
ylab("LGBTI Policy Index: Predicted Values")+ #Adds a label to the y-axis
xlab("Women in Parliament")+ #Adds a label to the x-axis
ggtitle("Standard Deviation Plot")+ #Title
theme_bw()+ #Removes the gray background
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
legend.key = element_blank())+ #Removes the lines
scale_fill_grey()
Plot.SD
library(sjPlot)
Learn more about sjPlot with 'browseVignettes("sjPlot")'.
library(sjmisc)
Attaching package: ‘sjmisc’
The following objects are masked from ‘package:jtools’:
%nin%, center
library(ggplot2)
plot_model(lgbti.model.4, type = "int", terms = c("women_parliament", "polity2_vdem")) + theme_bw() #Removes the gray background
Still using the same package, we are now going to plot using the standard deviations similar to before
plot_model(lgbti.model.4, type = "int", mdrt.values = "meansd") + theme_bw() #Removes the gray background
NA
NA
#With this package, adding the plot.points = TRUE option includes the specific plot points and automatically color codes them spending on which category they fall into on the moderating variable (modx)
interact_plot(lgbti.model.4, pred = women_parliament, modx = polity2_vdem,
plot.points = TRUE)
Next, we are going to compute the slope (the effect or Beta coefficient) for women in parliament on LGBTI Policy Index while holding the value of the moderator variable, democracy, constant at values running from -10 to 10. To do this, we will find the total coefficient for women in parliament in the model equation for each value of democracy.
library(msm)
lgbti.model.4$coef
(Intercept) polity2_vdem women_parliament
-0.17380034 0.02039863 0.05719045
polity2_vdem:women_parliament
0.01098982
at.democracy <- seq(-10, 10, 2) #go from -10 to 10 by 2
slopes <- lgbti.model.4$coef[3] + lgbti.model.4$coef[4]*at.democracy
slopes
[1] -0.052707742 -0.030728102 -0.008748463 0.013231176 0.035210816 0.057190455
[7] 0.079170094 0.101149734 0.123129373 0.145109012 0.167088652
estmean<-coef(lgbti.model.4)
var<-vcov(lgbti.model.4)
SEs <- rep(NA, length(at.democracy))
for (i in 1:length(at.democracy)){
j <- at.democracy[i]
SEs[i] <- deltamethod (~ (x3) + (x4)*j, estmean, var)
}
upper <- slopes + 1.96*SEs
lower <- slopes - 1.96*SEs
cbind(at.democracy, slopes, upper, lower)
at.democracy slopes upper lower
[1,] -10 -0.052707742 -0.0409849244 -0.064430559
[2,] -8 -0.030728102 -0.0204095842 -0.041046620
[3,] -6 -0.008748463 0.0002464115 -0.017743337
[4,] -4 0.013231176 0.0210242690 0.005438084
[5,] -2 0.035210816 0.0419891190 0.028432512
[6,] 0 0.057190455 0.0632358727 0.051145037
[7,] 2 0.079170094 0.0848742446 0.073465944
[8,] 4 0.101149734 0.1069734921 0.095325975
[9,] 6 0.123129373 0.1295077404 0.116751005
[10,] 8 0.145109012 0.1523780945 0.137839930
[11,] 10 0.167088652 0.1754781758 0.158699127
#Now, we are going to plot it
plot(at.democracy, slopes, type = "l", lty = 1, ylim = c(-.1, 1), xlab = "Level of Democracy", ylab = "Marginal Effect of Women in Parliament")
points(at.democracy, upper, type = "l", lty = 2)
points(at.democracy, lower, type = "l", lty = 2)
points(at.democracy, rep(0, length(at.democracy)), type = "l", col = "gray")
NA
NA
Interpretting the above plot: What this plot demonstrates is the slope/effect/beto coefficient of women in parliament at diffferent values of democracy. For example, when democracy is -5, there is no effect for women in parliament. When democracy is 10, however, the effect of women in parliament goes up to .20. So when democracy is at 10, for every one percent increase in the amount of women in parliament, this is associated with a .20 increase on the LGBTI Policy Index.
# Three-Way Interaction
lgbti.model.5 <- lm(policy_index
~polity2_vdem +
women_parliament +
region,
data =lgbti_data)
lgbti.model.6 <- lm(policy_index
~polity2_vdem *
women_parliament *
region,
data =lgbti_data)
stargazer(lgbti.model.5, lgbti.model.6,type="text",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=FALSE,
notes.append = FALSE,
header=FALSE)
length of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changednumber of rows of result is not a multiple of vector length (arg 2)
======================================================================================================
Dependent variable:
-----------------------------------------------------
policy_index
Main Effects Interaction
(1) (2)
------------------------------------------------------------------------------------------------------
Constant 1.285*** 1.744***
(0.098) (0.247)
polity2_vdem 0.154*** -0.115***
(0.006) (0.030)
women_parliament 0.070*** 0.022**
(0.003) (0.010)
regionGlobal South -1.948*** -1.954***
(0.083) (0.254)
polity2_vdem:women_parliament 0.016***
(0.001)
polity2_vdem:regionGlobal South 0.139***
(0.031)
women_parliament:regionGlobal South 0.019*
(0.010)
polity2_vdem:women_parliament:regionGlobal South -0.009***
(0.001)
------------------------------------------------------------------------------------------------------
Observations 4,341 4,341
R2 0.430 0.513
Adjusted R2 0.429 0.512
Residual Std. Error 2.197 (df = 4337) 2.031 (df = 4333)
F Statistic 1,089.338*** (df = 3; 4337) 652.501*** (df = 7; 4333)
======================================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
#Plotting using the sjPlot package
plot_model(lgbti.model.6, type = "pred", terms = c("polity2_vdem", "women_parliament[4.73, 15.75,26.77]", "region")) + theme_bw() #Removes the gray background
#Plotting using interactions package
probe_interaction(lgbti.model.6, pred = women_parliament, modx = region, mod2 = polity2_vdem,
alpha = .1)
Johnson-Neyman intervals are not available for factor moderators.
███████████████████ While polity2_vdem (2nd moderator) = -3.16 (- 1 SD) ███████████████████
[1m[4mSIMPLE SLOPES ANALYSIS[24m[22m
[3mSlope of women_parliament when region = Global South:
[23m Est. S.E. t val. p
------ ------ -------- ------
0.02 0.00 4.65 0.00
[3mSlope of women_parliament when region = Global North:
[23m Est. S.E. t val. p
------- ------ -------- ------
-0.03 0.01 -2.38 0.02
████████████████████ While polity2_vdem (2nd moderator) = 3.31 (Mean) ████████████████████
[1m[4mSIMPLE SLOPES ANALYSIS[24m[22m
[3mSlope of women_parliament when region = Global South:
[23m Est. S.E. t val. p
------ ------ -------- ------
0.06 0.00 18.18 0.00
[3mSlope of women_parliament when region = Global North:
[23m Est. S.E. t val. p
------ ------ -------- ------
0.08 0.01 11.24 0.00
███████████████████ While polity2_vdem (2nd moderator) = 9.77 (+ 1 SD) ███████████████████
[1m[4mSIMPLE SLOPES ANALYSIS[24m[22m
[3mSlope of women_parliament when region = Global South:
[23m Est. S.E. t val. p
------ ------ -------- ------
0.11 0.01 20.00 0.00
[3mSlope of women_parliament when region = Global North:
[23m Est. S.E. t val. p
------ ------ -------- ------
0.18 0.01 29.75 0.00
Interaction tutorials provided by University of Illinois-Chicago