All weekly notes and codes are licensed under CC-By-SA 4.0 unless otherwise stated. For other resources redirected from this page, please refer to the authorsâ website for licensing information.
Refer to Week 3 notes for an introduction of R Markdown. You may download the codes for this html document or read the notes on this site.
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 3.6.3
## Warning: package 'survival' was built under R version 3.6.3
library(dplyr)
library(psych) #for mediation model
library(gt) #for drawing tables
## Warning: package 'gt' was built under R version 3.6.3
library(gtsummary) #for drawing tables
library(ggplot2) #for plotting graphs
library(emmeans) #for simple slopes
library(lavaan) #for path/SEM
## Warning: package 'lavaan' was built under R version 3.6.3
We will continue to use the World Value Survey - Wave 6 Algeria data for examples in this week. Last week, we discussed how to recode values into missing for a specific variable. However, the world value survey data have a uniform way of setting missing data, that is, any values in the raw variables that are negative are missing. In this case, we can set NA for all variables in the beginning.
data <- read.csv("https://raw.githubusercontent.com/manyu26/PsychLearnR/master/psychlearnr_data.csv")
data[data<0]<-NA
Linear models deal with both continuous variables and categorical variables. Categorical variables we have used so far include
The World Value Survey has limited continuous variables, but in Week 2 notes, we have created subscales score for the schwartz scale. (Note that they do not have good reliability, but we are using them to demonstrate our examples.)
The codes are repeated here.
caretake <- dplyr::select(data, V74B,V78,V79)
caretake[sapply(caretake, function(x) x %in% "-2")] <- NA
data$caretake_mean<-rowMeans(caretake)
security<-dplyr::select(data,V72,V77)
security[sapply(security,function(x) x %in% "-2")] <-NA
data$security_mean<-rowMeans(security)
excite <- dplyr::select(data, V70, V71, V73, V75, V76)
excite[sapply(excite, function(x) x %in% "-2")] <- NA
data$excite_mean<-rowMeans(excite)
Additionally, the World Value Survey calculated two value scores
Correlations are often performed before analyzing a linear model. Feel free to skip to the next section if you are familiar with correlation analysis in R.
There are various ways to calculate correlations (pearson, spearman’s, kendall’s). We will only cover one way here (my favorite method). This method allows you to get a correlation matrix rather than pairwise correlations only. It is a two-step method. In the first step, you select the variables you want to correlate. In the second step, you run the correlation using rcorr() within the Hmisc package.
schwartz<-dplyr::select(data,security_mean, caretake_mean, excite_mean, SACSECVAL, RESEMAVAL)
rcorr(as.matrix(schwartz), type="pearson")
## security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean 1.00 0.52 0.31 0.12 0.06
## caretake_mean 0.52 1.00 0.15 0.27 0.10
## excite_mean 0.31 0.15 1.00 -0.02 -0.05
## SACSECVAL 0.12 0.27 -0.02 1.00 0.22
## RESEMAVAL 0.06 0.10 -0.05 0.22 1.00
##
## n
## security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean 1123 1085 1053 1103 1113
## caretake_mean 1085 1101 1040 1083 1094
## excite_mean 1053 1040 1059 1041 1052
## SACSECVAL 1103 1083 1041 1165 1151
## RESEMAVAL 1113 1094 1052 1151 1173
##
## P
## security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean 0.0000 0.0000 0.0000 0.0640
## caretake_mean 0.0000 0.0000 0.0000 0.0010
## excite_mean 0.0000 0.0000 0.5178 0.0831
## SACSECVAL 0.0000 0.0000 0.5178 0.0000
## RESEMAVAL 0.0640 0.0010 0.0831 0.0000
The default of rcorr() is pairwise deletion. To do listwide deletion, you have to remove your incomplete data first.
schwartz<-dplyr::select(data,security_mean, caretake_mean, excite_mean, SACSECVAL, RESEMAVAL)
schwartz_complete<-schwartz[complete.cases(schwartz),]
cor.matrix<-rcorr(as.matrix(schwartz_complete), type="pearson")
cor.matrix
## security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean 1.00 0.51 0.29 0.12 0.03
## caretake_mean 0.51 1.00 0.13 0.27 0.08
## excite_mean 0.29 0.13 1.00 -0.02 -0.06
## SACSECVAL 0.12 0.27 -0.02 1.00 0.21
## RESEMAVAL 0.03 0.08 -0.06 0.21 1.00
##
## n= 1014
##
##
## P
## security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean 0.0000 0.0000 0.0002 0.4172
## caretake_mean 0.0000 0.0000 0.0000 0.0124
## excite_mean 0.0000 0.0000 0.5044 0.0726
## SACSECVAL 0.0002 0.0000 0.5044 0.0000
## RESEMAVAL 0.4172 0.0124 0.0726 0.0000
To make it into a beautiful table, you first have to understand your output. The rcorr() output contains three components, r, n, and p. In the correlation table we just obtained, N was the same across all pairs (because we used listwise deletion), so we only need to include r and p in our main table. To extract these components, simply use *\(* (e.g., ```cor.matrix\)r```)
cor_df_r<-as.data.frame.matrix(round(cor.matrix$r,2)) #round to 2 d.p.
cor_df_r
## security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean 1.00 0.51 0.29 0.12 0.03
## caretake_mean 0.51 1.00 0.13 0.27 0.08
## excite_mean 0.29 0.13 1.00 -0.02 -0.06
## SACSECVAL 0.12 0.27 -0.02 1.00 0.21
## RESEMAVAL 0.03 0.08 -0.06 0.21 1.00
The codes for p-values are a little longer since we need to make adjustments for APA reporting style.
cor_df_p<-as.data.frame.matrix(round(cor.matrix$P,3)) #round to 3 d.p.
cor_df_p[cor_df_p == 0] <- "< .001"
cor_df_p[is.na(cor_df_p)] <- "-"
cor_df_p
## security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean - < .001 < .001 < .001 0.417
## caretake_mean < .001 - < .001 < .001 0.012
## excite_mean < .001 < .001 - 0.504 0.073
## SACSECVAL < .001 < .001 0.504 - < .001
## RESEMAVAL 0.417 0.012 0.073 < .001 -
Now that we have a data frame for our output, we can use different functions to make it a table (e.g., gt mentioned in Week 3 or kable). There are many ways you can do it, but note that it is more complicated if you want to use asterisks instead of reporting exact p-values.
To conduct a simple linear regression testing how emancipative values (RESEMAVAL) predict secular values (SACSECVAL), we place the outcome variable on the left-hand side of the formula, and the predicting variable on the right-hand side of the formula. Similar to ANOVA, we use summary() and plot() to obtain summary statistics and relevant diagnostic plots.
reg<- lm(SACSECVAL~RESEMAVAL, data=data)
summary(reg)
##
## Call:
## lm(formula = SACSECVAL ~ RESEMAVAL, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38898 -0.09001 0.00012 0.09121 0.54104
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.28453 0.01047 27.185 < 2e-16 ***
## RESEMAVAL 0.22469 0.02953 7.609 5.74e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1407 on 1149 degrees of freedom
## (49 observations deleted due to missingness)
## Multiple R-squared: 0.04797, Adjusted R-squared: 0.04714
## F-statistic: 57.89 on 1 and 1149 DF, p-value: 5.74e-14
plot(reg)
To transform it into a table, gtsummary package seems to be currently the easiest one. It automatically identify linear, logistic, and cox regression models.
simpletable<- reg %>%
tbl_regression(label = list(RESEMAVAL="Emancipative Values"))
#or
#tbl_regression(reg, label = list(RESEMAVAL="Emancipative Values"))
simpletable
| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| Emancipative Values | 0.22 | 0.17, 0.28 | <0.001 |
|
1
CI = Confidence Interval
|
|||
Note that it doesn’t print standard error. So it may be better to get standardized coefficient, rahter than unstandardized.
regstd<- lm(scale(SACSECVAL)~scale(RESEMAVAL), data=data)
summary(regstd)
##
## Call:
## lm(formula = scale(SACSECVAL) ~ scale(RESEMAVAL), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6797 -0.6201 0.0008 0.6284 3.7273
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.01179 0.02857 0.413 0.68
## scale(RESEMAVAL) 0.21822 0.02868 7.609 5.74e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9692 on 1149 degrees of freedom
## (49 observations deleted due to missingness)
## Multiple R-squared: 0.04797, Adjusted R-squared: 0.04714
## F-statistic: 57.89 on 1 and 1149 DF, p-value: 5.74e-14
simpletable<-tbl_regression(regstd, label = list("scale(RESEMAVAL)"="Emancipative Values"))
Multiple linear regression is simply adding more terms to your SLR formular using “+” signs. E.g., to predict secular values from emancipative values and excitement subscale of schwartz value scale:
regmult<- lm(SACSECVAL~RESEMAVAL+excite_mean, data=data)
summary(regmult)
##
## Call:
## lm(formula = SACSECVAL ~ RESEMAVAL + excite_mean, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38307 -0.08928 0.00138 0.09285 0.46561
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2847387 0.0159289 17.876 < 2e-16 ***
## RESEMAVAL 0.2219528 0.0311529 7.125 1.95e-12 ***
## excite_mean -0.0009693 0.0039379 -0.246 0.806
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1389 on 1033 degrees of freedom
## (164 observations deleted due to missingness)
## Multiple R-squared: 0.04721, Adjusted R-squared: 0.04536
## F-statistic: 25.59 on 2 and 1033 DF, p-value: 1.42e-11
library(car)
## Warning: package 'car' was built under R version 3.6.1
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
## The following object is masked from 'package:dplyr':
##
## recode
vif(regmult)
## RESEMAVAL excite_mean
## 1.003214 1.003214
For standardized version,
regmultstd<- lm(scale(SACSECVAL)~scale(RESEMAVAL)+scale(excite_mean), data=data)
summary(regmultstd)
##
## Call:
## lm(formula = scale(SACSECVAL) ~ scale(RESEMAVAL) + scale(excite_mean),
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6390 -0.6150 0.0095 0.6397 3.2076
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.011434 0.029743 -0.384 0.701
## scale(RESEMAVAL) 0.215557 0.030255 7.125 1.95e-12 ***
## scale(excite_mean) -0.007377 0.029969 -0.246 0.806
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9571 on 1033 degrees of freedom
## (164 observations deleted due to missingness)
## Multiple R-squared: 0.04721, Adjusted R-squared: 0.04536
## F-statistic: 25.59 on 2 and 1033 DF, p-value: 1.42e-11
To create table, use the same method as SLR.
multitable<- regmultstd %>%
tbl_regression(label = list("scale(RESEMAVAL)"="Emancipative Values", "scale(excite_mean)"="Value of Excitement"))
multitable
| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| Emancipative Values | 0.22 | 0.16, 0.27 | <0.001 |
| Value of Excitement | -0.01 | -0.07, 0.05 | 0.8 |
|
1
CI = Confidence Interval
|
|||
mergetable<-
tbl_merge(
tbls = list(simpletable, multitable),
tab_spanner = c("**Step 1 Model**", "**Step 2 Model**")) #double asterisks are to bold texts
mergetable
| Characteristic | Step 1 Model | Step 2 Model | ||||
|---|---|---|---|---|---|---|
| Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
| Emancipative Values | 0.22 | 0.16, 0.27 | <0.001 | 0.22 | 0.16, 0.27 | <0.001 |
| Value of Excitement | -0.01 | -0.07, 0.05 | 0.8 | |||
|
1
CI = Confidence Interval
|
||||||
If you want to report results of a certain row of your table in text, gtsummary has a helpful function inline_text() that generates the text representation of your results. Simply indicate which row you want to report. (You can also use the merged table version but the code will become unnecessarily compplicate.)
inline_text(multitable, variable = "scale(RESEMAVAL)")
## [1] "0.22 (95% CI 0.16, 0.27; p<0.001)"
Practice: Build a multiple linear regression by picking variables you want to examine (e.g., sex and excitement values predicting secular values, etc.). See if you can generate a table with the gtsummary package.
To compare models or conduct hierarchical linear regression, simply run each model and then use anova() to obtain F-test results on the differences across models. However, make sure your sample sizes is the same across all linear regressions (i.e., removing missing data to achieve listwise deletion across the models)
demo<-dplyr::select(data, SACSECVAL, V240,RESEMAVAL,excite_mean)
demo<-demo[complete.cases(demo),]
model1<-lm(SACSECVAL~V240, data=demo)
model2<-lm(SACSECVAL~V240+RESEMAVAL, data=demo)
model3<-lm(SACSECVAL~V240+RESEMAVAL+excite_mean, data=demo)
anova(model1, model2, model3)
## Analysis of Variance Table
##
## Model 1: SACSECVAL ~ V240
## Model 2: SACSECVAL ~ V240 + RESEMAVAL
## Model 3: SACSECVAL ~ V240 + RESEMAVAL + excite_mean
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1034 20.918
## 2 1033 19.864 1 1.05442 54.7837 2.786e-13 ***
## 3 1032 19.863 1 0.00086 0.0445 0.8329
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Moderation can be conducted using the lm() function plus simple slope analysis with the emmeans package. We covered categorical x categorical simple slope test using emmeans in Week 3. So we will cover the other situations here.
When conducting moderation using linear regression, zero-centering is a common practice. In R, the function scale() allows users to center and/or standardized. Since we only want to center without standardized, we will say “true” to center, but “false” to scale in the arguments.
describe(data$excite_mean)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1059 2.77 1.1 2.6 2.72 1.19 1 6 5 0.45 -0.18 0.03
data$excite_mean<-scale(data$excite_mean, center=T, scale=F)
describe(data$excite_mean) #check that mean is zero.
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1059 0 1.1 -0.17 -0.05 1.19 -1.77 3.23 5 0.45 -0.18 0.03
data$security_mean<-scale(data$security_mean, center=T, scale=F)
describe(data$security_mean)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1123 0 1.11 -0.26 -0.12 1.48 -1.26 3.74 5 0.85 0.52 0.03
data$SACSECVAL<-scale(data$SACSECVAL, center=T, scale=F)
describe(data$SACSECVAL)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1165 0 0.15 0 0 0.14 -0.36 0.63 0.98 0.05 0.14 0
data$RESEMAVAL<-scale(data$RESEMAVAL, center=T, scale=F)
describe(data$RESEMAVAL)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1173 0 0.14 0.01 0 0.15 -0.32 0.46 0.79 0.12 -0.32 0
In this example, we are predicting excitement value (i.e., perceived importance of having stimulation/excitement in life) from security value (i.e., perceived importance of behaving properly and living in security). This example also tests whether perceived social status moderates the relationship (e.g., would people with high social status show a stronger relationship between excitement and secular values than people with low social status?)
mod1<-lm(excite_mean~ security_mean*factor(V238), data)
summary(mod1)
##
## Call:
## lm(formula = excite_mean ~ security_mean * factor(V238), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1237 -0.7511 -0.1013 0.6258 3.0137
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.3589 0.2624 -1.368 0.1717
## security_mean 0.6504 0.2824 2.303 0.0215 *
## factor(V238)2 0.3752 0.2701 1.389 0.1651
## factor(V238)3 0.3703 0.2682 1.381 0.1677
## factor(V238)4 0.4344 0.2704 1.606 0.1086
## factor(V238)5 0.4145 0.2856 1.452 0.1469
## security_mean:factor(V238)2 -0.4626 0.2887 -1.602 0.1095
## security_mean:factor(V238)3 -0.3757 0.2869 -1.309 0.1907
## security_mean:factor(V238)4 -0.5106 0.2893 -1.765 0.0779 .
## security_mean:factor(V238)5 -0.1275 0.2978 -0.428 0.6688
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.037 on 964 degrees of freedom
## (226 observations deleted due to missingness)
## Multiple R-squared: 0.09098, Adjusted R-squared: 0.0825
## F-statistic: 10.72 on 9 and 964 DF, p-value: 5.645e-16
Interaction/Moderation results were not significant, but what if it was? We need to run a simple slope analysis to observe the pattern. I outline the methods here, but for detailed explanation, Decomposing, Probing, and Plotting Interactions in R by IDRE at UCLA is a great guide.
To obtain simple slopes predicting excitement from security values for each level of social status (i.e., each level of a categorical variable), we use the emtrends() function in the emmeans package.
emtrends(mod1, ~V238, var="security_mean")
## V238 security_mean.trend SE df lower.CL upper.CL
## 1 0.650 0.2824 964 0.0962 1.205
## 2 0.188 0.0600 964 0.0700 0.306
## 3 0.275 0.0505 964 0.1756 0.374
## 4 0.140 0.0629 964 0.0164 0.263
## 5 0.523 0.0946 964 0.3373 0.709
##
## Confidence level used: 0.95
To test differences in slopes (i.e., whether upper class differs from working class in the predictiveness of excitement from security values - would people in upper class have a stronger/weaker relationship between security and excitement values?), we use pairewise().
emtrends(mod1, pairwise~V238, var="security_mean")
## $emtrends
## V238 security_mean.trend SE df lower.CL upper.CL
## 1 0.650 0.2824 964 0.0962 1.205
## 2 0.188 0.0600 964 0.0700 0.306
## 3 0.275 0.0505 964 0.1756 0.374
## 4 0.140 0.0629 964 0.0164 0.263
## 5 0.523 0.0946 964 0.3373 0.709
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## 1 - 2 0.4626 0.2887 964 1.602 0.4964
## 1 - 3 0.3757 0.2869 964 1.309 0.6854
## 1 - 4 0.5106 0.2893 964 1.765 0.3949
## 1 - 5 0.1275 0.2978 964 0.428 0.9930
## 2 - 3 -0.0869 0.0785 964 -1.108 0.8026
## 2 - 4 0.0481 0.0869 964 0.553 0.9816
## 2 - 5 -0.3351 0.1120 964 -2.991 0.0238
## 3 - 4 0.1350 0.0806 964 1.674 0.4508
## 3 - 5 -0.2482 0.1072 964 -2.315 0.1409
## 4 - 5 -0.3832 0.1136 964 -3.374 0.0069
##
## P value adjustment: tukey method for comparing a family of 5 estimates
To visualize the moderation, you may use emmip() function and include the variables you want to use as x-axis and grouping varible (t) using the format of t~x. at=list(...) is to define the range for x-axis and t (optional). (Note: ggplot can achieve the same purpose as well, but emmip is easier since it directly takes the linear model you ran and apply it to the chart.)
emmip(mod1, V238~security_mean, at=list(security_mean=seq(1,6,by=.5)),
xlab = "Security Value",
ylab = "Excitement Value",
tlab = "Perceived Social Status")
The above example is to treat the categorical variable as a moderator, and continuous variable as explanatory variable. If you want to flip the role, it’s more complicated but doable. First, we are no longer examining the simple slopes (because it doesn’t make sense to interpret the relationship between categorical variable and continuous variable as slopes). Instead, we are examining the simple effects of categorical predictor on continuous outcome by different levels of a continuous moderator. Therefore, we will use the emmeans() function instead of emtrends().
Use at=list(...) to indicate what levels of the continuous moderator you want to test. Traditionally, we use -1SD, mean, and +1SD of the variable.
(Note that because I didn’t take away missing data before, I have to remove them now using na.rm=TRUE. mean() and sd() do not take in missing values. Our regression analysis also excludes missing cases. So it is actually better to remove missingness upfront.)
security_low<-mean(data$security_mean, na.rm=T)-sd(data$security_mean, na.rm=T)
security_mid<-mean(data$security_mean, na.rm=T)
security_high<-mean(data$security_mean, na.rm=T)+sd(data$security_mean, na.rm=T)
security_lmh<-c(security_low, security_mid, security_high)
contmod<-emmeans(mod1, ~ security_mean*V238, at=list(security_mean=security_lmh))
contrast(contmod, "pairwise", by="security_mean")
## security_mean = -1.11:
## contrast estimate SE df t.ratio p.value
## 1 - 2 -0.88810 0.2667 964 -3.330 0.0080
## 1 - 3 -0.78689 0.2637 964 -2.984 0.0243
## 1 - 4 -1.00057 0.2705 964 -3.699 0.0021
## 1 - 5 -0.55585 0.2996 964 -1.856 0.3423
## 2 - 3 0.10121 0.1211 964 0.836 0.9194
## 2 - 4 -0.11247 0.1353 964 -0.831 0.9209
## 2 - 5 0.33225 0.1867 964 1.780 0.3861
## 3 - 4 -0.21368 0.1292 964 -1.654 0.4632
## 3 - 5 0.23104 0.1823 964 1.267 0.7115
## 4 - 5 0.44472 0.1921 964 2.315 0.1408
##
## security_mean = 0.00:
## contrast estimate SE df t.ratio p.value
## 1 - 2 -0.37519 0.2701 964 -1.389 0.6347
## 1 - 3 -0.37033 0.2682 964 -1.381 0.6403
## 1 - 4 -0.43436 0.2704 964 -1.606 0.4938
## 1 - 5 -0.41452 0.2856 964 -1.452 0.5943
## 2 - 3 0.00486 0.0849 964 0.057 1.0000
## 2 - 4 -0.05917 0.0916 964 -0.646 0.9674
## 2 - 5 -0.03933 0.1296 964 -0.303 0.9982
## 3 - 4 -0.06403 0.0861 964 -0.744 0.9461
## 3 - 5 -0.04419 0.1257 964 -0.351 0.9967
## 4 - 5 0.01984 0.1304 964 0.152 0.9999
##
## security_mean = 1.11:
## contrast estimate SE df t.ratio p.value
## 1 - 2 0.13772 0.5289 964 0.260 0.9990
## 1 - 3 0.04623 0.5261 964 0.088 1.0000
## 1 - 4 0.13185 0.5282 964 0.250 0.9991
## 1 - 5 -0.27319 0.5399 964 -0.506 0.9868
## 2 - 3 -0.09148 0.1220 964 -0.750 0.9446
## 2 - 4 -0.00587 0.1307 964 -0.045 1.0000
## 2 - 5 -0.41090 0.1721 964 -2.388 0.1193
## 3 - 4 0.08561 0.1188 964 0.720 0.9518
## 3 - 5 -0.31942 0.1632 964 -1.957 0.2883
## 4 - 5 -0.40503 0.1698 964 -2.385 0.1201
##
## P value adjustment: tukey method for comparing a family of 5 estimates
To visualize,
p<- emmip(mod1, security_mean~V238, at=list(security_mean=security_lmh),
xlab = "Perceived Social Status",
ylab = "Excitement Value",
tlab = "Security Value",
style="factor"
,CI=TRUE) #you can add CI to your plot as well.
p
emmip() generates line graph. To show bar graph (since our x-axis is categorical), use ggplot()
data$security_group[data$security_mean > security_high]<- "High"
data$security_group[data$security_mean < security_high & data$security_mean>= security_low]<-"Mid"
data$security_group[data$security_mean< security_low]<-"Low"
count(data, security_group)
## security_group n
## 1 High 200
## 2 Low 268
## 3 Mid 655
## 4 <NA> 77
filter(data, !is.na(data$security_group)) %>% #filter NA so it doesn't appear on graph
ggplot(aes(x=V238, y=excite_mean, fill=security_group))+
geom_bar(stat="identity",position="dodge")
## Warning: Removed 149 rows containing missing values (geom_bar).
#stat="identity" is to indicate that we want y=axis to be predicted values ratehr than simply a count of number of participants in each x category.
#dodge means side-by-side presentation instead of stacked bars.
In this example, we are predicting security value (i.e., perceived importance of behaving properly/having security in life) from secular value. This example also tests whether emancipative values moderates the relationship.
mod2<-lm(security_mean~SACSECVAL*RESEMAVAL, data)
summary(mod2)
##
## Call:
## lm(formula = security_mean ~ SACSECVAL * RESEMAVAL, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7002 -0.9457 -0.1963 0.7597 3.9535
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01938 0.03370 -0.575 0.565436
## SACSECVAL 0.81578 0.24049 3.392 0.000718 ***
## RESEMAVAL 0.21238 0.24072 0.882 0.377828
## SACSECVAL:RESEMAVAL 3.61411 1.67772 2.154 0.031444 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.094 on 1093 degrees of freedom
## (103 observations deleted due to missingness)
## Multiple R-squared: 0.01852, Adjusted R-squared: 0.01583
## F-statistic: 6.874 on 3 and 1093 DF, p-value: 0.0001376
We get a significant interaction! Let’s move on to simple slope analysis. Since we are dealing with continuous moderator, we need to identify levels. Again, traditionally, we use -1SD, Mean, and +1SD.
eman_low<-mean(data$RESEMAVAL, na.rm=T)-sd(data$RESEMAVAL, na.rm=T)
eman_mid<-mean(data$RESEMAVAL, na.rm=T)
eman_high<-mean(data$RESEMAVAL, na.rm=T)+sd(data$RESEMAVAL, na.rm=T)
(Note that because I didn’t take away missing data before, I have to remove them now using na.rm=TRUE. mean() and sd() do not take in missing values. Our regression analysis also excludes missing cases. So it is actually better to remove missingness upfront.)
Since we are doing simple slope not simple effect, we use emtrends() to obtain separate slopes for each level of emancipative values we define (i.e., -1SD, mean, +1SD).
emtrends(mod2, ~RESEMAVAL, var="SACSECVAL", at = list(RESEMAVAL=c(eman_low, eman_mid, eman_high)))
## RESEMAVAL SACSECVAL.trend SE df lower.CL upper.CL
## -0.141 0.306 0.349 1093 -0.378 0.991
## 0.000 0.816 0.240 1093 0.344 1.288
## 0.141 1.325 0.325 1093 0.687 1.963
##
## Confidence level used: 0.95
For people with high (+1SD) emancipative value, they show the strongest relationship between secular values and security values. In other words, for people who really desire freedom, secular values negatively and strongly predict their perceived importance of security (remember that security values are reverse-coded such that high score reflect low value.)
To visualize our results, we use a similar code as the continuous x categorical situation using the format of emmip(model_name, moderator~predictor, at = list(...))
emmip(mod2, RESEMAVAL~SACSECVAL
, at=list(SACSECVAL=seq(1,6,by=.5),RESEMAVAL= c(eman_low, eman_mid, eman_high))
, xlab="Secular values"
, ylab="Security values"
, tlab="Emancipative values"
,CI=T #optional
)
Let’s level up and do a three-factor interaction.
data$sex<-factor(data$V240, #optional - for plots only
levels=c(1,2)
,labels=c("Male","Female"))
mod3<-lm(security_mean~SACSECVAL*RESEMAVAL*sex, data)
summary(mod3)
##
## Call:
## lm(formula = security_mean ~ SACSECVAL * RESEMAVAL * sex, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6983 -0.9335 -0.2216 0.7402 3.8965
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.04282 0.04879 -0.878 0.38036
## SACSECVAL 0.86663 0.33100 2.618 0.00896 **
## RESEMAVAL 0.26404 0.34570 0.764 0.44517
## sexFemale 0.04785 0.06895 0.694 0.48783
## SACSECVAL:RESEMAVAL 1.02362 2.27666 0.450 0.65308
## SACSECVAL:sexFemale -0.23353 0.49121 -0.475 0.63458
## RESEMAVAL:sexFemale -0.17572 0.49151 -0.358 0.72078
## SACSECVAL:RESEMAVAL:sexFemale 6.14254 3.42832 1.792 0.07346 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.094 on 1089 degrees of freedom
## (103 observations deleted due to missingness)
## Multiple R-squared: 0.02263, Adjusted R-squared: 0.01635
## F-statistic: 3.602 on 7 and 1089 DF, p-value: 0.0007641
Let’s visualize first. The formular is formatted as emmip(model_name, legend/fill~X | facet)
emmip(mod3, RESEMAVAL~SACSECVAL | sex
, at = list(SACSECVAL=seq(1,6,by=1),RESEMAVAL= c(eman_low, eman_mid, eman_high))
, xlab="Secular values"
, ylab="Security values"
, tlab="Emancipative values"
)
To examine simple slope/contrasts, we request simple slopes for each two-way relationship by holding the third variable “constant”. For continuous variables, we may hold them “constant” by setting them at -1SD, mean, and +1SD (examples below).
emtrends(mod3, pairwise~sex|RESEMAVAL, var="SACSECVAL"
, at=list(RESEMAVAL= c(eman_low, eman_mid, eman_high))
)
## $emtrends
## RESEMAVAL = -0.141:
## sex SACSECVAL.trend SE df lower.CL upper.CL
## Male 0.722 0.440 1089 -0.1418 1.586
## Female -0.377 0.573 1089 -1.5011 0.747
##
## RESEMAVAL = 0.000:
## sex SACSECVAL.trend SE df lower.CL upper.CL
## Male 0.867 0.331 1089 0.2172 1.516
## Female 0.633 0.363 1089 -0.0790 1.345
##
## RESEMAVAL = 0.141:
## sex SACSECVAL.trend SE df lower.CL upper.CL
## Male 1.011 0.481 1089 0.0675 1.954
## Female 1.643 0.443 1089 0.7736 2.513
##
## Confidence level used: 0.95
##
## $contrasts
## RESEMAVAL = -0.141:
## contrast estimate SE df t.ratio p.value
## Male - Female 1.099 0.723 1089 1.522 0.1284
##
## RESEMAVAL = 0.000:
## contrast estimate SE df t.ratio p.value
## Male - Female 0.234 0.491 1089 0.475 0.6346
##
## RESEMAVAL = 0.141:
## contrast estimate SE df t.ratio p.value
## Male - Female -0.632 0.654 1089 -0.967 0.3337
To look at the relationship between the other continuous variable and the outcome variable by sex, simply change place:
secu_low<-mean(data$SACSECVAL, na.rm=T)-sd(data$SACSECVAL, na.rm=T)
secu_mid<-mean(data$SACSECVAL, na.rm=T)
secu_high<-mean(data$SACSECVAL, na.rm=T)+sd(data$SACSECVAL, na.rm=T)
emtrends(mod3, pairwise~sex|SACSECVAL , var="RESEMAVAL"
, at=list(SACSECVAL= c(secu_low, secu_mid, secu_high))
)
## $emtrends
## SACSECVAL = -0.145:
## sex RESEMAVAL.trend SE df lower.CL upper.CL
## Male 0.1154 0.510 1089 -0.886 1.1165
## Female -0.9519 0.521 1089 -1.974 0.0702
##
## SACSECVAL = 0.000:
## sex RESEMAVAL.trend SE df lower.CL upper.CL
## Male 0.2640 0.346 1089 -0.414 0.9424
## Female 0.0883 0.349 1089 -0.597 0.7739
##
## SACSECVAL = 0.145:
## sex RESEMAVAL.trend SE df lower.CL upper.CL
## Male 0.4126 0.444 1089 -0.459 1.2838
## Female 1.1285 0.500 1089 0.148 2.1089
##
## Confidence level used: 0.95
##
## $contrasts
## SACSECVAL = -0.145:
## contrast estimate SE df t.ratio p.value
## Male - Female 1.067 0.729 1089 1.464 0.1435
##
## SACSECVAL = 0.000:
## contrast estimate SE df t.ratio p.value
## Male - Female 0.176 0.492 1089 0.358 0.7208
##
## SACSECVAL = 0.145:
## contrast estimate SE df t.ratio p.value
## Male - Female -0.716 0.668 1089 -1.071 0.2844
To hold the categorical variable “constant”, we will not use the pairewise arguments.
emtrends(mod3, ~RESEMAVAL|sex, var="SACSECVAL", at = list(RESEMAVAL=c(eman_low, eman_mid, eman_high)))
## sex = Male:
## RESEMAVAL SACSECVAL.trend SE df lower.CL upper.CL
## -0.141 0.722 0.440 1089 -0.1418 1.586
## 0.000 0.867 0.331 1089 0.2172 1.516
## 0.141 1.011 0.481 1089 0.0675 1.954
##
## sex = Female:
## RESEMAVAL SACSECVAL.trend SE df lower.CL upper.CL
## -0.141 -0.377 0.573 1089 -1.5011 0.747
## 0.000 0.633 0.363 1089 -0.0790 1.345
## 0.141 1.643 0.443 1089 0.7736 2.513
##
## Confidence level used: 0.95
Running mediation models in R is very convenient with the psych package mediate() function. In this example,
medmod<-mediate(security_mean~RESEMAVAL+(SACSECVAL), data=data)
medmod
##
## Mediation/Moderation Analysis
## Call: mediate(y = security_mean ~ RESEMAVAL + (SACSECVAL), data = data)
##
## The DV (Y) was security_mean . The IV (X) was RESEMAVAL . The mediating variable(s) = SACSECVAL .
##
## Total effect(c) of RESEMAVAL on security_mean = 0.44 S.E. = 0.23 t = 1.92 df= 1198 with p = 0.055
## Direct effect (c') of RESEMAVAL on security_mean removing SACSECVAL = 0.25 S.E. = 0.23 t = 1.09 df= 1197 with p = 0.28
## Indirect effect (ab) of RESEMAVAL on security_mean through SACSECVAL = 0.18
## Mean bootstrapped indirect effect = 0.18 with standard error = 0.06 Lower CI = 0.07 Upper CI = 0.31
## R = 0.12 R2 = 0.01 F = 8.65 on 2 and 1197 DF p-value: 0.000187
##
## To see the longer output, specify short = FALSE in the print statement or ask for the summary
You can add more mediators to the model if you need it.
medmod2<-mediate(security_mean~V240+(RESEMAVAL)+(SACSECVAL), data=data)
medmod2
##
## Mediation/Moderation Analysis
## Call: mediate(y = security_mean ~ V240 + (RESEMAVAL) + (SACSECVAL),
## data = data)
##
## The DV (Y) was security_mean . The IV (X) was V240 . The mediating variable(s) = RESEMAVAL SACSECVAL .
##
## Total effect(c) of V240 on security_mean = 0.05 S.E. = 0.06 t = 0.8 df= 1198 with p = 0.42
## Direct effect (c') of V240 on security_mean removing RESEMAVAL SACSECVAL = 0.05 S.E. = 0.06 t = 0.74 df= 1196 with p = 0.46
## Indirect effect (ab) of V240 on security_mean through RESEMAVAL SACSECVAL = 0
## Mean bootstrapped indirect effect = 0 with standard error = 0.02 Lower CI = -0.03 Upper CI = 0.04
## R = 0.12 R2 = 0.01 F = 5.95 on 3 and 1196 DF p-value: 0.000502
##
## To see the longer output, specify short = FALSE in the print statement or ask for the summary
You can even add moderator to make a mediated moderation or moderated mediation (not very recommended as we discussed in PSYC513 this past semester…).
medmod3<-mediate(security_mean~V240*V238+(RESEMAVAL)+(SACSECVAL), data=data)
medmod3
##
## Mediation/Moderation Analysis
## Call: mediate(y = security_mean ~ V240 * V238 + (RESEMAVAL) + (SACSECVAL),
## data = data)
##
## The DV (Y) was security_mean . The IV (X) was V240 V238 V240*V238 . The mediating variable(s) = RESEMAVAL SACSECVAL .
##
## Total effect(c) of V240 on security_mean = 0.07 S.E. = 0.06 t = 1.09 df= 1196 with p = 0.28
## Direct effect (c') of V240 on security_mean removing RESEMAVAL SACSECVAL = 0.06 S.E. = 0.07 t = 0.96 df= 1194 with p = 0.34
## Indirect effect (ab) of V240 on security_mean through RESEMAVAL SACSECVAL = 0.01
## Mean bootstrapped indirect effect = 0.01 with standard error = 0.02 Lower CI = -0.02 Upper CI = 0.04
##
## Total effect(c) of V238 on security_mean = 0.08 S.E. = 0.03 t = 2.54 df= 1196 with p = 0.011
## Direct effect (c') of V238 on security_mean removing RESEMAVAL SACSECVAL = 0.08 S.E. = 0.03 t = 2.35 df= 1194 with p = 0.019
## Indirect effect (ab) of V238 on security_mean through RESEMAVAL SACSECVAL = 0.01
## Mean bootstrapped indirect effect = 0.01 with standard error = 0.01 Lower CI = -0.01 Upper CI = 0.02
##
## Total effect(c) of V240*V238 on security_mean = -0.05 S.E. = 0.06 t = -0.74 df= 1196 with p = 0.46
## Direct effect (c') of V240*V238 on security_mean removing RESEMAVAL SACSECVAL = -0.01 S.E. = 0.06 t = -0.18 df= 1194 with p = 0.86
## Indirect effect (ab) of V240*V238 on security_mean through RESEMAVAL SACSECVAL = -0.04
## Mean bootstrapped indirect effect = -0.04 with standard error = 0.01 Lower CI = -0.07 Upper CI = -0.01
## R = 0.14 R2 = 0.02 F = 4.7 on 5 and 1194 DF p-value: 0.000296
##
## To see the longer output, specify short = FALSE in the print statement or ask for the summary
If you have more than one outcome variables, we may choose to do a path model or a structural equation model (SEM) using the package lavaan. There are two steps involved: (1) stating your model and (2) run the model.
For path model, there is no latent variables (i.e., no measurement model). Remember that all formula are in the format of y~x1+x2+...+xn. In this example, our two outcome variables are the two values (secular and emancipative), and our predictors are the three subscales of Schwartz. Emancipative values may change across age, so let’s control for that. This will also help us get a overidentified model (to have fit indices to observe).
pathmodel<- '
RESEMAVAL~ V242+security_mean+caretake_mean+excite_mean
SACSECVAL~security_mean+caretake_mean+excite_mean'
fitpath<-sem(pathmodel, data= data)
summary(fitpath, fit.measures=TRUE, standardized=TRUE)
## lavaan 0.6-6 ended normally after 54 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 10
##
## Used Total
## Number of observations 1014 1200
##
## Model Test User Model:
##
## Test statistic 12.252
## Degrees of freedom 1
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 153.336
## Degrees of freedom 9
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.922
## Tucker-Lewis Index (TLI) 0.298
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) 1178.619
## Loglikelihood unrestricted model (H1) 1184.745
##
## Akaike (AIC) -2337.237
## Bayesian (BIC) -2288.021
## Sample-size adjusted Bayesian (BIC) -2319.781
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.105
## 90 Percent confidence interval - lower 0.058
## 90 Percent confidence interval - upper 0.162
## P-value RMSEA <= 0.05 0.028
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.022
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## RESEMAVAL ~
## V242 -0.001 0.000 -2.786 0.005 -0.001 -0.093
## security_mean 0.000 0.005 0.067 0.947 0.000 0.002
## caretake_mean 0.008 0.005 1.615 0.106 0.008 0.060
## excite_mean -0.005 0.004 -1.126 0.260 -0.005 -0.039
## SACSECVAL ~
## security_mean -0.002 0.005 -0.432 0.666 -0.002 -0.016
## caretake_mean 0.040 0.005 8.226 0.000 0.040 0.288
## excite_mean -0.007 0.004 -1.684 0.092 -0.007 -0.053
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .RESEMAVAL ~~
## .SACSECVAL 0.004 0.001 5.892 0.000 0.004 0.188
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .RESEMAVAL 0.019 0.001 22.517 0.000 0.019 0.982
## .SACSECVAL 0.018 0.001 22.517 0.000 0.018 0.922
Alternatively, you may also use measurement model to identify latent variables. For the purpose of this example, we assume that the Schwartz world values form a general world values measurement, and the secular and emancipative values form a materialism measurement. Then, we predict materialism from general world values
semmodel<- '
General World Values =~ security_mean+caretake_mean+excite_mean
Materialism =~ RESEMAVAL+SACSECVAL
Materialism~General World Values
'
fitsem<-sem(semmodel, data= data)
summary(fitsem, fit.measures=TRUE, standardized=TRUE)
## lavaan 0.6-6 ended normally after 82 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 11
##
## Used Total
## Number of observations 1014 1200
##
## Model Test User Model:
##
## Test statistic 68.016
## Degrees of freedom 4
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 530.176
## Degrees of freedom 10
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.877
## Tucker-Lewis Index (TLI) 0.692
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3181.570
## Loglikelihood unrestricted model (H1) -3147.562
##
## Akaike (AIC) 6385.140
## Bayesian (BIC) 6439.278
## Sample-size adjusted Bayesian (BIC) 6404.341
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.126
## 90 Percent confidence interval - lower 0.100
## 90 Percent confidence interval - upper 0.153
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.054
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## GeneralWorldValues =~
## security_mean 1.000 0.876 0.801
## caretake_mean 0.743 0.090 8.264 0.000 0.651 0.641
## excite_mean 0.389 0.056 6.962 0.000 0.341 0.310
## Materialism =~
## RESEMAVAL 1.000 0.030 0.218
## SACSECVAL 4.537 3.455 1.313 0.189 0.138 0.976
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Materialism ~
## GeneralWrldVls 0.008 0.006 1.277 0.202 0.217 0.217
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .security_mean 0.429 0.091 4.721 0.000 0.429 0.359
## .caretake_mean 0.610 0.056 10.840 0.000 0.610 0.590
## .excite_mean 1.093 0.051 21.454 0.000 1.093 0.904
## .RESEMAVAL 0.018 0.001 17.203 0.000 0.018 0.952
## .SACSECVAL 0.001 0.014 0.068 0.946 0.001 0.048
## GeneralWrldVls 0.768 0.102 7.537 0.000 1.000 1.000
## .Materialism 0.001 0.001 1.304 0.192 0.953 0.953
Running a logistic regression is similar to linear regression except that you use glm() instead of lm() function. Assuming that we are predicting whether people would like to have neighbors from another religion (V41), with 1 = not wanting to have different religion neighbor and 2 = would not mind. We are predicting from sex (V240) and secular values (SACSECVAL).
data$religneighbor[data$V41==1]<-1 #not want
data$religneighbor[data$V41==2]<-0 #ok
logimod<-glm(religneighbor~factor(V240)+SACSECVAL, data, family = "binomial")
summary(logimod)
##
## Call:
## glm(formula = religneighbor ~ factor(V240) + SACSECVAL, family = "binomial",
## data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3391 -1.0690 -0.9199 1.2569 1.6927
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.35356 0.08475 -4.172 3.02e-05 ***
## factor(V240)2 0.16193 0.11939 1.356 0.175
## SACSECVAL -1.83507 0.41908 -4.379 1.19e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1594.3 on 1164 degrees of freedom
## Residual deviance: 1572.5 on 1162 degrees of freedom
## (35 observations deleted due to missingness)
## AIC: 1578.5
##
## Number of Fisher Scoring iterations: 4