R Summer Workshop - Week 4 Some Linear Models

License

All weekly notes and codes are licensed under CC-By-SA 4.0 unless otherwise stated. For other resources redirected from this page, please refer to the authorsâ website for licensing information.

R Markdown

Refer to Week 3 notes for an introduction of R Markdown. You may download the codes for this html document or read the notes on this site.

Packages

library(Hmisc)

## Warning: package 'Hmisc' was built under R version 3.6.3

## Warning: package 'survival' was built under R version 3.6.3

library(dplyr)
library(psych) #for mediation model
library(gt) #for drawing tables

## Warning: package 'gt' was built under R version 3.6.3

library(gtsummary) #for drawing tables
library(ggplot2) #for plotting graphs
library(emmeans) #for simple slopes
library(lavaan) #for path/SEM

## Warning: package 'lavaan' was built under R version 3.6.3

Dataset

We will continue to use the World Value Survey - Wave 6 Algeria data for examples in this week. Last week, we discussed how to recode values into missing for a specific variable. However, the world value survey data have a uniform way of setting missing data, that is, any values in the raw variables that are negative are missing. In this case, we can set NA for all variables in the beginning.

data <- read.csv("https://raw.githubusercontent.com/manyu26/PsychLearnR/master/psychlearnr_data.csv")
data[data<0]<-NA

Preparing variables for examples in this week notes

Linear models deal with both continuous variables and categorical variables. Categorical variables we have used so far include

sex (V240) - 1 = Male; 2 = Female
relationship status (V57) - married (1), living together as married (2), divorced (3), separated (4), widowed (5), and single (6)
Perceived social class (V238) - 1 (upper), 2 (Upper Middle Class), 3 (Lower Middle Class), 4 (Working Class), 5 (Lower Class)

The World Value Survey has limited continuous variables, but in Week 2 notes, we have created subscales score for the schwartz scale. (Note that they do not have good reliability, but we are using them to demonstrate our examples.)

Excitement (V70, V71, V73, V75, V76) - important to be creative, rich, successful, adventurous
Caretake (V74B, V78, V79) - important to help people, environment and maintain tradition
Security and conformity (V72, V77) - important to feel secure and behave properly

The codes are repeated here.

caretake <- dplyr::select(data, V74B,V78,V79)
caretake[sapply(caretake, function(x) x %in% "-2")] <- NA
data$caretake_mean<-rowMeans(caretake)
security<-dplyr::select(data,V72,V77)
security[sapply(security,function(x) x %in% "-2")] <-NA
data$security_mean<-rowMeans(security)
excite <- dplyr::select(data, V70, V71, V73, V75, V76)
excite[sapply(excite, function(x) x %in% "-2")] <- NA
data$excite_mean<-rowMeans(excite)

Additionally, the World Value Survey calculated two value scores

Welzel Overall Secular Values (SACSECVAL) and
Welzel emancipative values (RESEMAVAL).

Correlations

Correlations are often performed before analyzing a linear model. Feel free to skip to the next section if you are familiar with correlation analysis in R.

There are various ways to calculate correlations (pearson, spearman’s, kendall’s). We will only cover one way here (my favorite method). This method allows you to get a correlation matrix rather than pairwise correlations only. It is a two-step method. In the first step, you select the variables you want to correlate. In the second step, you run the correlation using rcorr() within the Hmisc package.

schwartz<-dplyr::select(data,security_mean, caretake_mean, excite_mean, SACSECVAL, RESEMAVAL)
rcorr(as.matrix(schwartz), type="pearson")

##               security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean          1.00          0.52        0.31      0.12      0.06
## caretake_mean          0.52          1.00        0.15      0.27      0.10
## excite_mean            0.31          0.15        1.00     -0.02     -0.05
## SACSECVAL              0.12          0.27       -0.02      1.00      0.22
## RESEMAVAL              0.06          0.10       -0.05      0.22      1.00
## 
## n
##               security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean          1123          1085        1053      1103      1113
## caretake_mean          1085          1101        1040      1083      1094
## excite_mean            1053          1040        1059      1041      1052
## SACSECVAL              1103          1083        1041      1165      1151
## RESEMAVAL              1113          1094        1052      1151      1173
## 
## P
##               security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean               0.0000        0.0000      0.0000    0.0640   
## caretake_mean 0.0000                      0.0000      0.0000    0.0010   
## excite_mean   0.0000        0.0000                    0.5178    0.0831   
## SACSECVAL     0.0000        0.0000        0.5178                0.0000   
## RESEMAVAL     0.0640        0.0010        0.0831      0.0000

The default of rcorr() is pairwise deletion. To do listwide deletion, you have to remove your incomplete data first.

schwartz<-dplyr::select(data,security_mean, caretake_mean, excite_mean, SACSECVAL, RESEMAVAL)
schwartz_complete<-schwartz[complete.cases(schwartz),]
cor.matrix<-rcorr(as.matrix(schwartz_complete), type="pearson")
cor.matrix

##               security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean          1.00          0.51        0.29      0.12      0.03
## caretake_mean          0.51          1.00        0.13      0.27      0.08
## excite_mean            0.29          0.13        1.00     -0.02     -0.06
## SACSECVAL              0.12          0.27       -0.02      1.00      0.21
## RESEMAVAL              0.03          0.08       -0.06      0.21      1.00
## 
## n= 1014 
## 
## 
## P
##               security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean               0.0000        0.0000      0.0002    0.4172   
## caretake_mean 0.0000                      0.0000      0.0000    0.0124   
## excite_mean   0.0000        0.0000                    0.5044    0.0726   
## SACSECVAL     0.0002        0.0000        0.5044                0.0000   
## RESEMAVAL     0.4172        0.0124        0.0726      0.0000

To make it into a beautiful table, you first have to understand your output. The rcorr() output contains three components, r, n, and p. In the correlation table we just obtained, N was the same across all pairs (because we used listwise deletion), so we only need to include r and p in our main table. To extract these components, simply use *\(* (e.g., ```cor.matrix\)r```)

cor_df_r<-as.data.frame.matrix(round(cor.matrix$r,2)) #round to 2 d.p.
cor_df_r

##               security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean          1.00          0.51        0.29      0.12      0.03
## caretake_mean          0.51          1.00        0.13      0.27      0.08
## excite_mean            0.29          0.13        1.00     -0.02     -0.06
## SACSECVAL              0.12          0.27       -0.02      1.00      0.21
## RESEMAVAL              0.03          0.08       -0.06      0.21      1.00

The codes for p-values are a little longer since we need to make adjustments for APA reporting style.

cor_df_p<-as.data.frame.matrix(round(cor.matrix$P,3)) #round to 3 d.p.
cor_df_p[cor_df_p == 0] <- "< .001"
cor_df_p[is.na(cor_df_p)] <- "-"
cor_df_p

##               security_mean caretake_mean excite_mean SACSECVAL RESEMAVAL
## security_mean             -        < .001      < .001    < .001     0.417
## caretake_mean        < .001             -      < .001    < .001     0.012
## excite_mean          < .001        < .001           -     0.504     0.073
## SACSECVAL            < .001        < .001       0.504         -    < .001
## RESEMAVAL             0.417         0.012       0.073    < .001         -

Now that we have a data frame for our output, we can use different functions to make it a table (e.g., gt mentioned in Week 3 or kable). There are many ways you can do it, but note that it is more complicated if you want to use asterisks instead of reporting exact p-values.

Linear Regression

Simple Linear Regression

To conduct a simple linear regression testing how emancipative values (RESEMAVAL) predict secular values (SACSECVAL), we place the outcome variable on the left-hand side of the formula, and the predicting variable on the right-hand side of the formula. Similar to ANOVA, we use summary() and plot() to obtain summary statistics and relevant diagnostic plots.

reg<- lm(SACSECVAL~RESEMAVAL, data=data)
summary(reg)

## 
## Call:
## lm(formula = SACSECVAL ~ RESEMAVAL, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.38898 -0.09001  0.00012  0.09121  0.54104 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.28453    0.01047  27.185  < 2e-16 ***
## RESEMAVAL    0.22469    0.02953   7.609 5.74e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1407 on 1149 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.04797,    Adjusted R-squared:  0.04714 
## F-statistic: 57.89 on 1 and 1149 DF,  p-value: 5.74e-14

plot(reg)

To transform it into a table, gtsummary package seems to be currently the easiest one. It automatically identify linear, logistic, and cox regression models.

simpletable<- reg %>% 
  tbl_regression(label = list(RESEMAVAL="Emancipative Values")) 
#or
#tbl_regression(reg, label = list(RESEMAVAL="Emancipative Values")) 
simpletable

Characteristic	Beta	95% CI¹	p-value
Emancipative Values	0.22	0.17, 0.28	<0.001
¹ CI = Confidence Interval

Note that it doesn’t print standard error. So it may be better to get standardized coefficient, rahter than unstandardized.

regstd<- lm(scale(SACSECVAL)~scale(RESEMAVAL), data=data)
summary(regstd)

## 
## Call:
## lm(formula = scale(SACSECVAL) ~ scale(RESEMAVAL), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6797 -0.6201  0.0008  0.6284  3.7273 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.01179    0.02857   0.413     0.68    
## scale(RESEMAVAL)  0.21822    0.02868   7.609 5.74e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9692 on 1149 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.04797,    Adjusted R-squared:  0.04714 
## F-statistic: 57.89 on 1 and 1149 DF,  p-value: 5.74e-14

simpletable<-tbl_regression(regstd, label = list("scale(RESEMAVAL)"="Emancipative Values"))

Multiple Linear Regression

Multiple linear regression is simply adding more terms to your SLR formular using “+” signs. E.g., to predict secular values from emancipative values and excitement subscale of schwartz value scale:

regmult<- lm(SACSECVAL~RESEMAVAL+excite_mean, data=data) 
summary(regmult)

## 
## Call:
## lm(formula = SACSECVAL ~ RESEMAVAL + excite_mean, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.38307 -0.08928  0.00138  0.09285  0.46561 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.2847387  0.0159289  17.876  < 2e-16 ***
## RESEMAVAL    0.2219528  0.0311529   7.125 1.95e-12 ***
## excite_mean -0.0009693  0.0039379  -0.246    0.806    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1389 on 1033 degrees of freedom
##   (164 observations deleted due to missingness)
## Multiple R-squared:  0.04721,    Adjusted R-squared:  0.04536 
## F-statistic: 25.59 on 2 and 1033 DF,  p-value: 1.42e-11

library(car)

## Warning: package 'car' was built under R version 3.6.1

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

## The following object is masked from 'package:dplyr':
## 
##     recode

vif(regmult)

##   RESEMAVAL excite_mean 
##    1.003214    1.003214

For standardized version,

regmultstd<- lm(scale(SACSECVAL)~scale(RESEMAVAL)+scale(excite_mean), data=data)
summary(regmultstd)

## 
## Call:
## lm(formula = scale(SACSECVAL) ~ scale(RESEMAVAL) + scale(excite_mean), 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6390 -0.6150  0.0095  0.6397  3.2076 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -0.011434   0.029743  -0.384    0.701    
## scale(RESEMAVAL)    0.215557   0.030255   7.125 1.95e-12 ***
## scale(excite_mean) -0.007377   0.029969  -0.246    0.806    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9571 on 1033 degrees of freedom
##   (164 observations deleted due to missingness)
## Multiple R-squared:  0.04721,    Adjusted R-squared:  0.04536 
## F-statistic: 25.59 on 2 and 1033 DF,  p-value: 1.42e-11

To create table, use the same method as SLR.

multitable<- regmultstd %>% 
  tbl_regression(label = list("scale(RESEMAVAL)"="Emancipative Values", "scale(excite_mean)"="Value of Excitement")) 
multitable

Characteristic	Beta	95% CI¹	p-value
Emancipative Values	0.22	0.16, 0.27	<0.001
Value of Excitement	-0.01	-0.07, 0.05	0.8
¹ CI = Confidence Interval

You can even combine them easily with gtsummary.

mergetable<-
  tbl_merge(
    tbls = list(simpletable, multitable),
    tab_spanner = c("**Step 1 Model**", "**Step 2 Model**")) #double asterisks are to bold texts
mergetable

Characteristic	Step 1 Model			Step 2 Model
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
Emancipative Values	0.22	0.16, 0.27	<0.001	0.22	0.16, 0.27	<0.001
Value of Excitement				-0.01	-0.07, 0.05	0.8
¹ CI = Confidence Interval

If you want to report results of a certain row of your table in text, gtsummary has a helpful function inline_text() that generates the text representation of your results. Simply indicate which row you want to report. (You can also use the merged table version but the code will become unnecessarily compplicate.)

inline_text(multitable, variable = "scale(RESEMAVAL)")

## [1] "0.22 (95% CI 0.16, 0.27; p<0.001)"

Practice: Build a multiple linear regression by picking variables you want to examine (e.g., sex and excitement values predicting secular values, etc.). See if you can generate a table with the gtsummary package.

Hierarchical linear regression

To compare models or conduct hierarchical linear regression, simply run each model and then use anova() to obtain F-test results on the differences across models. However, make sure your sample sizes is the same across all linear regressions (i.e., removing missing data to achieve listwise deletion across the models)

demo<-dplyr::select(data, SACSECVAL, V240,RESEMAVAL,excite_mean)
demo<-demo[complete.cases(demo),]
model1<-lm(SACSECVAL~V240, data=demo)
model2<-lm(SACSECVAL~V240+RESEMAVAL, data=demo)
model3<-lm(SACSECVAL~V240+RESEMAVAL+excite_mean, data=demo)
anova(model1, model2, model3)

## Analysis of Variance Table
## 
## Model 1: SACSECVAL ~ V240
## Model 2: SACSECVAL ~ V240 + RESEMAVAL
## Model 3: SACSECVAL ~ V240 + RESEMAVAL + excite_mean
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1   1034 20.918                                   
## 2   1033 19.864  1   1.05442 54.7837 2.786e-13 ***
## 3   1032 19.863  1   0.00086  0.0445    0.8329    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Moderation

Moderation can be conducted using the lm() function plus simple slope analysis with the emmeans package. We covered categorical x categorical simple slope test using emmeans in Week 3. So we will cover the other situations here.

When conducting moderation using linear regression, zero-centering is a common practice. In R, the function scale() allows users to center and/or standardized. Since we only want to center without standardized, we will say “true” to center, but “false” to scale in the arguments.

describe(data$excite_mean)

##    vars    n mean  sd median trimmed  mad min max range skew kurtosis   se
## X1    1 1059 2.77 1.1    2.6    2.72 1.19   1   6     5 0.45    -0.18 0.03

data$excite_mean<-scale(data$excite_mean, center=T, scale=F)
describe(data$excite_mean) #check that mean is zero.

##    vars    n mean  sd median trimmed  mad   min  max range skew kurtosis   se
## X1    1 1059    0 1.1  -0.17   -0.05 1.19 -1.77 3.23     5 0.45    -0.18 0.03

data$security_mean<-scale(data$security_mean, center=T, scale=F)
describe(data$security_mean)

##    vars    n mean   sd median trimmed  mad   min  max range skew kurtosis   se
## X1    1 1123    0 1.11  -0.26   -0.12 1.48 -1.26 3.74     5 0.85     0.52 0.03

data$SACSECVAL<-scale(data$SACSECVAL, center=T, scale=F)
describe(data$SACSECVAL)

##    vars    n mean   sd median trimmed  mad   min  max range skew kurtosis se
## X1    1 1165    0 0.15      0       0 0.14 -0.36 0.63  0.98 0.05     0.14  0

data$RESEMAVAL<-scale(data$RESEMAVAL, center=T, scale=F)
describe(data$RESEMAVAL)

##    vars    n mean   sd median trimmed  mad   min  max range skew kurtosis se
## X1    1 1173    0 0.14   0.01       0 0.15 -0.32 0.46  0.79 0.12    -0.32  0

Categorical x Continuous predictors

In this example, we are predicting excitement value (i.e., perceived importance of having stimulation/excitement in life) from security value (i.e., perceived importance of behaving properly and living in security). This example also tests whether perceived social status moderates the relationship (e.g., would people with high social status show a stronger relationship between excitement and secular values than people with low social status?)

mod1<-lm(excite_mean~ security_mean*factor(V238), data)
summary(mod1)

## 
## Call:
## lm(formula = excite_mean ~ security_mean * factor(V238), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1237 -0.7511 -0.1013  0.6258  3.0137 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                  -0.3589     0.2624  -1.368   0.1717  
## security_mean                 0.6504     0.2824   2.303   0.0215 *
## factor(V238)2                 0.3752     0.2701   1.389   0.1651  
## factor(V238)3                 0.3703     0.2682   1.381   0.1677  
## factor(V238)4                 0.4344     0.2704   1.606   0.1086  
## factor(V238)5                 0.4145     0.2856   1.452   0.1469  
## security_mean:factor(V238)2  -0.4626     0.2887  -1.602   0.1095  
## security_mean:factor(V238)3  -0.3757     0.2869  -1.309   0.1907  
## security_mean:factor(V238)4  -0.5106     0.2893  -1.765   0.0779 .
## security_mean:factor(V238)5  -0.1275     0.2978  -0.428   0.6688  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.037 on 964 degrees of freedom
##   (226 observations deleted due to missingness)
## Multiple R-squared:  0.09098,    Adjusted R-squared:  0.0825 
## F-statistic: 10.72 on 9 and 964 DF,  p-value: 5.645e-16

Interaction/Moderation results were not significant, but what if it was? We need to run a simple slope analysis to observe the pattern. I outline the methods here, but for detailed explanation, Decomposing, Probing, and Plotting Interactions in R by IDRE at UCLA is a great guide.

To obtain simple slopes predicting excitement from security values for each level of social status (i.e., each level of a categorical variable), we use the emtrends() function in the emmeans package.

emtrends(mod1, ~V238, var="security_mean")

##  V238 security_mean.trend     SE  df lower.CL upper.CL
##     1               0.650 0.2824 964   0.0962    1.205
##     2               0.188 0.0600 964   0.0700    0.306
##     3               0.275 0.0505 964   0.1756    0.374
##     4               0.140 0.0629 964   0.0164    0.263
##     5               0.523 0.0946 964   0.3373    0.709
## 
## Confidence level used: 0.95

To test differences in slopes (i.e., whether upper class differs from working class in the predictiveness of excitement from security values - would people in upper class have a stronger/weaker relationship between security and excitement values?), we use pairewise().

emtrends(mod1, pairwise~V238, var="security_mean")

## $emtrends
##  V238 security_mean.trend     SE  df lower.CL upper.CL
##     1               0.650 0.2824 964   0.0962    1.205
##     2               0.188 0.0600 964   0.0700    0.306
##     3               0.275 0.0505 964   0.1756    0.374
##     4               0.140 0.0629 964   0.0164    0.263
##     5               0.523 0.0946 964   0.3373    0.709
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast estimate     SE  df t.ratio p.value
##  1 - 2      0.4626 0.2887 964  1.602  0.4964 
##  1 - 3      0.3757 0.2869 964  1.309  0.6854 
##  1 - 4      0.5106 0.2893 964  1.765  0.3949 
##  1 - 5      0.1275 0.2978 964  0.428  0.9930 
##  2 - 3     -0.0869 0.0785 964 -1.108  0.8026 
##  2 - 4      0.0481 0.0869 964  0.553  0.9816 
##  2 - 5     -0.3351 0.1120 964 -2.991  0.0238 
##  3 - 4      0.1350 0.0806 964  1.674  0.4508 
##  3 - 5     -0.2482 0.1072 964 -2.315  0.1409 
##  4 - 5     -0.3832 0.1136 964 -3.374  0.0069 
## 
## P value adjustment: tukey method for comparing a family of 5 estimates

To visualize the moderation, you may use emmip() function and include the variables you want to use as x-axis and grouping varible (t) using the format of t~x. at=list(...) is to define the range for x-axis and t (optional). (Note: ggplot can achieve the same purpose as well, but emmip is easier since it directly takes the linear model you ran and apply it to the chart.)

emmip(mod1, V238~security_mean, at=list(security_mean=seq(1,6,by=.5)),
      xlab = "Security Value",
      ylab = "Excitement Value",
      tlab = "Perceived Social Status")

The above example is to treat the categorical variable as a moderator, and continuous variable as explanatory variable. If you want to flip the role, it’s more complicated but doable. First, we are no longer examining the simple slopes (because it doesn’t make sense to interpret the relationship between categorical variable and continuous variable as slopes). Instead, we are examining the simple effects of categorical predictor on continuous outcome by different levels of a continuous moderator. Therefore, we will use the emmeans() function instead of emtrends().

Use at=list(...) to indicate what levels of the continuous moderator you want to test. Traditionally, we use -1SD, mean, and +1SD of the variable.

(Note that because I didn’t take away missing data before, I have to remove them now using na.rm=TRUE. mean() and sd() do not take in missing values. Our regression analysis also excludes missing cases. So it is actually better to remove missingness upfront.)

security_low<-mean(data$security_mean, na.rm=T)-sd(data$security_mean, na.rm=T)
security_mid<-mean(data$security_mean, na.rm=T)
security_high<-mean(data$security_mean, na.rm=T)+sd(data$security_mean, na.rm=T)
security_lmh<-c(security_low, security_mid, security_high)

contmod<-emmeans(mod1, ~ security_mean*V238, at=list(security_mean=security_lmh))
contrast(contmod, "pairwise", by="security_mean")

## security_mean = -1.11:
##  contrast estimate     SE  df t.ratio p.value
##  1 - 2    -0.88810 0.2667 964 -3.330  0.0080 
##  1 - 3    -0.78689 0.2637 964 -2.984  0.0243 
##  1 - 4    -1.00057 0.2705 964 -3.699  0.0021 
##  1 - 5    -0.55585 0.2996 964 -1.856  0.3423 
##  2 - 3     0.10121 0.1211 964  0.836  0.9194 
##  2 - 4    -0.11247 0.1353 964 -0.831  0.9209 
##  2 - 5     0.33225 0.1867 964  1.780  0.3861 
##  3 - 4    -0.21368 0.1292 964 -1.654  0.4632 
##  3 - 5     0.23104 0.1823 964  1.267  0.7115 
##  4 - 5     0.44472 0.1921 964  2.315  0.1408 
## 
## security_mean =  0.00:
##  contrast estimate     SE  df t.ratio p.value
##  1 - 2    -0.37519 0.2701 964 -1.389  0.6347 
##  1 - 3    -0.37033 0.2682 964 -1.381  0.6403 
##  1 - 4    -0.43436 0.2704 964 -1.606  0.4938 
##  1 - 5    -0.41452 0.2856 964 -1.452  0.5943 
##  2 - 3     0.00486 0.0849 964  0.057  1.0000 
##  2 - 4    -0.05917 0.0916 964 -0.646  0.9674 
##  2 - 5    -0.03933 0.1296 964 -0.303  0.9982 
##  3 - 4    -0.06403 0.0861 964 -0.744  0.9461 
##  3 - 5    -0.04419 0.1257 964 -0.351  0.9967 
##  4 - 5     0.01984 0.1304 964  0.152  0.9999 
## 
## security_mean =  1.11:
##  contrast estimate     SE  df t.ratio p.value
##  1 - 2     0.13772 0.5289 964  0.260  0.9990 
##  1 - 3     0.04623 0.5261 964  0.088  1.0000 
##  1 - 4     0.13185 0.5282 964  0.250  0.9991 
##  1 - 5    -0.27319 0.5399 964 -0.506  0.9868 
##  2 - 3    -0.09148 0.1220 964 -0.750  0.9446 
##  2 - 4    -0.00587 0.1307 964 -0.045  1.0000 
##  2 - 5    -0.41090 0.1721 964 -2.388  0.1193 
##  3 - 4     0.08561 0.1188 964  0.720  0.9518 
##  3 - 5    -0.31942 0.1632 964 -1.957  0.2883 
##  4 - 5    -0.40503 0.1698 964 -2.385  0.1201 
## 
## P value adjustment: tukey method for comparing a family of 5 estimates

To visualize,

p<- emmip(mod1, security_mean~V238, at=list(security_mean=security_lmh),
      xlab = "Perceived Social Status",
      ylab = "Excitement Value",
      tlab = "Security Value",
      style="factor"
      ,CI=TRUE) #you can add CI to your plot as well.
p

emmip() generates line graph. To show bar graph (since our x-axis is categorical), use ggplot()

data$security_group[data$security_mean > security_high]<- "High"
data$security_group[data$security_mean < security_high & data$security_mean>= security_low]<-"Mid"
data$security_group[data$security_mean< security_low]<-"Low"
count(data, security_group)

##   security_group   n
## 1           High 200
## 2            Low 268
## 3            Mid 655
## 4           <NA>  77

filter(data, !is.na(data$security_group)) %>% #filter NA so it doesn't appear on graph
  ggplot(aes(x=V238, y=excite_mean, fill=security_group))+
  geom_bar(stat="identity",position="dodge")

## Warning: Removed 149 rows containing missing values (geom_bar).

#stat="identity" is to indicate that we want y=axis to be predicted values ratehr than simply a count of number of participants in each x category.
#dodge means side-by-side presentation instead of stacked bars.

Continuous X Continuous predictors

In this example, we are predicting security value (i.e., perceived importance of behaving properly/having security in life) from secular value. This example also tests whether emancipative values moderates the relationship.

mod2<-lm(security_mean~SACSECVAL*RESEMAVAL, data)
summary(mod2)

## 
## Call:
## lm(formula = security_mean ~ SACSECVAL * RESEMAVAL, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7002 -0.9457 -0.1963  0.7597  3.9535 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -0.01938    0.03370  -0.575 0.565436    
## SACSECVAL            0.81578    0.24049   3.392 0.000718 ***
## RESEMAVAL            0.21238    0.24072   0.882 0.377828    
## SACSECVAL:RESEMAVAL  3.61411    1.67772   2.154 0.031444 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.094 on 1093 degrees of freedom
##   (103 observations deleted due to missingness)
## Multiple R-squared:  0.01852,    Adjusted R-squared:  0.01583 
## F-statistic: 6.874 on 3 and 1093 DF,  p-value: 0.0001376

We get a significant interaction! Let’s move on to simple slope analysis. Since we are dealing with continuous moderator, we need to identify levels. Again, traditionally, we use -1SD, Mean, and +1SD.

eman_low<-mean(data$RESEMAVAL, na.rm=T)-sd(data$RESEMAVAL, na.rm=T)
eman_mid<-mean(data$RESEMAVAL, na.rm=T)
eman_high<-mean(data$RESEMAVAL, na.rm=T)+sd(data$RESEMAVAL, na.rm=T)

Since we are doing simple slope not simple effect, we use emtrends() to obtain separate slopes for each level of emancipative values we define (i.e., -1SD, mean, +1SD).

emtrends(mod2, ~RESEMAVAL, var="SACSECVAL", at = list(RESEMAVAL=c(eman_low, eman_mid, eman_high)))

##  RESEMAVAL SACSECVAL.trend    SE   df lower.CL upper.CL
##     -0.141           0.306 0.349 1093   -0.378    0.991
##      0.000           0.816 0.240 1093    0.344    1.288
##      0.141           1.325 0.325 1093    0.687    1.963
## 
## Confidence level used: 0.95

For people with high (+1SD) emancipative value, they show the strongest relationship between secular values and security values. In other words, for people who really desire freedom, secular values negatively and strongly predict their perceived importance of security (remember that security values are reverse-coded such that high score reflect low value.)

To visualize our results, we use a similar code as the continuous x categorical situation using the format of emmip(model_name, moderator~predictor, at = list(...))

emmip(mod2, RESEMAVAL~SACSECVAL
      , at=list(SACSECVAL=seq(1,6,by=.5),RESEMAVAL= c(eman_low, eman_mid, eman_high))
      , xlab="Secular values"
      , ylab="Security values"
      , tlab="Emancipative values"
      ,CI=T #optional
      )

Three-way interaction

Let’s level up and do a three-factor interaction.

data$sex<-factor(data$V240,               #optional - for plots only
                 levels=c(1,2)
                ,labels=c("Male","Female"))
mod3<-lm(security_mean~SACSECVAL*RESEMAVAL*sex, data)
summary(mod3)

## 
## Call:
## lm(formula = security_mean ~ SACSECVAL * RESEMAVAL * sex, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6983 -0.9335 -0.2216  0.7402  3.8965 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                   -0.04282    0.04879  -0.878  0.38036   
## SACSECVAL                      0.86663    0.33100   2.618  0.00896 **
## RESEMAVAL                      0.26404    0.34570   0.764  0.44517   
## sexFemale                      0.04785    0.06895   0.694  0.48783   
## SACSECVAL:RESEMAVAL            1.02362    2.27666   0.450  0.65308   
## SACSECVAL:sexFemale           -0.23353    0.49121  -0.475  0.63458   
## RESEMAVAL:sexFemale           -0.17572    0.49151  -0.358  0.72078   
## SACSECVAL:RESEMAVAL:sexFemale  6.14254    3.42832   1.792  0.07346 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.094 on 1089 degrees of freedom
##   (103 observations deleted due to missingness)
## Multiple R-squared:  0.02263,    Adjusted R-squared:  0.01635 
## F-statistic: 3.602 on 7 and 1089 DF,  p-value: 0.0007641

Let’s visualize first. The formular is formatted as emmip(model_name, legend/fill~X | facet)

emmip(mod3, RESEMAVAL~SACSECVAL | sex
      , at = list(SACSECVAL=seq(1,6,by=1),RESEMAVAL= c(eman_low, eman_mid, eman_high))
      , xlab="Secular values"
      , ylab="Security values"
      , tlab="Emancipative values"
      )

To examine simple slope/contrasts, we request simple slopes for each two-way relationship by holding the third variable “constant”. For continuous variables, we may hold them “constant” by setting them at -1SD, mean, and +1SD (examples below).

emtrends(mod3, pairwise~sex|RESEMAVAL, var="SACSECVAL"
        , at=list(RESEMAVAL= c(eman_low, eman_mid, eman_high))
         )

## $emtrends
## RESEMAVAL = -0.141:
##  sex    SACSECVAL.trend    SE   df lower.CL upper.CL
##  Male             0.722 0.440 1089  -0.1418    1.586
##  Female          -0.377 0.573 1089  -1.5011    0.747
## 
## RESEMAVAL =  0.000:
##  sex    SACSECVAL.trend    SE   df lower.CL upper.CL
##  Male             0.867 0.331 1089   0.2172    1.516
##  Female           0.633 0.363 1089  -0.0790    1.345
## 
## RESEMAVAL =  0.141:
##  sex    SACSECVAL.trend    SE   df lower.CL upper.CL
##  Male             1.011 0.481 1089   0.0675    1.954
##  Female           1.643 0.443 1089   0.7736    2.513
## 
## Confidence level used: 0.95 
## 
## $contrasts
## RESEMAVAL = -0.141:
##  contrast      estimate    SE   df t.ratio p.value
##  Male - Female    1.099 0.723 1089  1.522  0.1284 
## 
## RESEMAVAL =  0.000:
##  contrast      estimate    SE   df t.ratio p.value
##  Male - Female    0.234 0.491 1089  0.475  0.6346 
## 
## RESEMAVAL =  0.141:
##  contrast      estimate    SE   df t.ratio p.value
##  Male - Female   -0.632 0.654 1089 -0.967  0.3337

To look at the relationship between the other continuous variable and the outcome variable by sex, simply change place:

secu_low<-mean(data$SACSECVAL, na.rm=T)-sd(data$SACSECVAL, na.rm=T)
secu_mid<-mean(data$SACSECVAL, na.rm=T)
secu_high<-mean(data$SACSECVAL, na.rm=T)+sd(data$SACSECVAL, na.rm=T)

emtrends(mod3, pairwise~sex|SACSECVAL , var="RESEMAVAL"
        , at=list(SACSECVAL= c(secu_low, secu_mid, secu_high))
         )

## $emtrends
## SACSECVAL = -0.145:
##  sex    RESEMAVAL.trend    SE   df lower.CL upper.CL
##  Male            0.1154 0.510 1089   -0.886   1.1165
##  Female         -0.9519 0.521 1089   -1.974   0.0702
## 
## SACSECVAL =  0.000:
##  sex    RESEMAVAL.trend    SE   df lower.CL upper.CL
##  Male            0.2640 0.346 1089   -0.414   0.9424
##  Female          0.0883 0.349 1089   -0.597   0.7739
## 
## SACSECVAL =  0.145:
##  sex    RESEMAVAL.trend    SE   df lower.CL upper.CL
##  Male            0.4126 0.444 1089   -0.459   1.2838
##  Female          1.1285 0.500 1089    0.148   2.1089
## 
## Confidence level used: 0.95 
## 
## $contrasts
## SACSECVAL = -0.145:
##  contrast      estimate    SE   df t.ratio p.value
##  Male - Female    1.067 0.729 1089  1.464  0.1435 
## 
## SACSECVAL =  0.000:
##  contrast      estimate    SE   df t.ratio p.value
##  Male - Female    0.176 0.492 1089  0.358  0.7208 
## 
## SACSECVAL =  0.145:
##  contrast      estimate    SE   df t.ratio p.value
##  Male - Female   -0.716 0.668 1089 -1.071  0.2844

To hold the categorical variable “constant”, we will not use the pairewise arguments.

emtrends(mod3, ~RESEMAVAL|sex, var="SACSECVAL", at = list(RESEMAVAL=c(eman_low, eman_mid, eman_high)))

## sex = Male:
##  RESEMAVAL SACSECVAL.trend    SE   df lower.CL upper.CL
##     -0.141           0.722 0.440 1089  -0.1418    1.586
##      0.000           0.867 0.331 1089   0.2172    1.516
##      0.141           1.011 0.481 1089   0.0675    1.954
## 
## sex = Female:
##  RESEMAVAL SACSECVAL.trend    SE   df lower.CL upper.CL
##     -0.141          -0.377 0.573 1089  -1.5011    0.747
##      0.000           0.633 0.363 1089  -0.0790    1.345
##      0.141           1.643 0.443 1089   0.7736    2.513
## 
## Confidence level used: 0.95

Mediation model

Running mediation models in R is very convenient with the psych package mediate() function. In this example,

medmod<-mediate(security_mean~RESEMAVAL+(SACSECVAL), data=data)

medmod

## 
## Mediation/Moderation Analysis 
## Call: mediate(y = security_mean ~ RESEMAVAL + (SACSECVAL), data = data)
## 
## The DV (Y) was  security_mean . The IV (X) was  RESEMAVAL . The mediating variable(s) =  SACSECVAL .
## 
## Total effect(c) of  RESEMAVAL  on  security_mean  =  0.44   S.E. =  0.23  t  =  1.92  df=  1198   with p =  0.055
## Direct effect (c') of  RESEMAVAL  on  security_mean  removing  SACSECVAL  =  0.25   S.E. =  0.23  t  =  1.09  df=  1197   with p =  0.28
## Indirect effect (ab) of  RESEMAVAL  on  security_mean  through  SACSECVAL   =  0.18 
## Mean bootstrapped indirect effect =  0.18  with standard error =  0.06  Lower CI =  0.07    Upper CI =  0.31
## R = 0.12 R2 = 0.01   F = 8.65 on 2 and 1197 DF   p-value:  0.000187 
## 
##  To see the longer output, specify short = FALSE in the print statement or ask for the summary

You can add more mediators to the model if you need it.

medmod2<-mediate(security_mean~V240+(RESEMAVAL)+(SACSECVAL), data=data)

medmod2

## 
## Mediation/Moderation Analysis 
## Call: mediate(y = security_mean ~ V240 + (RESEMAVAL) + (SACSECVAL), 
##     data = data)
## 
## The DV (Y) was  security_mean . The IV (X) was  V240 . The mediating variable(s) =  RESEMAVAL SACSECVAL .
## 
## Total effect(c) of  V240  on  security_mean  =  0.05   S.E. =  0.06  t  =  0.8  df=  1198   with p =  0.42
## Direct effect (c') of  V240  on  security_mean  removing  RESEMAVAL SACSECVAL  =  0.05   S.E. =  0.06  t  =  0.74  df=  1196   with p =  0.46
## Indirect effect (ab) of  V240  on  security_mean  through  RESEMAVAL SACSECVAL   =  0 
## Mean bootstrapped indirect effect =  0  with standard error =  0.02  Lower CI =  -0.03    Upper CI =  0.04
## R = 0.12 R2 = 0.01   F = 5.95 on 3 and 1196 DF   p-value:  0.000502 
## 
##  To see the longer output, specify short = FALSE in the print statement or ask for the summary

You can even add moderator to make a mediated moderation or moderated mediation (not very recommended as we discussed in PSYC513 this past semester…).

medmod3<-mediate(security_mean~V240*V238+(RESEMAVAL)+(SACSECVAL), data=data)

medmod3

## 
## Mediation/Moderation Analysis 
## Call: mediate(y = security_mean ~ V240 * V238 + (RESEMAVAL) + (SACSECVAL), 
##     data = data)
## 
## The DV (Y) was  security_mean . The IV (X) was  V240 V238 V240*V238 . The mediating variable(s) =  RESEMAVAL SACSECVAL .
## 
## Total effect(c) of  V240  on  security_mean  =  0.07   S.E. =  0.06  t  =  1.09  df=  1196   with p =  0.28
## Direct effect (c') of  V240  on  security_mean  removing  RESEMAVAL SACSECVAL  =  0.06   S.E. =  0.07  t  =  0.96  df=  1194   with p =  0.34
## Indirect effect (ab) of  V240  on  security_mean  through  RESEMAVAL SACSECVAL   =  0.01 
## Mean bootstrapped indirect effect =  0.01  with standard error =  0.02  Lower CI =  -0.02    Upper CI =  0.04
## 
## Total effect(c) of  V238  on  security_mean  =  0.08   S.E. =  0.03  t  =  2.54  df=  1196   with p =  0.011
## Direct effect (c') of  V238  on  security_mean  removing  RESEMAVAL SACSECVAL  =  0.08   S.E. =  0.03  t  =  2.35  df=  1194   with p =  0.019
## Indirect effect (ab) of  V238  on  security_mean  through  RESEMAVAL SACSECVAL   =  0.01 
## Mean bootstrapped indirect effect =  0.01  with standard error =  0.01  Lower CI =  -0.01    Upper CI =  0.02
## 
## Total effect(c) of  V240*V238  on  security_mean  =  -0.05   S.E. =  0.06  t  =  -0.74  df=  1196   with p =  0.46
## Direct effect (c') of  V240*V238  on  security_mean  removing  RESEMAVAL SACSECVAL  =  -0.01   S.E. =  0.06  t  =  -0.18  df=  1194   with p =  0.86
## Indirect effect (ab) of  V240*V238  on  security_mean  through  RESEMAVAL SACSECVAL   =  -0.04 
## Mean bootstrapped indirect effect =  -0.04  with standard error =  0.01  Lower CI =  -0.07    Upper CI =  -0.01
## R = 0.14 R2 = 0.02   F = 4.7 on 5 and 1194 DF   p-value:  0.000296 
## 
##  To see the longer output, specify short = FALSE in the print statement or ask for the summary

Structural equation modeling

If you have more than one outcome variables, we may choose to do a path model or a structural equation model (SEM) using the package lavaan. There are two steps involved: (1) stating your model and (2) run the model.

For path model, there is no latent variables (i.e., no measurement model). Remember that all formula are in the format of y~x1+x2+...+xn. In this example, our two outcome variables are the two values (secular and emancipative), and our predictors are the three subscales of Schwartz. Emancipative values may change across age, so let’s control for that. This will also help us get a overidentified model (to have fit indices to observe).

pathmodel<- '
  RESEMAVAL~ V242+security_mean+caretake_mean+excite_mean
  SACSECVAL~security_mean+caretake_mean+excite_mean'
fitpath<-sem(pathmodel, data= data)
summary(fitpath, fit.measures=TRUE, standardized=TRUE)

## lavaan 0.6-6 ended normally after 54 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         10
##                                                       
##                                                   Used       Total
##   Number of observations                          1014        1200
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                12.252
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               153.336
##   Degrees of freedom                                 9
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.922
##   Tucker-Lewis Index (TLI)                       0.298
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               1178.619
##   Loglikelihood unrestricted model (H1)       1184.745
##                                                       
##   Akaike (AIC)                               -2337.237
##   Bayesian (BIC)                             -2288.021
##   Sample-size adjusted Bayesian (BIC)        -2319.781
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.105
##   90 Percent confidence interval - lower         0.058
##   90 Percent confidence interval - upper         0.162
##   P-value RMSEA <= 0.05                          0.028
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.022
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   RESEMAVAL ~                                                           
##     V242             -0.001    0.000   -2.786    0.005   -0.001   -0.093
##     security_mean     0.000    0.005    0.067    0.947    0.000    0.002
##     caretake_mean     0.008    0.005    1.615    0.106    0.008    0.060
##     excite_mean      -0.005    0.004   -1.126    0.260   -0.005   -0.039
##   SACSECVAL ~                                                           
##     security_mean    -0.002    0.005   -0.432    0.666   -0.002   -0.016
##     caretake_mean     0.040    0.005    8.226    0.000    0.040    0.288
##     excite_mean      -0.007    0.004   -1.684    0.092   -0.007   -0.053
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .RESEMAVAL ~~                                                          
##    .SACSECVAL         0.004    0.001    5.892    0.000    0.004    0.188
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .RESEMAVAL         0.019    0.001   22.517    0.000    0.019    0.982
##    .SACSECVAL         0.018    0.001   22.517    0.000    0.018    0.922

Alternatively, you may also use measurement model to identify latent variables. For the purpose of this example, we assume that the Schwartz world values form a general world values measurement, and the secular and emancipative values form a materialism measurement. Then, we predict materialism from general world values

semmodel<- '
  General World Values =~ security_mean+caretake_mean+excite_mean
  Materialism =~ RESEMAVAL+SACSECVAL

  Materialism~General World Values
'
fitsem<-sem(semmodel, data= data)
summary(fitsem, fit.measures=TRUE, standardized=TRUE)

## lavaan 0.6-6 ended normally after 82 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         11
##                                                       
##                                                   Used       Total
##   Number of observations                          1014        1200
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                68.016
##   Degrees of freedom                                 4
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               530.176
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.877
##   Tucker-Lewis Index (TLI)                       0.692
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3181.570
##   Loglikelihood unrestricted model (H1)      -3147.562
##                                                       
##   Akaike (AIC)                                6385.140
##   Bayesian (BIC)                              6439.278
##   Sample-size adjusted Bayesian (BIC)         6404.341
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.126
##   90 Percent confidence interval - lower         0.100
##   90 Percent confidence interval - upper         0.153
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.054
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                         Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   GeneralWorldValues =~                                                      
##     security_mean          1.000                               0.876    0.801
##     caretake_mean          0.743    0.090    8.264    0.000    0.651    0.641
##     excite_mean            0.389    0.056    6.962    0.000    0.341    0.310
##   Materialism =~                                                             
##     RESEMAVAL              1.000                               0.030    0.218
##     SACSECVAL              4.537    3.455    1.313    0.189    0.138    0.976
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Materialism ~                                                         
##     GeneralWrldVls    0.008    0.006    1.277    0.202    0.217    0.217
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .security_mean     0.429    0.091    4.721    0.000    0.429    0.359
##    .caretake_mean     0.610    0.056   10.840    0.000    0.610    0.590
##    .excite_mean       1.093    0.051   21.454    0.000    1.093    0.904
##    .RESEMAVAL         0.018    0.001   17.203    0.000    0.018    0.952
##    .SACSECVAL         0.001    0.014    0.068    0.946    0.001    0.048
##     GeneralWrldVls    0.768    0.102    7.537    0.000    1.000    1.000
##    .Materialism       0.001    0.001    1.304    0.192    0.953    0.953

Logistic Regression

Running a logistic regression is similar to linear regression except that you use glm() instead of lm() function. Assuming that we are predicting whether people would like to have neighbors from another religion (V41), with 1 = not wanting to have different religion neighbor and 2 = would not mind. We are predicting from sex (V240) and secular values (SACSECVAL).

data$religneighbor[data$V41==1]<-1 #not want
data$religneighbor[data$V41==2]<-0 #ok
logimod<-glm(religneighbor~factor(V240)+SACSECVAL, data, family = "binomial")
summary(logimod)

## 
## Call:
## glm(formula = religneighbor ~ factor(V240) + SACSECVAL, family = "binomial", 
##     data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3391  -1.0690  -0.9199   1.2569   1.6927  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.35356    0.08475  -4.172 3.02e-05 ***
## factor(V240)2  0.16193    0.11939   1.356    0.175    
## SACSECVAL     -1.83507    0.41908  -4.379 1.19e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1594.3  on 1164  degrees of freedom
## Residual deviance: 1572.5  on 1162  degrees of freedom
##   (35 observations deleted due to missingness)
## AIC: 1578.5
## 
## Number of Fisher Scoring iterations: 4