library(dplyr) # for data wrangling
library(psych) # for correlation matrices and factor analysis
data <- openxlsx::read.xlsx("MMM_influencer_data.xlsx")MMM WT 2023/24: Exercise 2 and 3
Set-up
Load packages dplyr and psych (we use them later)
Please also make sure that you’ve installed the packages car and GGally since we will need individual functions later.
Factor Analysis
Correlations
Extract the relevant variables into a new data frame and expect the data with the head()-function (head() displays only the first few rows of the data frame but all columns).
EFA_Corr_data <- data %>%
select(., SC02_01:PL01_04, PI01_01:PI01_06)
head(EFA_Corr_data)Remove PI01_03 since it was an attention check question. All participants had to indicate 7 here.
EFA_Corr_data <- EFA_Corr_data %>%
select(-PI01_03)Extract a correlation matrix by running psych’s corr.test-function and store the result.
Cor_matrix <- corr.test(EFA_Corr_data, use = "pairwise")The object Cor_matrix has multiple elements. Call the correlation matrix.
Cor_matrix$rPlot the correlation matrix
cor.plot(Cor_matrix$r)Show an interpretation
# We find one clear cut component "SC"
# The other three constructs PQ, PL, and PI are less clear cut with some high correlations between variables from different constructs.Identify the number of factors
Check the Eigenvalues and plot the result
eigen(cor(EFA_Corr_data))$values [1] 10.47339722 3.34080929 1.40173541 0.76343556 0.52606090 0.50866373
[7] 0.39407254 0.34323587 0.30823287 0.27065357 0.26582184 0.23724117
[13] 0.21171803 0.18780047 0.17747215 0.16053528 0.14995846 0.11344859
[19] 0.09708119 0.06862588
plot(eigen(cor(EFA_Corr_data))$values)Show an interpretation
# The Eigenvalues point to three factors (although we assume four constructs)Factor analysis with 4 factors
Run a factor analysis
EFA_psychFA <- psych::fa(EFA_Corr_data, nfactors = 4, rotate = "varimax", SMC = TRUE, fm = "ml")Communalities
All values should be greater than 0.5% (factors should account for (i.e., explain) more than 50% of variable’s variance).
EFA_psychFA$communality SC02_01 SC02_02 SC02_03 SC02_04 SC02_05 SC02_06 SC02_07 PQ01_01
0.7226784 0.7487418 0.7452686 0.5179542 0.5498860 0.5847570 0.8014818 0.8170496
PQ01_02 PQ01_03 PQ01_04 PL01_01 PL01_06 PL01_07 PL01_04 PI01_01
0.7536231 0.7428789 0.8392795 0.8724670 0.9494724 0.6910174 0.7408838 0.7124327
PI01_02 PI01_05 PI01_04 PI01_06
0.7188199 0.7888735 0.7951454 0.9111116
plot(EFA_psychFA$communality,
ylim = c(0,1))
abline(h = 0.5, col = "darkgreen")Show an interpretation
# All communalities are above 0.5 --> looks good!Loadings
EFA_psychFA$loadings
Loadings:
ML3 ML2 ML4 ML1
SC02_01 0.816 0.155 0.150
SC02_02 0.164 0.813 0.204 0.137
SC02_03 0.108 0.824 0.190 0.136
SC02_04 0.155 0.688 0.136
SC02_05 0.176 0.720
SC02_06 0.150 0.749
SC02_07 0.128 0.865 0.167
PQ01_01 0.368 0.184 0.793 0.136
PQ01_02 0.297 0.162 0.774 0.200
PQ01_03 0.309 0.120 0.782 0.147
PQ01_04 0.310 0.236 0.814 0.158
PL01_01 0.508 0.175 0.502 0.576
PL01_06 0.470 0.185 0.481 0.680
PL01_07 0.460 0.245 0.339 0.551
PL01_04 0.609 0.279 0.337 0.423
PI01_01 0.774 0.223 0.183 0.173
PI01_02 0.781 0.133 0.277 0.124
PI01_05 0.779 0.168 0.354 0.168
PI01_04 0.810 0.177 0.314
PI01_06 0.884 0.201 0.260 0.147
ML3 ML2 ML4 ML1
SS loadings 4.873 4.809 3.765 1.557
Proportion Var 0.244 0.240 0.188 0.078
Cumulative Var 0.244 0.484 0.672 0.750
Show an interpretation
# The factor loadings indicate:
## ML3 corresponds to purchase intention (PI... variables), but also a product liking variable (PL01_04) is associated with this factor.
## ML2 corresponds to self-influencer connection (SC... variables)
## ML4 corresponds to product quality (PQ... variables)
## ML1's factor loadings are highest for the product liking variables (PL... variables)
## However, the PL-variables also exhibit strong relations to other factors (ML3 and ML4)Factor analysis with 3 factors
Run a factor analysis
EFA_psychFA_3factors <- psych::fa(EFA_Corr_data, nfactors = 3, rotate = "varimax", SMC = TRUE, fm = "ml")Communalities
EFA_psychFA_3factors$communality SC02_01 SC02_02 SC02_03 SC02_04 SC02_05 SC02_06 SC02_07 PQ01_01
0.7214971 0.7499084 0.7447587 0.5135368 0.5457729 0.5817134 0.8012191 0.7833764
PQ01_02 PQ01_03 PQ01_04 PL01_01 PL01_06 PL01_07 PL01_04 PI01_01
0.7615636 0.7359412 0.8056511 0.7530281 0.7352660 0.5696186 0.7025687 0.7147376
PI01_02 PI01_05 PI01_04 PI01_06
0.7141982 0.7852945 0.7773128 0.8976801
plot(EFA_psychFA_3factors$communality,
ylim = c(0,1))
abline(h = 0.5, col = "darkgreen")Show an interpretation
# All communalities are above 0.5 --> looks good!Loadings
EFA_psychFA_3factors$loadings
Loadings:
ML1 ML2 ML3
SC02_01 0.118 0.819 0.191
SC02_02 0.178 0.814 0.236
SC02_03 0.125 0.826 0.218
SC02_04 0.159 0.686 0.135
SC02_05 0.178 0.717
SC02_06 0.152 0.747
SC02_07 0.137 0.865 0.185
PQ01_01 0.369 0.174 0.785
PQ01_02 0.301 0.155 0.804
PQ01_03 0.306 0.110 0.794
PQ01_04 0.317 0.228 0.808
PL01_01 0.575 0.194 0.620
PL01_06 0.553 0.212 0.620
PL01_07 0.528 0.262 0.471
PL01_04 0.658 0.286 0.434
PI01_01 0.788 0.220 0.214
PI01_02 0.785 0.126 0.288
PI01_05 0.788 0.162 0.372
PI01_04 0.806 0.168 0.314
PI01_06 0.885 0.195 0.278
ML1 ML2 ML3
SS loadings 5.219 4.813 4.362
Proportion Var 0.261 0.241 0.218
Cumulative Var 0.261 0.502 0.720
Show an interpretation
# The factor loadings indicate:
## ML1 corresponds to purchase intention (PI... variables), but also two product liking variables (PL01_04 and PL01_07) are associated with this factor.
## ML2 corresponds to self-influencer connection (SC... variables)
## ML3 corresponds to product quality (PQ... variables), but also two product liking variables (PL01_01 and PL01_06) are associated with this factor.
## Again, the product liking variables exhibit strong relations to multiple factors (ML1 and ML3)Regression analysis
Prepare the data
Compute mean scores
Independent variables: percLik_mean, SIC_mean, PQ_mean
Dependent variable: PI_mean
data <- data %>%
rowwise() %>% # use rowwise() to make sure that means for every case (rows are calculated)
mutate(percLik_mean = mean(c( # use c() to concatinate the variable names
PL01_01, PL01_04, PL01_06, PL01_07
))) %>%
mutate(SIC_mean = mean(c(
SC02_01, SC02_02, SC02_03, SC02_04, SC02_05, SC02_06, SC02_07
))) %>%
mutate(PQ_mean = mean(c(
PQ01_01, PQ01_02, PQ01_03, PQ01_04
))) %>%
mutate(PI_mean = mean(c(
PI01_01, PI01_02, PI01_04, PI01_05, PI01_06
))) %>%
as.data.frame() # we need a data.frame format later on (let's make sure we have it)Examine mean scores
percLik_mean
data %>%
summarize(
Mean = mean(percLik_mean, na.rm=TRUE),
SD = sd(percLik_mean, na.rm=TRUE),
n = n()) Mean SD n
1 3.570628 1.680506 223
SIC_mean
data %>%
summarize(
Mean = mean(SIC_mean, na.rm=TRUE),
SD = sd(SIC_mean, na.rm=TRUE),
n = n()) Mean SD n
1 1.936579 1.094078 223
PQ_mean
data %>%
summarize(
Mean = mean(PQ_mean, na.rm=TRUE),
SD = sd(PQ_mean, na.rm=TRUE),
n = n()) Mean SD n
1 4.227578 1.399023 223
PI_mean
data %>%
summarize(
Mean = mean(PI_mean, na.rm=TRUE),
SD = sd(PI_mean, na.rm=TRUE),
n = n()) Mean SD n
1 2.69417 1.666823 223
Correlations
Get correlations (same as above) and sho the corelation matrix
Cor_matrix_reg <- corr.test(data %>%
select(., percLik_mean, SIC_mean, PQ_mean, PI_mean),
use = "pairwise")$r #caöö $r at the end of the function to directly store th correlation matrix
Cor_matrix_reg percLik_mean SIC_mean PQ_mean PI_mean
percLik_mean 1.0000000 0.4490395 0.7354400 0.7791595
SIC_mean 0.4490395 1.0000000 0.3882637 0.3970893
PQ_mean 0.7354400 0.3882637 1.0000000 0.6436723
PI_mean 0.7791595 0.3970893 0.6436723 1.0000000
Plot the correlation matrix (same as above)
cor.plot(Cor_matrix_reg)Make a more comprehensive plot with GGally::ggpairs
GGally::ggpairs(data %>%
select(., percLik_mean, SIC_mean, PQ_mean, PI_mean))Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Show an interpretation
# The plot indicates high correlations between all variables.
# Reasonable to assume that the independent variables (percLik_mean + SIC_mean + PQ_mean) are associated with the dependent variable (PI_mean).
# Also: high correlation between independent variables (esp. perceived Quality and liking with r = 0.735) may point to a collinearity problem.Regression
Run a regression analysis with lm (i.e., “linear model”).
PI_mean –> dependent variable
percLik_mean + SIC_mean + PQ_mean –> independent variables
~ –> operator for “as a function of”
Summarize the results with summary().
simpleLM_full <- lm(PI_mean~percLik_mean + SIC_mean + PQ_mean,
data)
summary(simpleLM_full)
Call:
lm(formula = PI_mean ~ percLik_mean + SIC_mean + PQ_mean, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.90618 -0.68283 0.09944 0.70991 2.48490
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.49047 0.22666 -2.164 0.0316 *
percLik_mean 0.64348 0.06319 10.182 <2e-16 ***
SIC_mean 0.07374 0.07137 1.033 0.3026
PQ_mean 0.17604 0.07360 2.392 0.0176 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.035 on 219 degrees of freedom
Multiple R-squared: 0.6198, Adjusted R-squared: 0.6146
F-statistic: 119 on 3 and 219 DF, p-value: < 2.2e-16
Show an interpretation
# The overall model is significant with F(3, 219) = 119, p < .001.
# The individual parameter tests show that only Liking (percLik_mean, t(219) = 10.182, p < .001) and quality (PQ_mean, t(219) = 2.392, p = .0176) are significantly associated with purchase intention (PI).Expect then VIF values
car::vif(simpleLM_full)percLik_mean SIC_mean PQ_mean
2.338347 1.264173 2.198232
Show an interpretation
# VIF indicate no multicollinearity problem since all values are beow 10 (even below 5)