MMM WT 2023/24: Exercise 2 and 3

Author
Affiliation
Susanne Adler

Institute for Marketing, Ludwig-Maximilians-University Munich

Set-up

Load packages dplyr and psych (we use them later)

Please also make sure that you’ve installed the packages car and GGally since we will need individual functions later.

library(dplyr) # for data wrangling
library(psych) # for correlation matrices and factor analysis

data <- openxlsx::read.xlsx("MMM_influencer_data.xlsx")

Factor Analysis

Correlations

Extract the relevant variables into a new data frame and expect the data with the head()-function (head() displays only the first few rows of the data frame but all columns).

EFA_Corr_data <- data %>%
  select(., SC02_01:PL01_04, PI01_01:PI01_06)

head(EFA_Corr_data)

Remove PI01_03 since it was an attention check question. All participants had to indicate 7 here.

EFA_Corr_data <- EFA_Corr_data %>% 
  select(-PI01_03)

Extract a correlation matrix by running psych’s corr.test-function and store the result.

Cor_matrix <- corr.test(EFA_Corr_data, use = "pairwise")

The object Cor_matrix has multiple elements. Call the correlation matrix.

Cor_matrix$r

Plot the correlation matrix

cor.plot(Cor_matrix$r)

Show an interpretation
# We find one clear cut component "SC"
# The other three constructs PQ, PL, and PI are less clear cut with some high correlations between variables from different constructs.

Identify the number of factors

Check the Eigenvalues and plot the result

eigen(cor(EFA_Corr_data))$values
 [1] 10.47339722  3.34080929  1.40173541  0.76343556  0.52606090  0.50866373
 [7]  0.39407254  0.34323587  0.30823287  0.27065357  0.26582184  0.23724117
[13]  0.21171803  0.18780047  0.17747215  0.16053528  0.14995846  0.11344859
[19]  0.09708119  0.06862588
plot(eigen(cor(EFA_Corr_data))$values)

Show an interpretation
# The Eigenvalues point to three factors (although we assume four constructs)

Factor analysis with 4 factors

Run a factor analysis

EFA_psychFA <- psych::fa(EFA_Corr_data, nfactors = 4, rotate = "varimax", SMC = TRUE, fm = "ml")

Communalities

All values should be greater than 0.5% (factors should account for (i.e., explain) more than 50% of variable’s variance).

EFA_psychFA$communality
  SC02_01   SC02_02   SC02_03   SC02_04   SC02_05   SC02_06   SC02_07   PQ01_01 
0.7226784 0.7487418 0.7452686 0.5179542 0.5498860 0.5847570 0.8014818 0.8170496 
  PQ01_02   PQ01_03   PQ01_04   PL01_01   PL01_06   PL01_07   PL01_04   PI01_01 
0.7536231 0.7428789 0.8392795 0.8724670 0.9494724 0.6910174 0.7408838 0.7124327 
  PI01_02   PI01_05   PI01_04   PI01_06 
0.7188199 0.7888735 0.7951454 0.9111116 
plot(EFA_psychFA$communality,
     ylim = c(0,1))
abline(h = 0.5, col = "darkgreen")

Show an interpretation
# All communalities are above 0.5 --> looks good!

Loadings

EFA_psychFA$loadings

Loadings:
        ML3   ML2   ML4   ML1  
SC02_01       0.816 0.155 0.150
SC02_02 0.164 0.813 0.204 0.137
SC02_03 0.108 0.824 0.190 0.136
SC02_04 0.155 0.688 0.136      
SC02_05 0.176 0.720            
SC02_06 0.150 0.749            
SC02_07 0.128 0.865 0.167      
PQ01_01 0.368 0.184 0.793 0.136
PQ01_02 0.297 0.162 0.774 0.200
PQ01_03 0.309 0.120 0.782 0.147
PQ01_04 0.310 0.236 0.814 0.158
PL01_01 0.508 0.175 0.502 0.576
PL01_06 0.470 0.185 0.481 0.680
PL01_07 0.460 0.245 0.339 0.551
PL01_04 0.609 0.279 0.337 0.423
PI01_01 0.774 0.223 0.183 0.173
PI01_02 0.781 0.133 0.277 0.124
PI01_05 0.779 0.168 0.354 0.168
PI01_04 0.810 0.177 0.314      
PI01_06 0.884 0.201 0.260 0.147

                 ML3   ML2   ML4   ML1
SS loadings    4.873 4.809 3.765 1.557
Proportion Var 0.244 0.240 0.188 0.078
Cumulative Var 0.244 0.484 0.672 0.750
Show an interpretation
# The factor loadings indicate:

## ML3 corresponds to purchase intention (PI... variables), but also a product liking variable (PL01_04) is associated with this factor.

## ML2 corresponds to self-influencer connection (SC... variables)

## ML4 corresponds to product quality (PQ... variables)

## ML1's factor loadings are highest for the product liking variables (PL... variables)
## However, the PL-variables also exhibit strong relations to other factors (ML3 and ML4)

Factor analysis with 3 factors

Run a factor analysis

EFA_psychFA_3factors <- psych::fa(EFA_Corr_data, nfactors = 3, rotate = "varimax", SMC = TRUE, fm = "ml")

Communalities

EFA_psychFA_3factors$communality
  SC02_01   SC02_02   SC02_03   SC02_04   SC02_05   SC02_06   SC02_07   PQ01_01 
0.7214971 0.7499084 0.7447587 0.5135368 0.5457729 0.5817134 0.8012191 0.7833764 
  PQ01_02   PQ01_03   PQ01_04   PL01_01   PL01_06   PL01_07   PL01_04   PI01_01 
0.7615636 0.7359412 0.8056511 0.7530281 0.7352660 0.5696186 0.7025687 0.7147376 
  PI01_02   PI01_05   PI01_04   PI01_06 
0.7141982 0.7852945 0.7773128 0.8976801 
plot(EFA_psychFA_3factors$communality,
     ylim = c(0,1))
abline(h = 0.5, col = "darkgreen")

Show an interpretation
# All communalities are above 0.5 --> looks good!

Loadings

EFA_psychFA_3factors$loadings

Loadings:
        ML1   ML2   ML3  
SC02_01 0.118 0.819 0.191
SC02_02 0.178 0.814 0.236
SC02_03 0.125 0.826 0.218
SC02_04 0.159 0.686 0.135
SC02_05 0.178 0.717      
SC02_06 0.152 0.747      
SC02_07 0.137 0.865 0.185
PQ01_01 0.369 0.174 0.785
PQ01_02 0.301 0.155 0.804
PQ01_03 0.306 0.110 0.794
PQ01_04 0.317 0.228 0.808
PL01_01 0.575 0.194 0.620
PL01_06 0.553 0.212 0.620
PL01_07 0.528 0.262 0.471
PL01_04 0.658 0.286 0.434
PI01_01 0.788 0.220 0.214
PI01_02 0.785 0.126 0.288
PI01_05 0.788 0.162 0.372
PI01_04 0.806 0.168 0.314
PI01_06 0.885 0.195 0.278

                 ML1   ML2   ML3
SS loadings    5.219 4.813 4.362
Proportion Var 0.261 0.241 0.218
Cumulative Var 0.261 0.502 0.720
Show an interpretation
# The factor loadings indicate:

## ML1 corresponds to purchase intention (PI... variables), but also two product liking variables (PL01_04 and PL01_07) are associated with this factor.

## ML2 corresponds to self-influencer connection (SC... variables)

## ML3 corresponds to product quality (PQ... variables), but also two product liking variables (PL01_01 and PL01_06) are associated with this factor.

## Again, the product liking variables exhibit strong relations to multiple factors (ML1 and ML3)

Regression analysis

Prepare the data

Compute mean scores

  • Independent variables: percLik_mean, SIC_mean, PQ_mean

  • Dependent variable: PI_mean

data <- data %>% 
  rowwise() %>%  # use rowwise() to make sure that means for every case (rows are calculated)
  mutate(percLik_mean = mean(c( # use c() to concatinate the variable names
    PL01_01, PL01_04, PL01_06, PL01_07 
    ))) %>% 
  mutate(SIC_mean = mean(c( 
    SC02_01, SC02_02, SC02_03, SC02_04, SC02_05, SC02_06, SC02_07 
    ))) %>% 
  mutate(PQ_mean = mean(c( 
    PQ01_01, PQ01_02, PQ01_03, PQ01_04 
    ))) %>% 
  mutate(PI_mean = mean(c( 
    PI01_01, PI01_02, PI01_04, PI01_05, PI01_06 
    ))) %>% 
  as.data.frame() # we need a data.frame format later on (let's make sure we have it)

Examine mean scores

percLik_mean

data %>% 
  summarize( 
    Mean = mean(percLik_mean, na.rm=TRUE), 
    SD = sd(percLik_mean, na.rm=TRUE), 
    n = n())
      Mean       SD   n
1 3.570628 1.680506 223

SIC_mean

data %>% 
  summarize( 
  Mean = mean(SIC_mean, na.rm=TRUE), 
  SD = sd(SIC_mean, na.rm=TRUE), 
  n = n())
      Mean       SD   n
1 1.936579 1.094078 223

PQ_mean

data %>% 
  summarize( 
    Mean = mean(PQ_mean, na.rm=TRUE), 
    SD = sd(PQ_mean, na.rm=TRUE), 
    n = n())
      Mean       SD   n
1 4.227578 1.399023 223

PI_mean

data %>% 
  summarize( 
    Mean = mean(PI_mean, na.rm=TRUE), 
    SD = sd(PI_mean, na.rm=TRUE), 
    n = n())
     Mean       SD   n
1 2.69417 1.666823 223

Correlations

Get correlations (same as above) and sho the corelation matrix

Cor_matrix_reg <- corr.test(data %>% 
                              select(., percLik_mean, SIC_mean, PQ_mean, PI_mean), 
                            use = "pairwise")$r #caöö $r at the end of the function to directly store th correlation matrix

Cor_matrix_reg
             percLik_mean  SIC_mean   PQ_mean   PI_mean
percLik_mean    1.0000000 0.4490395 0.7354400 0.7791595
SIC_mean        0.4490395 1.0000000 0.3882637 0.3970893
PQ_mean         0.7354400 0.3882637 1.0000000 0.6436723
PI_mean         0.7791595 0.3970893 0.6436723 1.0000000

Plot the correlation matrix (same as above)

cor.plot(Cor_matrix_reg)

Make a more comprehensive plot with GGally::ggpairs

GGally::ggpairs(data %>% 
                  select(., percLik_mean, SIC_mean, PQ_mean, PI_mean))
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

Show an interpretation
# The plot indicates high correlations between all variables.
# Reasonable to assume that the independent variables (percLik_mean + SIC_mean + PQ_mean) are associated with the dependent variable (PI_mean).

# Also: high correlation between independent variables (esp. perceived Quality and liking with r = 0.735) may point to a collinearity problem.

Regression

Run a regression analysis with lm (i.e., “linear model”).

PI_mean –> dependent variable

percLik_mean + SIC_mean + PQ_mean –> independent variables

~ –> operator for “as a function of”

Summarize the results with summary().

simpleLM_full <- lm(PI_mean~percLik_mean + SIC_mean + PQ_mean, 
                    data)

summary(simpleLM_full)

Call:
lm(formula = PI_mean ~ percLik_mean + SIC_mean + PQ_mean, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.90618 -0.68283  0.09944  0.70991  2.48490 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.49047    0.22666  -2.164   0.0316 *  
percLik_mean  0.64348    0.06319  10.182   <2e-16 ***
SIC_mean      0.07374    0.07137   1.033   0.3026    
PQ_mean       0.17604    0.07360   2.392   0.0176 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.035 on 219 degrees of freedom
Multiple R-squared:  0.6198,    Adjusted R-squared:  0.6146 
F-statistic:   119 on 3 and 219 DF,  p-value: < 2.2e-16
Show an interpretation
# The overall model is significant with F(3, 219) = 119, p < .001.

# The individual parameter tests show that only Liking (percLik_mean, t(219) = 10.182, p < .001) and quality (PQ_mean, t(219) = 2.392, p = .0176) are significantly associated with purchase intention (PI).

Expect then VIF values

car::vif(simpleLM_full)
percLik_mean     SIC_mean      PQ_mean 
    2.338347     1.264173     2.198232 
Show an interpretation
# VIF indicate no multicollinearity problem since all values are beow 10 (even below 5)