Data analysis pilot study: test framing and bioinspired items
Author
Julius Fenn, Stephanie Bugler
1 Notes
2 global variables
Define your global variables (can take some time to run):
3 create raw data files
# sets the directory of location of this script as the current directory# setwd(dirname(rstudioapi::getSourceEditorContext()$path))### load packagesrequire(pacman)p_load('tidyverse', 'jsonlite','stargazer', 'DT', 'psych','writexl', 'moments', 'lavaan', 'semPlot', 'mirt', 'MplusAutomation','afex', 'emmeans', 'jtools')setwd("../01_dataPreperation/outputs")# Load datasetdat <-read_rds(file ="questionnaire.rds")# --- Remove duplicate Prolific IDs ---tmp_removeIDs <-names(table(dat$PROLIFIC_PID))[table(dat$PROLIFIC_PID) >=2]dat <- dat[!dat$PROLIFIC_PID %in% tmp_removeIDs,]# Check new Ndim(dat) # should now be 582 rows
Promax rotation, factoring method minimum residual, if scale is a likert scale less or equal than 7 answering options the EFA or parallel analysis is computed over a polychoric correlation to account for non-normality of data (see in detail R-Code “helperFunctions”)
#### Overall EFAregExOverall <-"^Bioinspiration|^EcologicalDimension"psych::cor.plot(r =cor(dat[, str_detect(string =colnames(dat),pattern = regExOverall)] , use ="pairwise.complete.obs"),upper =FALSE, xlas =2, main ="Overall")
Factor Analysis using method = minres
Call: fa(r = tmp_dat, nfactors = nfac, rotate = "promax", cor = "cor")
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 h2 u2 com
EcologicalDimension-1 -0.13 0.87 0.740 0.26 1.0
EcologicalDimension-4 0.09 0.57 0.348 0.65 1.1
EcologicalDimension-2 -0.11 0.84 0.685 0.32 1.0
EcologicalDimension-3 -0.06 0.85 0.713 0.29 1.0
Bioinspiration-PN2r -0.19 0.07 0.036 0.96 1.3
Bioinspiration-PN1 0.72 0.23 0.617 0.38 1.2
Bioinspiration-IPI4 0.84 -0.04 0.698 0.30 1.0
Bioinspiration-VRtN4r 0.67 -0.05 0.446 0.55 1.0
Bioinspiration-VRtN2 0.85 -0.03 0.722 0.28 1.0
Bioinspiration-IPI1 0.85 -0.13 0.704 0.30 1.0
Bioinspiration-VRtN1 0.76 0.09 0.601 0.40 1.0
Bioinspiration-IPI3 0.81 -0.03 0.653 0.35 1.0
Bioinspiration-VRtN3 0.76 0.07 0.597 0.40 1.0
Bioinspiration-PN3 0.48 0.35 0.404 0.60 1.8
Bioinspiration-IPI2r 0.45 -0.14 0.204 0.80 1.2
Bioinspiration-PN4 0.59 0.21 0.425 0.57 1.2
MR1 MR2
SS loadings 5.80 2.80
Proportion Var 0.36 0.17
Cumulative Var 0.36 0.54
Proportion Explained 0.67 0.33
Cumulative Proportion 0.67 1.00
With factor correlations of
MR1 MR2
MR1 1.00 0.16
MR2 0.16 1.00
Mean item complexity = 1.1
Test of the hypothesis that 2 factors are sufficient.
df null model = 120 with the objective function = 9.41 with Chi Square = 5411.06
df of the model are 89 and the objective function was 0.81
The root mean square of the residuals (RMSR) is 0.04
The df corrected root mean square of the residuals is 0.05
The harmonic n.obs is 582 with the empirical chi square 223.85 with prob < 1.3e-13
The total n.obs was 582 with Likelihood Chi Square = 462.95 with prob < 6.6e-52
Tucker Lewis Index of factoring reliability = 0.904
RMSEA index = 0.085 and the 90 % confidence intervals are 0.077 0.093
BIC = -103.67
Fit based upon off diagonal values = 0.99
Measures of factor score adequacy
MR1 MR2
Correlation of (regression) scores with factors 0.97 0.95
Multiple R square of scores with factors 0.94 0.90
Minimum correlation of possible factor scores 0.88 0.79
two factor structure, within bioinspiried dimensions no strong differences in overall answering patterns; 1 items is dysfunctional (negative correlated)
4.2.1 for “Bioinspiration Items”
#### Overall EFAregExOverall <-"^Bioinspiration"psych::cor.plot(r =cor(dat[, str_detect(string =colnames(dat),pattern = regExOverall)] , use ="pairwise.complete.obs"),upper =FALSE, xlas =2, main ="Overall")
Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
The estimated weights for the factor scores are probably incorrect. Try a
different factor score estimation method.
tmp_EFA[[1]]
Factor Analysis using method = minres
Call: fa(r = tmp_dat, nfactors = nfac, rotate = "promax", cor = "cor")
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 h2 u2 com
Bioinspiration-PN2r -0.11 -0.08 0.03 0.97 1.8
Bioinspiration-PN1 0.03 0.82 0.71 0.29 1.0
Bioinspiration-IPI4 0.81 0.08 0.75 0.25 1.0
Bioinspiration-VRtN4r 0.53 0.18 0.45 0.55 1.2
Bioinspiration-VRtN2 0.58 0.32 0.71 0.29 1.6
Bioinspiration-IPI1 0.85 0.02 0.75 0.25 1.0
Bioinspiration-VRtN1 0.27 0.57 0.62 0.38 1.4
Bioinspiration-IPI3 0.76 0.10 0.69 0.31 1.0
Bioinspiration-VRtN3 0.37 0.46 0.60 0.40 1.9
Bioinspiration-PN3 -0.20 0.82 0.47 0.53 1.1
Bioinspiration-IPI2r 0.65 -0.20 0.27 0.73 1.2
Bioinspiration-PN4 -0.01 0.71 0.49 0.51 1.0
MR1 MR2
SS loadings 3.60 2.94
Proportion Var 0.30 0.24
Cumulative Var 0.30 0.55
Proportion Explained 0.55 0.45
Cumulative Proportion 0.55 1.00
With factor correlations of
MR1 MR2
MR1 1.00 0.73
MR2 0.73 1.00
Mean item complexity = 1.3
Test of the hypothesis that 2 factors are sufficient.
df null model = 66 with the objective function = 7.03 with Chi Square = 4051.16
df of the model are 43 and the objective function was 0.35
The root mean square of the residuals (RMSR) is 0.04
The df corrected root mean square of the residuals is 0.04
The harmonic n.obs is 582 with the empirical chi square 97.5 with prob < 4.1e-06
The total n.obs was 582 with Likelihood Chi Square = 200.33 with prob < 3.7e-22
Tucker Lewis Index of factoring reliability = 0.939
RMSEA index = 0.079 and the 90 % confidence intervals are 0.068 0.091
BIC = -73.43
Fit based upon off diagonal values = 0.99
Measures of factor score adequacy
MR1 MR2
Correlation of (regression) scores with factors 0.96 0.94
Multiple R square of scores with factors 0.92 0.88
Minimum correlation of possible factor scores 0.84 0.77
4.3 Descriptives, correlation plot, EFA, CFA for “Ecological Items”
Applied a self-written function for example to check the reliability and amount of explained variance for the first factor:
Some items ( Bioinspiration-PN2r ) were negatively correlated with the first principal component and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' optionCronbachs Alpha: 0.89
Parallel analysis suggests that the number of factors = 4 and the number of components = 1
Bioinspiration
Number of components: 1
EFA factor loadings (1 factor solution):
Loadings:
MR1
BioinspirationPN2r -0.164
BioinspirationPN1 0.792
BioinspirationIPI4 0.866
BioinspirationVRtN4r 0.693
BioinspirationVRtN2 0.882
BioinspirationIPI1 0.842
BioinspirationVRtN1 0.815
BioinspirationIPI3 0.840
BioinspirationVRtN3 0.815
BioinspirationPN3 0.576
BioinspirationIPI2r 0.459
BioinspirationPN4 0.665
MR1
SS loadings 6.390
Proportion Var 0.533
CFA summary and fit statistics:
lavaan 0.6-19 ended normally after 57 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 24
Number of observations 582
Model Test User Model:
Standard Scaled
Test Statistic 503.816 395.086
Degrees of freedom 54 54
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.275
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 4092.174 3059.934
Degrees of freedom 66 66
P-value 0.000 0.000
Scaling correction factor 1.337
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.888 0.886
Tucker-Lewis Index (TLI) 0.863 0.861
Robust Comparative Fit Index (CFI) 0.891
Robust Tucker-Lewis Index (TLI) 0.867
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -8228.204 -8228.204
Scaling correction factor 1.219
for the MLR correction
Loglikelihood unrestricted model (H1) -7976.296 -7976.296
Scaling correction factor 1.258
for the MLR correction
Akaike (AIC) 16504.408 16504.408
Bayesian (BIC) 16609.203 16609.203
Sample-size adjusted Bayesian (SABIC) 16533.012 16533.012
Root Mean Square Error of Approximation:
RMSEA 0.120 0.104
90 Percent confidence interval - lower 0.110 0.096
90 Percent confidence interval - upper 0.129 0.113
P-value H_0: RMSEA <= 0.050 0.000 0.000
P-value H_0: RMSEA >= 0.080 1.000 1.000
Robust RMSEA 0.118
90 Percent confidence interval - lower 0.107
90 Percent confidence interval - upper 0.129
P-value H_0: Robust RMSEA <= 0.050 0.000
P-value H_0: Robust RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.059 0.059
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Bioinspiration =~
BioinsprtnPN2r 1.000 0.195 0.180
BioinspirtnPN1 -3.678 0.989 -3.721 0.000 -0.716 -0.738
BioinsprtnIPI4 -4.210 1.139 -3.695 0.000 -0.819 -0.847
BinsprtnVRtN4r -3.758 1.060 -3.546 0.000 -0.732 -0.661
BionsprtnVRtN2 -4.760 1.280 -3.720 0.000 -0.927 -0.844
BioinsprtnIPI1 -4.250 1.135 -3.744 0.000 -0.827 -0.838
BionsprtnVRtN1 -4.066 1.091 -3.727 0.000 -0.791 -0.763
BioinsprtnIPI3 -3.952 1.068 -3.701 0.000 -0.769 -0.818
BionsprtnVRtN3 -3.846 1.041 -3.696 0.000 -0.749 -0.762
BioinspirtnPN3 -2.465 0.708 -3.483 0.000 -0.480 -0.522
BionsprtnIPI2r -2.374 0.703 -3.380 0.001 -0.462 -0.435
BioinspirtnPN4 -3.366 0.906 -3.714 0.000 -0.655 -0.615
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.BioinsprtnPN2r 1.127 0.054 20.847 0.000 1.127 0.967
.BioinspirtnPN1 0.429 0.038 11.386 0.000 0.429 0.456
.BioinsprtnIPI4 0.264 0.027 9.922 0.000 0.264 0.283
.BinsprtnVRtN4r 0.691 0.064 10.748 0.000 0.691 0.564
.BionsprtnVRtN2 0.346 0.033 10.538 0.000 0.346 0.287
.BioinsprtnIPI1 0.291 0.029 10.170 0.000 0.291 0.298
.BionsprtnVRtN1 0.448 0.039 11.504 0.000 0.448 0.417
.BioinsprtnIPI3 0.293 0.026 11.225 0.000 0.293 0.331
.BionsprtnVRtN3 0.405 0.033 12.379 0.000 0.405 0.420
.BioinspirtnPN3 0.615 0.038 16.319 0.000 0.615 0.728
.BionsprtnIPI2r 0.913 0.057 15.896 0.000 0.913 0.810
.BioinspirtnPN4 0.707 0.044 15.955 0.000 0.707 0.622
Bioinspiration 0.038 0.021 1.824 0.068 1.000 1.000
Some items ( Bioinspiration-PN2r ) were negatively correlated with the first principal component and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' optionCronbachs Alpha: 0.89
Parallel analysis suggests that the number of factors = 4 and the number of components = 1
Bioinspiration
Number of components: 1
EFA factor loadings (1 factor solution):
Loadings:
MR1
BioinspirationPN2r -0.164
BioinspirationPN1 0.792
BioinspirationIPI4 0.866
BioinspirationVRtN4r 0.693
BioinspirationVRtN2 0.882
BioinspirationIPI1 0.842
BioinspirationVRtN1 0.815
BioinspirationIPI3 0.840
BioinspirationVRtN3 0.815
BioinspirationPN3 0.576
BioinspirationIPI2r 0.459
BioinspirationPN4 0.665
MR1
SS loadings 6.390
Proportion Var 0.533
CFA summary and fit statistics:
lavaan 0.6-19 ended normally after 57 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 24
Number of observations 582
Model Test User Model:
Standard Scaled
Test Statistic 503.816 395.086
Degrees of freedom 54 54
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.275
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 4092.174 3059.934
Degrees of freedom 66 66
P-value 0.000 0.000
Scaling correction factor 1.337
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.888 0.886
Tucker-Lewis Index (TLI) 0.863 0.861
Robust Comparative Fit Index (CFI) 0.891
Robust Tucker-Lewis Index (TLI) 0.867
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -8228.204 -8228.204
Scaling correction factor 1.219
for the MLR correction
Loglikelihood unrestricted model (H1) -7976.296 -7976.296
Scaling correction factor 1.258
for the MLR correction
Akaike (AIC) 16504.408 16504.408
Bayesian (BIC) 16609.203 16609.203
Sample-size adjusted Bayesian (SABIC) 16533.012 16533.012
Root Mean Square Error of Approximation:
RMSEA 0.120 0.104
90 Percent confidence interval - lower 0.110 0.096
90 Percent confidence interval - upper 0.129 0.113
P-value H_0: RMSEA <= 0.050 0.000 0.000
P-value H_0: RMSEA >= 0.080 1.000 1.000
Robust RMSEA 0.118
90 Percent confidence interval - lower 0.107
90 Percent confidence interval - upper 0.129
P-value H_0: Robust RMSEA <= 0.050 0.000
P-value H_0: Robust RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.059 0.059
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Bioinspiration =~
BioinsprtnPN2r 1.000 0.195 0.180
BioinspirtnPN1 -3.678 0.989 -3.721 0.000 -0.716 -0.738
BioinsprtnIPI4 -4.210 1.139 -3.695 0.000 -0.819 -0.847
BinsprtnVRtN4r -3.758 1.060 -3.546 0.000 -0.732 -0.661
BionsprtnVRtN2 -4.760 1.280 -3.720 0.000 -0.927 -0.844
BioinsprtnIPI1 -4.250 1.135 -3.744 0.000 -0.827 -0.838
BionsprtnVRtN1 -4.066 1.091 -3.727 0.000 -0.791 -0.763
BioinsprtnIPI3 -3.952 1.068 -3.701 0.000 -0.769 -0.818
BionsprtnVRtN3 -3.846 1.041 -3.696 0.000 -0.749 -0.762
BioinspirtnPN3 -2.465 0.708 -3.483 0.000 -0.480 -0.522
BionsprtnIPI2r -2.374 0.703 -3.380 0.001 -0.462 -0.435
BioinspirtnPN4 -3.366 0.906 -3.714 0.000 -0.655 -0.615
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.BioinsprtnPN2r 1.127 0.054 20.847 0.000 1.127 0.967
.BioinspirtnPN1 0.429 0.038 11.386 0.000 0.429 0.456
.BioinsprtnIPI4 0.264 0.027 9.922 0.000 0.264 0.283
.BinsprtnVRtN4r 0.691 0.064 10.748 0.000 0.691 0.564
.BionsprtnVRtN2 0.346 0.033 10.538 0.000 0.346 0.287
.BioinsprtnIPI1 0.291 0.029 10.170 0.000 0.291 0.298
.BionsprtnVRtN1 0.448 0.039 11.504 0.000 0.448 0.417
.BioinsprtnIPI3 0.293 0.026 11.225 0.000 0.293 0.331
.BionsprtnVRtN3 0.405 0.033 12.379 0.000 0.405 0.420
.BioinspirtnPN3 0.615 0.038 16.319 0.000 0.615 0.728
.BionsprtnIPI2r 0.913 0.057 15.896 0.000 0.913 0.810
.BioinspirtnPN4 0.707 0.044 15.955 0.000 0.707 0.622
Bioinspiration 0.038 0.021 1.824 0.068 1.000 1.000
Some items ( Bioinspiration-PN2r ) were negatively correlated with the first principal component and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' optionCronbachs Alpha: 0.54
Parallel analysis suggests that the number of factors = 2 and the number of components = 1
PN
Number of components: 1
KMO criteria is to low (< .6) for:
BioinspirationPN2r
mean KMO: 0.68
EFA factor loadings (1 factor solution):
Loadings:
MR1
BioinspirationPN2r -0.120
BioinspirationPN1 0.904
BioinspirationPN3 0.690
BioinspirationPN4 0.768
MR1
SS loadings 1.896
Proportion Var 0.474
CFA summary and fit statistics:
lavaan 0.6-19 ended normally after 44 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 8
Number of observations 582
Model Test User Model:
Standard Scaled
Test Statistic 21.009 16.546
Degrees of freedom 2 2
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.270
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 557.717 416.170
Degrees of freedom 6 6
P-value 0.000 0.000
Scaling correction factor 1.340
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.966 0.965
Tucker-Lewis Index (TLI) 0.897 0.894
Robust Comparative Fit Index (CFI) 0.966
Robust Tucker-Lewis Index (TLI) 0.899
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3050.099 -3050.099
Scaling correction factor 1.055
for the MLR correction
Loglikelihood unrestricted model (H1) -3039.595 -3039.595
Scaling correction factor 1.098
for the MLR correction
Akaike (AIC) 6116.198 6116.198
Bayesian (BIC) 6151.130 6151.130
Sample-size adjusted Bayesian (SABIC) 6125.733 6125.733
Root Mean Square Error of Approximation:
RMSEA 0.128 0.112
90 Percent confidence interval - lower 0.082 0.071
90 Percent confidence interval - upper 0.180 0.158
P-value H_0: RMSEA <= 0.050 0.003 0.008
P-value H_0: RMSEA >= 0.080 0.957 0.904
Robust RMSEA 0.126
90 Percent confidence interval - lower 0.075
90 Percent confidence interval - upper 0.185
P-value H_0: Robust RMSEA <= 0.050 0.009
P-value H_0: Robust RMSEA >= 0.080 0.931
Standardized Root Mean Square Residual:
SRMR 0.043 0.043
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
PN =~
BioinsprtnPN2r 1.000 0.149 0.138
BioinspirtnPN1 -5.625 2.157 -2.608 0.009 -0.837 -0.862
BioinspirtnPN3 -4.042 1.611 -2.509 0.012 -0.601 -0.654
BioinspirtnPN4 -5.168 1.953 -2.646 0.008 -0.769 -0.721
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.BioinsprtnPN2r 1.142 0.053 21.384 0.000 1.142 0.981
.BioinspirtnPN1 0.242 0.049 4.955 0.000 0.242 0.257
.BioinspirtnPN3 0.484 0.039 12.523 0.000 0.484 0.573
.BioinspirtnPN4 0.546 0.052 10.436 0.000 0.546 0.480
PN 0.022 0.017 1.290 0.197 1.000 1.000
4.6 Item Response Theory (polytomous) for “Bioinspiration Items”
Factor Loadings (F1) indicate how strongly each item is associated with the latent trait, rule of thumb:
0.70 = strong
0.40–0.69 = moderate
< 0.40 = weak
Communality (h²) is the proportion of variance in each item explained by the factor, rule of thumb: + h² > 0.40 → item is well represented + h² < 0.30 → potentially problematic item
# Choose dataset and regular expressionregEx <-"^Bioinspiration"# Filter variables matching the patternirt_items <- dat[, str_detect(colnames(dat), pattern = regEx)]# Drop rows with missing data (mirt requires complete cases)irt_items <-na.omit(irt_items)# Ensure all items are treated as ordered factorsirt_items[] <-lapply(irt_items, function(x) as.numeric(as.character(x)))# Fit Graded Response Model (1-factor)mod_grm <-mirt(data = irt_items, model =1, itemtype ="graded", verbose =FALSE)# Summarize modelsummary(mod_grm)
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
# Plots the mean scores on the Bioinspiration & Ecological Dimension by Framing Conditionggplot(dat, aes(x = framingCondition, y = mean_Ecological)) +geom_boxplot(fill ="lightgreen") +labs(title ="Ecological Dimension by Framing", y ="Mean Score") +theme_minimal()
ggplot(dat, aes(x = framingCondition, y = mean_Bioinspiration)) +geom_boxplot(fill ="skyblue") +labs(title ="Bioinspiration by Framing", y ="Mean Score") +theme_minimal()
dat <- dat[!dat$PROLIFIC_PID %in% tmp_removeIDs,]dim(dat)
[1] 582 51
ANOVAs + post hoc tests:
# --- ANOVA for Bioinspiration ---aov_bio <-aov_ez(id ="PROLIFIC_PID", # replace with your participant ID column namedv ="mean_Bioinspiration",data = dat,between ="framingCondition")print(aov_bio)