A full survey project usually consists of six phases.
Item Generation. Start by generating a list of candidate survey items. With help from SMEs, you evaluate the equivalence (interrater reliability) and content validity of the candidate survey items and pare down the list into the final survey.
Survey Administration. Administer the survey to respondents and perform an exploratory data analysis. Summarize the Likert items with plots and look for correlations among the variables.
Item Reduction. Explore the dimensions of the latent variable in the survey data with parallel analysis and exploratory factor analysis. Assess the internal consistency of the items with Cronbach’s alpha and split-half tests, and remove items that do not add value and/or amend your theory of the number of dimensions.
Confirmatory Factor Analysis. Perform a formal hypothesis test of the theory that emerged from the exploratory factor analysis.
Convergent/Discriminant Validity. Test for convergent and discriminant construct validity.
Replication. Establish test-retest reliability and criterion validity.
Quality surveys should be both reliable (consistent) and valid (accurate).
A survey’s reliability is the extent to which the results can be reproduced when repeated under similar conditions. You assess reliability three ways:
You use the Cohen’s Kappa test of inter-rater reliability to test equivalence in the Item Generation phase (1). You use the Cronbach alpha test and split-half tests to assess internal consistency in the Item Reduction phase (3). You use the test-retest construct to assess stability in the Replication phase (6).
A survey’s Validity is how accurately it measures the latent variable it was intended to measure. A high validity survey produces results that correspond to real characteristics of the social world. Reliability is a necessary, but not sufficient, condition for validity. You assess validity three ways.
You use Lawshe’s CVR to assess content validity in the Item Generation phase (1). You use convergent analysis and discriminant analysis to assess construct validity in the Convergent/Discriminant Validity phase (5). You use concurrent analysis and predictive analysis to assess criterion validity in the Replication phase (6).
After you generate a list of candidate survey items, enlist SMEs to assess their inter-rater reliability with Cohen’s Kappa and content validity with Lawshe’s CVR.
An item has interrater reliability if it produces consistent results across raters. One way to test this is by having SMEs take the survey. Their answers should be close to each other. Conduct an inter-rater reliability test by measuring the statistical significance of SME response agreement using the Kohen’s kappa test statistic.
Suppose your survey measures brand loyalty and two SMEs answer 13 survey items like this (showing first 6):
## RATER_A RATER_B
## 1 1 1
## 2 2 2
## 3 3 2
## 4 2 3
## 5 1 3
## 6 1 1
You could measure SME agreement with a simple correlation matrix (with cor(sme)
) or by measuring the percentage of items they rate identically (with irr::agree(sme)
), but these measures to not tests for statistical validity.
cor(sme)
## RATER_A RATER_B
## RATER_A 1.0000000 0.3291403
## RATER_B 0.3291403 1.0000000
irr::agree(sme)
## Percentage agreement (Tolerance=0)
##
## Subjects = 13
## Raters = 2
## %-agree = 46.2
Instead, calculate the Kohen’s kappa test statistic to assess statistical validity. In general, Cohen’s kappa compares the observed agreement (accuracy) to the probability of chance agreement. \(\kappa\) < 0.4 indicates poor agreement, \(\kappa\) >= 0.4 is moderate, \(\kappa\) >= 0.6 is substantial, and \(\kappa\) >= 0.8 is very strong agreement.
## Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels)
##
## Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries
## lower estimate upper
## unweighted kappa -0.179 0.19 0.55
## weighted kappa -0.099 0.32 0.73
##
## Number of subjects = 13
Use the weighted kappa for ordinal measures like Likert items (see Wikipedia). In this example, \(\kappa\) is only 0.32 (poor agreement).
And item has content validity if SMEs agree on its relevance to the latent variable. Test content validity with Lawshe’s content validity ratio (CVR),
\[CVR = \frac{E - N/2}{N/2}\] where \(N\) is the number of SMEs and \(E\) is the number who rate the item as essential. CVR can range from -1 to 1. E.g., suppose three SMEs (A, B, and C) assess the relevance of 5 survey items as “Not Necessary”, “Useful”, or “Essential”:
## item A B C
## 1 1 Essential Useful Not necesary
## 2 2 Useful Not necesary Useful
## 3 3 Not necesary Not necesary Essential
## 4 4 Essential Useful Essential
## 5 5 Essential Essential Essential
Use the psychometric::CVratio()
function to calculate CVR. The threshold CVR to keep or drop an item depends on the number of raters. CVR should be >= 0.99 for 5 experts; >= 0.49 for 15, and >= 0.29 for 40.
sme2 %>%
pivot_longer(-item, names_to = "expert", values_to = "rating") %>%
group_by(item) %>%
summarize(.groups = "drop",
n_sme = length(unique(expert)),
n_ess = sum(rating == "Essential"),
CVR = psychometric::CVratio(NTOTAL = n_sme, NESSENTIAL = n_ess))
## # A tibble: 5 x 4
## item n_sme n_ess CVR
## <int> <int> <int> <dbl>
## 1 1 3 1 -0.333
## 2 2 3 0 -1
## 3 3 3 1 -0.333
## 4 4 3 2 0.333
## 5 5 3 3 1
In this example, items vary widely in content validity from unanimous consensus for to unanimous consensus against.
The second phase of a survey analysis is to collect the responses and perform an exploratory data analysis to familiarize yourself with the data.
brand_rep
is a brand reputation survey of n = 599 respondents answering nine 5-point Likert-scale items. The responses come in as numeric, and you will want to leave them that way for most analyses.
brand_rep <- read_csv(url("https://assets.datacamp.com/production/repositories/4494/datasets/59b5f2d717ddd647415d8c88aa40af6f89ed24df/brandrep-cleansurvey-extraitem.csv"))
psych::response.frequencies(brand_rep)
## 1 2 3 4 5 miss
## well_made 0.6493739 0.2665474 0.04830054 0.014311270 0.02146691 0
## consistent 0.6779964 0.2343470 0.04651163 0.019677996 0.02146691 0
## poor_workman_r 0.7584973 0.1699463 0.04472272 0.007155635 0.01967800 0
## higher_price 0.3130590 0.2593918 0.18425760 0.119856887 0.12343470 0
## lot_more 0.1484794 0.1735242 0.15563506 0.162790698 0.35957066 0
## go_up 0.1842576 0.1824687 0.19677996 0.177101968 0.25939177 0
## stands_out 0.4275492 0.3041145 0.13953488 0.067978533 0.06082290 0
## unique 0.4579606 0.2933810 0.11985689 0.069767442 0.05903399 0
## one_of_a_kind 0.6225403 0.1949911 0.08944544 0.023255814 0.06976744 0
Summarize Likert response with the likert::likert()
function. This is the one place where you will need the items to be treated as factors.
brand_rep %>%
data.frame() %>% # read_csv() returns a tibble
mutate(across(everything(), as.factor)) %>% # likert() uses factors
likert::likert() %>%
plot() +
labs(title = "Brand Reputation Survey") +
theme(legend.position = "top")
Missing values may mean respondents did not understand the question or did not want to reveal their answer. If <5% of survey responses have no missing values, you can just drop those responses. If missing values are a problem, try the Hmisc::naclus()
to see which items tend to be missing in the same record. This survey is clean.
You will want to identify items that correlate highly with each other, but not highly outside their group. These patterns are the basis of mapping factors to the latent variables. Factors are the concrete survey items; latent variables are the abstract concepts they are intended to supply, like brand loyalty or customer satisfaction. The correlation plot below appears to have 3 groups, plus a stand-alone variable (one_of_a_kind
).
The third phase, explores the mapping of the factors (aka “manifest variables”) to the latent variable’s “dimensions” and refines the survey to exclude factors that do not map to a dimension. A latent variable may have several dimensions. E.g., “brand loyalty” may consist of “brand identification”, “perceived value”, and “brand trust”. Exploratory factor analysis (EFA), identifies the dimensions in the data, and whether any items do not reveal information about the latent variable. EFA establishes the internal reliability, whether similar items produce similar scores.
Start with a parallel analysis and scree plot. This will suggest the number of factors in the data. Use this number as the input to an exploratory factor analysis.
A scree plot is a line plot of the eigenvalues. An eigenvalue is the proportion of variance explained by each factor. Only factors with eigenvalues greater than those from uncorrelated data are useful. You want to find a sharp reduction in the size of the eigenvalues (like a cliff), with the rest of the smaller eigenvalues constituting rubble (scree!). After the eigenvalues drop dramatically in size, additional factors add relatively little to the information already extracted.
Parallel analysis helps to make the interpretation of scree plots more objective. The eigenvalues are plotted along with eigenvalues of simulated variables with population correlations of 0. The number of eigenvalues above the point where the two lines intersect is the suggested number of factors. The rationale for parallel analysis is that useful factors account for more variance than could be expected by chance.
psych::fa.parallel()
compares a scree of your data set to a random data set to identify the number of factors. The elbow below here is at 3 factors.
## Parallel analysis suggests that the number of factors = 3 and the number of components = 2
Use psych::fa()
to perform the factor analysis with your chosen number of factors. The number of factors may be the result of your parallel analysis, or the opinion of the SMEs. In this case, we’ll go with the 3 factors identified by the parallel analysis.
## Loading required namespace: GPArotation
Using EFA, you may tweak the number of factors or drop poorly-loading items. Each item should load highly to one and only one dimension. This one dimension is the item’s primary loading. Generally, a primary loading > .7 is excellent, >.6 is very good, >.5 is good, >.4 is fair, and <.4 is poor. Here are the factor loadings from the 3 factor model.
##
## Loadings:
## MR2 MR1 MR3
## well_made 0.896
## consistent 0.947
## poor_workman_r 0.739
## higher_price 0.127 0.632 0.149
## lot_more 0.850
## go_up 0.896
## stands_out 1.008
## unique 0.896
## one_of_a_kind 0.309 0.295 0.115
##
## MR2 MR1 MR3
## SS loadings 2.361 2.012 1.858
## Proportion Var 0.262 0.224 0.206
## Cumulative Var 0.262 0.486 0.692
The brand-rep survey items load to 3 factors well except for the one_of_a_kind
item. Its primary factor loading (0.309) is poor. The others are either very good (.6-.7) and excellent (>.7) range.
Look at the model eigenvalues. There should be one eigenvalue per dimension. Eigenvalues a little less than one may be contaminating the model.
## [1] 4.35629549 1.75381015 0.98701607 0.67377072 0.39901205 0.35865598 0.23915591
## [8] 0.14238807 0.08989556
Look at the factor score correlations. They should all be around 0.6. Much smaller means they are not describing the same latent variable. Much larger means they are describing the same dimension of the latent variable.
## [,1] [,2] [,3]
## [1,] 1.0000000 0.3147485 0.4778759
## [2,] 0.3147485 1.0000000 0.5428218
## [3,] 0.4778759 0.5428218 1.0000000
If you have a poorly loaded dimension, drop factors one at a time from the scale. one_of_a_kind
loads across all three factors, but does not load strongly onto any one factor. one_of_a_kind
is not clearly measuring any dimension of the latent variable. Drop it and try again.
brand_rep_efa <- psych::fa(brand_rep %>% select(-one_of_a_kind), nfactors = 3)
brand_rep_efa$loadings
##
## Loadings:
## MR2 MR3 MR1
## well_made 0.887
## consistent 0.958
## poor_workman_r 0.735
## higher_price 0.120 0.596 0.170
## lot_more 0.845
## go_up 0.918
## stands_out 0.990
## unique 0.916
##
## MR2 MR3 MR1
## SS loadings 2.261 1.915 1.850
## Proportion Var 0.283 0.239 0.231
## Cumulative Var 0.283 0.522 0.753
## [1] 4.00884429 1.75380537 0.97785883 0.42232527 0.36214534 0.24086772 0.14404264
## [8] 0.09011054
## [,1] [,2] [,3]
## [1,] 1.0000000 0.2978009 0.4802138
## [2,] 0.2978009 1.0000000 0.5351654
## [3,] 0.4802138 0.5351654 1.0000000
This is better. We have three dimensions of brand reputation:
well_made
, consistent
, and poor_workman_r
describe Product Quality,higher_price
, lot_more
, and go_up
describe Willingness to Pay, andstands_out
and unique
describe Product DifferentiationEven if the data and your theory suggest otherwise, explore what happens when you include more or fewer factors in your EFA.
##
## Loadings:
## MR1 MR2
## well_made 0.884
## consistent 0.932
## poor_workman_r 0.728
## higher_price 0.745
## lot_more 0.784 -0.131
## go_up 0.855 -0.103
## stands_out 0.591 0.286
## unique 0.540 0.307
## one_of_a_kind 0.378 0.305
##
## MR1 MR2
## SS loadings 2.688 2.485
## Proportion Var 0.299 0.276
## Cumulative Var 0.299 0.575
##
## Loadings:
## MR2 MR1 MR3 MR4
## well_made 0.879
## consistent 0.964
## poor_workman_r 0.732
## higher_price 0.552 0.150 0.168
## lot_more 0.835
## go_up 0.929
## stands_out 0.976
## unique 0.932
## one_of_a_kind 0.999
##
## MR2 MR1 MR3 MR4
## SS loadings 2.244 1.866 1.846 1.029
## Proportion Var 0.249 0.207 0.205 0.114
## Cumulative Var 0.249 0.457 0.662 0.776
The two-factor loading worked okay. The 4 factor loading only loaded one variable to the fourth factor. In this example the SME expected a three-factor model and the data did not contradict the theory, so stick with three.
Whereas the item generation phase tested for item equivalence, the EFA phase tests for internal reliability (consistency) of items. Internal reliability means the survey produces consistent results. The more common statistics for assessing internal reliability are Cronbach’s Alpha, and split-half.
In general, an alpha <.6 is unacceptable, <.65 is undesirable, <.7 is minimally acceptable, <.8 is respectable, <.9 is very good, and >=.9 suggests items are too alike. A very low alpha means items may not be measuring the same construct, so you should drop items. A very high alpha means items are multicollinear, and you should drop items. Here is Cronbach’s alpha for the brand reputation survey, after removing the poorly-loading one_of_a_kind
variable.
## [1] 0.8557356
This value is in the “very good” range. Cronbach’s alpha is often used to measure the reliability of a single dimension. Here are the values for the 3 dimensions.
psych::alpha(brand_rep[, 1:3])$total$std # Product Quality
## [1] 0.8918025
psych::alpha(brand_rep[, 4:6])$total$std # Willingness to Pay
## [1] 0.8517566
psych::alpha(brand_rep[, 7:8])$total$std # Product Differentiation
## [1] 0.951472
Alpha is >0.7 for each dimension. Sometimes the alpha for our survey as a whole is greater than that of the dimensions. This can happen because Cronbach’s alpha is sensitive to the number of items. Over-inflation of the alpha statistic can be a concern when working with surveys containing a large number of items.
Use psych::splitHalf()
to split the survey in half and test whether all parts of the survey contribute equally to measurement. This method is much less popular than Cronbach’s alpha.
## Split half reliabilities
## Call: psych::splitHalf(r = brand_rep[, 1:8])
##
## Maximum split half reliability (lambda 4) = 0.93
## Guttman lambda 6 = 0.92
## Average split half reliability = 0.86
## Guttman lambda 3 (alpha) = 0.86
## Guttman lambda 2 = 0.87
## Minimum split half reliability (beta) = 0.66
## Average interitem r = 0.43 with median = 0.4
Whereas EFA is used to develop a theory of the number of factors needed to explain the relationships among the survey items, confirmatory factor analysis (CFA) is a formal hypothesis test of the EFA theory. CFA measures construct validity, that is, whether you are really measuring what you claim to measure.
These notes explain how to use CFA, but do not explain the theory. For that you need to learn about dimensionality reduction, and structural equation modeling.
Use the lavaan package (latent variable analysis package), passing in the model definition. Here is the model for the three dimensions in the brand reputation survey. Lavaan’s default estimator is maximum likelihood, which assumes normality. You can change it to MLR which uses robust standard errors to mitigate non-normality. The summary prints a ton of output. Concentrate on the lambda
- the factor loadings.
brand_rep_mdl <- paste(
"PrdQl =~ well_made + consistent + poor_workman_r",
"WillPay =~ higher_price + lot_more + go_up",
"PrdDff =~ stands_out + unique",
sep = "\n"
)
brand_rep_cfa <- lavaan::cfa(model = brand_rep_mdl, data = brand_rep[, 1:8], estimator = "MLR")
# lavaan::summary(brand_rep_cfa, fit.measures = TRUE, standardized = TRUE)
semPlot::semPaths(brand_rep_cfa, rotation = 4)
## Registered S3 method overwritten by 'kutils':
## method from
## print.likert likert
## Registered S3 methods overwritten by 'huge':
## method from
## plot.sim BDgraph
## print.sim BDgraph
## PrdQl WillPy PrdDff
## well_made 0.914 0.000 0.000
## consistent 0.932 0.000 0.000
## poor_workman_r 0.730 0.000 0.000
## higher_price 0.000 0.733 0.000
## lot_more 0.000 0.812 0.000
## go_up 0.000 0.902 0.000
## stands_out 0.000 0.000 0.976
## unique 0.000 0.000 0.930
The CFA hypothesis test is a chi-square test, so is sensitive to normality assumptions and n-size. Other fit measure are reported too: * Comparative Fit Index (CFI) (look for value >.9) * Tucker-Lewis Index (TLI) (look for value >.9) * Root mean squared Error of Approximation (RMSEA) (look for value <.05)
There are actually 71 fit measures to choose from! Focus on CFI and TLI.
## cfi tli
## 0.980 0.967
This output indicates a good model because both measures are >.9. Check the standardized estimates for each item. The standardized factor loadings are the basis of establishing construct validity. While we call these measures ‘loadings,’ they are better described as correlations of each manifest item with the dimensions. As you calculated, the difference between a perfect correlation and the observed is considered ‘error.’ This relationship between the so-called ‘true’ and ‘observed’ scores is the basis of classical test theory.
lavaan::standardizedSolution(brand_rep_cfa) %>%
filter(op == "=~") %>%
select(lhs, rhs, est.std, pvalue)
## lhs rhs est.std pvalue
## 1 PrdQl well_made 0.914 0
## 2 PrdQl consistent 0.932 0
## 3 PrdQl poor_workman_r 0.730 0
## 4 WillPay higher_price 0.733 0
## 5 WillPay lot_more 0.812 0
## 6 WillPay go_up 0.902 0
## 7 PrdDff stands_out 0.976 0
## 8 PrdDff unique 0.930 0
If you have a survey that meets your assumptions, performs well under EFA, but fails under CFA, return to your survey and revisit your scale, examine the CFA modification indices, factor variances, etc.
Construct validity means the survey measures what it intends to measure. It is composed of convergent validity and discriminant validity. Convergent validity means factors address the same concept. Discriminant validity means factors address different aspects of the concept.
Test for construct validity after assessing CFA model strength (with CFI, TFI, and RMSEA) – a poor-fitting model may have greater construct validity than a better-fitting model. Use the semTools::reliability()
function. The average variance extracted (AVE) measures convergent validity (avevar
) and should be > .5. The composite reliability (CR) measures discriminant validity (omega
) and should be > .7.
## PrdQl WillPay PrdDff
## alpha 0.8926349 0.8521873 0.9514719
## omega 0.9017514 0.8605751 0.9520117
## omega2 0.9017514 0.8605751 0.9520117
## omega3 0.9019287 0.8646975 0.9520114
## avevar 0.7573092 0.6756174 0.9084654
These values look good for all three dimensions. As an aside, alpha
is Cronbach’s alpha. Do not be tempted to test reliability and validity in the same step. Start with reliability because it is a necessary but insufficient condition for validity. By checking for internal consistency first, as measured by alpha, then construct validity, as measured by AVE and CR, you establish the necessary reliability of the scale as a whole was met, then took it to the next level by checking for construct validity among the unique dimensions.
At this point you have established that the latent and manifest variables are related as hypothesized, and that the survey measures what you intended to measure, in this case, brand reputation.
The replication phase establishes criterion validity and stability (reliability). Criterion validity is a measure of the relationship between the construct and some external measure of interest. Measure criterion validity with concurrent validity, how well items correlate with an external metric measured at the same time, and with predictive validity, how well an item predicts an external metric. Stability means the survey produces similar results over repeated test-retest administrations.
Concurrent validity is a measure of whether our latent construct is significantly correlated to some outcome measured at the same time.
Suppose you have an additional data set of consumer spending on the brand. The consumer’s perception of the brand should correlate with their spending. Before checking for concurrent validity, standardize the data so that likert and other variable types are on the same scale.
set.seed(20201004)
brand_rep <- brand_rep %>%
mutate(spend = ((well_made + consistent + poor_workman_r)/3 * 5 +
(higher_price + lot_more + go_up)/3 * 3 +
(stands_out + unique)/2 * 2) / 10)
brand_rep$spend <- brand_rep$spend + rnorm(559, 5, 4) # add randomness
brand_rep_scaled <- scale(brand_rep)
Do respondents with higher scores on our the brand reputation scale also tend to spend more at the store? Build model, and latentize spend
as Spndng
and model with the ~~
operator. Fit the model with the semTools::sem()
function.
brand_rep_cv_mdl <- paste(
"PrdQl =~ well_made + consistent + poor_workman_r",
"WillPay =~ higher_price + lot_more + go_up",
"PrdDff =~ stands_out + unique",
"Spndng =~ spend",
"Spndng ~~ PrdQl + WillPay + PrdDff",
sep = "\n"
)
brand_rep_cv <- lavaan::sem(data = brand_rep_scaled, model = brand_rep_cv_mdl)
Here are the standardized covariances. Because the data is standardized, interpret these as correlations. The p-vales are not significant because the spending data was random.
## lhs est.std se z pvalue ci.lower ci.upper
## 1 PrdQl 0.174 0.043 4.092 0 0.091 0.258
## 2 WillPay 0.241 0.043 5.654 0 0.157 0.324
## 3 PrdDff 0.198 0.041 4.779 0 0.117 0.279
## 4 Spndng 1.000 0.000 NA NA 1.000 1.000
Each dimension of brand reputation is positively correlated to spending history and the relationships are all significant.
Predictive validity is established by regressing some future outcome on your established construct. Assess predictive validity just as you would with any linear regression – regression estimates and p-values (starndardizedSolution()
), and the r-squared coefficient of determination inspect()
.
Build a regression model with the single ~
operator. Then fit the model to the data as before.
brand_rep_pv_mdl <- paste(
"PrdQl =~ well_made + consistent + poor_workman_r",
"WillPay =~ higher_price + lot_more + go_up",
"PrdDff =~ stands_out + unique",
"spend ~ PrdQl + WillPay + PrdDff",
sep = "\n"
)
brand_rep_pv <- lavaan::sem(data = brand_rep_scaled, model = brand_rep_pv_mdl)
#lavaan::summary(brand_rep_pv, standardized = T, fit.measures = T, rsquare = T)
semPlot::semPaths(brand_rep_pv, whatLabels = "est.std", edge.label.cex = .8, rotation = 2)
lavaan::standardizedSolution(brand_rep_pv) %>%
filter(op == "~") %>%
mutate_if(is.numeric, round, digits = 3)
## lhs op rhs est.std se z pvalue ci.lower ci.upper
## 1 spend ~ PrdQl 0.094 0.048 1.949 0.051 -0.001 0.189
## 2 spend ~ WillPay 0.182 0.052 3.486 0.000 0.080 0.284
## 3 spend ~ PrdDff 0.060 0.055 1.092 0.275 -0.047 0.167
## well_made consistent poor_workman_r higher_price lot_more
## 0.837 0.867 0.533 0.537 0.658
## go_up stands_out unique spend
## 0.816 0.951 0.866 0.072
There is a statistically significant relationship between one dimension of brand quality (Willingness to Pay) and spending. At this point you may want to drop the other two dimensions. However, the R^2 is not good - only 7% of the variability in Spending can be explained by the three dimension of our construct.
Factor scores represent individual respondents’ standing on a latent factor. While not used for scale validation per se, factor scores can be used for customer segmentation via clustering, network analysis and other statistical techniques.
brand_rep_cfa <- lavaan::cfa(brand_rep_pv_mdl, data = brand_rep_scaled)
brand_rep_cfa_scores <- lavaan::predict(brand_rep_cfa) %>% as.data.frame()
psych::describe(brand_rep_cfa_scores)
## vars n mean sd median trimmed mad min max range skew kurtosis
## PrdQl 1 559 0 0.88 -0.50 -0.19 0.09 -0.57 4.02 4.59 2.31 6.18
## WillPay 2 559 0 0.69 0.00 0.01 0.86 -1.15 1.16 2.31 -0.05 -1.18
## PrdDff 3 559 0 0.96 -0.04 -0.15 1.17 -0.88 2.54 3.41 1.09 0.35
## se
## PrdQl 0.04
## WillPay 0.03
## PrdDff 0.04
## $PrdQl
##
## Shapiro-Wilk normality test
##
## data: .x[[i]]
## W = 0.66986, p-value < 2.2e-16
##
##
## $WillPay
##
## Shapiro-Wilk normality test
##
## data: .x[[i]]
## W = 0.94986, p-value = 7.811e-13
##
##
## $PrdDff
##
## Shapiro-Wilk normality test
##
## data: .x[[i]]
## W = 0.82818, p-value < 2.2e-16
These scores are not normally distributed, which makes clustering a great choice for modeling factor scores. Clustering does not mean distance-based clustering, such as K-means, in this context. Mixture models consider data as coming from a distribution which itself is a mixture of clusters. To learn more about model-based clustering in the Hierarchical and Mixed Effects Models DataCamp course.
Factor scores can be extracted from a structural equation model and used as inputs in other models. For example, you can use the factor scores from the brand reputation dimensions as regressors for a regrssion on spending.
brand_rep_fs_reg_dat <- bind_cols(brand_rep_cfa_scores, spend = brand_rep$spend)
brand_rep_fs_reg <- lm(spend ~ PrdQl + WillPay + PrdDff, data = brand_rep_fs_reg_dat)
summary(brand_rep_fs_reg)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.1555354 0.1591195 44.9695738 3.363705e-187
## PrdQl 0.4260875 0.2062002 2.0663774 3.925620e-02
## WillPay 1.1365087 0.2805960 4.0503388 5.842799e-05
## PrdDff 0.1714031 0.2181813 0.7855993 4.324375e-01
The coefficients and r-squared of the lm() and sem() models closely resemble each other, but keeping the regression inside the lavaan framework provides more information (as witnessed in the higher estimates and r-squared). A construct, once validated, can be combined with a wide range of outcomes and models to produce valuable information about consumer behavior and habits.
Test-retest reliability is the ability to achieve the same result from a respondent at two closely-spaced points in time (repeated measures).
Suppose you had two surveys, identified by an id
field.
svy_1 <- brand_rep[sample(1:559, 300),] %>% as.data.frame()
svy_2 <- brand_rep[sample(1:559, 300),] %>% as.data.frame()
survey_test_retest <- psych::testRetest(t1 = svy_1, t2 = svy_2, id = "id")
## boundary (singular) fit: see ?isSingular
## [1] -0.08346322
An r^2 <.7 is unacceptable, <.9 good, and >.9 very good. This one is unacceptable.
One way to check for replication is by splitting the data in half.
##
## Descriptive statistics by group
## group: 1
## vars n mean sd median trimmed mad min max range skew
## time* 1 300 1.00 0.00 1 1.00 0.00 1.00 1.00 0.00 NaN
## well_made 2 300 1.50 0.82 1 1.33 0.00 1.00 5.00 4.00 2.26
## consistent 3 300 1.50 0.87 1 1.30 0.00 1.00 5.00 4.00 2.19
## poor_workman_r 4 300 1.37 0.80 1 1.18 0.00 1.00 5.00 4.00 2.68
## higher_price 5 300 2.63 1.44 2 2.53 1.48 1.00 5.00 4.00 0.41
## lot_more 6 300 3.43 1.47 4 3.54 1.48 1.00 5.00 4.00 -0.34
## go_up 7 300 3.21 1.43 3 3.26 1.48 1.00 5.00 4.00 -0.20
## stands_out 8 300 2.12 1.21 2 1.94 1.48 1.00 5.00 4.00 0.97
## unique 9 300 2.09 1.22 2 1.89 1.48 1.00 5.00 4.00 1.03
## one_of_a_kind 10 300 1.82 1.25 1 1.54 0.00 1.00 5.00 4.00 1.49
## spend 11 300 7.13 4.04 7 7.10 3.94 -5.59 18.92 24.51 0.02
## kurtosis se
## time* NaN 0.00
## well_made 6.08 0.05
## consistent 5.11 0.05
## poor_workman_r 7.76 0.05
## higher_price -1.17 0.08
## lot_more -1.34 0.09
## go_up -1.28 0.08
## stands_out 0.04 0.07
## unique 0.10 0.07
## one_of_a_kind 1.07 0.07
## spend -0.09 0.23
## ------------------------------------------------------------
## group: 2
## vars n mean sd median trimmed mad min max range skew
## time* 1 300 1.00 0.00 1.00 1.00 0.00 1.00 1.00 0.00 NaN
## well_made 2 300 1.52 0.89 1.00 1.32 0.00 1.00 5.00 4.00 2.21
## consistent 3 300 1.48 0.85 1.00 1.28 0.00 1.00 5.00 4.00 2.29
## poor_workman_r 4 300 1.39 0.81 1.00 1.20 0.00 1.00 5.00 4.00 2.60
## higher_price 5 300 2.46 1.36 2.00 2.33 1.48 1.00 5.00 4.00 0.58
## lot_more 6 300 3.34 1.50 3.00 3.42 2.97 1.00 5.00 4.00 -0.22
## go_up 7 300 3.05 1.45 3.00 3.07 1.48 1.00 5.00 4.00 -0.03
## stands_out 8 300 1.98 1.16 2.00 1.77 1.48 1.00 5.00 4.00 1.15
## unique 9 300 1.92 1.15 2.00 1.71 1.48 1.00 5.00 4.00 1.22
## one_of_a_kind 10 300 1.73 1.21 1.00 1.44 0.00 1.00 5.00 4.00 1.68
## spend 11 300 7.14 3.77 7.05 7.08 3.89 -1.96 16.99 18.94 0.11
## kurtosis se
## time* NaN 0.00
## well_made 5.26 0.05
## consistent 5.70 0.05
## poor_workman_r 7.40 0.05
## higher_price -0.85 0.08
## lot_more -1.43 0.09
## go_up -1.34 0.08
## stands_out 0.47 0.07
## unique 0.60 0.07
## one_of_a_kind 1.73 0.07
## spend -0.37 0.22
brand_rep_test_retest <- psych::testRetest(
t1 = filter(svy, time == 1),
t2 = filter(svy, time == 2),
id = "id")
## boundary (singular) fit: see ?isSingular
## [1] -0.08346322
If the correlation of scaled scores across time 1 and time 2 is greater than .9, that indicates very strong test-retest reliability. This measure can be difficult to collect because it requires the same respondents to answer the survey at two points in time. However, it’s a good technique to have in your survey development toolkit.
When validating a scale, it’s a good idea to split the survey results into two samples, using one for EFA and one for CFA. This works as a sort of cross-validation such that the overall fit of the model is less likely due to chance of any one sample’s makeup.
brand_rep_efa_data <- brand_rep[1:280,]
brand_rep_cfa_data <- brand_rep[281:559,]
efa <- psych::fa(brand_rep_efa_data, nfactors = 3)
efa$loadings
##
## Loadings:
## MR1 MR2 MR3
## well_made 0.879
## consistent 0.931
## poor_workman_r 0.795
## higher_price 0.139 0.640 0.137
## lot_more 0.868
## go_up 0.893
## stands_out 0.980
## unique 0.942
## one_of_a_kind 0.360 0.279 0.132
## spend 0.107 0.223
##
## MR1 MR2 MR3
## SS loadings 2.433 2.091 1.896
## Proportion Var 0.243 0.209 0.190
## Cumulative Var 0.243 0.452 0.642
brand_rep_cfa <- lavaan::cfa(brand_rep_mdl, data = brand_rep_cfa_data)
lavaan::inspect(brand_rep_cfa, what = "call")
## [[1]]
## lavaan::lavaan
##
## $model
## brand_rep_mdl
##
## $data
## brand_rep_cfa_data
##
## $model.type
## [1] "cfa"
##
## $int.ov.free
## [1] TRUE
##
## $int.lv.free
## [1] FALSE
##
## $auto.fix.first
## [1] TRUE
##
## $auto.fix.single
## [1] TRUE
##
## $auto.var
## [1] TRUE
##
## $auto.cov.lv.x
## [1] TRUE
##
## $auto.cov.y
## [1] TRUE
##
## $auto.th
## [1] TRUE
##
## $auto.delta
## [1] TRUE
##
## $auto.efa
## [1] TRUE
## cfi tli rmsea
## 0.97503646 0.95888359 0.09006769