## global variables:
# only if neededReliability of pre, post measures
Notes
Theory
brief theory of latent variable models
In context of answering survey question the Cognitive Aspects of Survey Methodology (also called „Optimizing-Satisficing-Model“) tries to explain how people finally arrive at a response, which is a great heuristic for identifying possible sources of error (Tourangeau, Rips, and Rasinski 2000,; Moosbrugger and Kelava 2020):
Despite not knowing the exact processes of answering a question (black box), we fundamentally assume that the answer / reporting depends on a score on a latent variable:
the central task of test theory is to determine the relationship between test behaviour and the (psychological) characteristic to be assessed
Latent variable definition: random variables whose realized values are hidden (Bollen 2002; Borsboom 2008)
A possible operationalisation of a latent variable is a linear measurement model with the equation Y = \lambda * \eta + \epsilon.
this corresponds to the fundamental equation of Classical Test Theory: Y = T + \epsilon
\rightarrow models which are dealing with latent variables are called Latent Variable Models (Skrondal and Rabe-Hesketh 2004, 2007), whereby the central aim of latent variable models is to infer unobservable (latent) psychological traits or abilities from responses to test items
theoretical aspects “What is a Latent Variable?”
- the theoretical status of latent variables has not been clarified (Schurig 2017)
- are these variables representations of real entities or just useful inventions?
- Advantage: use of latent variables allow more generalisable reasoning than manifest variables
- latent variables need a substantive scientific foundation, whereby the bridging problem between observed and latent variable must be solved by theoretical assumptions and statistical modelling
centrally by assuming the local independence assumption in latent variable models it is assumed that, once the latent variable (e.g., a psychological trait or ability) is accounted for, the observed variables (such as responses to test items) are statistically independent of each other. This means that any correlation between the observed variables is fully explained by the latent variable, and no further direct relationships exist between the observed variables. In essence, the latent variable “absorbs” the shared variance, allowing the model to “get rid” of any inter-dependencies among the observed variables, simplifying the analysis (Skrondal and Rabe-Hesketh 2007).
Pr(y_j \mid \eta_j) = \prod_{i=1}^{n} Pr(y_{ij} \mid \eta_j)
This assumption do not hold in more complex data sets (e.g., multi-dimensionality, response styles, …) and by more complex models (e.g., the bifactor model) the variance of indicators of single measurment models (CFAs) is divided into common and unique variance:
The variance shared between the indicators is the commonality; the remaining variance is the unique variance, which is divided into indicator-specific method variance (specific) and measurement variance (error).
quality criteria of measurments
measurements, test administration should be carried out taking into account three central quality criteria of tests, which build on each other (no reliable measurement possible without objectivity, etc.):
\text{objectivity} \rightarrow \text{reliability} \rightarrow \text{validity}
- Objectivity: Test score is objective if it is independent of any influences outside the tested person (e.g., situational conditions, experimenter – all exogenous variables whose covariance structure is not explained by the statistical model, including error terms, unobserved influencing variables, and exogenous latent constructs)
- Implementation objectivity (Durchführungsobjektivität): Standardization of implementation conditions (writing a test manual, training test leader, standardization of all other conditions).
- Objectivity of evaluation (Auswertungsobjektivität): The interpretation of the test result is not dependent on the person who evaluates the test (measurable by inter-rater reliability, such as Kendall’s coefficient of concordance).
- Objectivity of interpretation (Interpretationsobjektivität): Different test users come to the same conclusions with identical test scores.
- Reliability: The extent to which a test measures what it is intended to measure. The focus here is on measurement accuracy. Reliability is demonstrated theoretically by the fact that repeated measurements under the same conditions produce the same measurement results (the central contribution to the development of reliability measurement is made by classical test theory, which establishes a theory of measurement error).
- Reliability can be estimated by different methods, often as a measure of internal consistency - Cronbach’s Alpha (a measure of how items in a scale correlate with one another) is commonly used.
The classical test theory assumes that the test performance of a person on the question i is composed of x_{i} = \tau_{i} + \epsilon_{i}. Here \tau_{i} corresponds to the person’s true score on question i, which is composed of an item response x_{i} and the error \epsilon_{i}, where the error is unbiased (if there are systematic aspects in the errors, apply models with can do variance splitting)
- Validity: A test is considered valid if it actually measures the characteristic it is supposed to measure and not some other characteristic. The measurement of validity is done in two steps:
- Via structure-searching (such as exploratory factor analysis) and structure-testing (such as confirmatory factor analysis) procedures, construct validity is determined. This indicates the extent to which conclusions can be drawn from test results, for example, about psychological personality traits.
- The agreement of the results of the individual constructs should be high with constructs that measure the same or similar characteristics (convergent validity), and the agreement with results from constructs that measure other characteristics should be low (discriminant validity). This can be analyzed via (latent) correlations, see also construct validity (Cronbach and Meehl 1955)
Importantly more recently there is also an argument-based approach to validation. To validate an interpretation or use of measurements is to evaluate the rationale, or argument, for the proposed conclusions and decisions … Ultimately, the need for validation derives from the scientific and social requirement that public claims and decisions need to be justified:
- interpretive argument: specifies the proposed interpretations and applications of assessment results by laying out a network of inferences and assumptions leading from the observed performances to the conclusions and decisions based on the assessment scores
- validity argument: provides an evaluation of the interpretive argument’s coherence and the plausibility of its inferences and assumptions
A variety of further quality criteria of indicators were developed by the „Key National Indicators Initiative“ (outdated, ~ 2005):
To reflect on all possible factors influencing the quality of a survey, there is also the concept of the Total Survey Error (see Biemer et al. 2017; Groves and Lyberg 2010).
assumptions 1, 2 parameter logistic models (also called Rasch, Birnbaum model)
Before running any statistical model, normally - if model is sensitive to the violation of a specific assumption - the assumptions of the models are tested. For example, the t-test is robust to violations of non-normality, so pragmatically it is not necessary to test this assumption.
For an important “Item Response Theory” (IRT) model, the Rasch model (1PL), we have the following assumptions (from my master thesis):
- Unidimensionality: The probability of a correct response depends only on a single latent trait and is determined by the model parameters \theta_v, \beta_i. Besides the model parameters, there are no other influencing variables \varphi: P(X_{vi} = 1 \mid \theta_v, \beta_i, \varphi) = (X_{vi} = 1 \mid \theta_v, \beta_i)
- Local stochastic independence: When the person ability \theta_v is held constant at a specific value, the correlation between any possible item pair X_{vi}, X_{vj} in the test disappears (where i \neq j): P(X_{vi} \perp X_{vj} \mid \theta_v), \forall ,, i, j
- Sufficiency of sum scores: The sum scores R_v = \sum_{i=1}^{k} X_{vi} of a test with length k are sufficient for estimating a person’s ability \theta_v. The same applies analogously to the item scores C_i = \sum_{v=1}^{n} X_{vi}
- Monotonicity: The probability of a correct response to an item x_{vi} increases monotonically with higher values of person ability \theta. The more able a person is, the more likely they are to answer an item correctly. This is expressed in the ICC f(x_{vi} \mid \theta_v, \beta_i) as follows: \theta_v > \theta_w: f(x_{vi} \mid \theta_v, \beta_i) > f(x_{wi} \mid \theta_w, \beta_i), \forall ,, \theta_v, \theta_w
In the 2PL model, each item has additionally its own discrimination parameter, allowing some items to be better at differentiating between individuals with slightly different abilities.
If now the 1PL or 2PL is not fitting, this could have multiple reasons:
- Multidimensionality, items are influenced by more than one latent trait
- Poorly Fitting Items or Data, some items behave in an unexpected or erratic manner, it may not exhibit the expected increasing probability of a correct response as ability increases, especially if the item discrimnation parameter \alpha_i is estimated negative (Monotonicity violated)
- Ceiling or Floor Effects, test contains items that are either too easy or too difficult for the population being measured, the probability of a correct response might not vary meaningfully with ability over a range of \theta values.
- …
To solve these problems, normally:
- assumptions of the models are tested (e.g., Exploratory Factor Analysis to test for uni-dimensionality)
- if necessary data is accounted by more complex statistical models (e.g., computing Multidimensional Rasch Models)
in the context of survey data these are models normally from the model family of the “Latent Variable Models” (see above)
Test reliability of pre, post measures
Correction for Attenuation theory
see YouTube video: https://www.youtube.com/watch?v=jJ-qLImQYZs
attenuation formula developed by Charles Spearman, based on “Classical Test Theory”
Corr(\tau_1, \tau_2) = \frac{Corr(Y_1, Y_2)}{\sqrt(Rel(Y_1)* Rel(Y_2))}
Rule of thumb: if reliability of measures Y_1, Y_2 is low, there is shrinkage, which leads to an underestimation of the true correlation
just imagine larger values in the denominator
\sim reliable items:
r12 = .174 # corr in numerator
re1 = .81 # reliability 1 in denominator
re2 = .88 # "
r12 / sqrt(re1 * re2) # plug in in formula[1] 0.206094
non-reliable items:
r12 = .174 # corr in numerator
re1 = .65 # reliability 1 in denominator
re2 = .59 # "
r12 / sqrt(re1 * re2) # plug in in formula[1] 0.2809743
only if items are perfect reliable there is no attenuation:
! not a plausible assumption, see classical test theory, which is basically a measurement theory
r12 = .174 # corr in numerator
re1 = 1 # reliability 1 in denominator
re2 = 1 # "
r12 / sqrt(re1 * re2) # plug in in formula[1] 0.174
so we usually use latent variables
we compute a “Confirmatory Factor Analysis” (CFA) containing the measurement models of the measures Y_1, Y_2, whereby these “latent” correlation is corrected for the un-reliability of the measures
you always get higher correlations, or in context of regression models higher coefficients, thereby reducing Type II errors, …
simulation study
using the following R package: https://cran.r-project.org/web/packages/faux/
if you want to change the corr matrix, just ask ChatGPT: https://chatgpt.com/share/6718a0ff-05f4-8007-b424-4af35980fc49
set.seed(123)
# Load necessary libraries
library(faux)Warning: Paket 'faux' wurde unter R Version 4.3.3 erstellt
************
Welcome to faux. For support and examples visit:
https://debruine.github.io/faux/
- Get and set global package options with: faux_options()
************
library(psych)Warning: Paket 'psych' wurde unter R Version 4.3.3 erstellt
# Set up a correlation matrix
# We have two measurement models with three items each,
# with correlations of 0.7 within each measurement model,
# and correlations of 0.3 between the two models.
cor_matrix <- matrix(c(
1, 0.7, 0.7, 0.3, 0.3, 0.3, # Items A, B, C with D, E, F
0.7, 1, 0.7, 0.3, 0.3, 0.3, # Same structure
0.7, 0.7, 1, 0.3, 0.3, 0.3,
0.3, 0.3, 0.3, 1, 0.7, 0.7,
0.3, 0.3, 0.3, 0.7, 1, 0.7,
0.3, 0.3, 0.3, 0.7, 0.7, 1
), nrow=6, byrow=TRUE)
# Generate data with the specified correlation structure
data <- rnorm_multi(
n = 300,
mu = rep(3, 6), # Set mean of 3 for all items
sd = rep(1, 6), # Set standard deviation of 1 for all items
r = cor_matrix # Use the defined correlation matrix
)
# Assign column names to represent items
colnames(data) <- c("A", "B", "C", "D", "E", "F")
# Check the correlation matrix
# cor(data)
# Visualize the correlation matrix
psych::cor.plot(cor(data))Latent variable approach:
true correlation corrected for un-reliability is around .393
# Load necessary libraries
library(lavaan)Warning: Paket 'lavaan' wurde unter R Version 4.3.3 erstellt
This is lavaan 0.6-17
lavaan is FREE software! Please report any bugs.
Attache Paket: 'lavaan'
Das folgende Objekt ist maskiert 'package:psych':
cor2cov
library(semPlot)
# fit model
myModel <- '
f1 =~ A + B + C
f2 =~ D + E + F
'
fit <- cfa(myModel, data=data)
summary(fit, standardized = TRUE)lavaan 0.6.17 ended normally after 26 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 13
Number of observations 300
Model Test User Model:
Test statistic 7.458
Degrees of freedom 8
P-value (Chi-square) 0.488
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
f1 =~
A 1.000 0.738 0.806
B 0.989 0.074 13.362 0.000 0.731 0.773
C 1.126 0.080 14.005 0.000 0.831 0.841
f2 =~
D 1.000 0.784 0.831
E 1.121 0.070 16.105 0.000 0.879 0.865
F 1.032 0.067 15.402 0.000 0.809 0.814
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
f1 ~~
f2 0.227 0.043 5.304 0.000 0.393 0.393
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.A 0.293 0.037 7.942 0.000 0.293 0.350
.B 0.359 0.040 8.889 0.000 0.359 0.402
.C 0.286 0.042 6.738 0.000 0.286 0.292
.D 0.276 0.034 8.082 0.000 0.276 0.309
.E 0.261 0.038 6.787 0.000 0.261 0.252
.F 0.333 0.039 8.632 0.000 0.333 0.337
f1 0.545 0.070 7.785 0.000 1.000 1.000
f2 0.615 0.074 8.340 0.000 1.000 1.000
semPlot::semPaths(object = fit, what = "std")Manifest variable approach (simple computing mean scores):
not corrected for unreliability of measurments \rightarrow is lower
Y1 <- rowMeans(data[,1:3])
Y2 <- rowMeans(data[,4:6])
cor(Y1, Y2)[1] 0.336648
check reliability of measures:
# average inter-item correlation of first factor:
mean(colMeans(x = cor(data[,1:3])))[1] 0.766947
# compute Cronbachs Alpha
re_factor1 <- psych::alpha(subset(data, select = c(A, B, C)))Number of categories should be increased in order to count frequencies.
re_factor1$total raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
0.8478307 0.8480644 0.7895914 0.6504204 5.581738 0.01516258 2.969903 0.833271
median_r
0.6569992
re_factor2 <- psych::alpha(subset(data, select = c(D, E, F)))Number of categories should be increased in order to count frequencies.
re_factor2$total raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
0.8745568 0.8750114 0.8240772 0.7000219 7.000729 0.01250763 2.979012 0.8823433
median_r
0.7000894
Random stuff
cite literature in Quarto (rmarkdown)
- Blah blah (see Yarkoni and Westfall 2017, 33–35; also Speck et al. 2017, ch. 1).
- Blah blah (Yarkoni and Westfall 2017, 33–35).
- Blah blah (Yarkoni and Westfall 2017; Speck et al. 2017).
- Rutkowski et al. says blah (2017).
- Yarkoni and Westfall (2017) says blah.