General Factor Modeling

  • GFM in Research on Scientific Thinking
    • Review
    • Simulations
    • Discussion
  • GFM in Broader Psychology
    • Intelligence
    • Personality
    • Clinical
  • GFM Issues
    • Statistical
    • Theoretical
    • Indeterminacy

General Factor Modeling

a general factor (GF) is assumed to influence all observed variables”. “a general second-order factor (GF) having an influence on all first-order factors”. (Eid, Heene, & , 2016)

General factor GF: A latent random variable that directly or indirectly predicts all observed variables that belong to the represented construct.

General Factor Modeling

  • Latent variable (Borsboom, 2008):
    • Inference from data structure to variable structure prone to error (epistemic accessability).
  • Observed variable:
    • relation between variable structure and data structure deterministic, causally isolated, and of equivalent cardinality.
  • Psychological variables: Latent.

General Factor Modeling

GFM in Research on Scientific Thinking

  • GFM in Research on Scientific Thinking
    • Review
    • Simulations
    • Discussion

Scientific Thinking

  • Hypothesis generation, evidence generation, evidence evaluation, drawing conclusions (Mayer, 2007).
  • Processes for intentional knowledge seeking and coordination of theory and evidence (Mayer et al., 2014).

Scientific Thinking

  • Long interest in education, development, assessment (Blair, 1940; Nerring, 1918; Piaget & Inhelder, 1958).

  • Two waves of test development (Opitz, Fischer, & Heene, submitted)
    • wave I: 1970-1990.
    • wave II: since 2000s. Following PISA etc: Rasch Model Application. GFM.

Research Questions

  1. Practices in the application of the Rasch Model.

  2. Practices in the interpretation of the Rasch Model.

The Rasch Model

\(p(x_{pi})=\frac{exp(x_{pi}(\theta_p-\sigma_i))}{{1+exp(\theta_p-\sigma_i)}}\)

person ability \(\theta_p\), item difficulty \(\sigma_i\)

  • Why is this GFM?
    • Random Effect: All systematic answer differences depend on a single person characteristic.
    • GLMM: Crossed random effects linear model with logistic link function.

The Rasch Model

\(p(x_{pi})=\frac{exp(x_{pi}(\theta_p-\sigma_i))}{{1+exp(\theta_p-\sigma_i)}}\)

person ability \(\theta_p\), item difficulty \(\sigma_i\)

  • Why Rasch?
    • Sufficient statistics: All information is in the sum score.
    • Specific objectivity: Same information for any persons and items.
    • Invariant measurement: Linking of items and tests in large-scale and longitudinal studies.
    • 2pl: No measurement and exchangeability; everyone wants Rasch!

The Rasch Model

The Rasch Model

The Rasch Model

The Rasch Model: Strong Assumptions

  • Unidimensionality (GFM)
    • One coherent psychological trait
  • Local (stochastic) independence
    • No additional systematic influences on answers
  • Homogenous item discriminations
    • Parallel Item Characteristic Curves (equal factor loadings)

Practices in the Field: Rasch Application

Reference Infit Criterion Reliability lrt irem Software
Mayer at al. (2014) x - EAP/PV - - ConQuest
Koerber et al. (2014) x 0.85-1.15 (-) EAP/PV - x ConQuest
Hartmann et al. (2015) x - EAP/PV - x ConQuest
Nowak et al. (2013) x 0.8-1.2 (Adams, 2002) EAP/PV x x ConQuest
Grube (2010) x 0.8-1.2 (Adams, 2000) EAP/PV x x ConQuest
Heene (2007) x 0.8-1.2 (Wright, 2000) PSR/ISR - x ConQuest, WS, FC, WM
Brown et al. (2010) x - PSR - - ConQuest

Practices in the Field: Interpretation

Reference Theoretical models Fitted models Best fit Reliability Itemfit
Mayer at al. (2014) 4D 1D na 1D 1D
Koerber et al. (2014) 1D, 5D 1D na 1D 1D
Hartmann et al. (2015) 1D 1D na 1D 1D
Nowak et al. (2013) 1D, 3D 1D, 3D 3D 1D 1D
Grube (2010) 4D 1D, 4D 4D 1D 1D
Heene (2007) 1D 1D na 1D 1D
Brown et al. (2010) 1D 1D na 1D 1D

Practices in the Field: Rasch Application

  • 7 instruments based on Rasch model, mostly since 2010 (Review: Opitz, Fischer, & Heene, submitted)
  • Typical approach is to fit unidimensional Rasch model in ConQuest software, remove items based on infit statistics

Itemfit

  • Based on the individual response residuals
    \[ \begin{align} R_{pi} &= X_{pi} - E_{pi}\\ Z_{pi}^2 &= \frac{R_{pi}^2}{VAR(X_{pi})}\\ \end{align} \]
  • \[ \begin{equation} INFIT_i = \frac{\sum^n_{p=1}R^2_{pi}}{\sum^n_{p=1}VAR(X_{pi})} \end{equation} \]
  • Degree of item-level deviation from expected scores given the Rasch model

Itemfit: Simulations

  • Dichotomous data

Itemfit: Simulations

  • Sim1: Unequal item discriminations
    • \(d_i \sim \text{lognormal}(0, \sigma)\)
    • \(\sigma\) either .1, .3, .5
  • Sim2: Typical theoretical model
    • 3-5 correlated facets of Scientific Reasoning assumed
    • Simulated factor structure:
    • \(\left( \begin{array}{ccc} 1.0 & 0.6 & 0.5 & 0.4\\ 0.6 & 1.0 & 0.7 & 0.4\\ 0.5 & 0.7 & 1.0 & 0.3\\ 0.4 & 0.4 & 0.3 & 1.0 \end{array} \right)\)
    • How do infit statistics respond to such data?

Itemfit: Simulations

  • Item infit statistics do not test the assumptions of the Rasch model, but its predictions, given that it holds. A few items have to be removed. Low reliability, similar to reviewed articles (.5 - .6).

Practices in the Field: Interpretation

  • Conclusions in reviewed articles:
    • scale measures unitary construct.
    • support for unidimensional theoretical model.
    • instrument possesses adequate reliability and validity.

Discussion

Discussion

  • Inconsistent assumptions, models, testing, and inferences?
    • Different paradigms (Andersen, 2010; Linacre, 2010)
  • Two traditional perspectives on Rasch: Utility vs. hypothesis testing (cf. Linacre, 2010).
  • Researchers apply practices of the utility/Rasch school. Potential reasons?
    • Following large scale programs
    • Limiting software (ConQuest)
    • Lack of knowledge about Rasch foundations and model tests

School 1: Rasch-Paradigm

  • Testing fit of data to model.
    • Rasch makes measurement from psychological data.
    • Sufficient fit: Valid measurement.
    • Choose your grain size.
  • Itemfit strongly suggested, to test affordances/predictions of the model (Wu, Wilson, Wang, Boone)
    • Points towards measurement problems.
    • Central criterion for fit of data.

School 2: IRT-Paradigm

  • Testing of fit of model to the data.
    • Rasch tests whether measurement took place.
    • Absolute fit: Valid measurement.
    • Examine the grain size.
  • Itemfit strongly critisized (Christensen & Kreiner, 2013; Heene et al., 2014; Smith, 1995; Smith, 1996; Smith et al., 1998)
    • Underpowered.
    • Apply only after inspection of global model fit.
    • Item removal can decrease construct validity.
    • Itemfit statistics are in general no adequate means for testing the Rasch model.
    • …if Rasch does not fit:

School 2: IRT-Paradigm

  • Testing the Rasch model
    • Andersen test (Person homogeneity; Andersen, 1973)
    • Martin Löf test (Item homogeneity; Christensen et al., 2002)
    • Recursive Partitioning (continuous DIF - Rasch Trees; Strobl et al., 2013)
    • M2 statistic (Maydeu-Olivares, 2013)
    • nonparametric Statistics (t1, t10; Koller et al., 2015)
    • posterior predictive checking (Bayesianisch; Fox, 2010; Sinharay, 2006)
    • SEM-like statistics (RMSEA, CFI, SRMR; Maydeu-Olivares, 2014)
    • ! Battle-testing !

School 2: IRT-Paradigm

  • Match of theoretical and statistical model
    • IRT: Multidimensional Rasch model, 2PL, Mixtures (Fischer & Molenaar, 1995)
    • Nonparametric IRT: Mokken Scaling (van der Graaf et al., 2015)
    • Applied IRT: cognitive diagnosis modeling (de la Torre & Minchen, 2014)
    • R-Software

School 1: Rasch-Paradigm

School 1: Rasch-Paradigm

School 1: Rasch-Paradigm

School 1: Rasch-Paradigm

Discussion

  • Psychological: What is ST?
    • Dynamics/Intercorrelations, or a psychological trait?
  • Pedagogical: How to intervene?
    • Differential validities?

Discussion

  • Acknowledgement and modeling of complexity
    • Network modeling (Schmittman et al., 2013)
    • Longitudinal and experimental designs
  • Acknowledge and modeling of uncertainty
    • Estimate and different models
    • Technical appendices
  • Systematic research into Rasch-schools.
    • Empirical practices?
    • Consequences on model testing?
    • American school (IRT/2PL), European school (Rasch)?
    • Vienna school (Hypothesis testing)?

GFM in Broader Psychology

  • GFM in Broader Psychology
    • Intelligence
    • Personality
    • Clinical

Intelligence

Charles Spearman (1904)

Intelligence

Any alternative after 100 years?

Intelligence

  • Thomson (1916): Bonds:
    • elements of activity at the lower level are entirely specific, but those at the higher level are such that they may come into play in different activities. Any activity is a sample of these elements.
  • van der Maas et al., 2006: Mutualism.
  • Consensus still on G (Stern & Neubauer, 2016).

Personality

Musek (2007)

Heavily criticized (e.g., Prinz, 2014)

Clinical

Clinical

Clinical

  • "The p factor explains why it is challenging to find causes, consequences, biomarkers, and treatments with specificity to individual mental disorders"
  • "…researchers should no longer assume a specific relation between the disorder they study and a biomarker/ cause/consequence/treatment without empirical verification. Rather, our finding suggests the default assumption must be that biomarkers/causes/consequences/ treatments relate first to p." (Caspi et al., 2014, p. 137).

GFM in broader Psychology

  • Else? (Musek, 2007).
  • emotionality (Diener, Smith, & Fujita, 1995; Larsen & Diener, 1992; Tellegen, 1985; Watson & Clark, 1993)
  • motivation (Cattell, 1957; Cattell, Radcliffe, & Sweney, 1963; Elliot & Thrash, 2002)
  • self-concept (Marsh, 1990; Marsh & Shavelson, 1985; Marsh, Byrne, & Shavelson, 1988)
  • values.

GFM Issues

  • GFM Issues
    • Statistical
    • Theoretical
    • Indeterminacy

Statistical Issues

  • Model equivalence (MacCallum, 2000; Raykov & Penev, 1999; Raykov & Marcoulides, 2001).

  • Bias (Murray & Johnson, 2013).

  • Sampling (Eid & Koch, 2014; Eid, Geiser, Koch, & Heene, 2016).

Statistical Issues: Model equivalence

  • Model equivalence (MacCallum, 2000; Raykov & Penev, 1999; Raykov & Marcoulides, 2001). We compare only few models. Confirmation bias (MacCallum, 2000)
  • "An especially interesting phenomenon in the context of alternative models is the existence of equivalent models, which are alternative models that fit any data to the same degree. Such models can be distinguished only in terms of substantive meaning." (MacCallum, 2000)

Statistical Issues: Model equivalence

  • Raykov & Marcoulides, 2001:
  • Good model fit does not indicate the correctness of a model. (See also Roberts & Pashler, 2000)

Statistical Issues: Unmodelled Complexity

  • Bias (Murray & Johnson, 2013). Bi-factor GFM is biased in model comparisons due to unmodelled complexity.

Statistical Issues: Hierarchical sampling

  • Sampling (Eid & Koch, 2014; Eid, Geiser, Koch, & Heene, 2016). "domains have to be randomly chosen from a set of possible domains. they are considered levels of a random GT-facet"

Theoretical Issues

  • Quantity (Michell, 1997, 2008)

  • Interindividual vs. intra-individual structure (Borsboom, 2005; Molenaar, 2004, 2008; Molenaar & Campbell, 2009)

Theoretical Issues: Quantity

"a discrete quantitative difference need not be caused by a quantitative factor at all, let alone one that is a continuous quantity." (Michell, 2013)

"when the ideological support structures of a science sustain serious blind spots like this, then that science is in the grip of some kind of thought disorder." (Michell, 1997)

Scientific measurement? See also Inventing Temperature (Chang, 2007)

there is no single piece of evidence more important to a construct's definition than the causal relationship between the construct and its indicators

Theoretical Issues

  • Interindividual vs. intra-individual structure (Borsboom, 2005; Molenaar, 2004, 2008; Molenaar & Campbell, 2009)
  • We are used to modelling inter-individual variance for 110 years.
  • What between-subjects latent variables models do is specify sources of between-subjects differences, but because they are silent with respect to the question of how individual scores are produced, they cannot be interpreted as posing intelligence as a causal force within Einstein. (Borsboom, 2003)
  • variations in height have demonstrably the same effect within and between subjects. But it remains to be shown that the same holds true for psychological variables like intelligence. (See Borkenau & Ostendorf, 1998: Big Five)

Model Comparison and Indeterminacy

Argument 1: Statistically, a General Factor represents covariances. It is a behaviorist construct based on logical positivism and therefore formative. Thus, it cannot inform cognitivist theories, which the assumption of the "existence" of something going beyond covariance dynamics is.

GFM is applied behaviorist

GFM is interpreted cognitivist

Model Comparison and Indeterminacy

Argument 2: Correlational data cannot be used to test the assumption of a reflective latent construct because it is a causal assumption. Thus, GFM cannot be used to test the assumption of a GF. This needs experimental or longitudinal data. This renders degrees of freedom meaningless.

GFM is applied correlationally.

GFM is interpreted causally.

Model Comparison and Indeterminacy

Model Comparison and Indeterminacy

Model Comparison and Indeterminacy

van der Maas et al. (2016): If we just count parameters, the g-factor model seems simpler than an unconstrained mutualism model. This might not be true for constrained versions of the mutualism model. But more importantly, what are the costs of introducing a mysterious latent variable, as the common cause of ‘everything’ in the g-factor model?

Underestimation of the flexibility and thereby the complexity of GF-theories.

Degrees of freedom are not a valid means of model comparison in GFM.

Model Comparison and Indeterminacy

Alternative routes:

  • Experimentation (e.g., multinomial modeling)
  • Longitudinal Data (e.g., intra-individual)
  • Complexity Modeling (e.g., [dynamic] network modeling)

Theoretical Research into model complexity.

Conclusions

  • GFM has statistical problems.
  • GFM has theoretical problems.
  • GFM does not work.
  • We have been engaging in behaviorist modeling of cognitive constructs for more than 110 years.
  • Psychometrics is a mess.

thank you.

  • References

Bollen, K. A., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 100, 305-314.

Borsboom, D. (2005). Measuring the Mind. Cambridge, MA: Cambridge University Press.

Burdick, D. S., Stone, M. H., & Stenner, A. J. (2006). The combined gas law and a Rasch reading law. Rasch Measurement Transactions, 20, 1059-1060.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.

Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between constructs and measures. Psychological Methods, 5, 155-174.

Eid, M., Geiser, C., Koch, T., & Heene, M. (2016). Anomalous Results in G-Factor Models: Explanations and Alternatives. Psychological Methods. Advance online publication. http://dx.doi.org/10.1037/met0000083

Eid, M., & Koch, T. (2014) The Meaning of Higher-Order Factors in Reflective-Measurement Models, Measurement: Interdisciplinary Research and Perspectives, 12, 96-101.

Hoefer, C., & Rosenberg, A. (1994). Empirical Equivalence, Underdetermination, and Systems of the World. Philosophy of Science, 61, pp. 592-607.

Holzinger, K. J., & Swineford, F. (1937). The bi-factor method. Psychometrika, 2, 41–54.

Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355–383.

Michell, J. (2008). Is Psychometrics Pathological Science? Measurement: Interdisciplinary Research & Perspective, 6, 7–24. http://doi.org/10.1080/15366360802035489

Molenaar, P. C. M. (2004). A Manifesto on Psychology as Idiographic Science: Bringing the Person Back Into Scientific Psychology, This Time Forever. Measurement: Interdisciplinary Research & Perspective, 2, 201–218. http://doi.org/10.1207/s15366359mea0204_1

Molenaar, P. C. M. (2008). On the implications of the classical ergodic theorems: Analysis of developmental processes has to focus on intra-individual variation. Developmental Psychobiology, 50, 60–69. http://doi.org/10.1002/dev.20262

Molenaar, P. C. M., & Campbell, C. G. (2009). The New Person-Specific Paradigm in Psychology. Current Directions in Psychological Science, 18, 112–117. http://doi.org/10.1111/j.1467-8721.2009.01619.x

Murray, A. L., & Johnson, W. (2013). The limitations of model fit in comparing the bi-factor versus higher-order models of human cognitive ability structure. Intelligence, 41, 407–422. http://doi.org/10.1016/j.intell.2013.06.004

Musek, J. (2007). A general factor of personality: Evidence for the Big One in the five-factor model. Journal of Research in Personality, 41, 1213-1233. doi:10.1016/j.jrp.2007.02.003

Perline, R., Wainer H., & Wright, B. D. (1979). The Rasch model as additive conjoint measurement. Applied Psychological Measurement, 3, 237-255.

Raykov, T., & Marcoulides, G. A. (2001). Can there be infinitely many models equivalent to a given covariance structure model? Structural Equation Modeling, 8, 142–149.

Raykov, T., & Penev, S. (1999). On Structural Equation Model Equivalence. Multivariate Behavioral Research, 34, 199–244. http://doi.org/10.1207/S15327906Mb340204

Reise, S. P. (2012). The Rediscovery of Bifactor Measurement Models. Multivariate Behavioral Research, 47, 667–696. http://doi.org/10.1080/00273171.2012.715555