We followed the consecutive approach to multidimensionality, downloading and aggregating all the Pretest scores from BASS then using ConQuest1 to fit a unidimensional Partial Credit Model (PCM) for each construct separately. We refer to these calibrations as the original calibrations.
For each construct, we simulated a complete response matrix \(X_{pi}\) in which each student \(p\) responded to each item \(i\). We treated the estimated item parameters from the original calibration (in ConQuest parameterization), and the associated WLE person estimates as the true \(\delta_i\), \(\tau_{ik}\), and \(\theta_p\). With these margins fixed, we used the ConQuest generate command to simulate a random response matrix using the PCM as the data-generating model.
We then fit the PCM to this response matrix, with all the structural parameters (item parameters \(\delta_i\), \(\tau_{ik}\), and person distribution2 \(\mu_\theta\) and \(\sigma^{2}_\theta)\)) anchored.
Below, we refer to results from the original calibrations as original, and the results from fitting the PCM to the simulated response matrices as simulated. We also refer to the WLE estimates from the original calibrations as the generating \(\theta\) and those from fitting the PCM to the simulated response matrices as the recovered \(\theta\) . Since the item parameters were anchored, our interest is primarily in the measurement errors (WLE standard errors) and the associated reliability estimates.
Additionally, we used an analytic method to estimate measurement error and reliability, based on the additivity of Fisher information across items (assuming local independence).
In the PCM, item information function3 is:
\[ \begin{equation} I_i(\theta) = a_i^2 \left[ \sum_{k=1}^m {k^2 P_{ik}(\theta)} - \left(\sum_{k=1}^m {k P_{ik}(\theta)}\right)^2 \right] \end{equation} \]
where \(a_i = 1\) in Rasch models, and \(P_{ik}(\theta)\) is the usual PCM likelihood4:
\[ \begin{equation}\begin{split} P_{ik}(\theta) &= Pr(X_{pi} = k | \theta_p=\theta, \delta_{ik} ) \\ &= \frac { \exp \sum_{k=0}^a ( \theta_p - \delta_{ik}) } { \sum_{h=0}^{A_i} \exp \sum_{k=0}^h ( \theta_p - \delta_{ik}) } \end{split}\end{equation} \] where \(\delta_{ik}\) are the step difficulties (in Masters’ parameterization).
Because we are acting as if all the students responded to all the items, we compute the test information function as the sum of all the item information functions. The Standard Error of Measurement (SEM) function is the inverse square root of the test information function: \[ \begin{align} I(\theta) &= \sum_{i} I_i(\theta) \\ \text{SEM}(\theta) &= \frac{1}{\sqrt{I(\theta)}} \\ \end{align} \] We refer to these as analytic standard errors.
For reliability, we use a formula from Classical Test Theory: \[ \begin{equation} \rho_{xx'} = \frac {\sigma^2_T} {\sigma^2_T + \sigma^2_E} \end{equation} \] For the true score variance \(\sigma^2_T\) we use the population variance from the original calibration. We compute the error variance \(\sigma^2_E\) by margining out the person distribution from analytic measurement error variance function (\(\text{SEM}^2\) or \(1/I(\theta)\)). We do this by taking a weighted average along a gird of \(\theta\) values, with weights from the Normal PDF, having mean and variance from the original calibration. We refer to these as analytic reliability estimates.
We plan to repeat this analysis using the empirical distribution of \(\theta\) (i.e., taking an unweighted average of the of the analytic measurement error variance function at the the WLE estimates from the original calibration).
In the table and figure below, analytic represents the analytic reliability estimates described above. The original reliability estimates from ConQuest are orig.EAP, orig.MLE, and orig.WLE (based on EAP, MLE, and WLE. respectively). Likewise, sim.EAP, sim.MLE, and sim.WLE are the simulated reliability estimates.
Initial Observations
In the tables and figures below, orig.theta and sim.theta are the original and simulated WLE person estimates, respectively; orig.error and sim.error are the associated standard errors. Applying the analytic measurement error function to the original and simulated WLE’s yields analytic.orig and analytic.sim, respectively. Original raw scores are orig.tot points out of orig.max possible, and simulated raw scores are sim.tot points out of sim.max possible.
The scatter plots below are the original (“generating”) vs. simulated (“recovered”) WLE’s, with a regression line added.
Each of the constructs is a tab. Click the name of a construct to switch to that tab.
Initial Observations
Modeling Applied Problems—Conceptual Model
Modeling Applied Problems—Mathematical Model
Multiple Mathematical Representations—Contextual
Multiple Mathematical Representations—Relational
Position, Rate, and Acceleration
Interpreting Mathematical Results
The graphs below compare different methods of computing the standard error of measurement, for different locations along the logit scale. Each consists of multiple overlaid scatter plots of \(\theta\) vs. \(\text{SEM}(\theta)\) with LOESS curves to show trends. Outliers, as shown in the box plots in the next section, have been excluded.
The upper graphs show the original standard errors for each person, the simulated (or recovered) standard error for that person, and the analytic measurement error function (evaluated at their original WLE).
The lower graphs show the simulated standard error (“estimated.sim”) for each person, and the analytic measurement error function (“analytic.sim”, evaluated at their simulated WLE).
Initial Observations
Modeling Applied Problems—Conceptual Model
Modeling Applied Problems—Mathematical Model
Multiple Mathematical Representations—Contextual
Multiple Mathematical Representations—Relational
Position, Rate, and Acceleration
Interpreting Mathematical Results
The graphs below compare the distributions of standard errors across all three methods: original are the WLE standard errors from the original calibration, simulated are the WLE standard errors from the simulated data, and analytic are from the measurement error function (evaluated at the original \(\theta\)). Outliers are included in the box plots, but excluded from the histograms to make them more readable.
Initial Observations
Modeling Applied Problems—Conceptual Model
Modeling Applied Problems—Mathematical Model
Multiple Mathematical Representations—Contextual
Multiple Mathematical Representations—Relational
Position, Rate, and Acceleration
Interpreting Mathematical Results
ConQuest version 5.1.4, build Jan 22 2020↩
We assumed a Normal distribution for the person locations, both in the original calibrations and when fitting the PCM to the simulated data matrix. Because the distribution of respondents on several constructs was decidedly non-Normal (see Wright Maps), we may want to repeat these analyses using histogram distributions.↩
From Veldkamp (2003), Equation 1, p. 2↩
From Masters (2016), Equation 7.4, p. 111↩