Items

The following 13 items constituted the Reasoning factor:

B1 “I make insightful remarks”
B2 “I know the answers to many questions”
B3 “I tend to analyze things”
B4 “I use my brain”
B5 “I learn quickly”
B6 “I counter others’ arguments”
B7 “I reflect on things before acting”
B8 “I weigh the pros against the cons”
B9 “I consider myself an average person” [Reversed]
B10 “I get confused easily” [Reversed]
B11 “I know that I am not a special person” [Reversed]
B12 “I have a poor vocabulary” [Reversed]
B13 “I skip difficult words while reading” [Reversed]

Analysis

Analysis was undertaken with the assistance of the CTT, TAM, WrightMap, and psych R packages. The CTT package provided results pertaining to the frequency of response categories, point biserial and biserial discrimination indices (with specific items of interest removed for each estimated correlation, by default). The CTT package also generates the Cronbach’s alpha coefficient to provide an assessment of test reliability.

IRT analysis was undertaken with the assistance of the TAM package. Weighted Mean Likelihood Estimation (WLE) was used to estimate the paramters of the model (Warm, 1989). Unlike the Maximum Likelihood Estimation (MLE) method that uses modal estimates of the likelihood function, WLE uses estimates based on the mean. Although ability estimates using these two estimation methods have equivalent standard errors, WLE estimates are generally slightly more central (Warm, 1989), though, it should be noted that there is little practical difference in the results. Using WLE, this study reports the mean, SD, variance, and the (WLE) reliability coefficient.

Because this study was interested in the population from which the sample was drawn (i.e., the world!), the expected a posteriori (EAP) Reasoning ability distribution was also generated. This distributon is based on Bayesian adjustments which capitalise on an assumed prior normal probability distribution of Reasoning ability among the population. Consequent to this procedure, EAP-adjusted variance estimates, item paramters, item expected scores curves, item characteristic curves, and Thurstonian thresholds were produced.

Finally, with the assistance of the r WrightMap package, a Wright Map is produced.

Frequency of Raw Responses

The frequency of the raw responses were as follows:

##     B1   B2   B3   B4   B5   B6   B7   B8   B9  B10  B11  B12  B13
## 0   85   82   21   16   36   93  128   55  315  161  313   66  160
## 1  282  481  142   54  190  457  519  278 1238  769  809  219  442
## 2  787  826  295  201  462  849  696  608  523  656  660  403  357
## 3 1661 1436 1666 1475 1672 1506 1535 1676  935 1362 1019 1306 1240
## 4  589  586 1288 1666 1049  498  528  791  403  464  607 1418 1212

Point Biserial Discrimination Indices

The 13 point biserial discrimination indices as outputted by the CTT package are as follows:

##  [1] 0.45 0.47 0.42 0.54 0.51 0.32 0.29 0.33 0.37 0.42 0.27 0.50 0.36

Biserial Discrimination Indices

(Less biased to low p-q proportions) The 13 biserial discrimination indices as outputted by the CTT package are as follows:

##  [1] 0.50 0.50 0.48 0.63 0.56 0.34 0.32 0.37 0.37 0.45 0.29 0.56 0.40

Pearson Correlation Matrix

Using the psych package, the Pearson correlation matrix is as follows:

## Call: mixedCor(data = Reasoning2, c = 1:14)
##            B1   B2   B3   B4   B5   B6   B7   B8   B9   B10  B11  B12  B13  Rw_Tt
## B1         1.00                                                                  
## B2         0.34 1.00                                                             
## B3         0.27 0.25 1.00                                                        
## B4         0.30 0.34 0.40 1.00                                                   
## B5         0.25 0.40 0.28 0.42 1.00                                              
## B6         0.29 0.29 0.23 0.20 0.20 1.00                                         
## B7         0.18 0.11 0.30 0.27 0.18 0.03 1.00                                    
## B8         0.19 0.16 0.29 0.27 0.18 0.15 0.35 1.00                               
## B9         0.22 0.22 0.16 0.18 0.20 0.19 0.09 0.06 1.00                          
## B10        0.18 0.21 0.16 0.33 0.37 0.12 0.18 0.17 0.19 1.00                     
## B11        0.17 0.13 0.09 0.16 0.16 0.06 0.08 0.09 0.40 0.21 1.00                
## B12        0.29 0.34 0.21 0.34 0.33 0.18 0.14 0.16 0.22 0.30 0.13 1.00           
## B13        0.18 0.22 0.13 0.24 0.25 0.13 0.10 0.14 0.13 0.25 0.04 0.48 1.00      
## Row_Totals 0.56 0.58 0.52 0.62 0.60 0.45 0.43 0.45 0.52 0.55 0.44 0.60 0.50 1.00

Heterogenous Correlation Matrix

Using the psych package, the heterogenous correlation matrix is as follows:

## Call: mixedCor(data = Reasoning2, c = 14, p = 1:13)
##            B1   B2   B3   B4   B5   B6   B7   B8   B9   B10  B11  B12  B13  Rw_Tt
## B1         1.00                                                                  
## B2         0.40 1.00                                                             
## B3         0.35 0.32 1.00                                                        
## B4         0.39 0.43 0.51 1.00                                                   
## B5         0.32 0.47 0.35 0.52 1.00                                              
## B6         0.34 0.35 0.29 0.27 0.24 1.00                                         
## B7         0.23 0.15 0.37 0.34 0.22 0.05 1.00                                    
## B8         0.24 0.20 0.37 0.35 0.24 0.19 0.41 1.00                               
## B9         0.25 0.25 0.19 0.23 0.25 0.21 0.10 0.07 1.00                          
## B10        0.22 0.25 0.20 0.41 0.44 0.14 0.20 0.21 0.20 1.00                     
## B11        0.21 0.15 0.11 0.20 0.19 0.07 0.09 0.10 0.44 0.23 1.00                
## B12        0.36 0.42 0.28 0.44 0.41 0.22 0.18 0.20 0.26 0.36 0.15 1.00           
## B13        0.24 0.28 0.20 0.33 0.31 0.17 0.13 0.18 0.15 0.30 0.04 0.57 1.00      
## Row_Totals 0.60 0.62 0.59 0.72 0.66 0.47 0.46 0.49 0.55 0.59 0.46 0.68 0.55 1.00

Note. The bottom row in the correlation matrix are polyserial correlations between Row_Totals (the continuous variable) and each of the 13 Reasoning variables (polytomous variables). The lower triangle 13x13 matrix (above the bottom row) illustrate all possible polychoric correlations (between polytomous items). Estimation of these correlations assume that polytomous items have an underlying latent normal distribution.

Classical Reliability Var[O]-Var[E]/Var[O]

Derived from generalisability theory, the Cronbach’s alpha is a popular estimate for test reliability. It assumes tau equivalence (i.e., equal contribution of items to the total scores) so the Cronbach’s alpha is a lower bound estimate for reliability (i.e., it ‘bound’ from going lower than it probably is because items tend to not have the same discrimination). The Cronbach’s alpha was 0.77.

Mean Ability and Standard Deviation of Ability (Theta)

The mean and SD of the Reasoning ability thetas were as follows: M = 0.01, and SD = 0.74 (variance = 0.54)

The Item Paramters

##                   xsi     se.xsi
## B1_Cat1  -1.761829314 0.11317816
## B1_Cat2  -1.354695817 0.05854444
## B1_Cat3  -0.819736460 0.03893803
## B1_Cat4   1.262600490 0.04766460
## B2_Cat1  -2.293276918 0.11441257
## B2_Cat2  -0.832362095 0.04937342
## B2_Cat3  -0.591450555 0.03795423
## B2_Cat4   1.151886972 0.04809249
## B3_Cat1  -2.609533451 0.22230341
## B3_Cat2  -1.189732369 0.08380207
## B3_Cat3  -1.937844838 0.05294356
## B3_Cat4   0.340042845 0.03728386
## B4_Cat1  -1.983859029 0.25525469
## B4_Cat2  -1.832874559 0.12450121
## B4_Cat3  -2.255816135 0.06574697
## B4_Cat4  -0.094813551 0.03602155
## B5_Cat1  -2.311490098 0.17090744
## B5_Cat2  -1.301282372 0.07227449
## B5_Cat3  -1.448035965 0.04543872
## B5_Cat4   0.594634424 0.03926907
## B6_Cat1  -2.112537168 0.10792305
## B6_Cat2  -0.905922743 0.04979419
## B6_Cat3  -0.603937591 0.03781591
## B6_Cat4   1.373496208 0.05106749
## B7_Cat1  -1.907450704 0.09312220
## B7_Cat2  -0.572386430 0.04731924
## B7_Cat3  -0.818499168 0.03837263
## B7_Cat4   1.333788431 0.04995517
## B8_Cat1  -2.215684454 0.13907363
## B8_Cat2  -1.144431892 0.06106263
## B8_Cat3  -1.123691437 0.04116799
## B8_Cat4   0.934281208 0.04288768
## B9_Cat1  -1.731117393 0.06154432
## B9_Cat2   0.729695459 0.03867111
## B9_Cat3  -0.462067135 0.03979499
## B9_Cat4   1.246722787 0.05704693
## B10_Cat1 -2.022836208 0.08339372
## B10_Cat2 -0.072081364 0.04223534
## B10_Cat3 -0.710983500 0.03805108
## B10_Cat4  1.388192834 0.05287718
## B11_Cat1 -1.365129488 0.06245532
## B11_Cat2  0.008149747 0.04076611
## B11_Cat3 -0.389115911 0.03875212
## B11_Cat4  0.840561989 0.04849616
## B12_Cat1 -1.839351390 0.12840311
## B12_Cat2 -1.028530814 0.06610231
## B12_Cat3 -1.356613177 0.04615824
## B12_Cat4  0.013239775 0.03717380
## B13_Cat1 -1.566551011 0.08496266
## B13_Cat2 -0.128065662 0.04979743
## B13_Cat3 -1.357990173 0.04255492
## B13_Cat4  0.177572554 0.03872000

The Expected A Posteriori (EAP) Variance, and EAP Reliability

The EAP variance and reliability estimates are made based on a Bayesian-adjusted ability distribution based on the assumption normally distributed population parameters: The EAP variance of the model was 0.36. This is lower than the observed variance of 0.54. The estimated reliability for the EAP distribution was 0.78.

All 13 Item Expected Curves

Following are all item expected curves:

knitr::opts_chunk$set(echo = TRUE)
plot(mod1)

## Iteration in WLE/MLE estimation  1   | Maximal change  1.2019 
## Iteration in WLE/MLE estimation  2   | Maximal change  0.3031 
## Iteration in WLE/MLE estimation  3   | Maximal change  0.0177 
## Iteration in WLE/MLE estimation  4   | Maximal change  2e-04 
## Iteration in WLE/MLE estimation  5   | Maximal change  0 
## ----
##  WLE Reliability = 0.785

## ....................................................
##  Plots exported in png format into folder:
##  /Users/matthewcourtney/Desktop/Wu.Course/Day 8/Plots

All Item Characteristic Curves

Item characteristic cuves by item also here:

## Iteration in WLE/MLE estimation  1   | Maximal change  1.2019 
## Iteration in WLE/MLE estimation  2   | Maximal change  0.3031 
## Iteration in WLE/MLE estimation  3   | Maximal change  0.0177 
## Iteration in WLE/MLE estimation  4   | Maximal change  2e-04 
## Iteration in WLE/MLE estimation  5   | Maximal change  0 
## ----
##  WLE Reliability = 0.785

## ....................................................
##  Plots exported in png format into folder:
##  /Users/matthewcourtney/Desktop/Wu.Course/Day 8/Plots

Thurstonian Thresholds

The Thurstonian Thresholds were as follows:

##          Cat1       Cat2        Cat3       Cat4
## B1  -2.195892 -1.3422546 -0.52688599 1.37503052
## B2  -2.491608 -1.0862732 -0.29434204 1.30709839
## B3  -2.848663 -1.7209167 -1.26370239 0.44540405
## B4  -2.589752 -2.0190125 -1.60482788 0.01950073
## B5  -2.615753 -1.5850525 -0.98318481 0.71749878
## B6  -2.356293 -1.1018372 -0.29031372 1.49880981
## B7  -2.142609 -0.9353943 -0.32748413 1.44552612
## B8  -2.497833 -1.3909607 -0.71438599 1.05331421
## B9  -1.825104 -0.1127014  0.25845337 1.44827271
## B10 -2.165131 -0.6418762 -0.12002563 1.51144409
## B11 -1.600250 -0.4466858  0.03561401 1.10678101
## B12 -2.211456 -1.3585510 -0.89749146 0.24765015
## B13 -1.835724 -0.8777161 -0.58584595 0.42343140

Wright Map

Finally, the Wright Map:

##          Cat1       Cat2        Cat3       Cat4
## B1  -2.195892 -1.3422546 -0.52688599 1.37503052
## B2  -2.491608 -1.0862732 -0.29434204 1.30709839
## B3  -2.848663 -1.7209167 -1.26370239 0.44540405
## B4  -2.589752 -2.0190125 -1.60482788 0.01950073
## B5  -2.615753 -1.5850525 -0.98318481 0.71749878
## B6  -2.356293 -1.1018372 -0.29031372 1.49880981
## B7  -2.142609 -0.9353943 -0.32748413 1.44552612
## B8  -2.497833 -1.3909607 -0.71438599 1.05331421
## B9  -1.825104 -0.1127014  0.25845337 1.44827271
## B10 -2.165131 -0.6418762 -0.12002563 1.51144409
## B11 -1.600250 -0.4466858  0.03561401 1.10678101
## B12 -2.211456 -1.3585510 -0.89749146 0.24765015
## B13 -1.835724 -0.8777161 -0.58584595 0.42343140

Summary

For all 13 items, the most common response category was “agree”" (3 on the 0-4 scale).
Both Cronbach’s alpha and IRT reliability (WLE) estimates were good at 0.77 and .79, respectively. The EAP reliability was similar at 0.78).
Discrimination indices were between .28 and .53, suggesting that the test items contrinuted positively with the Reasoning construct.
Across all 13 items, expected curves were well aligned–as self-reported reasoning ability increased, the expected score of students for each item (0-4) also increased. Likewise, item characteristic curves suggested logical ordering of the probability plots for each of the five item categories. All Thurstonian thresholds were also ordered.

The Wright map provided much insight. Strong agreement with “I counter other’s arguments” and strong disagreement with “I get confused easily” tended to identify those of the highest reasoning. However, strong disagreement with “I tend to analyze things”, “I use my brain”, and “I learn quickly” tended to identify those respondents of with the lowest self-reported reasoning.

Compared with Confirmatory Factor Analysis (CFA), IRT analysis provides a for a better diagnostic tool for understanding indicative behaviours that might be associated with increased capcity in some latent trait. Whilst never being confused and often countering arguments might be associated with the highest levels of reasoning, never tending to analyze things nor learn quickly may be especially indicative behaviours of those persons at the lower end of the reasoning spectrum.

Given the Likert design of the questions (as opposed to a selection of indicative behaviours for each question), it may be that a CFA analytical approach may have been envisiged by the original researchers. Although a standard CFA would not have readily identified which item categories were associated with the extremes of reasoning ability, factor scores generated from the analysis would have taken account of the extent to which each of the items discriminate, making factor scores a useful means of measuring a the latent trait (by acounting for systemic measurement error).

To sum, when the purpose of an instrument is to provide diagnostic feedback for some intended clinical or educational intervention, IRT analysis is useful. However, when the purpose of the instrument is to provide a broad gauge of the existance of some latent trait, whilst theoretically accounting for error associated with the items themselves, CFA is useful. A two-paramter IRT model might also be considered as this model accounts for item discrimination when estimating ability logits–I can do this later along with a CFA (ML) to test (prove) equivalence of item discrimination estimates across these analytic methods.

IRT Analysis of the Reasoning Factor

By Matthew Courtney, PhD

Sunday 6 May, 2018

Purpose

Data Preparation