Psychometrics Applied to Organizational and Work Psychology

.title[
# Psychometrics Applied to Organizational and Work Psychology
]
.subtitle[
## <br>Classical Test Theory
]
.author[
### Jorge Sinval<br> Thaís Zerbini
]
.date[
### 2024-10-01
]

---

# Readings
<html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html>

.kbd {
    display: inline-block;
    padding: .2em .5em;
    font-size: 0.75em;
    line-height: 1.75;
    color: #555;
    vertical-align: middle;
    background-color: #fcfcfc;
    border: solid 1px #ccc;
    border-bottom-color: #bbb;
    border-radius: 3px;
    box-shadow: inset 0 -1px 0 #bbb
}

</style>

---

# Readings

---

# CTT
<html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html>

---

# CTT

## Readings

.can-edit.key-measurement[
- thing one
- thing two
- ...
]

---

# CTT

## Theory and Assumptions

- Classical Test Theory (CTT) – also known as weak or true-score test theory

- Called classic relative to Item Response Theory (IRT) which is a more modern approach
  
  - CTT describes a set of psychometric procedures used to test items and scales reliability, difficulty, discrimination, etc.

.footnote[In the context of CTT, a psychometric instrument is said to have evidence of reliability if the error in the true score `$(\tau)$` is minimal.

]

---

# CTT

- CTT analyses are the are most widely used form of analyses.
  
  - The statistics can be computed by readily available statistical packages
  
  - CTT Analyses are performed on the test as a whole rather than on the item and although item statistics can be generated, they apply only to that group of students on that collection of items

---

# CTT

• Assumes that every person has a true score on an item or a scale if we can only measure it directly without error

• CTT analyses assumes that a person’s test score is comprised of their “true” score plus some measurement error.

• This is the common true score model:

\begin{align}
X = \tau + \varepsilon \label{truescore}
\end{align}

An observed test-score of a person is the sum of that persons true score and an error of measurement

<center>
<div class="figure" style="text-align: center">
<img src="assets/img/ctt.gif" alt="Graphical representation of the CTT. This figure was extracted from &lt;a href=&quot;https://conjointly.com/kb/true-score-theory/&quot;&gt;https://conjointly.com/kb/true-score-theory/&lt;/a&gt;" width="30%" />
<p class="caption">Graphical representation of the CTT. This figure was extracted from <a href="https://conjointly.com/kb/true-score-theory/">https://conjointly.com/kb/true-score-theory/</a></p>
</div>

---

# CTT

- Based on the expected values of each component for each person we can see that \eqref{expectation}:

\begin{align}
\mathbb{E}(X_i)=\tau_i \label{expectation}
\end{align}

The expected value of observed scores is the true score

`$\varepsilon_i=X_i-\tau_i$`

`$\mathbb{E}(X_i-\tau_i)=\mathbb{E}(X_i)-\mathbb{E}(\tau_i)=\tau_i-\tau_i=0$`

`$\varepsilon$` and `$X$` are random variables, `$\tau$` is constant

However this is theoretical and **not done at the individual level**.

---

# CTT

\begin{align}
\rho_{ \varepsilon,\tau}=0 \label{corr1}
\end{align}

The error of measurement on a test and the true scores on that test are uncorrelated

\begin{align}
\rho_{ \varepsilon_1,\varepsilon_2}=0 \label{corr2}
\end{align}

Error scores on two different tests are uncorrelated

\begin{align}
\rho_{ \varepsilon_1,\tau_2}=0 \label{corr3}
\end{align}

The error of measurement on a test and the true scores on all other tests are uncorrelated

If two tests have observed scores `$X$` and `$X^\prime$` that satisfy assumptions \eqref{truescore} to \eqref{corr3}, and if, for every population of examinees, `$\tau = \tau^\prime$` and `$\sigma^2_\varepsilon=\sigma^2_{\varepsilon^\prime}$`, then the tests are called **parallel tests**. In other words, parallel tests have the same true scores and error variances.

If two tests have observed scores `$X_1$` and `$X_2$` that satisfy assumptions \eqref{truescore} to \eqref{corr3}, and if, for every population of examinees, `$\tau_1=\tau_2+c$`, where `$c$` is a constant, then the tests are called essentially `$\tau-\text{equivalent tests}$`. To put it differently, **essentially τ-equivalent tests** have true scores that differ by a constant.

---

# CTT

If we assume that people are randomly selected then `$\tau$` becomes a random variable as well (as seen in equation \ref{truescore})

Therefore, in CTT we assume that `$\varepsilon$` (i.e., the error)):  
  
  * Has a mean of zero (i.e., `$\mu = 0$`)
  * Is normally distributed (i.e., `$\mathcal{N}(0,\sigma)$`)
  * Uncorrelated with true score (i.e., `$\rho_{\varepsilon, \tau}=0$` see equation \eqref{corr1})

---

# CTT

]

]

---

# CTT

Measurement error `$(\varepsilon)$` around a `$\tau$` can be large or small, for example `$X_1$`, `$X_2$`, and `$X_3$`.

---

# CTT

## Domain Sampling Theory<sup>💡</sup>

• Another Central Component of CTT

• Another way of thinking about populations and samples

• Domain &mdash; Population or universe of all possible items measuring a single concept or trait (theoretically infinite)

• Test &mdash; a sample of items from that universe

.footnote[<sup>💡</sup>Assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items. Domain sampling is the most common CTT used for practical purposes.]

---

# CTT

## Domain Sampling Theory

• A person’s true score would be obtained by having them respond to all items in the "universe" of items

• We only see responses to the sample of items on the test

• So, reliability is the proportion of variance in the "universe" explained by the test variance

---

# CTT

## Domain Sampling Theory

• A universe is made up of a (possibly infinitely) large number of items

• So, as tests get longer they represent the domain better, therefore longer tests should have higher reliability

• Also, if we take multiple random samples from the population we can have a distribution of sample scores that represent the population

---

# CTT

## Domain Sampling Theory

• Each random sample from the universe would be "randomly parallel" to each other

• Unbiased estimate of reliability:

\begin{align}
r_{1, \tau }=\sqrt{\bar{r}_1 ,_j} \label{reliability}
\end{align}

* `$r_{1,\tau} =$` correlation between test and true score  
  
  * `$\bar r_{1,j} =$` average correlation between the test and all other randomly parallel tests

---

# CTT

## Reliability

• Reliability is theoretically the correlation between a `$X$` (test-score) and a `$\tau$` (the true score), squared

• Essentially the proportion of `$X$` that is `$\tau$`

`$$\rho^2_{X,\tau}=\frac{\sigma^2_{\tau}}{\sigma^2_{X}}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\varepsilon}} \label{reliabilityfrac}$$`

• This can’t be measured directly so we use other methods to estimate

---

# CTT

## Reliability

• Reliability can be viewed as a measure of consistency or how well as test "holds together"

• Reliability is measured on a scale of `$0-1$`. The greater the number the higher the reliability<sup>⚠️</sup>.

.footnote[<sup>⚠️</sup>Values very close to `\$1\$` can be seem as indicative of redundancy between the items.]

---

# CTT

## Reliability

The approach to estimating reliability depends on:

* Estimation of "true" score  
  * Source of measurement error

Types of reliability:

* Test-retest  
  * Parallel Forms  
  * Split-half  
  * Internal Consistency

---

# CTT

## Test-Retest Reliability

• Evaluates the error associated with administering a test at two different times.  
  
  • _Time Sampling Error_  
  
  • How-To:  
    • Apply the psychometric instrument at Time 1 `$(X_1)$`  
    • Apply the psychometric instrument at Time 2 `$(X_2)$`  
    • Calculate `$r_{X_1,X_2}$` for the two scores

• Easy to do; one test does it all.

---

# CTT

## Test-Retest Reliability

• Assume 2 administrations `$X_1$` and `$X_2$`:

`$$\varepsilon_{X_{1,i}}  = \varepsilon_{X_{2,i}} ~~~~~~ \sigma^2_{\varepsilon_{1,i}}=\sigma^2_{\varepsilon_{2,i}}
\therefore \rho_{X_1,X_2}=\frac{\sigma_{X_1,X_2}}{\sigma_{X_1}\sigma_{X_2}}=\frac{\sigma^2_{\tau}}{\sigma^2_{X}}=\rho_{X,\tau}$$`

• The correlation between the 2 administrations is the reliability

---

# CTT

## Test-Retest Reliability

• Sources of error:

* random fluctuations in performance  
  * uncontrolled testing conditions  
  * extreme changes in weather  
  * sudden noises/chronic noise  
  * other distractions

• internal factors:

* illness, fatigue, emotional strain, worry  
  * recent experiences

---

# CTT

## Test-Retest Reliability

Generally used to evaluate constant traits:

* Intelligence, personality

Not appropriate for qualities that change rapidly over time:

* Mood, hunger  
  
Problem: Carryover Effects (Exposure to the test at time #1 influences scores on the test at time #2)

Only a problem when the effects are random.

If everybody goes up 5pts, you still have the same variability

---

# CTT

## Test-Retest Reliability

• Practice effects

* Type of carryover effect  
  * Some skills improve with practice  
  * Manual dexterity, ingenuity or creativity  
  * Practice effects may not benefit everybody in the same way.

Carryover & Practice effects more of a problem with short inter-test intervals (ITI).
  
But, longer ITI’s have other problems:

* developmental change, maturation, exposure to historical events

---

# CTT

## Parallel Forms Reliability

Evaluates the error associated with selecting a particular set of items.

_Item Sampling Error_

How To:

* Develop a large pool of items (i.e. Domain) of varying difficulty.  
  * Choose equal distributions of difficult / easy items to produce multiple forms of the same test.  
  * Give both forms close in time.  
  * Calculate `$r$` for the two administrations.

---

# CTT

## Parallel Forms Reliability

Also known as _Alternative Forms_ or _Equivalent Forms_

Can give parallel forms at different points in time; produces error estimates of time and item sampling.

One of the most rigorous assessments of reliability currently in use.

Infrequently used in practice – too expensive to develop two tests.

---

# CTT

## Parallel Forms Reliability

Assume 2 parallel tests `$X$` and `$X^\prime$`:

`$$\varepsilon(X_i)=\varepsilon(X_i^\prime) ~~~~~~ \sigma^2_{E_i}=\sigma^2_{E^\prime_i}$$`

`$$\therefore \rho_{XX^\prime} = \frac{\sigma_{XX^\prime}}{\sigma_{X}\sigma_{X^\prime}}= \frac{\sigma^2_{\tau}}{\sigma^2_{X}}=\rho_{XT}$$`

• The correlation between the 2 parallel forms is the reliability

---

# CTT

## Split Half Reliability

What if we treat halves of one test as parallel forms? (Single test as whole domain)

That’s what a split-half reliability does

This is testing for _Internal Consistency_

* Scores on one half of a test are correlated with scores on the second half of a test
  
Big question: “How to split?”:

* First half vs. last half  
  * Odd vs Even  
  * Create item groups called testlets

---

# CTT

## Split Half Reliability

How to:

* Compute scores for two halves of single test, calculate `$r$`.
  
Problem:

* Considering the domain sampling theory what’s wrong with this approach?
  * A `$20$` item test cut in half, is two `$10-$`item tests, what does that do to the reliability?  
  * If only we could correct for that…

---

# CTT

## Spearman-Brown Formula

Estimates the reliability for the entire test based on the split-half

Can also be used to estimate the affect changing the number of items on a test has on the reliability

`$r^\ast = \frac{j(r)}{1+(j-1)r}$`

Where `$r^\ast$` is the estimated reliability, `$r$` is the correlation between the halves, `$j$` is the new length proportional to the old length

---

# CTT

## Spearman-Brown Formula

For a split-half it would be:

`$$r^\ast=\frac{2(r)}{(1+r)}$$`

Since the full length of the test is twice the length of each half

---

# CTT

## Spearman-Brown Formula

**Example 1:** a 30-item test with a split-half reliability of `$.65$`

`$$r^\ast=\frac{2(.65)}{(1+.65)}=.79$$`

• The `$.79$` is a much better reliability than the `$.65$`

---

# CTT

## Spearman-Brown Formula

**Example 2**: a 30-item test with a test retest reliability of `$.65$` is lengthened to `$90$` items

`$$r^\ast=\frac{3(.65)}{1+(3-1).65}=\frac{1.95}{2.3}=.85$$`

**Example 3**: a 30 item test with a test re-test reliability of .65 is cut to 15 items

`$$r^\ast=\frac{.5(.65)}{1+(.5-1).65}=\frac{.325}{.675}=.48$$`

---

# CTT

## Detour 1: Variance Sum Law

Often multiple items are combined in order to create a composite score

The variance of the composite is a combination of the variances and covariances of the items creating it

General Variance Sum Law states that if `$X$` and `$Y$` are random variables:

`$$\sigma^2_{X \pm Y}=\sigma^2_{X}+\sigma^2_{Y}\pm2\sigma^2_{XY}$$`

---

# CTT

## Detour 1: Variance Sum Law

Given multiple variables we can create a variance/covariance matrix

For 3 items:

<table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">  </th>
   <th style="text-align:left;"> $X_1$ </th>
   <th style="text-align:left;"> $X_2$ </th>
   <th style="text-align:left;"> $X_3$ </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> $X_1$ </td>
   <td style="text-align:left;"> $\sigma^2_1$ </td>
   <td style="text-align:left;"> $\sigma_{12}$ </td>
   <td style="text-align:left;"> $\sigma_{13}$ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> $X_2$ </td>
   <td style="text-align:left;"> $\sigma_{21}$ </td>
   <td style="text-align:left;"> $\sigma^2_{2}$ </td>
   <td style="text-align:left;"> $\sigma_{23}$ </td>
  </tr>
  <tr>
   <td style="text-align:left;"> $X_3$ </td>
   <td style="text-align:left;"> $\sigma_{31}$ </td>
   <td style="text-align:left;"> $\sigma_{32}$ </td>
   <td style="text-align:left;"> $\sigma^2_3$ </td>
  </tr>
</tbody>
</table>

---

# CTT

## Detour 1: Variance Sum Law
  
Example Variables `$X$`, `$Y$` and `$Z$`

Covariance Matrix

By the variance sum law the composite variance would be:

`$$\sigma^2_{X+Y+Z}=\sigma^2_{Total}=\sigma^2_{X}+\sigma^2_{Y}+\sigma^2_{Z}+2\sigma_{XY}+2\sigma_{XZ}+2\sigma_{YZ}$$`

---

# CTT

## Detour 1: Variance Sum Law
  
  
<table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">  </th>
   <th style="text-align:right;"> $X$ </th>
   <th style="text-align:right;"> $Y$ </th>
   <th style="text-align:right;"> $Z$ </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> $X$ </td>
   <td style="text-align:right;"> 55.83 </td>
   <td style="text-align:right;"> 29.52 </td>
   <td style="text-align:right;"> 30.33 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> $Y$ </td>
   <td style="text-align:right;"> 29.52 </td>
   <td style="text-align:right;"> 17.49 </td>
   <td style="text-align:right;"> 16.15 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> $Z$ </td>
   <td style="text-align:right;"> 30.33 </td>
   <td style="text-align:right;"> 16.15 </td>
   <td style="text-align:right;"> 29.06 </td>
  </tr>
</tbody>
</table>

By the variance sum law the composite variance would be:
  
`$S^2_{total}=55.83+17.49+29.06+2\times29.52+2\times30.33+2\times16.15=254.38$`

---

# CTT

## Internal Consistency Reliability

• If items are measuring the same construct they should elicit similar if not identical responses

• Coefficient OR Cronbach’s Alpha is a widely used measure of internal consistency for continuous data

• Knowing the a composite is a sum of the variances and covariances of a measure we can assess consistency by how much covariance exists between the items relative to the total variance

---

# CTT

## Internal Consistency Reliability

• Coefficient Alpha is defined as:

`$$\alpha = \frac{k}{k-1}\left(\frac{\sum S_{ij}}{S^2_{Total}}\right)$$`

• `$S^2_{Total}$` is the composite variance (if items were summed)

• `$S_{ij}$` is covariance between the `$i^{th}$` and `$j^{th}$` items where `$i \neq j$`

• `$k$` is the number of items

---

# CTT

## Internal Consistency Reliability

• Using the same continuous items `$X$`, `$Y$` and `$Z$`

• The covariance matrix is:

• The total variance is `$254.38$`

• The sum of all the covariances is `$152$`

`$$\alpha = \frac{k}{k-1}\left(\frac{\sum S_{ij}}{S^2_{Total}}\right)= \frac{3}{3-1}\left(\frac{152}{254.38}\right)=0.8962969$$`

---

# CTT

## Internal Consistency Reliability

• Coefficient Alpha can also be defined as:

`$$\alpha=\frac{k}{k-1}\left(\frac{S^2_{Total}-\sum S^2_i}{S^2_{Total}}\right)$$`

• `$S^2_{Total}$` is the composite variance (if items were summed)  
• `$S^2_{i}$` is variance for each item  
• `$k$` is the number of items

---

# CTT

## Internal Consistency Reliability

• Using the same continuous items `$X$`, `$Y$` and `$Z$`  
• The covariance matrix is:  
  
<table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">  </th>
   <th style="text-align:right;"> $X$ </th>
   <th style="text-align:right;"> $Y$ </th>
   <th style="text-align:right;"> $Z$ </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> $X$ </td>
   <td style="text-align:right;"> 55.83 </td>
   <td style="text-align:right;"> 29.52 </td>
   <td style="text-align:right;"> 30.33 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> $Y$ </td>
   <td style="text-align:right;"> 29.52 </td>
   <td style="text-align:right;"> 17.49 </td>
   <td style="text-align:right;"> 16.15 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> $Z$ </td>
   <td style="text-align:right;"> 30.33 </td>
   <td style="text-align:right;"> 16.15 </td>
   <td style="text-align:right;"> 29.06 </td>
  </tr>
</tbody>
</table>
  
• The total variance is `$254.38$`  
• The sum of all the variances is `$102.38$`

`$\alpha=\frac{k}{k-1}\left(\frac{S^2_{Total}-\sum S^2_i}{S^2_{Total}}\right)=\frac{3}{3-1}\left(\frac{254.38-102.38}{254.38}\right)=0.8962969$`

---

# CTT

## Internal Consistency Reliability: Example

<div class="pre-name">internal_consistency.R</div>

``` r
#download data
ds <- readr::read_csv('https://ndownloader.figshare.com/files/22299075')
#the function to be used "ufs"
ufs::scaleStructure(dat = ds,
items = c("SIJS1","SIJS2", "SIJS3","SIJS4", "SIJS5"))
```
  
  + `dat` &mdash; set the dataset.
  + `items` &mdash; sets the for which the reliability (internal consistency) estimates should be computed.

---

# CTT

## Internal Consistency Reliability: Example

### Scale structure

#### Information about this scale

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;">  </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Dataframe: </td>
   <td style="text-align:left;"> ds </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Items: </td>
   <td style="text-align:left;"> SIJS1, SIJS2, SIJS3, SIJS4 &amp; SIJS5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Observations: </td>
   <td style="text-align:left;"> 1171 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Positive correlations: </td>
   <td style="text-align:left;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Number of correlations: </td>
   <td style="text-align:left;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Percentage positive correlations: </td>
   <td style="text-align:left;"> 100 </td>
  </tr>
</tbody>
</table>

#### Estimates assuming interval level

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;">  </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Omega (total): </td>
   <td style="text-align:right;"> 0.85 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Omega (hierarchical): </td>
   <td style="text-align:right;"> 0.81 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Revelle's Omega (total): </td>
   <td style="text-align:right;"> 0.88 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Greatest Lower Bound (GLB): </td>
   <td style="text-align:right;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Coefficient H: </td>
   <td style="text-align:right;"> 0.89 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Coefficient Alpha: </td>
   <td style="text-align:right;"> 0.84 </td>
  </tr>
</tbody>
</table>

##### Confidence intervals

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;">  </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Omega (total): </td>
   <td style="text-align:left;"> [0.83; 0.86] </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Coefficient Alpha: </td>
   <td style="text-align:left;"> [0.83; 0.85] </td>
  </tr>
</tbody>
</table>

#### Estimates assuming ordinal level

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;">  </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Ordinal Omega (total): </td>
   <td style="text-align:right;"> 0.88 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Ordinal Omega (hierarch.): </td>
   <td style="text-align:right;"> 0.88 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Ordinal Coefficient Alpha: </td>
   <td style="text-align:right;"> 0.88 </td>
  </tr>
</tbody>
</table>

##### Confidence intervals

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;">  </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Ordinal Omega (total): </td>
   <td style="text-align:left;"> [0.87; 0.89] </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Ordinal Coefficient Alpha: </td>
   <td style="text-align:left;"> [0.86; 0.89] </td>
  </tr>
</tbody>
</table>

Note: the normal point estimate and confidence interval for omega are based on the procedure suggested by Dunn, Baguley & Brunsden (2013) using the MBESS function ci.reliability, whereas the psych package point estimate was suggested in Revelle & Zinbarg (2008). See the help ('?ufs::scaleStructure') for more information.

</div>
<div style="display:block;clear:both;" class="scale-structure-end"></div>
]

---

# CTT

## Internal Consistency Reliability

• Coefficient Alpha is considered a lower-bound estimate of the reliability of continuous items  
• It was developed by Cronbach (1951) in the 50’s but is based on an earlier formula by Kuder and Richardson (1937) that tackled internal consistency for dichotomous ("Yes"/"No", "Right"/"Wrong") items  
• ⚠️ Internal consistency estimates for ordinal data can be artificially attenuated if we assume interval level data (Gadermann, Guhn, and Zumbo, 2012).

---

# CTT

## Detour 2: Dichotomous Items

• If `$Y$` is a dichotomous item:  
  + `$p$` &mdash; proportion of successes OR items answer correctly  
  + `$q$` &mdash; proportion of failures OR items answer incorrectly  
  + `$\bar Y=p$` &mdash; observed proportion of successes 
  
  + `$S^2_Y = pq$`

---

# CTT

## Internal Consistency Reliability

•  Kuder and Richardson (1937) developed the `$KR_{20}$` that is defined as:

`$$KR_{20}=\alpha=\frac{k}{k-1}\left(\frac{S^2_{Total}-\sum pq}{S^2_{Total}}\right)$$`

• Where `$pq$` is the variance for each dichotomous item  
• The `$KR_{21}$` is a quick and dirty estimate of the `$KR_{20}$`

---

# CTT

## Reliability of Observations

• What if you’re not using a test but instead observing individual’s behaviors as a psychological assessment tool?

• How can we tell if the judge’s (assessor’s) are reliable?

---

# CTT

## Reliability of Observations

• Typically a set of criteria are established for judging the behavior and the judge is trained on the criteria

• Then to establish the reliability of both the set of criteria and the judge, multiple judges rate the same series of behaviors

• The correlation between the judges is the typical measure of reliability

• But, couldn’t they agree by accident? Especially on dichotomous or ordinal scales?

---

# CTT

## Reliability of Observations

• `$\kappa$` (kappa) is a measure of inter-rater reliability that controls for chance agreement

• Values range from `$-1$` (less agreement than expected by chance) to `$1$` (perfect agreement)

• `$\kappa \geq .75$` &mdash; excellent  
  • `$.40 \leq \kappa < .75$` &mdash; fair to good  
  • `$\kappa <.40$` &mdash; poor

---

# CTT

## Standard Error of Measurement

• So far the standard error of measurement was approached as the error associated with trying to estimate a true score from a specific test  
• This error can come from many sources  
• We can calculate it’s size by:

`$$S_{measurement} = S\sqrt{1-r}$$`

• `$S$` is the standard deviation  
• `$r$` is reliability

---

# CTT

## Standard Error of Measurement

• Using the same continuous items `$X$`, `$Y$` and `$Z$`  
• The total variance is 254.38  
• `$s = \sqrt{254.38} = 15.9492947$`  
• `$\alpha = 0.8962969$`

`$$s_{measurement}=15.9492947\times \sqrt{1-0.8962969}=5.1361464$$`

---
# CTT

## The Prophecy Formula

• How much reliability do we want?

• Typically we want values above `$.80$`

• What if we don’t have them?

• The Spearman-Brown can be algebraically manipulated to achieve

`$$j=\frac{r_d\left(1-r_o\right)}{r_o\left(1-r_d\right)}$$`

• `$j$` &mdash; # of tests at the current length

• `$r_d$` &mdash; desired reliability

• `$r_o$` &mdash; observed reliability

---

# CTT

## The Prophecy Formula

• Using the same continuous items `$X$`, `$Y$` and `$Z$`

• `$\alpha = 15.9492947\times \sqrt{1-0.8962969}$`

• What if we want a .95 reliability?

`$$j=\frac{r_d\left(1-r_o\right)}{r_o\left(1-r_d\right)}=\frac{.95\left(1-0.8962969\right)}{0.8962969\left(1-.95\right)}=\frac{0.098518}{0.0448148}=2.1983333$$`

• We need a test that is `$2.2$` times longer than the original  
• Nearly `$7$` items to achieve .95 reliability

---

# CTT

## Attenuation

• Correlations are typically sought at the true score level but the presence of measurement error can cloud (attenuate) the size the relationship  
• We can correct the size of a correlation for the low reliability of the items.  
• Called the Correction for Attenuation

---

# CTT

## Attenuation

• Correction for attenuation is calculated as:

`$$\hat r_{12}=\frac{r_{12}}{\sqrt{r_{11}r_{22}}}$$`

• `$\hat r_{12}$` &mdash; corrected correlation  
• `$\sqrt{r_{12}}$` &mdash;  uncorrected correlation  
• `$\sqrt{r_{11}}$` and `$\sqrt{r_{22}}$` &mdash; the reliabilities of the tests

---

# CTT

## Attenuation

• For example `$X$` and `$Y$` are correlated at `$.45$`, `$X$` has a reliability of `$.8$` and `$Y$` has a reliability of `$.6$`, the corrected correlation is

`$$\hat r_{12}=\frac{r_{12}}{\sqrt{r_{11}r_{22}}}=\frac{.45}{\sqrt{.8\times.6}}=\frac{.45}{\sqrt{.48}}=.65$$`

---

# References

Cronbach, L. J. (1951). "Coefficient alpha and the internal structure
of tests". In: _Psychometrika_ 16.3, pp. 297-334. ISSN: 0033-3123. DOI:
[10.1007/BF02310555](https://doi.org/10.1007%2FBF02310555). URL:
[http://link.springer.com/10.1007/BF02310555](http://link.springer.com/10.1007/BF02310555).

Gadermann, A. M., M. Guhn, et al. (2012). "Estimating ordinal
reliability for Likert-type and ordinal item response data: A
conceptual , empirical , and practical guide". In: _Practical
Assessment, Research & Evaluation_ 17.3, pp. 1-13. ISSN: 1531-7714.
URL:
[https://pareonline.net/pdf/v17n3.pdf](https://pareonline.net/pdf/v17n3.pdf).

Kuder, G. F. and M. W. Richardson (1937). "The theory of the estimation
of test reliability". In: _Psychometrika_ 2.3, pp. 151-160. ISSN:
0033-3123. DOI:
[10.1007/BF02288391](https://doi.org/10.1007%2FBF02288391). URL:
[http://link.springer.com/10.1007/BF02288391](http://link.springer.com/10.1007/BF02288391).

---

# References

---
class: center, bottom, inverse

# More info

Slides created with the <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;">  [ comment ]  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> package [`xaringan`](https://github.com/yihui/xaringan).

Practice is the best strategy for learning.

_In God we trust, all others bring data_

Edwards Deming

THE END

---
class: center, bottom, inverse

![:scale 50%](assets/gif/the_end.gif)