class: center, middle, inverse, title-slide .title[ # Psychometrics Applied to Organizational and Work Psychology ] .subtitle[ ##
Classical Test Theory ] .author[ ### Jorge Sinval
ThaĆs Zerbini ] .date[ ### 2024-10-01 ] --- class: inverse, center, middle # Readings <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> <style> .orange { color: #EB811B; } .kbd { display: inline-block; padding: .2em .5em; font-size: 0.75em; line-height: 1.75; color: #555; vertical-align: middle; background-color: #fcfcfc; border: solid 1px #ccc; border-bottom-color: #bbb; border-radius: 3px; box-shadow: inset 0 -1px 0 #bbb } </style>
--- # Readings --- class: inverse, center, middle # CTT <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- # CTT ## Readings .can-edit.key-measurement[ - thing one - thing two - ... ] --- # CTT ## Theory and Assumptions - Classical Test Theory (CTT) ā also known as weak or true-score test theory - Called classic relative to Item Response Theory (IRT) which is a more modern approach - CTT describes a set of psychometric procedures used to test items and scales reliability, difficulty, discrimination, etc. .footnote[In the context of CTT, a psychometric instrument is said to have evidence of reliability if the error in the true score `\((\tau)\)` is minimal. ] --- # CTT - CTT analyses are the are most widely used form of analyses. - The statistics can be computed by readily available statistical packages - CTT Analyses are performed on the test as a whole rather than on the item and although item statistics can be generated, they apply only to that group of students on that collection of items --- # CTT ⢠Assumes that every person has a true score on an item or a scale if we can only measure it directly without error ⢠CTT analyses assumes that a personās test score is comprised of their ātrueā score plus some measurement error. ⢠This is the common true score model: \begin{align} X = \tau + \varepsilon \label{truescore} \end{align} An observed test-score of a person is the sum of that persons true score and an error of measurement <center> <div class="figure" style="text-align: center"> <img src="assets/img/ctt.gif" alt="Graphical representation of the CTT. This figure was extracted from <a href="https://conjointly.com/kb/true-score-theory/">https://conjointly.com/kb/true-score-theory/</a>" width="30%" /> <p class="caption">Graphical representation of the CTT. This figure was extracted from <a href="https://conjointly.com/kb/true-score-theory/">https://conjointly.com/kb/true-score-theory/</a></p> </div> <center> --- # CTT - Based on the expected values of each component for each person we can see that \eqref{expectation}: \begin{align} \mathbb{E}(X_i)=\tau_i \label{expectation} \end{align} The expected value of observed scores is the true score `\(\varepsilon_i=X_i-\tau_i\)` `\(\mathbb{E}(X_i-\tau_i)=\mathbb{E}(X_i)-\mathbb{E}(\tau_i)=\tau_i-\tau_i=0\)` `\(\varepsilon\)` and `\(X\)` are random variables, `\(\tau\)` is constant However this is theoretical and **not done at the individual level**. --- # CTT \begin{align} \rho_{ \varepsilon,\tau}=0 \label{corr1} \end{align} The error of measurement on a test and the true scores on that test are uncorrelated \begin{align} \rho_{ \varepsilon_1,\varepsilon_2}=0 \label{corr2} \end{align} Error scores on two different tests are uncorrelated \begin{align} \rho_{ \varepsilon_1,\tau_2}=0 \label{corr3} \end{align} The error of measurement on a test and the true scores on all other tests are uncorrelated If two tests have observed scores `\(X\)` and `\(X^\prime\)` that satisfy assumptions \eqref{truescore} to \eqref{corr3}, and if, for every population of examinees, `\(\tau = \tau^\prime\)` and `\(\sigma^2_\varepsilon=\sigma^2_{\varepsilon^\prime}\)`, then the tests are called **parallel tests**. In other words, parallel tests have the same true scores and error variances. If two tests have observed scores `\(X_1\)` and `\(X_2\)` that satisfy assumptions \eqref{truescore} to \eqref{corr3}, and if, for every population of examinees, `\(\tau_1=\tau_2+c\)`, where `\(c\)` is a constant, then the tests are called essentially `\(\tau-\text{equivalent tests}\)`. To put it differently, **essentially Ļ-equivalent tests** have true scores that differ by a constant. --- # CTT If we assume that people are randomly selected then `\(\tau\)` becomes a random variable as well (as seen in equation \ref{truescore}) Therefore, in CTT we assume that `\(\varepsilon\)` (i.e., the error)): * Has a mean of zero (i.e., `\(\mu = 0\)`) * Is normally distributed (i.e., `\(\mathcal{N}(0,\sigma)\)`) * Uncorrelated with true score (i.e., `\(\rho_{\varepsilon, \tau}=0\)` see equation \eqref{corr1}) --- # CTT .pull-left[ .center[  ] ] .pull-right[ .center[  ] ] --- # CTT Measurement error `\((\varepsilon)\)` around a `\(\tau\)` can be large or small, for example `\(X_1\)`, `\(X_2\)`, and `\(X_3\)`. .center[  ] --- # CTT ## Domain Sampling Theory<sup>š”</sup> ⢠Another Central Component of CTT ⢠Another way of thinking about populations and samples ⢠Domain — Population or universe of all possible items measuring a single concept or trait (theoretically infinite) ⢠Test — a sample of items from that universe .footnote[<sup>š”</sup>Assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items. Domain sampling is the most common CTT used for practical purposes.] --- # CTT ## Domain Sampling Theory ⢠A personās true score would be obtained by having them respond to all items in the "universe" of items ⢠We only see responses to the sample of items on the test ⢠So, reliability is the proportion of variance in the "universe" explained by the test variance --- # CTT ## Domain Sampling Theory ⢠A universe is made up of a (possibly infinitely) large number of items ⢠So, as tests get longer they represent the domain better, therefore longer tests should have higher reliability ⢠Also, if we take multiple random samples from the population we can have a distribution of sample scores that represent the population --- # CTT ## Domain Sampling Theory ⢠Each random sample from the universe would be "randomly parallel" to each other ⢠Unbiased estimate of reliability: \begin{align} r_{1, \tau }=\sqrt{\bar{r}_1 ,_j} \label{reliability} \end{align} * `\(r_{1,\tau} =\)` correlation between test and true score * `\(\bar r_{1,j} =\)` average correlation between the test and all other randomly parallel tests --- # CTT ## Reliability ⢠Reliability is theoretically the correlation between a `\(X\)` (test-score) and a `\(\tau\)` (the true score), squared ⢠Essentially the proportion of `\(X\)` that is `\(\tau\)` `$$\rho^2_{X,\tau}=\frac{\sigma^2_{\tau}}{\sigma^2_{X}}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\varepsilon}} \label{reliabilityfrac}$$` ⢠This canāt be measured directly so we use other methods to estimate --- # CTT ## Reliability ⢠Reliability can be viewed as a measure of consistency or how well as test "holds together" ⢠Reliability is measured on a scale of `\(0-1\)`. The greater the number the higher the reliability<sup>ā ļø</sup>. .footnote[<sup>ā ļø</sup>Values very close to `\\(1\\)` can be seem as indicative of redundancy between the items.] --- # CTT ## Reliability The approach to estimating reliability depends on: * Estimation of "true" score * Source of measurement error Types of reliability: * Test-retest * Parallel Forms * Split-half * Internal Consistency --- # CTT ## Test-Retest Reliability ⢠Evaluates the error associated with administering a test at two different times. ⢠_Time Sampling Error_ ⢠How-To: ⢠Apply the psychometric instrument at Time 1 `\((X_1)\)` ⢠Apply the psychometric instrument at Time 2 `\((X_2)\)` ⢠Calculate `\(r_{X_1,X_2}\)` for the two scores ⢠Easy to do; one test does it all. --- # CTT ## Test-Retest Reliability ⢠Assume 2 administrations `\(X_1\)` and `\(X_2\)`: `$$\varepsilon_{X_{1,i}} = \varepsilon_{X_{2,i}} ~~~~~~ \sigma^2_{\varepsilon_{1,i}}=\sigma^2_{\varepsilon_{2,i}} \therefore \rho_{X_1,X_2}=\frac{\sigma_{X_1,X_2}}{\sigma_{X_1}\sigma_{X_2}}=\frac{\sigma^2_{\tau}}{\sigma^2_{X}}=\rho_{X,\tau}$$` ⢠The correlation between the 2 administrations is the reliability --- # CTT ## Test-Retest Reliability ⢠Sources of error: * random fluctuations in performance * uncontrolled testing conditions * extreme changes in weather * sudden noises/chronic noise * other distractions ⢠internal factors: * illness, fatigue, emotional strain, worry * recent experiences --- # CTT ## Test-Retest Reliability Generally used to evaluate constant traits: * Intelligence, personality Not appropriate for qualities that change rapidly over time: * Mood, hunger Problem: Carryover Effects (Exposure to the test at time #1 influences scores on the test at time #2) Only a problem when the effects are random. If everybody goes up 5pts, you still have the same variability --- # CTT ## Test-Retest Reliability ⢠Practice effects * Type of carryover effect * Some skills improve with practice * Manual dexterity, ingenuity or creativity * Practice effects may not benefit everybody in the same way. Carryover & Practice effects more of a problem with short inter-test intervals (ITI). But, longer ITIās have other problems: * developmental change, maturation, exposure to historical events --- # CTT ## Parallel Forms Reliability Evaluates the error associated with selecting a particular set of items. _Item Sampling Error_ How To: * Develop a large pool of items (i.e. Domain) of varying difficulty. * Choose equal distributions of difficult / easy items to produce multiple forms of the same test. * Give both forms close in time. * Calculate `\(r\)` for the two administrations. --- # CTT ## Parallel Forms Reliability Also known as _Alternative Forms_ or _Equivalent Forms_ Can give parallel forms at different points in time; produces error estimates of time and item sampling. One of the most rigorous assessments of reliability currently in use. Infrequently used in practice ā too expensive to develop two tests. --- # CTT ## Parallel Forms Reliability Assume 2 parallel tests `\(X\)` and `\(X^\prime\)`: `$$\varepsilon(X_i)=\varepsilon(X_i^\prime) ~~~~~~ \sigma^2_{E_i}=\sigma^2_{E^\prime_i}$$` `$$\therefore \rho_{XX^\prime} = \frac{\sigma_{XX^\prime}}{\sigma_{X}\sigma_{X^\prime}}= \frac{\sigma^2_{\tau}}{\sigma^2_{X}}=\rho_{XT}$$` ⢠The correlation between the 2 parallel forms is the reliability --- # CTT ## Split Half Reliability What if we treat halves of one test as parallel forms? (Single test as whole domain) Thatās what a split-half reliability does This is testing for _Internal Consistency_ * Scores on one half of a test are correlated with scores on the second half of a test Big question: āHow to split?ā: * First half vs. last half * Odd vs Even * Create item groups called testlets --- # CTT ## Split Half Reliability How to: * Compute scores for two halves of single test, calculate `\(r\)`. Problem: * Considering the domain sampling theory whatās wrong with this approach? * A `\(20\)` item test cut in half, is two `\(10-\)`item tests, what does that do to the reliability? * If only we could correct for that⦠--- # CTT ## Spearman-Brown Formula Estimates the reliability for the entire test based on the split-half Can also be used to estimate the affect changing the number of items on a test has on the reliability `\(r^\ast = \frac{j(r)}{1+(j-1)r}\)` Where `\(r^\ast\)` is the estimated reliability, `\(r\)` is the correlation between the halves, `\(j\)` is the new length proportional to the old length --- # CTT ## Spearman-Brown Formula For a split-half it would be: `$$r^\ast=\frac{2(r)}{(1+r)}$$` Since the full length of the test is twice the length of each half --- # CTT ## Spearman-Brown Formula **Example 1:** a 30-item test with a split-half reliability of `\(.65\)` `$$r^\ast=\frac{2(.65)}{(1+.65)}=.79$$` ⢠The `\(.79\)` is a much better reliability than the `\(.65\)` --- # CTT ## Spearman-Brown Formula **Example 2**: a 30-item test with a test retest reliability of `\(.65\)` is lengthened to `\(90\)` items `$$r^\ast=\frac{3(.65)}{1+(3-1).65}=\frac{1.95}{2.3}=.85$$` **Example 3**: a 30 item test with a test re-test reliability of .65 is cut to 15 items `$$r^\ast=\frac{.5(.65)}{1+(.5-1).65}=\frac{.325}{.675}=.48$$` --- # CTT ## Detour 1: Variance Sum Law Often multiple items are combined in order to create a composite score The variance of the composite is a combination of the variances and covariances of the items creating it General Variance Sum Law states that if `\(X\)` and `\(Y\)` are random variables: `$$\sigma^2_{X \pm Y}=\sigma^2_{X}+\sigma^2_{Y}\pm2\sigma^2_{XY}$$` --- # CTT ## Detour 1: Variance Sum Law Given multiple variables we can create a variance/covariance matrix For 3 items: <table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> \(X_1\) </th> <th style="text-align:left;"> \(X_2\) </th> <th style="text-align:left;"> \(X_3\) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> \(X_1\) </td> <td style="text-align:left;"> \(\sigma^2_1\) </td> <td style="text-align:left;"> \(\sigma_{12}\) </td> <td style="text-align:left;"> \(\sigma_{13}\) </td> </tr> <tr> <td style="text-align:left;"> \(X_2\) </td> <td style="text-align:left;"> \(\sigma_{21}\) </td> <td style="text-align:left;"> \(\sigma^2_{2}\) </td> <td style="text-align:left;"> \(\sigma_{23}\) </td> </tr> <tr> <td style="text-align:left;"> \(X_3\) </td> <td style="text-align:left;"> \(\sigma_{31}\) </td> <td style="text-align:left;"> \(\sigma_{32}\) </td> <td style="text-align:left;"> \(\sigma^2_3\) </td> </tr> </tbody> </table> --- # CTT ## Detour 1: Variance Sum Law Example Variables `\(X\)`, `\(Y\)` and `\(Z\)` Covariance Matrix <table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> \(X\) </th> <th style="text-align:right;"> \(Y\) </th> <th style="text-align:right;"> \(Z\) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> \(X\) </td> <td style="text-align:right;"> 55.83 </td> <td style="text-align:right;"> 29.52 </td> <td style="text-align:right;"> 30.33 </td> </tr> <tr> <td style="text-align:left;"> \(Y\) </td> <td style="text-align:right;"> 29.52 </td> <td style="text-align:right;"> 17.49 </td> <td style="text-align:right;"> 16.15 </td> </tr> <tr> <td style="text-align:left;"> \(Z\) </td> <td style="text-align:right;"> 30.33 </td> <td style="text-align:right;"> 16.15 </td> <td style="text-align:right;"> 29.06 </td> </tr> </tbody> </table> By the variance sum law the composite variance would be: `$$\sigma^2_{X+Y+Z}=\sigma^2_{Total}=\sigma^2_{X}+\sigma^2_{Y}+\sigma^2_{Z}+2\sigma_{XY}+2\sigma_{XZ}+2\sigma_{YZ}$$` --- # CTT ## Detour 1: Variance Sum Law <table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> \(X\) </th> <th style="text-align:right;"> \(Y\) </th> <th style="text-align:right;"> \(Z\) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> \(X\) </td> <td style="text-align:right;"> 55.83 </td> <td style="text-align:right;"> 29.52 </td> <td style="text-align:right;"> 30.33 </td> </tr> <tr> <td style="text-align:left;"> \(Y\) </td> <td style="text-align:right;"> 29.52 </td> <td style="text-align:right;"> 17.49 </td> <td style="text-align:right;"> 16.15 </td> </tr> <tr> <td style="text-align:left;"> \(Z\) </td> <td style="text-align:right;"> 30.33 </td> <td style="text-align:right;"> 16.15 </td> <td style="text-align:right;"> 29.06 </td> </tr> </tbody> </table> By the variance sum law the composite variance would be: `\(S^2_{total}=55.83+17.49+29.06+2\times29.52+2\times30.33+2\times16.15=254.38\)` --- # CTT ## Internal Consistency Reliability ⢠If items are measuring the same construct they should elicit similar if not identical responses ⢠Coefficient OR Cronbachās Alpha is a widely used measure of internal consistency for continuous data ⢠Knowing the a composite is a sum of the variances and covariances of a measure we can assess consistency by how much covariance exists between the items relative to the total variance --- # CTT ## Internal Consistency Reliability ⢠Coefficient Alpha is defined as: `$$\alpha = \frac{k}{k-1}\left(\frac{\sum S_{ij}}{S^2_{Total}}\right)$$` ⢠`\(S^2_{Total}\)` is the composite variance (if items were summed) ⢠`\(S_{ij}\)` is covariance between the `\(i^{th}\)` and `\(j^{th}\)` items where `\(i \neq j\)` ⢠`\(k\)` is the number of items --- # CTT ## Internal Consistency Reliability ⢠Using the same continuous items `\(X\)`, `\(Y\)` and `\(Z\)` ⢠The covariance matrix is: <table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> \(X\) </th> <th style="text-align:right;"> \(Y\) </th> <th style="text-align:right;"> \(Z\) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> \(X\) </td> <td style="text-align:right;"> 55.83 </td> <td style="text-align:right;"> 29.52 </td> <td style="text-align:right;"> 30.33 </td> </tr> <tr> <td style="text-align:left;"> \(Y\) </td> <td style="text-align:right;"> 29.52 </td> <td style="text-align:right;"> 17.49 </td> <td style="text-align:right;"> 16.15 </td> </tr> <tr> <td style="text-align:left;"> \(Z\) </td> <td style="text-align:right;"> 30.33 </td> <td style="text-align:right;"> 16.15 </td> <td style="text-align:right;"> 29.06 </td> </tr> </tbody> </table> ⢠The total variance is `\(254.38\)` ⢠The sum of all the covariances is `\(152\)` `$$\alpha = \frac{k}{k-1}\left(\frac{\sum S_{ij}}{S^2_{Total}}\right)= \frac{3}{3-1}\left(\frac{152}{254.38}\right)=0.8962969$$` --- # CTT ## Internal Consistency Reliability ⢠Coefficient Alpha can also be defined as: `$$\alpha=\frac{k}{k-1}\left(\frac{S^2_{Total}-\sum S^2_i}{S^2_{Total}}\right)$$` ⢠`\(S^2_{Total}\)` is the composite variance (if items were summed) ⢠`\(S^2_{i}\)` is variance for each item ⢠`\(k\)` is the number of items --- # CTT ## Internal Consistency Reliability ⢠Using the same continuous items `\(X\)`, `\(Y\)` and `\(Z\)` ⢠The covariance matrix is: <table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> \(X\) </th> <th style="text-align:right;"> \(Y\) </th> <th style="text-align:right;"> \(Z\) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> \(X\) </td> <td style="text-align:right;"> 55.83 </td> <td style="text-align:right;"> 29.52 </td> <td style="text-align:right;"> 30.33 </td> </tr> <tr> <td style="text-align:left;"> \(Y\) </td> <td style="text-align:right;"> 29.52 </td> <td style="text-align:right;"> 17.49 </td> <td style="text-align:right;"> 16.15 </td> </tr> <tr> <td style="text-align:left;"> \(Z\) </td> <td style="text-align:right;"> 30.33 </td> <td style="text-align:right;"> 16.15 </td> <td style="text-align:right;"> 29.06 </td> </tr> </tbody> </table> ⢠The total variance is `\(254.38\)` ⢠The sum of all the variances is `\(102.38\)` `\(\alpha=\frac{k}{k-1}\left(\frac{S^2_{Total}-\sum S^2_i}{S^2_{Total}}\right)=\frac{3}{3-1}\left(\frac{254.38-102.38}{254.38}\right)=0.8962969\)` --- # CTT ## Internal Consistency Reliability: Example <div class="pre-name">internal_consistency.R</div> ``` r #download data ds <- readr::read_csv('https://ndownloader.figshare.com/files/22299075') #the function to be used "ufs" ufs::scaleStructure(dat = ds, items = c("SIJS1","SIJS2", "SIJS3","SIJS4", "SIJS5")) ``` + `dat` — set the dataset. + `items` — sets the for which the reliability (internal consistency) estimates should be computed. --- # CTT ## Internal Consistency Reliability: Example .scroll-output[ <div style="display:block;clear:both;" class="scale-structure-start"></div> <div class="scale-structure-container"> ### Scale structure #### Information about this scale <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Dataframe: </td> <td style="text-align:left;"> ds </td> </tr> <tr> <td style="text-align:left;"> Items: </td> <td style="text-align:left;"> SIJS1, SIJS2, SIJS3, SIJS4 & SIJS5 </td> </tr> <tr> <td style="text-align:left;"> Observations: </td> <td style="text-align:left;"> 1171 </td> </tr> <tr> <td style="text-align:left;"> Positive correlations: </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> Number of correlations: </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> Percentage positive correlations: </td> <td style="text-align:left;"> 100 </td> </tr> </tbody> </table> #### Estimates assuming interval level <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Omega (total): </td> <td style="text-align:right;"> 0.85 </td> </tr> <tr> <td style="text-align:left;"> Omega (hierarchical): </td> <td style="text-align:right;"> 0.81 </td> </tr> <tr> <td style="text-align:left;"> Revelle's Omega (total): </td> <td style="text-align:right;"> 0.88 </td> </tr> <tr> <td style="text-align:left;"> Greatest Lower Bound (GLB): </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> Coefficient H: </td> <td style="text-align:right;"> 0.89 </td> </tr> <tr> <td style="text-align:left;"> Coefficient Alpha: </td> <td style="text-align:right;"> 0.84 </td> </tr> </tbody> </table> ##### Confidence intervals <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Omega (total): </td> <td style="text-align:left;"> [0.83; 0.86] </td> </tr> <tr> <td style="text-align:left;"> Coefficient Alpha: </td> <td style="text-align:left;"> [0.83; 0.85] </td> </tr> </tbody> </table> #### Estimates assuming ordinal level <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Ordinal Omega (total): </td> <td style="text-align:right;"> 0.88 </td> </tr> <tr> <td style="text-align:left;"> Ordinal Omega (hierarch.): </td> <td style="text-align:right;"> 0.88 </td> </tr> <tr> <td style="text-align:left;"> Ordinal Coefficient Alpha: </td> <td style="text-align:right;"> 0.88 </td> </tr> </tbody> </table> ##### Confidence intervals <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Ordinal Omega (total): </td> <td style="text-align:left;"> [0.87; 0.89] </td> </tr> <tr> <td style="text-align:left;"> Ordinal Coefficient Alpha: </td> <td style="text-align:left;"> [0.86; 0.89] </td> </tr> </tbody> </table> Note: the normal point estimate and confidence interval for omega are based on the procedure suggested by Dunn, Baguley & Brunsden (2013) using the MBESS function ci.reliability, whereas the psych package point estimate was suggested in Revelle & Zinbarg (2008). See the help ('?ufs::scaleStructure') for more information. </div> <div style="display:block;clear:both;" class="scale-structure-end"></div> ] --- # CTT ## Internal Consistency Reliability ⢠Coefficient Alpha is considered a lower-bound estimate of the reliability of continuous items ⢠It was developed by Cronbach (1951) in the 50ās but is based on an earlier formula by Kuder and Richardson (1937) that tackled internal consistency for dichotomous ("Yes"/"No", "Right"/"Wrong") items ⢠ā ļø Internal consistency estimates for ordinal data can be artificially attenuated if we assume interval level data (Gadermann, Guhn, and Zumbo, 2012). --- # CTT ## Detour 2: Dichotomous Items ⢠If `\(Y\)` is a dichotomous item: + `\(p\)` — proportion of successes OR items answer correctly + `\(q\)` — proportion of failures OR items answer incorrectly + `\(\bar Y=p\)` — observed proportion of successes + `\(S^2_Y = pq\)` --- # CTT ## Internal Consistency Reliability ⢠Kuder and Richardson (1937) developed the `\(KR_{20}\)` that is defined as: `$$KR_{20}=\alpha=\frac{k}{k-1}\left(\frac{S^2_{Total}-\sum pq}{S^2_{Total}}\right)$$` ⢠Where `\(pq\)` is the variance for each dichotomous item ⢠The `\(KR_{21}\)` is a quick and dirty estimate of the `\(KR_{20}\)` --- # CTT ## Reliability of Observations ⢠What if youāre not using a test but instead observing individualās behaviors as a psychological assessment tool? ⢠How can we tell if the judgeās (assessorās) are reliable? --- # CTT ## Reliability of Observations ⢠Typically a set of criteria are established for judging the behavior and the judge is trained on the criteria -- ⢠Then to establish the reliability of both the set of criteria and the judge, multiple judges rate the same series of behaviors -- ⢠The correlation between the judges is the typical measure of reliability -- ⢠But, couldnāt they agree by accident? Especially on dichotomous or ordinal scales? --- # CTT ## Reliability of Observations ⢠`\(\kappa\)` (kappa) is a measure of inter-rater reliability that controls for chance agreement ⢠Values range from `\(-1\)` (less agreement than expected by chance) to `\(1\)` (perfect agreement) ⢠`\(\kappa \geq .75\)` — excellent ⢠`\(.40 \leq \kappa < .75\)` — fair to good ⢠`\(\kappa <.40\)` — poor --- # CTT ## Standard Error of Measurement ⢠So far the standard error of measurement was approached as the error associated with trying to estimate a true score from a specific test ⢠This error can come from many sources ⢠We can calculate itās size by: `$$S_{measurement} = S\sqrt{1-r}$$` ⢠`\(S\)` is the standard deviation ⢠`\(r\)` is reliability --- # CTT ## Standard Error of Measurement ⢠Using the same continuous items `\(X\)`, `\(Y\)` and `\(Z\)` ⢠The total variance is 254.38 ⢠`\(s = \sqrt{254.38} = 15.9492947\)` ⢠`\(\alpha = 0.8962969\)` `$$s_{measurement}=15.9492947\times \sqrt{1-0.8962969}=5.1361464$$` --- # CTT ## The Prophecy Formula ⢠How much reliability do we want? ⢠Typically we want values above `\(.80\)` ⢠What if we donāt have them? ⢠The Spearman-Brown can be algebraically manipulated to achieve `$$j=\frac{r_d\left(1-r_o\right)}{r_o\left(1-r_d\right)}$$` ⢠`\(j\)` — # of tests at the current length ⢠`\(r_d\)` — desired reliability ⢠`\(r_o\)` — observed reliability --- # CTT ## The Prophecy Formula ⢠Using the same continuous items `\(X\)`, `\(Y\)` and `\(Z\)` ⢠`\(\alpha = 15.9492947\times \sqrt{1-0.8962969}\)` ⢠What if we want a .95 reliability? `$$j=\frac{r_d\left(1-r_o\right)}{r_o\left(1-r_d\right)}=\frac{.95\left(1-0.8962969\right)}{0.8962969\left(1-.95\right)}=\frac{0.098518}{0.0448148}=2.1983333$$` ⢠We need a test that is `\(2.2\)` times longer than the original ⢠Nearly `\(7\)` items to achieve .95 reliability --- # CTT ## Attenuation ⢠Correlations are typically sought at the true score level but the presence of measurement error can cloud (attenuate) the size the relationship ⢠We can correct the size of a correlation for the low reliability of the items. ⢠Called the Correction for Attenuation --- # CTT ## Attenuation ⢠Correction for attenuation is calculated as: `$$\hat r_{12}=\frac{r_{12}}{\sqrt{r_{11}r_{22}}}$$` ⢠`\(\hat r_{12}\)` — corrected correlation ⢠`\(\sqrt{r_{12}}\)` — uncorrected correlation ⢠`\(\sqrt{r_{11}}\)` and `\(\sqrt{r_{22}}\)` — the reliabilities of the tests --- # CTT ## Attenuation ⢠For example `\(X\)` and `\(Y\)` are correlated at `\(.45\)`, `\(X\)` has a reliability of `\(.8\)` and `\(Y\)` has a reliability of `\(.6\)`, the corrected correlation is `$$\hat r_{12}=\frac{r_{12}}{\sqrt{r_{11}r_{22}}}=\frac{.45}{\sqrt{.8\times.6}}=\frac{.45}{\sqrt{.48}}=.65$$` --- # References Cronbach, L. J. (1951). "Coefficient alpha and the internal structure of tests". In: _Psychometrika_ 16.3, pp. 297-334. ISSN: 0033-3123. DOI: [10.1007/BF02310555](https://doi.org/10.1007%2FBF02310555). URL: [http://link.springer.com/10.1007/BF02310555](http://link.springer.com/10.1007/BF02310555). Gadermann, A. M., M. Guhn, et al. (2012). "Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual , empirical , and practical guide". In: _Practical Assessment, Research & Evaluation_ 17.3, pp. 1-13. ISSN: 1531-7714. URL: [https://pareonline.net/pdf/v17n3.pdf](https://pareonline.net/pdf/v17n3.pdf). Kuder, G. F. and M. W. Richardson (1937). "The theory of the estimation of test reliability". In: _Psychometrika_ 2.3, pp. 151-160. ISSN: 0033-3123. DOI: [10.1007/BF02288391](https://doi.org/10.1007%2FBF02288391). URL: [http://link.springer.com/10.1007/BF02288391](http://link.springer.com/10.1007/BF02288391). --- # References --- class: center, bottom, inverse # More info -- Slides created with the <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> package [`xaringan`](https://github.com/yihui/xaringan). -- <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> <g label="icon" id="layer6" groupmode="layer"> <path id="path2" d="M 132.62426,316.69067 C 119.2805,301.94483 112.56962,274.5073 112.56962,234.39862 v -54.79191 c 0,-37.32217 -5.81677,-63.58084 -17.532347,-78.83466 -11.6757,-15.293118 -31.159702,-22.922596 -58.353466,-22.922596 -5.958581,0 -11.409226,0.22492 -16.45319,0.5917 -5.04455,0.427121 -9.742846,1.037046 -14.1564111,1.83092 V 95.057199 H 16.671281 c 12.325533,0 20.908335,3.82414 25.667559,11.532201 4.77973,7.74964 7.139712,25.48587 7.139712,53.14663 v 68.01321 c 0,42.12298 13.016861,74.19672 39.233939,96.16314 19.627549,16.47424 46.636229,27.23363 81.030059,32.40064 v -20.17708 c -16.3928,-4.27176 -29.04346,-10.51565 -37.11829,-19.44413 z m 246.75144,0 c 13.34377,-14.74584 20.05466,-42.18337 20.05466,-82.29205 v -54.79191 c 0,-37.32217 5.81673,-63.58084 17.53235,-78.83466 11.67568,-15.293118 31.15971,-22.922596 58.35348,-22.922596 5.95858,0 11.40922,0.22492 16.45315,0.5917 5.04457,0.427121 9.74287,1.037046 14.15645,1.83092 v 14.785125 h -10.59712 c -12.32549,0 -20.90826,3.82414 -25.66752,11.532201 -4.77974,7.74964 -7.13972,25.48587 -7.13972,53.14663 v 68.01321 c 0,42.12298 -13.01688,74.19672 -39.23394,96.16314 -19.6275,16.47424 -46.63622,27.23363 -81.03006,32.40064 v -20.17708 c 16.39279,-4.27176 29.04347,-10.51565 37.11827,-19.44413 z M 303.95857,87.165762 c 8.42049,-6.691524 25.52576,-10.536158 51.23486,-11.492333 V 63.999997 H 156.80716 v 11.673432 c 26.1755,0.956175 43.38268,4.800809 51.68248,11.492333 8.31852,6.73139 12.40691,20.033568 12.40691,39.904818 V 384.6851 c 0,20.80641 -4.08839,34.5146 -12.40691,41.02332 -8.2998,6.56905 -25.50698,10.10729 -51.68248,10.65744 V 448 h 197.71597 l 0.67087,-11.63414 c -25.50471,-0.54955 -42.56835,-4.35266 -51.07201,-11.40918 -8.4182,-6.95638 -12.73153,-20.44184 -12.73153,-40.27158 V 127.07058 c 0,-19.87125 4.16983,-33.173428 12.56922,-39.904818 z" style="stroke-width:0.0753388"></path> </g></svg> + <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> = <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:red;"> [ comment ] <path d="M462.3 62.6C407.5 15.9 326 24.3 275.7 76.2L256 96.5l-19.7-20.3C186.1 24.3 104.5 15.9 49.7 62.6c-62.8 53.6-66.1 149.8-9.9 207.9l193.5 199.8c12.5 12.9 32.8 12.9 45.3 0l193.5-199.8c56.3-58.1 53-154.3-9.8-207.9z"></path></svg> -- <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> has infinite possibilities. -- Practice is the best strategy for learning. -- . -- _In God we trust, all others bring data_ -- Edwards Deming -- . -- . -- . -- THE END --- class: center, bottom, inverse 