class: center, middle, inverse, title-slide .title[ # Computational Psychomet
[ comment ]
ics Applied to Enginee
[ comment ]
ing ] .subtitle[ ##
Developing New Instruments ] .author[ ### Jo
[ comment ]
ge Sinval ] .date[ ### 2025-11-21 ] --- class: inverse, center, middle # .white[Create or Adapt?] <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> <style> .orange { color: #EB811B; } .white { color: #FFFFFF; } .red { color: #FF0000; } .green { color: #00FF00; } .kbd { display: inline-block; padding: .2em .5em; font-size: 0.75em; line-height: 1.75; color: #555; vertical-align: middle; background-color: #fcfcfc; border: solid 1px #ccc; border-bottom-color: #bbb; border-radius: 3px; box-shadow: inset 0 -1px 0 #bbb } </style>
--- # Create or Adapt? ## Create .can-edit.key-validity[ - advantages - disadvantages - ... ] ## Adapt .can-edit.key-validity[ - advantages - disadvantages - ... ] -- --- # Create or Adapt? ## Too many psychometric instruments? Yes... .pull-left-1[ <iframe src="https://www.linkedin.com/embed/feed/update/urn:li:share:7217498040100696065" height="732" width="504" frameborder="0" allowfullscreen="" title="Publicação incorporada"></iframe> ] .pull-right-2[ <div class="figure" style="text-align: center"> <img src="assets/img/constructs_redundancy.jpeg" alt="See Banks, Gooty, Ross, Williams, and Harrington (2018)." width="50%" /> <p class="caption">See Banks, Gooty, Ross, Williams, and Harrington (2018).</p> </div> ] --- # Create or Adapt? ## Too many psychometric instruments? Yes... .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt4[ A large proportion of measures are only used once or twice and the widespread lack of agreement about their implementation threatens research credibility. The <strong>S</strong>tandardisation <strong>O</strong>f <strong>BE</strong>havior <strong>R</strong>esearch (SOBER) guidelines that specifically address issues of flexibility and norming in measurement. .tr[ 📖 (Elson, Hussey, Alsalti, and Arslan, 2023) ]] --- # Create or Adapt? ## Too many new measures? <div class="figure" style="text-align: center"> <img src="assets/img/Elson2023.png" alt="Measurement proliferation in psychology. Reprinted from Elson Hussey et al. (2023)." width="50%" /> <p class="caption">Measurement proliferation in psychology. Reprinted from Elson Hussey et al. (2023).</p> </div> --- # Create or Adapt? ## What to Report? .font60[ |Policy|Compliance by authors|Enforcement by reviewers and editors| |---|---|---| |Demonstrate nonredundancy|When creating a new measure for a primary study, provide evidence of nonredundancy (with other measures and/or constructs) or incremental validity in a separate sample, e.g. through latent variable-based analyses of associations with a large selection of scales as opposed to simple correlations.|Require validation and norming in independent data, or an explanation why this is not necessary. Reject studies that use novel/ad-hoc measures without providing validity evidence from independent data.| |Demonstrate protocol adherence|When following a measurement procedure published elsewhere, cite relevant protocols, and demonstrate you adhered to them (e.g., by sharing study materials).|Check claims of protocol adherence by comparing materials and cited sources. Journals dedicate resources to this task.| |Justify Modifications|Prove that any deviation or modification to an existing measure is either meaningful (e.g. to address non-invariance of a measure between samples) or irrelevant, e.g. by providing validity evidence in a separate sample. Document when deviations happened, and if possible assess the robustness of conclusions.|Discourage authors from modifying existing scales by, for instance, dropping “poorly performing” items if the authors cannot show how these deviations affect the measure out-of-sample. Journals incentivise methodological research primarily providing validity evidence for commonly used measures rather than answering substantive research questions.| |Preregistration and Registered Reports|Determine procedural and scoring details ahead of time, reporting all deviations as they potentially weaken the strength of conclusions. Analytical and statistical decisions should be preregistered with code rather than a narrative description.|Require authors to provide rationales for decision making in measurement specifically. Compare preregistrations (and any reported deviations) with what was actually employed and reported.| |Comprehensive Reporting|Report all of the items, stimuli, instructions, procedural parameters or other measurement characteristics used in a study or generated during the development process.|Check for comprehensive data and materials beyond what is reported in the manuscript.| |Facilitate Research Synthesis|Report standard deviations and means (regardless of whether data are shared), do not exclusively report effect sizes relative to the in-sample variation (so-called standardised effect sizes like Cohen’s `\(d\)`, correlations, `\(r^2\)`).|Insist on complete descriptive statistics to make rigorous meta-analysis feasible. Suggest effect sizes be standardised by test norms instead of in-sample variation.| ] The SOBER policies for psychological journals; how authors comply with them, and how they should be enforced by editors and reviewers. Reprinted from Elson Hussey et al. (2023). For a more detail see Supplementary Table 1 (<a href="https://static-content.springer.com/esm/art%3A10.1038%2Fs44271-023-00026-9/MediaObjects/44271_2023_26_MOESM1_ESM.pdf">here</a>). --- # Create or Adapt? ## Too many psychometric instruments? No... .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt4[ Measurement proliferation is an important part of theory development and that standardization comes at a cost to sensitive treatments of contextual characteristics of a given sample. .tr[ 📖 (Iliescu, Greiff, Ziegler, Nye, Geisinger, Sellbom, Samuel, and Saklofske, 2024) ]] --- # Developing New Instruments ## What to Report? Questionable measurement practices, including poor reporting (Flake and Fried, 2020), are a serious problem. .font60[ |Question| Information to report| |---|---| |1. What is your construct?| Define the construct<br>Describe theories and research supporting the construct| |2. Why and how did you select your measure?|Justify the measure selection<br>Report existing validity evidence| |3. What measure did you use to operationalize the construct?|Describe the measure and administration procedure<br>Match the measure to the construct| |4. How did you quantify your measure?|Describe response coding and transformation<br>Report the items or stimuli included in each score<br>Describe the calculation of scores<br>Describe all conducted (e.g., psychometric) analyses| |5. Did you modify the scale? And if so, how and why?|Describe any modifications<br>Indicate if modifications occurred before or after data collection<br>Provide justification for modifications| |6. Did you create a measure on the fly?|Justify why you did not use an existing measure<br>Report all measurement details for the new measure<br>Describe all available validity evidence; if there is no evidence, report that| ] Six Questions to Promote Transparent Reporting of Measurement Practices. Reprinted from Flake and Fried (2020). --- # Questionable Practices .pull-left-1[ <iframe src="https://www.linkedin.com/embed/feed/update/urn:li:share:7202277872902836226" height="463" width="504" frameborder="0" allowfullscreen="" title="I-O Psych Memes"></iframe> ] .pull-right-2[ <div class="figure" style="text-align: center"> <img src="assets/img/meme_items.jpeg" alt="Adding items from different instruments." width="65%" /> <p class="caption">Adding items from different instruments.</p> </div> ] --- # Choice of Instrument Before embarking on the development of a new psychometric instrument for research or practice, it is advisable to search for an instrument that has already been developed and meets the desired requirements (Gall, Gall, and Borg, 2014). --- # Importance of Instrument Choice The appropriate choice of measurement instrument is essential, as they have important value in making decisions about individuals, organizations, and institutions. --- # Strategy for Choosing Instruments Nitko (2015) review and evaluation strategy for choosing instruments includes six steps: -- Step 1: Clarify the Purpose -- Step 2: Use the Local Context -- Step 3: Study Professional Reviews -- Step 4: Obtain Copies of the Instrument -- Step 5: Summarize Strengths and Weaknesses -- Step 6: Reach a Conclusion --- # Step 1: Clarify the Purpose Define the objectives and specific needs for which the instrument will be used, considering the target population and intended outcomes. --- # Step 2: Use the Local Context Consider how the instrument will be used in the specific context and identify the needs, challenges, and resources available. --- # Step 3: Study Professional Reviews Examine the analyses from information sources on psychometric instruments such as the Mental Measurements Yearbook or the [Test Collection website](https://www.ets.org/test-collection.html) of the Educational Testing Service, scientific articles, technical manuals, and user guides. --- # Step 4: Obtain Copies of the Instrument Review the instrument materials, including test items, administration and scoring manuals, and any supplemental materials. --- # Step 5: Summarize Strengths and Weaknesses Based on the review and evaluation, identify the strengths and weaknesses of the instrument in relation to the specific context and needs. --- # Step 6: Reach a Conclusion Make a final decision on the appropriateness of the instrument for the desired purposes and prepare a justification for the choice based on the evidence gathered during the evaluation process. --- # Development of New Psychometric Instruments When creating an instrument there should be a strong enough underlying reason to justify the construction of a new instrument (Moreau and Wiebels, 2022). --- # What does the test measure? Them: What is that test actually measuring?<br><br>The researcher:<br> .center[ <video controls src="assets/vid/Aishaishcaliperce2024.mp4" height="400"></video> (Aish [@aish_caliperce], 2024) ] --- # Guidelines for Developing New Instruments Several guidelines exist to develop new psychometric instruments. Some examples: ## Bandalos (2018) 1 — State the purpose of the scale 2 — Identify and define the domain 3 — Determine whether a measure already exists 4 — Determine the item format 5 — Write out the testing objectives 6 — Create the initial item pool 7 — Conduct the initial item review 8 — Conduct preliminary Item tryouts 9 — Conduct a large-scale field test of items 10 — Prepare guidelines for administration --- # Guidelines for Developing New Instruments ## Boateng, Neilands, Frongillo, Melgar-Quiñonez, and Young (2018) 1 — identification of the domain(s) and item generation 2 — consideration of content validity 3 — pre-testing questions 4 — sampling and survey administration 5 — item reduction 6 — extraction of latent factors 7 — tests of dimensionality 8 — tests of reliability 9 — tests of validity -- Let's see it in more detail... --- # Boateng Neilands et al. (2018) .scroll-box-26[ <div class="figure" style="text-align: center"> <img src="assets/img/boateng2018.jpg" alt="An overview of the three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018)." width="50%" /> <p class="caption">An overview of the three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018).</p> </div> ] --- # Boateng Neilands et al. (2018) <div class="figure" style="text-align: center"> <img src="assets/img/boateng2018_table1.png" alt="The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018)." width="100%" /> <p class="caption">The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018).</p> </div> --- # Boateng Neilands et al. (2018) <div class="figure" style="text-align: center"> <img src="assets/img/boateng2018_table2.png" alt="The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018)." width="100%" /> <p class="caption">The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018).</p> </div> --- # Boateng Neilands et al. (2018) <div class="figure" style="text-align: center"> <img src="assets/img/boateng2018_table3.png" alt="The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018)." width="69%" /> <p class="caption">The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018).</p> </div> --- # Boateng Neilands et al. (2018) <div class="figure" style="text-align: center"> <img src="assets/img/boateng2018_table4.png" alt="The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018)." width="100%" /> <p class="caption">The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018).</p> </div> --- # Boateng Neilands et al. (2018) <div class="figure" style="text-align: center"> <img src="assets/img/boateng2018_table5.png" alt="The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018)." width="100%" /> <p class="caption">The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018).</p> </div> --- # Boateng Neilands et al. (2018) <div class="figure" style="text-align: center"> <img src="assets/img/boateng2018_table6.png" alt="The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018)." width="100%" /> <p class="caption">The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018).</p> </div> --- # Boateng Neilands et al. (2018) <div class="figure" style="text-align: center"> <img src="assets/img/boateng2018_table7.png" alt="The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018)." width="75%" /> <p class="caption">The three phases and nine steps of scale development. Reprinted from Boateng Neilands et al. (2018).</p> </div> --- # Guidelines for Developing New Instruments ## Clark and Watson (2019) ### Substantive validity: Conceptualization and development of an initial item pool - Conceptualization - Literature review - Hierarchical structure of constructs - Creation of item pool - Basic principles of item pool - Choice format - Derivative versions ### Structural validity: Item selection and psychometric evaluation - Test-construction strategies - Inital data collection - Psychometric evaluation: An iterative process --- # Guidelines for Developing New Instruments ## Clark and Watson (2019) ### External validity: An ongoing process - Convergent and discriminant validity - Criterion validity - Incremental validity - Cross-method analyses --- # Guidelines for Developing New Instruments ## And others... - Cohen, Schneider, and Tobin (2022) - Cooper (2023) - Irwing and Hughes (2018) - Lane, Raymond, and Haladyna (2016) - Muñiz and Fonseca-Pedrero (2019) - And, finally, the proposal by DeVellis and Thorpe (2021)... --- # Steps for Developing New Scales ## DeVellis and Thorpe (2021) I. One must clearly determine what one wants to measure -- II. Generate an item pool -- III. Determine the measurement format -- IV. Have the initial set of items reviewed by experts -- V. Cognitive Interviewing -- VI. Consider the inclusion of validation items -- VII. Administer the items to a development sample -- VIII. Evaluate the items -- IX. Optimize scale length --- class: inverse, center, middle # .white[Applied Example] <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- # I — Determine What to Measure The first step in creating a new psychometric instrument is to clearly define what you want to measure. In our case, we want to measure "Satisfaction with Public Transport." --- # I — Example For instance, we might want to measure satisfaction with aspects such as the punctuality of the service, the comfort of the vehicles, the courtesy of the drivers, the cleanliness of the vehicles, and the cost of the service. --- # I— Considerations We must be sensitive to how specific the measure should be, what emphasis should be placed on one phenomenon rather than others, and whether it should be based in existing theory or instead open new intellectual paths. --- # II — Generate an Item Pool We should create an extensive group of items that are candidates for possible inclusion in the scale. --- # II — Example For example, items could include: "The buses/trains always arrive on time", "The vehicles are always clean", "The drivers are always courteous", "The cost of the service is reasonable." --- # II — Considerations There is not a specific number of initial items or even an ideal relationship between the initial number and the final number of items in the scale development. During the initial phases of item bank construction, it is recommended that the number of initial items be .orange[at least twice] the amount that is ultimately expected to form part of the final version of the measurement instrument . .bg-washe.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt3[ The item pool usually consists of at least 25 statements, and can be as many as 50 or more. The primary reason for using multiple items rather than a single statement is that each statement may have ambiguities and subtle biases, leading people to respond in a certain way. By summing or averaging across multiple related items, the impact of biases and imperfections contained in individual items can be minimized. A secondary reason for using multiple items concerns breadth. Attitudes are often multifaceted, involving cognitions, emotions, and behavioral tendencies. **A single item is unlikely to capture the full scope of the attitude in question; using multiple items potentially ameliorates this problem.** .tr[ (Fabrigar and Wood, 2007,p. 537) ]] --- # II — Considerations Some recommendations: - Items should be **simple**, **straightforward**, and appropriate for the **target population's reading level**. -- - Avoid: - Expressions that may become dated quickly - Colloquialisms that may not be familiar across different demographics - Items that everyone or no one will endorse - Complex or "double-barreled" items that assess more than one characteristic -- - The exact phrasing of items can greatly influence the construct that is being measured. For example, including any negative mood term virtually guarantees a substantial neuroticism/negative affectivity component to an item (Clark and Watson, 2019). --- # II — Considerations - It should allow the respondent to answer adequately. -- - Complex questions would make it hard for the respondent to answer correctly. -- - The language used should be suitable for the target population. -- - The balance between positive and negative items is necessary. --- # II — Considerations - Negative items sometimes hinder comprehension. We need to be sensitive to introduce (or not) items formulated in a negative or positive way. The use use of reversed items is a questionable practice, as it can include an aditional source of variance (Suárez-Alvarez, Pedrosa, Lozano, García-Cueto, Cuesta, and Muñiz, 2018). .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt4[ When developing a new scale, researchers may want to reduce the risk of misresponse to reversed items by fully labeling their scales. Otherwise, results may be biased against the inclusion of reversed items. .tr[ (Weijters, Cabooter, and Schillewaert, 2010) ]] --- # II — Considerations ## Effects on Factor Structure Including both negatively and positively keyed items often introduces additional dimensions . - This happens because items keyed in the same direction (positively or negatively) are more highly correlated within their respective sets than with items keyed in the opposite direction. This correlation pattern cannot be explained by a single factor. - This can create confusion when assessing the dimensionality of a scale. - Numerous studies have demonstrated this phenomenon, including: - Horan, DiStefano, and Motl (2003) - Hazlett-Stevens, Ullman, and Craske (2004) - Magazine, Williams, and Williams (1996) - Motl, Conroy, and Horan (2000) --- # II — Considerations **Possible solution for reversed items:** .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt4[ To minimize reversed item bias, researchers can disperse same-scale items throughout the questionnaire by including buffer items or items of other scales in between (Weijters, Geuens, and Schillewaert, 2009). In addition, when analyzing the data, researchers can include a method factor to account for reversed item bias (Weijters Geuens et al., 2009). .tr[ (Weijters Cabooter et al., 2010) ]] --- # II — Considerations - It should not measure the idealized answer of the respondent but their actual behavior. -- - Avoid having multiple items measuring the exact same facet of the construct. -- - The instrument should cover different aspects of the variable. -- - If all items are in the same format, the task will be tiring for the respondent. -- - The instrument should measure what it proposes and should appear to do so, although clear face validity can make a test more vulnerable to social desirability bias, as respondents may be influenced by what they perceive as socially desirable responses<sup>⚠️</sup>. .footnote[ <sup>⚠️</sup> — the term .red[face validity] originally aimed to incorporate the judgments of other laymen or experts into the final decisions on the "apparent validity" of an instrument. There has been considerably antipathy toward this term among many scholars (HajiPourNezhad, 2003). ] --- # II — Considerations - The instrument should have items that are able to detect people with high, medium, and low scores. - Items with words like "always", "never", "extremely" can provoke distorted responses. The use of these terms should be done with care. For more recommendations on item generation, see Angleitner and Wiggins (1986); Comrey (1988). --- # III — Determine the Measurement Format There are several types of measurement, and the type of items must be compatible with the format to be used. .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt4[ Investigators... do not appear to recognise that the data they obtain will be partly dependent upon the scales and formats that they have used. .tr[ (Hartley and Betts, 2010, p. 25) ]] --- # III — Example We could use a Likert scale, where 1 represents "Strongly Disagree" and 5 represents "Strongly Agree". --- # III — Considerations ## Likert<sup>🤔</sup> scales (Likert, 1932) - The most common format for measuring attitudes, opinions, and perceptions. -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ Likert scales are named after Rensis Likert (1903–1981) and are also known as summated rating scales because the scale score is a simple sum of responses over items. They are perhaps best known from items that allow respondents to express degrees of agreement, such as "Strongly agree," "Agree," "Agree somewhat," "Disagree somewhat," "Disagree," and "Strongly disagree" rather than a simple choice between agreement and disagreement. However, Likert scales contain a second element that Likert actually stressed more in his original article. This involves using equally spaced integral scale values, most simply, 1, 2, 3,..., to scale items with ordinal response categories instead of a more formal algorithm. Consequently, the logic he used in his classic 1932 paper applies to any ordinal multicategory rating scale, which includes estimates of frequency, anchored scales, and confidence ratings. .tr[ (Bernstein, 2005, p. 497) ]] .footnote[<sup>🤔</sup> Despite his influence, Likert's name is often mispronounced with a long "i" ("lie-kurt")—Likert himself pronounced his name "lick-urt" with a short "i." ] --- # III — Considerations ## Likert scales (Likert, 1932) - Likert (1932) developed a distinct method which involved rating the level of agreement or disagreement on empirically-derived scales, preferably with 5 points that are unweighted. These ratings were assigned to one-directional attitude statements, with the middle point representing a neutral stance. -- - A scale should only be referred to as a '**Likert scale**' if it adheres to Likert's guidelines. The term 'Likert-type'<sup>❓</sup> might be a more suitable substitute, but it doesn't clearly express how much the method differs from the original Likert procedure. For scales that do not fulfill Likert's criteria, I suggest using the more unbiased term '.orange[ordered response scale]''<sup>💡</sup>. -- - Although the Likert scale is widely used, it is not the only format available. Other formats include the Thurstone scale, the Guttman scale, the semantic differential scale, and the Stapel scale. .footnote[<sup>❓ 💡</sup> See the discussion [here](http://core.ecu.edu/psyc/wuenschk/StatHelp/Likert.htm) ] --- # III — Considerations ## Ordered response scales The term 'ordered response scale' is a more neutral term that can be used to describe any scale that has ordered response categories. .font60[ |Response Set|1|2|3|4|5| |:---:|:---:|:---:|:---:|:---:|:---:| |**Frequency**|Never|Rarely|Sometimes|Often|Always| |**Quality**|Very poor|Poor|Fair|Good|Excellent| |**Intensity**|None|Very mild|Mild|Moderate|Severe| |**Agreement**|Strongly disagree|Disagree|Neither agree nor disagree|Agree|Strongly Agree| |**Approval**|Strongly disapprove|Disapprove|Neutral|Approve|Strongly Approve| |**Awareness**|Not at all aware|Slightly aware|Moderately aware|Very aware|Extremely aware| |**Importance**|Not at all important|Slightly important|Moderately important|Very important|Extremely important| |**Familiarity**|Not at all familiar|Slightly familiar|Moderately familiar|Very familiar|Extremely familiar| |**Satisfaction**|Very dissatisfied|Dissatisfied|Neither satisfied nor dissatisfied|Satisfied|Very satisfied| |**Performance**|Far below average|Below average|Average|Above average|Far above average| ] --- # III — Considerations ## Likert scales (Likert, 1932): The implicite Model Following classical psychometric theory, it assumes that asking any question, such as attitudes toward a transportation mode, evokes a latent response, `\(\lambda_{ij}\)`, where `\(i\)` denotes the subject (out of `\(n\)` subjects) and `\(j\)` denotes the response (out of `\(k\)` items defining the scale). Next, assume that `\(\lambda_{ij}\)` is a weighted combination of a systematic (true) component `\((\tau_{ij})\)` and an error component `\((\varepsilon_{ij})\)`, as definedby the following general model: \begin{align} \lambda\_{ij} = \beta\_i \tau\_{ij} + \gamma\_i \varepsilon\_{ij} \label{impmod} \end{align} where both the true and error components are standard normal random variables, `\(\beta_i\)` is the weight allocated to that true score, and `\(\gamma_i\)` is the weight allocated the error term for that item. As a result, all that is assumed at this point is that the true and error scores are independent (uncorrelated) and that they combine linearly. Further, assume that the error scores for different items are also uncorrelated. The scalings of the two weights can reflect the assumption that they are also definable as standard normal variables (Bernstein, 2005). Because even nonlinear relations are often approximated by linear ones and the very meanings of the terms "true score" and "error" imply independence, only one point is controversial.<sup>🤔</sup> .footnote[<sup>🤔</sup>That is the view that the absolute magnitude of the response is the critical variable. This is generally regarded as a good assumption in such areas as intellectual functioning (the smarter you are, the better you perform). However, in many attitudinal areas, an unfolding model using distances from an ideal point may be more relevant; for example, a candidate may be either too liberal or too conservative to be preferred optimally by a given voter.] --- # III — Considerations ## Likert scales (Likert, 1932): The implicite Model One common assumption is that the implicit responses are normally distributed; however, Micceri has made a well-stated objection to this Gaussian model. More critically, an important class of restrictions is often made that lead to the congeneric (i.e., measuring the same thing) test model: 1. True score components across items are linearly related to one another for a given respondent. 2. Errors are independent over items and respondents. The parallel test model is even more restrictive. It further assumes that: 1. True score components for a given respondent across items are the same. 2. Their weighting is the same across items. --- # III — Considerations ## Likert scales (Likert, 1932): The implicite Model This leads to: \begin{align} \lambda\_{i} = \beta\_i \tau\_{i} + \left(1-\beta^2\right)^{0.5} \varepsilon\_{ij} \label{impmod2} \end{align} Because of the first parallelism assumption, the item subscript can be dropped for a given respondent, and because of the second parallelism assumption, the item subscript can be dropped from the item weighting. The second congeneric assumption allows the same weight to be applied to each error observation. It can be shown that letting the `\(\gamma_i\)` of Eq. \eqref{impmod} equal `\(\left(1 -\beta^2\right)^{0.5}\)` simply standardizes the values of `\(\lambda_i\)`. Given this standardization, it makes sense to define `\(\beta\)` as the item reliability. --- # III — Considerations ## Likert scales (Likert, 1932): The implicite Model Finally, if we simply keep in mind that respondents inherently vary, their subscript can be eliminated, giving rise to: \begin{align} \lambda = \beta \tau + \left(1-\beta^2\right)^{0.5} \varepsilon\_{j} \label{impmod3} \end{align} As strong as this model is, it probably fits quite well when items are sampled from a homogeneous pool (domain sampling). That is the case, for example, in an abilities test that requires respondents to add randomly selected pairs of three-digit numbers, in which the pool (domain) consists of the numbers `\(000, 001,... ,999\)`. The concept of domain sampling may not apply to an unselected set of attitudinal items, but may at least be approximately true of a scale following item analysis in which the items have similar item/total correlations. --- # III — Considerations .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt4[ If a researcher wants to relate variables and estimate linear relations using correlations, regression models, structural equation models (SEM), etc., a 5- (or 7-) point scale with endpoint labels is the best choice. Respondents seem to use this format in a way that better conforms to linear models, thus providing higher criterion validity. The downside here is that reversed items may prove problematic in endpoint-labeled formats. .tr[ (Weijters Cabooter et al., 2010, p. 245) ]] --- # III — Considerations From a purely statistical point of view, Likert scales offer advantages for later analysis, since they allow a specific metric, symmetry, central neutral point, measurement of perceived agreement and homogeneous intervals between the points of the scale. -- - Two dominant response formats: Dichotomous responding (e.g., true/false; yes/no) and Likert-type rating scales with three or more options. - Recent research by Simms, Zelazny, Williams, and Bernstein (2019) shows psychometric quality increases with up to six response options, but the number of options has less effect on validity. --- # III — Considerations Ordinal scales (e.g., Likert) come with different response formats: frequency, degree, similarity, and agreement. -- Deciding to label all or some response options is important. -- Simms Zelazny et al. (2019) found no significant differences between odd versus even number of response options. --- # III — Considerations **Less common formats** like checklists are falling out of favor due to response biases. -- Visual analog scales are regaining popularity, especially in medical studies. -- Forced-choice formats are making a comeback, with the advantage of reducing the effect of socially desirable responding. --- # III — Considerations ## Visual Analog Scales (VAS) Visual analog scales (VAS) are continuous scales that allow respondents to mark a point along a line to indicate their response. .pull-left[ Some advantages of VAS include (Voutilainen, Pitkäaho, Kvist, and Vehviläinen‐Julkunen, 2016): - VAS are easy to use and understand. - VAS is less prone to bias than Likert scale; - VAS better prevents ceiling effect than Likert; and, - VAS questionnaire takes `\(28\%\)` less time to complete than Likert. ] .pull-right[ <div class="figure" style="text-align: center"> <img src="assets/img/visual_analog_scale.png" alt="The use of a ruler. Reprinted from Goetz and Causey (2009)." width="100%" /> <p class="caption">The use of a ruler. Reprinted from Goetz and Causey (2009).</p> </div> ] --- # III — Considerations ## VAS: Faces and Thermometer .pull-left[ <br> <br> <br> <div class="figure" style="text-align: center"> <img src="assets/img/vas_faces.jpg" alt=""Smiley" faces series ranging from glumness to happy. Reprinted from Chapman and Kirby-Turner (2002)." width="100%" /> <p class="caption">"Smiley" faces series ranging from glumness to happy. Reprinted from Chapman and Kirby-Turner (2002).</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="assets/img/vas_thermometer.jpg" alt="The use of a thermoter to self-report perceptions. Reprinted from Chapman and Kirby-Turner (2002)." width="10%" /> <p class="caption">The use of a thermoter to self-report perceptions. Reprinted from Chapman and Kirby-Turner (2002).</p> </div> ] --- # III — Considerations ## VAS: Example with Smiley Faces Please select the face that most accurately represents your feelings regarding each of the following statements. <div class="figure" style="text-align: center"> <img src="assets/img/vas_faces_ii.jpg" alt="Answer " width="55%" /> <p class="caption">Answer </p> </div> 1. How satisfied are you with the ease of navigating the new roundabout? 2. How comfortable did you feel while driving on the newly widened highway? 3. How safe do you feel at the pedestrian crosswalk with the new traffic light system? 4. How efficient do you find the new public transportation scheduling system? 5. How satisfactory is the new parking arrangement in the downtown area? --- # III — Considerations ## Numerical Rating Scales (NRS) and Verbal Rating Scales (VRS) Numerical rating scales (NRS) are a type of ordinal scale that uses numbers to rate a respondent's feelings or opinions. Verbal rating scales (VRS) are similar but use words instead of numbers. .pull-left[ <div class="figure" style="text-align: center"> <img src="assets/img/nrs.png" alt="Example of a numerical rating scale." width="100%" /> <p class="caption">Example of a numerical rating scale.</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="assets/img/vrs.jpg" alt="Example of a verbal rating scale." width="100%" /> <p class="caption">Example of a verbal rating scale.</p> </div> ] --- # III — Considerations ## VRS, VAS, Faces, NRS: An example <div class="figure" style="text-align: center"> <img src="assets/img/several_scales.png" alt="Example of a VRS, VAS, NRS scale." width="100%" /> <p class="caption">Example of a VRS, VAS, NRS scale.</p> </div> --- # III — Considerations ## Thurstone scale Thurstone scales are a type of ordinal scale that uses a series of statements to gauge respondents' attitudes or opinions on a particular topic. The process of developing a Thurstone scale involves several steps. -- First, a set of statements about a particular topic is collected. Each statement is then rated by a panel of judges on how favorable it is to the topic on a scale. The scale typically ranges from 1 (most unfavorable) to 11 (most favorable). The average ratings for each statement are then calculated and the statements are sorted from most unfavorable to most favorable. Finally, the statements are put into a survey or questionnaire form and are presented to respondents. -- Respondents indicate their agreement or disagreement with each statement, and their overall attitude is measured based on their responses. The scores are added up to give a total score which indicates the respondent's attitude towards the topic. --- # III — Considerations ## Thurstone scale: Example Suppose a city's transportation department wants to gauge public opinion about the efficiency of its public transportation system. They might use a Thurstone scale with statements like: -- 1. The city's public transportation system is reliable. (Rated 9) 2. Buses and trains often run late. (Rated 2) 3. I can always find a seat on the bus or train. (Rated 8) 4. The city's public transportation system is overcrowded. (Rated 3) 5. The public transportation system is the easiest way to get around the city. (Rated 10) -- Respondents would then indicate their level of agreement with each statement, and their responses would be used to calculate an overall score reflecting their attitude towards the city's public transportation system. --- # III — Considerations ## Guttman scale Guttman scales (or scalograms) are a type of ordinal scale used in surveys and questionnaires that measure respondents' attitudes or behaviors on a unidimensional hierarchical scale, where agreement with a higher-ranked item implies agreement with lower-ranked items. -- For example, in transportation engineering, a Guttman scale might be used in a survey to gauge public opinion on a proposed transportation project, with statements ranging from: 1. "I am aware of the proposed transportation project." 2. "I understand the benefits of the proposed transportation project." 3. "I believe the proposed transportation project will improve traffic conditions." 4. "I am willing to support the proposed transportation project financially through increased taxes." 5. "I am actively advocating for the implementation of the proposed transportation project." --- # III — Considerations ## Semantic scale This type of scale makes extensive use of words rather than numbers. Respondents describe their feelings about the object of interest using a series of adjectives. When bipolar adjectives are used at the end points of the scales, these are termed semantic differential scales. <div class="figure" style="text-align: center"> <img src="assets/img/semantic_scales.png" alt="The semantic scale and the semantic differential scales." width="35%" /> <p class="caption">The semantic scale and the semantic differential scales.</p> </div> --- # III — Considerations ## Stapel scale The Stapel scale also known is a type of unipolar rating scale that uses a single adjective as the scale's midpoint<sup>0️⃣</sup>. Respondents indicate their agreement or disagreement with the adjective by marking a point on a scale that ranges from `\(-X\)` to `\(+X\)` `\(\left(X \in \mathbb{Z}^{+}\right)\)`. -- However, it's not typically used in transportation engineering. For the sake of the question, let's consider a hypothetical scenario where a transportation engineer is evaluating the performance of various transport systems in a city. The criteria might be safety, efficiency, and environmental impact. The Stapel scale allows assigning ratings ranging from -5 (very poor) to +5 (excellent). .font60[ | -5 |-4|-3|-2|-1|Criteria<sup>🤔</sup>|1|2|3|4|5| |------------------|--------|------------|----------------------| | -5 |-4|-3|-2|-1|Safe|1|2|3|4|5| | -5 |-4|-3|-2|-1|Efficient|1|2|3|4|5| | -5 |-4|-3|-2|-1|Environmental Impact|1|2|3|4|5| ] .footnote[<sup>0️⃣</sup>No neutral point. <sup>🤔</sup> Usually the criteria are presented in a random order to avoid bias.] --- # III — Considerations ## Stapel scale: Example Applying the scale... .font60[ | Transport Mode | Safety | Efficiency | Environmental Impact | |------------------|--------|------------|----------------------| | Bus | -1 | 2 | 3 | | Train | 4 | 5 | 1 | | Car | 1 | -2 | -5 | | Bike | 2 | 3 | 5 | | Walk | 3 | -1 | 4 | ] In this example, each transport mode is evaluated with a Stapel scale on three criteria. For instance, the bus has slightly below average safety (-1), above average efficiency (2), and good environmental impact (3). --- # III — Considerations ## Final thoughts .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ Far from being "neutral measurement devices," the response alternatives that are provided to respondents do constitute a source of information that respondents actively use in determining their task and in constructing a reasonable answer. .tr[ 📖 (Schwarz, Knauper, Hippler, Noelle-Neumann, and Clark, 1991, p. 578) ]] --- # IV — Have Items Reviewed by Experts Other researchers (with expertise in the aimed constructs) should be asked to evaluate the pertinence of the inclusion of each item. --- # IV — Example We could invite urban planners, public transportation officials, and social scientists to review our items and provide feedback. --- # IV — Considerations The goal being the maximizing the validity evidence based on the content of the instrument. Metrics as the Content Validity Index (CVI; Polit, Beck, and Owen, 2007) or the Content Validity Ratio (CVR; Gilbert and Prion, 2016) can be used for this purpose. --- # V — Cognitive Interviewing This qualitative step aims to understand the reasoning process behind respondents’ interpretation and understanding of the new items. --- # V — Example We could conduct interviews with a small sample of public transport users and ask them to think out loud as they respond to each item. --- # V — Considerations This process of cognitive interview can help to identify possible problems with the items’ wording, or the response scale used. --- # VI — Consider the Inclusion of Validation Items It may be possible, and relatively convenient to include some additional items in the same questionnaire with the purpose of helping to refine the final validity evidence of the scale. --- # VI — Example We could include items that detect flaws or scale problems, such as social desirability, and items that measure related constructs, and thus can be used vis-à-vis the performance of these items to serve as concurrent or divergent validity evidence. --- # VI — Considerations There is also the possibility to add items to detect other problems such as insufficient effort in responding or acquiescence. --- # VII — Administer Items to a Development Sample After obtaining the set of items to be applied, one will have to apply these items together to a representative sample of the intended population. --- # VII — Example We could administer our items to a diverse group of public transport users, ensuring that we include individuals from different demographics and who use different types of public transport. --- # VII — Considerations Typically, the analysis used in the next phase of the development is the Exploratory factor Analysis (EFA) which should be used if there is not strong evidence about the structure that the instrument should have (i.e., dimensionality). However, if there is a firm a priori sense of the number of factors and about which factors are related to which factors, the Confirmatory Factor Analysis (CFA) should be used in the next step. --- # VIII — Evaluate the Items This is the moment in which the distributional properties of each item are evaluated. --- # VIII — Example This is the step where the factor analyses are conducted, whether it is CFA or EFA, for dimensionality to be investigated. Additionally, it is possible to analyze the items by their means, variances, correlations between items that theoretically belong to the same construct, and by the reliability of the score’s coefficients. --- # VIII — Considerations Even with a generous set of items, there is no guarantee that the desired latent variables are well manifested in those items. --- # IX — Optimize Scale Length Shorter instruments offer less of a chore for respondents, but longer scales tend to present better reliability indices. --- # IX — Example That is, if scale reliability is too low, brevity will no longer be a virtue. There are some improvements that can be carried out from a statistical point of view, such as eliminating items with lower factor loadings, investigating which items contribute less to the reliability of the scale, splitting the sample to test the modifications imposed. --- # IX — Considerations These two aspects present an inverse relation, and the researcher must choose their ideal balance between brevity and reliability. --- class: inverse, center, middle # .white[Additional Recommendations] <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- # Psychometric Evaluation: An Iterative Process Good scale construction is an iterative process involving cycles of preliminary measure development, data collection, psychometric evaluation, and revisions. --- # Importance of Validity Evidence It is critical to remain open to rethinking one’s initial construct—to "listen to the data" not "make the data talk." Often this involves slight tweaking, but it may involve more fundamental reconceptualization (Clark and Watson, 2019). --- # Convergence of Findings The convergence of findings from independently conducted research on the topic of interest and psychometric scale evaluation is recommended for scale development. --- # References Aish [@aish_caliperce] (2024). _"What do you do for fun outside of work?" [Video attached]_. URL: [https://x.com/aish_caliperce/status/1825039750362792420](https://x.com/aish_caliperce/status/1825039750362792420). Angleitner, A. and J. S. Wiggins, ed. (1986). _Personality assessment via questionnaires: Current issues in theory and measurement_. Springer Berlin Heidelberg. DOI: [10.1007/978-3-642-70751-3](https://doi.org/10.1007%2F978-3-642-70751-3). Bandalos, D. L. (2018). _Measurement theory and applications for the social sciences_. Methodology in the social sciences. New York, NY: The Guilford Press. ISBN: 9781462532131. Banks, G. C., J. Gooty, et al. (2018). "Construct redundancy in leader behaviors: A review and agenda for the future". In: _The Leadership Quarterly_ 29 (1), pp. 236-251. ISSN: 10489843. DOI: [10.1016/j.leaqua.2017.12.005](https://doi.org/10.1016%2Fj.leaqua.2017.12.005). Bernstein, I. H. (2005). "Likert scale analysis". In: _Encyclopedia of social measurement_. Ed. by K. Kempf-Leonard. Vol. 2. Amsterdam: Elsevier, pp. 497-504. DOI: [10.1016/b0-12-369398-5/00104-3](https://doi.org/10.1016%2Fb0-12-369398-5%2F00104-3). --- # References Boateng, G. O., T. B. Neilands, et al. (2018). "Best practices for developing and validating scales for health, social, and behavioral research: A primer". In: _Frontiers in Public Health_ 6, pp. 1-18. ISSN: 2296-2565. DOI: [10.3389/fpubh.2018.00149](https://doi.org/10.3389%2Ffpubh.2018.00149). Chapman, H. R. and N. Kirby-Turner (2002). "Visual/verbal analogue scales: Examples of brief assessment methods to aid management of child and adult patients in clinical practice". In: _British Dental Journal_ 193 (8), pp. 447-450. ISSN: 0007-0610. DOI: [10.1038/sj.bdj.4801593](https://doi.org/10.1038%2Fsj.bdj.4801593). URL: [https://www.nature.com/articles/4801593](https://www.nature.com/articles/4801593). Clark, L. A. and D. Watson (2019). "Constructing validity: New developments in creating objective measuring instruments". In: _Psychological Assessment_ 31.12, pp. 1412-1427. ISSN: 1939-134X. DOI: [10.1037/pas0000626](https://doi.org/10.1037%2Fpas0000626). URL: [http://doi.apa.org/getdoi.cfm?doi=10.1037/pas0000626](http://doi.apa.org/getdoi.cfm?doi=10.1037/pas0000626). Cohen, R. J., W. J. Schneider, et al. (2022). _Psychological testing and assessment: An introduction to tests and measurement_. 10th ed. McGraw-Hill. ISBN: 9781265799731. Comrey, A. L. (1988). "Factor-analytic methods of scale development in personality and clinical psychology". In: _Journal of Consulting and Clinical Psychology_ 56.5, pp. 754-761. ISSN: 1939-2117. DOI: [10.1037/0022-006X.56.5.754](https://doi.org/10.1037%2F0022-006X.56.5.754). URL: [https://doi.apa.org/doi/10.1037/0022-006X.56.5.754](https://doi.apa.org/doi/10.1037/0022-006X.56.5.754). --- # References Cooper, C. (2023). _An introduction to psychometrics and psychological assessment: Using, interpreting and developing tests_. 2nd ed. London: Routledge. ISBN: 9781003240181. DOI: [10.4324/9781003240181](https://doi.org/10.4324%2F9781003240181). URL: [https://www.routledge.com/An-Introduction-to-Psychometrics-and-Psychological-Assessment-Using-Interpreting/Cooper/p/book/9781032146171 https://www.taylorfrancis.com/books/9781003240181](https://www.routledge.com/An-Introduction-to-Psychometrics-and-Psychological-Assessment-Using-Interpreting/Cooper/p/book/9781032146171 https://www.taylorfrancis.com/books/9781003240181). DeVellis, R. F. and C. T. Thorpe (2021). _Scale development: Theory and applications_. 5th ed. Thousand Oaks, CA: SAGE. ISBN: 978-1544379340. URL: [https://us.sagepub.com/en-us/nam/scale-development/book269114](https://us.sagepub.com/en-us/nam/scale-development/book269114). Elson, M., I. Hussey, et al. (2023). "Psychological measures aren’t toothbrushes". In: _Communications Psychology_ 1 (1), p. 25. ISSN: 2731-9121. DOI: [10.1038/s44271-023-00026-9](https://doi.org/10.1038%2Fs44271-023-00026-9). URL: [https://www.nature.com/articles/s44271-023-00026-9](https://www.nature.com/articles/s44271-023-00026-9). Fabrigar, L. and J. K. Wood (2007). "Likert scaling". In: _Encyclopedia of measurement and statistics_. Ed. by N. J. Salkind. Vol. 1. Thousand Oaks, CA: SAGE Publications, pp. 536-540. Flake, J. K. and E. I. Fried (2020). "Measurement schmeasurement: Questionable measurement practices and how to avoid them". In: _Advances in Methods and Practices in Psychological Science_ 3.4, pp. 456-465. ISSN: 2515-2459. DOI: [10.1177/2515245920952393](https://doi.org/10.1177%2F2515245920952393). URL: [http://journals.sagepub.com/doi/10.1177/2515245920952393](http://journals.sagepub.com/doi/10.1177/2515245920952393). --- # References Gall, J. P., M. D. Gall, et al. (2014). _Applying educational research: How to read, do, and use research to solve problems of practice_. 6th ed. Harlow: Pearson. ISBN: 978-1-292-04168-1. Gilbert, G. E. and S. Prion (2016). "Making sense of methods and measurement: Lawshe's content validity index". In: _Clinical Simulation in Nursing_ 12.12, pp. 530-531. ISSN: 18761399. DOI: [10.1016/j.ecns.2016.08.002](https://doi.org/10.1016%2Fj.ecns.2016.08.002). URL: [https://linkinghub.elsevier.com/retrieve/pii/S1876139916300688](https://linkinghub.elsevier.com/retrieve/pii/S1876139916300688). Goetz, A. T. and K. Causey (2009). "Sex differences in perceptions of infidelity: Men often assume the worst". In: _Evolutionary Psychology_ 7 (2), p. 147470490900700. ISSN: 1474-7049. DOI: [10.1177/147470490900700208](https://doi.org/10.1177%2F147470490900700208). URL: [http://journals.sagepub.com/doi/10.1177/147470490900700208](http://journals.sagepub.com/doi/10.1177/147470490900700208). HajiPourNezhad, G. (2003). "An approach to the validation of judgments in language testing". In: _Proceedings of the 2nd JALT Pan-SIG Conference_. Ed. by T. Newfields, S. Yamashita, A. Howard and C. Rinnert. Kyoto, pp. 80-84. URL: [http://jalt.org/pansig/2003/HTML/HajiPourNezhad.htm](http://jalt.org/pansig/2003/HTML/HajiPourNezhad.htm). Hartley, J. and L. R. Betts (2010). "Four layouts and a finding: The effects of changes in the order of the verbal labels and numerical values on Likert‐type scales". In: _International Journal of Social Research Methodology_ 13.1, pp. 17-27. ISSN: 1364-5579. DOI: [10.1080/13645570802648077](https://doi.org/10.1080%2F13645570802648077). URL: [http://www.tandfonline.com/doi/abs/10.1080/13645570802648077](http://www.tandfonline.com/doi/abs/10.1080/13645570802648077). --- # References Hazlett-Stevens, H., J. B. Ullman, et al. (2004). "Factor structure of the Penn State Worry Questionnaire". In: _Assessment_ 11.4, pp. 361-370. ISSN: 1073-1911. DOI: [10.1177/1073191104269872](https://doi.org/10.1177%2F1073191104269872). URL: [https://journals.sagepub.com/doi/10.1177/1073191104269872](https://journals.sagepub.com/doi/10.1177/1073191104269872). Horan, P. M., C. DiStefano, et al. (2003). "Wording effects in self-esteem scales: Methodological artifact or response style?". In: _Structural Equation Modeling: A Multidisciplinary Journal_ 10.3, pp. 435-455. ISSN: 1070-5511. DOI: [10.1207/S15328007SEM1003_6](https://doi.org/10.1207%2FS15328007SEM1003_6). URL: [https://www.tandfonline.com/doi/full/10.1207/S15328007SEM1003_6](https://www.tandfonline.com/doi/full/10.1207/S15328007SEM1003_6). Iliescu, D., S. Greiff, et al. (2024). "Proliferation of measures contributes to advancing psychological science". In: _Communications Psychology_ 2 (1), p. 19. ISSN: 2731-9121. DOI: [10.1038/s44271-024-00065-w](https://doi.org/10.1038%2Fs44271-024-00065-w). URL: [https://www.nature.com/articles/s44271-024-00065-w](https://www.nature.com/articles/s44271-024-00065-w). Irwing, P. and D. J. Hughes (2018). "Test development". In: _The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development_. Ed. by P. Irwing, T. Booth and D. Hughes. Hoboken, NJ: John Wiley & Sons. Chap. 1, pp. 3-47. DOI: [10.1002/9781118489772.ch1](https://doi.org/10.1002%2F9781118489772.ch1). Lane, S., M. R. Raymond, et al., ed. (2016). _Handbook of test development_. 2nd ed. New York, NY: Routledge. ISBN: 978-0-415-62601-9. DOI: [10.4324/9780203102961](https://doi.org/10.4324%2F9780203102961). URL: [http://www.tandfonline.com/doi/abs/10.1080/15305050701813433](http://www.tandfonline.com/doi/abs/10.1080/15305050701813433). --- # References Likert, R. (1932). "A technique for the measurement of attitudes". In: _Archives of Psychology_ 22.140, pp. 1-55. URL: [http://www.voteview.com/pdf/Likert_1932.pdf http://psycnet.apa.org/psycinfo/1933-01885-001](http://www.voteview.com/pdf/Likert_1932.pdf http://psycnet.apa.org/psycinfo/1933-01885-001). Magazine, S. L., L. J. Williams, et al. (1996). "A confirmatory factor analysis examination of reverse coding effects in Meyer and Allen's Affective and Continuance Commitment Scales". In: _Educational and Psychological Measurement_ 56.2, pp. 241-250. ISSN: 0013-1644. DOI: [10.1177/0013164496056002005](https://doi.org/10.1177%2F0013164496056002005). URL: [https://journals.sagepub.com/doi/10.1177/0013164496056002005](https://journals.sagepub.com/doi/10.1177/0013164496056002005). Moreau, D. and K. Wiebels (2022). "Psychological constructs as local optima". In: _Nature Reviews Psychology 2022 1:4_ 1.4, pp. 188-189. ISSN: 2731-0574. DOI: [10.1038/s44159-022-00042-2](https://doi.org/10.1038%2Fs44159-022-00042-2). URL: [https://www.nature.com/articles/s44159-022-00042-2](https://www.nature.com/articles/s44159-022-00042-2). Motl, R. W., D. E. Conroy, et al. (2000). "The Social Physique Anxiety Scale: An example of the potential consequence of negatively worded items in factorial validity studies.". In: _Journal of Applied Measurement_ 1.4, pp. 327-345. ISSN: 1529-7713. URL: [http://www.ncbi.nlm.nih.gov/pubmed/12077461](http://www.ncbi.nlm.nih.gov/pubmed/12077461). Muñiz, J. and E. Fonseca-Pedrero (2019). "Ten steps for test development". In: _Psicothema_ 31.1, pp. 7-16. DOI: [10.7334/psicothema2018.291](https://doi.org/10.7334%2Fpsicothema2018.291). URL: [www.psicothema.com](www.psicothema.com). --- # References Nitko, A. J. (2015). _Using a mental measurements yearbook review and other materials to evaluate a test_. URL: [https://buros.org/using-mental-measurements-yearbook-review-and-other-materials-evaluate-test](https://buros.org/using-mental-measurements-yearbook-review-and-other-materials-evaluate-test) (visited on Jul. 08, 2023). Polit, D. F., C. T. Beck, et al. (2007). "Is the CVI an acceptable indicator of content validity? Appraisal and recommendations". In: _Research in Nursing & Health_ 30.4, pp. 459-467. ISSN: 01606891. DOI: [10.1002/nur.20199](https://doi.org/10.1002%2Fnur.20199). URL: [https://onlinelibrary.wiley.com/doi/10.1002/nur.20199](https://onlinelibrary.wiley.com/doi/10.1002/nur.20199). Schwarz, N., B. Knauper, et al. (1991). "Rating scales: Numeric values may change the meaning of scale labels". In: _Public Opinion Quarterly_ 55.4, pp. 570-582. ISSN: 0033362X. DOI: [10.1086/269282](https://doi.org/10.1086%2F269282). URL: [https://academic.oup.com/poq/article-lookup/doi/10.1086/269282](https://academic.oup.com/poq/article-lookup/doi/10.1086/269282). Simms, L. J., K. Zelazny, et al. (2019). "Does the number of response options matter? Psychometric perspectives using personality questionnaire data". In: _Psychological Assessment_ 31.4, pp. 557-566. ISSN: 1939-134X. DOI: [10.1037/pas0000648](https://doi.org/10.1037%2Fpas0000648). URL: [https://doi.apa.org/doi/10.1037/pas0000648](https://doi.apa.org/doi/10.1037/pas0000648). Suárez-Alvarez, J., I. Pedrosa, et al. (2018). "Using reversed items in Likert scales: A questionable practice". In: _Psicothema_ 30.2, pp. 149-158. DOI: [10.7334/psicothema2018.33](https://doi.org/10.7334%2Fpsicothema2018.33). --- class: center, bottom, inverse # More info -- Slides created with the <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> package [`xaringan`](https://github.com/yihui/xaringan). -- <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> <g label="icon" id="layer6" groupmode="layer"> <path id="path2" d="M 132.62426,316.69067 C 119.2805,301.94483 112.56962,274.5073 112.56962,234.39862 v -54.79191 c 0,-37.32217 -5.81677,-63.58084 -17.532347,-78.83466 -11.6757,-15.293118 -31.159702,-22.922596 -58.353466,-22.922596 -5.958581,0 -11.409226,0.22492 -16.45319,0.5917 -5.04455,0.427121 -9.742846,1.037046 -14.1564111,1.83092 V 95.057199 H 16.671281 c 12.325533,0 20.908335,3.82414 25.667559,11.532201 4.77973,7.74964 7.139712,25.48587 7.139712,53.14663 v 68.01321 c 0,42.12298 13.016861,74.19672 39.233939,96.16314 19.627549,16.47424 46.636229,27.23363 81.030059,32.40064 v -20.17708 c -16.3928,-4.27176 -29.04346,-10.51565 -37.11829,-19.44413 z m 246.75144,0 c 13.34377,-14.74584 20.05466,-42.18337 20.05466,-82.29205 v -54.79191 c 0,-37.32217 5.81673,-63.58084 17.53235,-78.83466 11.67568,-15.293118 31.15971,-22.922596 58.35348,-22.922596 5.95858,0 11.40922,0.22492 16.45315,0.5917 5.04457,0.427121 9.74287,1.037046 14.15645,1.83092 v 14.785125 h -10.59712 c -12.32549,0 -20.90826,3.82414 -25.66752,11.532201 -4.77974,7.74964 -7.13972,25.48587 -7.13972,53.14663 v 68.01321 c 0,42.12298 -13.01688,74.19672 -39.23394,96.16314 -19.6275,16.47424 -46.63622,27.23363 -81.03006,32.40064 v -20.17708 c 16.39279,-4.27176 29.04347,-10.51565 37.11827,-19.44413 z M 303.95857,87.165762 c 8.42049,-6.691524 25.52576,-10.536158 51.23486,-11.492333 V 63.999997 H 156.80716 v 11.673432 c 26.1755,0.956175 43.38268,4.800809 51.68248,11.492333 8.31852,6.73139 12.40691,20.033568 12.40691,39.904818 V 384.6851 c 0,20.80641 -4.08839,34.5146 -12.40691,41.02332 -8.2998,6.56905 -25.50698,10.10729 -51.68248,10.65744 V 448 h 197.71597 l 0.67087,-11.63414 c -25.50471,-0.54955 -42.56835,-4.35266 -51.07201,-11.40918 -8.4182,-6.95638 -12.73153,-20.44184 -12.73153,-40.27158 V 127.07058 c 0,-19.87125 4.16983,-33.173428 12.56922,-39.904818 z" style="stroke-width:0.0753388"></path> </g></svg> + <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> = <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:red;"> [ comment ] <path d="M462.3 62.6C407.5 15.9 326 24.3 275.7 76.2L256 96.5l-19.7-20.3C186.1 24.3 104.5 15.9 49.7 62.6c-62.8 53.6-66.1 149.8-9.9 207.9l193.5 199.8c12.5 12.9 32.8 12.9 45.3 0l193.5-199.8c56.3-58.1 53-154.3-9.8-207.9z"></path></svg> -- <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> has infinite possibilities. -- Practice is the best strategy for learning. -- . -- _In God we trust, all others bring data_ -- Edwards Deming -- . -- . -- . -- THE END --- class: center, bottom, inverse 