Validity ensures the tool accurately measures the intended concept,
providing meaningful and actionable insights.
Face Validity
Does the assessment appear to measure what it is supposed to at face
value?
Ask other people to review “Extraction Tool & its items” and gauge their suitability for measuring the variable of interest.
Content Validity
Does the assessment cover the entire range of the concept being
measured?
Collect data from experts.
Calculate CVR - Content Validity Ratio = (ne-N/2) / (N/2)
with ne: No.of experts panelists indicating “essential”, N: total No.of
expert panelists.
Calculate CVI - Content Validity Index - the average CVR score of all
questions in the test (closer to 1 denote higher content
validity).
Criterion-related
Validity
How well does the assessment predict outcomes based on other measures?
Compare the test results to criterion
variables. Criterion variables are often referred to as a “gold
standard” measurement.
calculate the correlation Pearson's r between the results
of your measurement and the results of the criterion measurement. If
there is a high correlation, this gives a good indication that your test
is measuring what it intends to measure.
Construct Validity
Does the assessment truly measure the theoretical construct it claims to
measure?
It’s central to establishing the overall validity of a method.
Test out a new measure with a pilot
study.
Test convergent and discriminant validity with correlations to see if
results from “Extraction Tool” are positively or negatively related to
those of other established tests.
Use regression analyses to assess whether the “Extraction Tool” is
actually predictive of outcomes that are expected it to predict
theoretically
Reliability ensures that the assessment tool produces stable and
consistent results, which is crucial for making informed decisions based
on the data.
Test-Retest
Reliability
Consistency of results when the same test is administered at different
points in time.
Conduct the same test on the same group of people at two different points in time. Then calculate the correlation between the two sets of results.
Inter-Rater
Reliability
Consistency of results when different people score the same test.
Different users conduct the same
measurement or observation on the same sample.
Then calculate the correlation between their different sets of results.
If all the users give similar ratings, the test has high interrater
reliability.
Parallel-Forms
Reliability
Consistency of results between different versions of the same test.
Produce a large set of questions to
evaluate the same thing, then divide these randomly into two question
sets.
The same group of respondents answers both sets, and calculate the
correlation between the results. High correlation between the two
indicates high parallel forms reliability.
Internal Consistency
Consistency of results across items within a test.
Two common methods are used to measure
internal consistency:
- Average inter-item correlation: For a set of measures designed to
assess the same construct, calculate the correlation between the results
of all possible pairs of items and then calculate the average.
- Split-half reliability: randomly split a set of measures into two
sets. After testing the entire set on the respondents, calculate the
correlation between the two sets of responses.
A data quality traffic light or kite mark appears next to KPIs in
the dashboard to provide visual assurance on the quality of data
underpinning a performance indicator.
A visual indicator acknowledges the variability of data and makes
an explicit assessment of the quality of evidence on which the
performance measurement is based.
The kite-mark:
Process
A process has been documented to support consistency and understanding
of data capture requirements for all relevant staff.
Timeliness
The time taken between the end of the data period and when the
information can be produced and reviewed.
Validation
The data has been validated to ensure it is accurate and in compliance
with relevant requirements.
Completeness
All the expected attributes of the data are populated but also the
extent to which all the records for the relevant population are
provided.