2025-Jefferson_test-retest

Author

S Uribe

CREATED

June 2, 2025

UPDATED

July 17, 2025

Packages

[1] "/home/sergiouribe/Insync/sergio.uribe@gmail.com/Google Drive/Research Drive/2025_Sindija_Baltic Survey of Empathy Levels Jefferson scale/analysis_jefferson"

Docs

Abstract ADEE

Manuscript

Data

Filter those with test retest


 1  2 
14 14

# A tibble: 0 × 2
# ℹ 2 variables: Respondent <dbl>, n <int>

Prepare the dataset

Check the correlation


    Pearson's product-moment correlation

data:  test_retest_wide$Time1 and test_retest_wide$Time2
t = 21.474, df = 278, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7412695 0.8302116
sample estimates:
     cor 
0.789858

Now a plot

Now intraclass correlation

Call: psych::ICC(x = select(test_retest_wide, Time1, Time2), alpha = 0.05, 
    lmer = TRUE, check.keys = FALSE)

Intraclass correlation coefficients 
                         type  ICC   F df1 df2       p lower bound upper bound
Single_raters_absolute   ICC1 0.79 8.5 279 280 1.9e-61        0.74        0.83
Single_random_raters     ICC2 0.79 8.5 279 279 1.8e-61        0.74        0.83
Single_fixed_raters      ICC3 0.79 8.5 279 279 1.8e-61        0.74        0.83
Average_raters_absolute ICC1k 0.88 8.5 279 280 1.9e-61        0.85        0.91
Average_random_raters   ICC2k 0.88 8.5 279 279 1.8e-61        0.85        0.91
Average_fixed_raters    ICC3k 0.88 8.5 279 279 1.8e-61        0.85        0.91

 Number of subjects = 280     Number of Judges =  2
See the help file for a discussion of the other 4 McGraw and Wong estimates,

ICC2

0.7976927

So, according to Koo and Li (2016), this is a good intraclass correlation.

Test-retest reliability by item was good, ICC = 0.80 (95%IC 0.75 to 0.84

Plot

NULL

Correcting for multiple measurements

Calculate Total Scores and ICC

Total Scores for Each Respondent
Test (Time1) and Retest (Time2)
Respondent	Time1	Time2
2	96.0	96.0
3	83.0	79.0
4	101.0	101.0
5	70.0	62.0
7	92.0	93.0
8	68.0	72.0
10	79.0	81.0
13	91.0	79.0
15	98.0	80.0
16	94.0	100.0
17	110.0	111.0
18	101.0	106.0
21	102.0	101.0
23	103.0	96.0

Call: psych::ICC(x = select(test_retest_totals, Time1, Time2), alpha = 0.05, 
    lmer = TRUE, check.keys = FALSE)

Intraclass correlation coefficients 
                         type  ICC  F df1 df2       p lower bound upper bound
Single_raters_absolute   ICC1 0.87 14  13  14 7.9e-06        0.65        0.95
Single_random_raters     ICC2 0.87 14  13  13 1.2e-05        0.65        0.95
Single_fixed_raters      ICC3 0.87 14  13  13 1.2e-05        0.64        0.96
Average_raters_absolute ICC1k 0.93 14  13  14 7.9e-06        0.78        0.98
Average_random_raters   ICC2k 0.93 14  13  13 1.2e-05        0.79        0.98
Average_fixed_raters    ICC3k 0.93 14  13  13 1.2e-05        0.78        0.98

 Number of subjects = 14     Number of Judges =  2
See the help file for a discussion of the other 4 McGraw and Wong estimates,

Item-Level ICC (Optional)

Item-level ICC (based on 280 individual item responses): 0.789 (95% CI: 0.741 to 0.83 )

Total score ICC (based on 14 respondents): 0.867 (95% CI: 0.646 to 0.955 )

Summary

Test-retest reliability by item was good, ICC =

Item-level ICC (based on 280 individual item responses): 0.789 (95% CI: 0.741 to 0.83 )

Test-retest reliability by respondent (n = 14) was good, ICC =
```
0.867 (95% CI: 0.646 to 0.955 )
```

For the Methods section:

Test-retest reliability was assessed in a subsample of 14 respondents who completed the Jefferson Scale of Empathy twice. Intraclass correlation coefficients (ICC) were calculated using a two-way mixed-effects model (absolute agreement, single measures) for both item-level and total scores.

Cronbach’s Alpha for All 28 Students

Some items ( Q_Norm_6 Q_Norm_7 Q_Norm_17 Q_Norm_18 ) were negatively correlated with the first principal component and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option


Reliability analysis   
Call: psych::alpha(x = cronbach_data)

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
      0.74      0.74    0.96      0.12 2.8 0.069  4.5 0.66     0.13

    95% confidence boundaries 
         lower alpha upper
Feldt     0.57  0.74  0.86
Duhachek  0.60  0.74  0.87

 Reliability if an item is dropped:
          raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
Q_Norm_1       0.69      0.70    0.95      0.11 2.3    0.082 0.073  0.12
Q_Norm_2       0.74      0.74    0.96      0.13 2.9    0.069 0.073  0.14
Q_Norm_3       0.73      0.73    0.96      0.12 2.7    0.070 0.073  0.13
Q_Norm_4       0.73      0.73    0.95      0.12 2.7    0.071 0.074  0.13
Q_Norm_5       0.73      0.73    0.96      0.12 2.7    0.072 0.071  0.13
Q_Norm_6       0.78      0.78    0.96      0.15 3.4    0.059 0.058  0.15
Q_Norm_7       0.74      0.74    0.96      0.13 2.8    0.068 0.071  0.14
Q_Norm_8       0.74      0.74    0.96      0.13 2.8    0.068 0.069  0.14
Q_Norm_9       0.74      0.74    0.96      0.13 2.9    0.068 0.069  0.14
Q_Norm_10      0.72      0.71    0.96      0.12 2.5    0.073 0.071  0.13
Q_Norm_11      0.71      0.72    0.95      0.12 2.5    0.076 0.069  0.13
Q_Norm_12      0.68      0.70    0.94      0.11 2.3    0.084 0.068  0.12
Q_Norm_13      0.73      0.73    0.95      0.13 2.7    0.070 0.069  0.13
Q_Norm_14      0.71      0.72    0.96      0.12 2.5    0.075 0.070  0.13
Q_Norm_15      0.70      0.70    0.95      0.11 2.4    0.078 0.067  0.13
Q_Norm_16      0.72      0.72    0.96      0.12 2.6    0.074 0.067  0.13
Q_Norm_17      0.75      0.75    0.96      0.13 3.0    0.065 0.071  0.15
Q_Norm_18      0.75      0.75    0.96      0.14 3.0    0.067 0.071  0.14
Q_Norm_19      0.72      0.71    0.95      0.12 2.5    0.074 0.069  0.13
Q_Norm_20      0.71      0.70    0.95      0.11 2.3    0.077 0.065  0.13

 Item statistics 
           n raw.r std.r  r.cor r.drop mean   sd
Q_Norm_1  28  0.74  0.70  0.709  0.664  3.9 1.96
Q_Norm_2  28  0.17  0.26  0.229  0.126  6.8 0.61
Q_Norm_3  28  0.40  0.40  0.383  0.289  3.8 1.62
Q_Norm_4  28  0.35  0.40  0.390  0.273  5.5 1.17
Q_Norm_5  28  0.46  0.41  0.399  0.316  4.0 2.20
Q_Norm_6  28 -0.21 -0.22 -0.231 -0.330  3.4 1.73
Q_Norm_7  28  0.27  0.30  0.289  0.161  5.4 1.55
Q_Norm_8  28  0.22  0.27  0.259  0.117  5.2 1.42
Q_Norm_9  28  0.27  0.21  0.201  0.145  3.2 1.78
Q_Norm_10 28  0.50  0.55  0.544  0.416  5.9 1.40
Q_Norm_11 28  0.58  0.53  0.535  0.492  3.6 1.69
Q_Norm_12 28  0.78  0.74  0.755  0.711  3.6 1.97
Q_Norm_13 28  0.35  0.35  0.345  0.224  4.1 1.76
Q_Norm_14 28  0.54  0.53  0.524  0.445  4.6 1.62
Q_Norm_15 28  0.65  0.68  0.683  0.563  5.5 1.69
Q_Norm_16 28  0.49  0.50  0.490  0.399  5.1 1.55
Q_Norm_17 28  0.21  0.17  0.147  0.069  3.6 1.91
Q_Norm_18 28  0.16  0.12  0.099  0.045  1.9 1.51
Q_Norm_19 28  0.51  0.56  0.564  0.425  5.9 1.36
Q_Norm_20 28  0.65  0.71  0.712  0.583  5.9 1.35

Non missing response frequency for each item
             1    2    3    4    5    6    7 miss
Q_Norm_1  0.07 0.21 0.21 0.18 0.00 0.18 0.14    0
Q_Norm_2  0.00 0.00 0.00 0.04 0.00 0.07 0.89    0
Q_Norm_3  0.11 0.14 0.18 0.21 0.18 0.18 0.00    0
Q_Norm_4  0.00 0.00 0.00 0.29 0.21 0.25 0.25    0
Q_Norm_5  0.18 0.14 0.07 0.18 0.18 0.00 0.25    0
Q_Norm_6  0.14 0.32 0.00 0.25 0.14 0.14 0.00    0
Q_Norm_7  0.04 0.00 0.07 0.18 0.11 0.32 0.29    0
Q_Norm_8  0.00 0.04 0.14 0.07 0.29 0.29 0.18    0
Q_Norm_9  0.18 0.25 0.18 0.11 0.11 0.18 0.00    0
Q_Norm_10 0.00 0.04 0.00 0.18 0.11 0.18 0.50    0
Q_Norm_11 0.04 0.25 0.36 0.07 0.14 0.04 0.11    0
Q_Norm_12 0.14 0.18 0.21 0.18 0.07 0.07 0.14    0
Q_Norm_13 0.04 0.21 0.11 0.29 0.14 0.07 0.14    0
Q_Norm_14 0.00 0.11 0.18 0.21 0.21 0.11 0.18    0
Q_Norm_15 0.00 0.04 0.11 0.21 0.14 0.00 0.50    0
Q_Norm_16 0.00 0.11 0.07 0.07 0.29 0.29 0.18    0
Q_Norm_17 0.14 0.18 0.14 0.29 0.07 0.04 0.14    0
Q_Norm_18 0.64 0.14 0.00 0.07 0.14 0.00 0.00    0
Q_Norm_19 0.00 0.00 0.04 0.21 0.07 0.14 0.54    0
Q_Norm_20 0.00 0.04 0.04 0.04 0.29 0.14 0.46    0

Internal consistency (Cronbach’s alpha) for the 20-item Jefferson Scale in the full sample (n = 28):

Cronbach's alpha = 0.737 (standardized alpha = 0.738)

--- title: "2025-Jefferson_test-retest" author: "S Uribe" date: 2025-06-02 date-modified: last-modified language: title-block-published: "CREATED" title-block-modified: "UPDATED" format: html: toc: true toc-expand: 3 code-fold: true code-tools: true editor: visual execute: echo: false cache: false warning: false message: false --- ![](images/clipboard-1615707102.png) # Packages ```{r} pacman::p_load(tidyverse, here, sjPlot, gt, # for the table psych, # for the crohnbach ggstats, # for the new likert BlandAltmanLeh, irr) # for ICC ``` ```{r} here::here() ``` ```{r} theme_set(theme_minimal()) ``` # Docs [Abstract ADEE](https://docs.google.com/document/d/1hdRnpvj2s_lVpJtQwfAJDkZES_H5vUdn/edit) [Manuscript](https://docs.google.com/document/d/18EtAMJm36GVRP5PKIuI_fwndDhnVS_lgfY2zpjZZBas/edit?tab=t.0#heading=h.suq8ocx8doy2) # Data ```{r} test_retest <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQDqJVG3B4KNbXXZp3KeBSSHY8wfH0AXvJwN10uRPC6gh4tYzGspQLVDOCmVFqNDOXMH65Higi4QT9H/pub?gid=1370812712&single=true&output=csv") |> select(Respondent, Test_retest, Q_Norm_1:Q_Norm_20) # the new normalized data ``` ```{r} # previous # test_retest <- read_csv(here("data", "JSA_Latvia_validacija (atbildes) - Validacijai normalizēts (2025-06-02).csv")) |> # select(Respondent, Test_retest, Q_Norm_1:Q_Norm_20) # the new normalized data ``` ```{r} # glimpse(test_retest) ``` Filter those with test retest ```{r} test_retest <- test_retest |> filter(Test_retest %in% c(1, 2)) |> group_by(Respondent) |> filter(n_distinct(Test_retest) == 2) |> ungroup() ``` ```{r} table(test_retest$Test_retest) ``` ```{r} test_retest |> count(Respondent) |> filter(n != 2) ``` ## Prepare the dataset ```{r} test_retest_wide <- test_retest |> pivot_longer( cols = starts_with("Q_Norm_"), names_to = "Question", values_to = "Response" ) |> # second part pivot_wider( id_cols = c(Respondent, Question), names_from = Test_retest, values_from = Response, names_prefix = "Time" ) |> arrange(Respondent, Question) ``` ### Check the correlation ```{r} cor.test(test_retest_wide$Time1, test_retest_wide$Time2) ``` Now a plot ```{r} test_retest_wide |> ggplot(aes(x = Time1, y = Time2)) + geom_jitter() + geom_smooth() ``` ## Now intraclass correlation ```{r} test_retest_wide |> select(Time1, Time2) |> psych::ICC(alpha = 0.05, lmer = TRUE, # Use a linear mixed-effects model (recommended for small or unbalanced samples) check.keys = FALSE) # Do not attempt to reverse-code items; assume correct coding ``` +:-------+----------:+ | \ | 0.7976927 | | ICC2 | | +--------+-----------+ So, according to Koo and Li (2016), this is a good intraclass correlation. **Test-retest reliability by item was good, ICC = 0.80 (95%IC 0.75 to 0.84** ## Plot ```{r} BlandAltmanLeh::bland.altman.plot(test_retest_wide$Time1, test_retest_wide$Time2, main = "Test Retest", xlab = "Means", ylab = "Difference") ``` # Correcting for multiple measurements ## Calculate Total Scores and ICC ```{r} # Calculate total scores for each respondent at each time point test_retest_totals <- test_retest |> group_by(Respondent, Test_retest) |> summarise( total_score = sum(across(starts_with("Q_Norm_")), na.rm = TRUE), .groups = "drop" ) |> pivot_wider( id_cols = Respondent, names_from = Test_retest, values_from = total_score, names_prefix = "Time" ) # View the total scores test_retest_totals |> gt() |> tab_header( title = "Total Scores for Each Respondent", subtitle = "Test (Time1) and Retest (Time2)" ) |> fmt_number(columns = c(Time1, Time2), decimals = 1) ``` ```{r} # Calculate ICC on total scores total_scores_icc <- test_retest_totals |> select(Time1, Time2) |> psych::ICC(alpha = 0.05, lmer = TRUE, check.keys = FALSE) # Print ICC results total_scores_icc ``` ```{r} # Create correlation plot for total scores ggplot(test_retest_totals, aes(x = Time1, y = Time2)) + geom_point(alpha = 0.6) + geom_smooth(method = "lm", se = TRUE, color = "blue") + geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") + labs( title = "Test-Retest Reliability of Total Scores", subtitle = sprintf("ICC = %.2f", total_scores_icc$results[2, "ICC"]), x = "Test Score (Time 1)", y = "Retest Score (Time 2)" ) + theme_minimal() ``` ```{r} # Bland-Altman plot for total scores bland_altman_data <- test_retest_totals |> mutate( mean_score = (Time1 + Time2) / 2, difference = Time2 - Time1 ) mean_diff <- mean(bland_altman_data$difference) sd_diff <- sd(bland_altman_data$difference) limits <- mean_diff + c(-1.96, 1.96) * sd_diff ggplot(bland_altman_data, aes(x = mean_score, y = difference)) + geom_point(alpha = 0.6) + geom_hline(yintercept = mean_diff, linetype = "dashed", color = "blue") + geom_hline(yintercept = limits, linetype = "dotted", color = "red") + labs( title = "Bland-Altman Plot of Total Scores", subtitle = sprintf("Mean difference = %.1f (95%% LOA: %.1f to %.1f)", mean_diff, limits[1], limits[2]), x = "Mean of Test and Retest Scores", y = "Difference (Retest - Test)" ) + theme_minimal() ``` ## Item-Level ICC (Optional) ```{r} # For reference, we can also look at item-level reliability item_level_icc <- test_retest_wide |> select(Time1, Time2) |> psych::ICC(alpha = 0.05, lmer = TRUE, check.keys = FALSE) # Print results with explanation cat("Item-level ICC (based on", nrow(test_retest_wide), "individual item responses):", round(item_level_icc$results[2, "ICC"], 3), "(95% CI:", round(item_level_icc$results[2, "lower bound"], 3), "to", round(item_level_icc$results[2, "upper bound"], 3), ")\n") ``` ```{r} cat("Total score ICC (based on", nrow(test_retest_totals), "respondents):", round(total_scores_icc$results[2, "ICC"], 3), "(95% CI:", round(total_scores_icc$results[2, "lower bound"], 3), "to", round(total_scores_icc$results[2, "upper bound"], 3), ")\n") ``` # Summary - Test-retest reliability **by item** was good, ICC = ``` Item-level ICC (based on 280 individual item responses): 0.789 (95% CI: 0.741 to 0.83 ) ``` - Test-retest reliability **by respondent** (n = 14) was good, ICC = ``` 0.867 (95% CI: 0.646 to 0.955 ) ``` ## For the Methods section: Test-retest reliability was assessed in a subsample of 14 respondents who completed the Jefferson Scale of Empathy twice. Intraclass correlation coefficients (ICC) were calculated using a two-way mixed-effects model (absolute agreement, single measures) for both item-level and total scores. # Cronbach's Alpha for All 28 Students ```{r} # Calculate Cronbach's alpha for all 28 students at the first measurement (Test) # Select only the first test (Test_retest == 1) and all Q_Norm_1:Q_Norm_20 columns cronbach_data <- test_retest |> # filter(Test_retest == 1) |> # this is for only one, but we will calculate with all select(starts_with("Q_Norm_")) # Compute Cronbach's alpha cronbach_alpha <- psych::alpha(cronbach_data) # Print the results cronbach_alpha ``` **Internal consistency (Cronbach's alpha) for the 20-item Jefferson Scale in the full sample (n = 28):** ```{r} cat(sprintf( "Cronbach's alpha = %.3f (standardized alpha = %.3f)", cronbach_alpha$total$raw_alpha, cronbach_alpha$total$std.alpha )) ```