Analyse PDF evaluation for Harmony

This is an analysis of Harmony’s improved PDF extraction model. You can try Harmony at https://harmonydata.ac.uk/

By Thomas Wood, https://fastdatascience.com/

Analyse results

First we load the data. We have the data frame in both long and wide format, processed by Python

data.wide <- read.csv("test_pdf_models_wide.csv")

data.wide

##                                               questionnaire total orig new
## 1                                 academic motivation scale    28    7  24
## 2     adverse childhood experience questionnaire for adults    10   10  10
## 3                questions about behavioral function (qabf)    25   25  24
## 4                          athletic coping skills inventory    28    0  27
## 5  attitudes toward seeking professional psychological help    10    1  10
## 6              benevolent childhood experiences (bce) scale    10    0  10
## 7                             the big five personality test    50    2  50
## 8                      brief big five personality inventory    10    0  10
## 9         carol dweck’s growth vs. fixed mindset assessment    20   17  20
## 10                                 aggression questionnaire    29    0  29
## 11                                   cohen percieved stress    10   10  10
## 12                                                    gad-7     7    7   7
## 13                                     the trait hope scale    12    0  12
## 14              davis' interpersonal reactivity index (iri)    28   22  28
## 15                             patient health questionnaire     9    9   9
## 16                          post traumatic growth inventory    21   20  21
## 17                        rosenberg self-esteem scale (rse)    10    3  10
## 18                         the satisfaction with life scale     5    0   5
## 19                neff’s self-compassion scale (short-form)    12    5  12
## 20                              general self-efficacy scale    10    0  10
## 21                       social dominance orientation scale    16    0  16
## 22                    ten-item personality inventory-(tipi)    10    0  10
## 23                               the sport motivation scale    28    9  26
## 24            the warwick–edinburgh mental well-being scale    14    3  14
## 25                                   brief resilience scale     6    6   0

data.long <- read.csv("test_pdf_models_long.csv")

data.long

##                                               questionnaire model total
## 1                                 academic motivation scale  orig    28
## 2                                 academic motivation scale   new    28
## 3     adverse childhood experience questionnaire for adults  orig    10
## 4     adverse childhood experience questionnaire for adults   new    10
## 5                questions about behavioral function (qabf)  orig    25
## 6                questions about behavioral function (qabf)   new    25
## 7                          athletic coping skills inventory  orig    28
## 8                          athletic coping skills inventory   new    28
## 9  attitudes toward seeking professional psychological help  orig    10
## 10 attitudes toward seeking professional psychological help   new    10
## 11             benevolent childhood experiences (bce) scale  orig    10
## 12             benevolent childhood experiences (bce) scale   new    10
## 13                            the big five personality test  orig    50
## 14                            the big five personality test   new    50
## 15                     brief big five personality inventory  orig    10
## 16                     brief big five personality inventory   new    10
## 17        carol dweck’s growth vs. fixed mindset assessment  orig    20
## 18        carol dweck’s growth vs. fixed mindset assessment   new    20
## 19                                 aggression questionnaire  orig    29
## 20                                 aggression questionnaire   new    29
## 21                                   cohen percieved stress  orig    10
## 22                                   cohen percieved stress   new    10
## 23                                                    gad-7  orig     7
## 24                                                    gad-7   new     7
## 25                                     the trait hope scale  orig    12
## 26                                     the trait hope scale   new    12
## 27              davis' interpersonal reactivity index (iri)  orig    28
## 28              davis' interpersonal reactivity index (iri)   new    28
## 29                             patient health questionnaire  orig     9
## 30                             patient health questionnaire   new     9
## 31                          post traumatic growth inventory  orig    21
## 32                          post traumatic growth inventory   new    21
## 33                        rosenberg self-esteem scale (rse)  orig    10
## 34                        rosenberg self-esteem scale (rse)   new    10
## 35                         the satisfaction with life scale  orig     5
## 36                         the satisfaction with life scale   new     5
## 37                neff’s self-compassion scale (short-form)  orig    12
## 38                neff’s self-compassion scale (short-form)   new    12
## 39                              general self-efficacy scale  orig    10
## 40                              general self-efficacy scale   new    10
## 41                       social dominance orientation scale  orig    16
## 42                       social dominance orientation scale   new    16
## 43                    ten-item personality inventory-(tipi)  orig    10
## 44                    ten-item personality inventory-(tipi)   new    10
## 45                               the sport motivation scale  orig    28
## 46                               the sport motivation scale   new    28
## 47            the warwick–edinburgh mental well-being scale  orig    14
## 48            the warwick–edinburgh mental well-being scale   new    14
## 49                                   brief resilience scale  orig     6
## 50                                   brief resilience scale   new     6
##    num_correct
## 1            7
## 2           24
## 3           10
## 4           10
## 5           25
## 6           24
## 7            0
## 8           27
## 9            1
## 10          10
## 11           0
## 12          10
## 13           2
## 14          50
## 15           0
## 16          10
## 17          17
## 18          20
## 19           0
## 20          29
## 21          10
## 22          10
## 23           7
## 24           7
## 25           0
## 26          12
## 27          22
## 28          28
## 29           9
## 30           9
## 31          20
## 32          21
## 33           3
## 34          10
## 35           0
## 36           5
## 37           5
## 38          12
## 39           0
## 40          10
## 41           0
## 42          16
## 43           0
## 44          10
## 45           9
## 46          26
## 47           3
## 48          14
## 49           6
## 50           0

Plot a histogram of the results data

p <- ggplot(data.wide, aes(x=x) ) +
  # Top
  geom_density( aes(x = new, y = ..density..), fill="#69b3a2" ) +
  geom_label( aes(x=4.5, y=0.25, label="new"), color="#69b3a2") +
  # Bottom
  geom_density( aes(x = orig, y = -..density..), fill= "#404080") +
  geom_label( aes(x=4.5, y=-0.25, label="orig"), color="#404080") +
  xlab("Number of questions correct in questionnaire")
p

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Statistical tests

We will do a t-test to see if the scores on the new model are significantly better than the scores on the old model. We will use a 2-tailed t-test at p=0.05.

model <- lm(formula = num_correct ~ model, data=data.long)
summary(model)

## 
## Call:
## lm(formula = num_correct ~ model, data = data.long)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -16.16  -6.22  -3.70   3.82  33.84 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   16.160      1.839   8.788 1.46e-11 ***
## modelorig     -9.920      2.601  -3.814  0.00039 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.195 on 48 degrees of freedom
## Multiple R-squared:  0.2326, Adjusted R-squared:  0.2166 
## F-statistic: 14.55 on 1 and 48 DF,  p-value: 0.0003899

The new model is significantly better, with p = 0.00039

Let’s do the analysis on the proportion correct in each questionnaire.

data.long$proportion_correct = data.long$num_correct / data.long$total

model <- lm(formula = num_correct ~ model, data=data.long)
summary(model)

## 
## Call:
## lm(formula = num_correct ~ model, data = data.long)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -16.16  -6.22  -3.70   3.82  33.84 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   16.160      1.839   8.788 1.46e-11 ***
## modelorig     -9.920      2.601  -3.814  0.00039 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.195 on 48 degrees of freedom
## Multiple R-squared:  0.2326, Adjusted R-squared:  0.2166 
## F-statistic: 14.55 on 1 and 48 DF,  p-value: 0.0003899

We got the same p-value, 0.00039.

Let’s find the average score

Average proportion correct for original model:

mean(data.wide$orig/data.wide$total)

## [1] 0.409219

Average proportion correct for original model:

mean(data.wide$new/data.wide$total)

## [1] 0.9484

Therefore, the old model had accuracy 41% and the new model had accuracy 95%, with p = 0.00039.

Analyse PDF evaluation for Harmony

Thomas Wood

2025-10-17

Analyse results

Statistical tests