Background

For each exam, we have been asking participants to report their expected grades at two separate time points: immediately following each exam, and immediately before we release their grades (usually 2-3 days later). Using these two predictions for each exam, we can check if participants are "adjusting" their predictions during the period between the completion of an exam and the reveal of the outcome.

Below I've included some descriptives and initial results on these prediction adjustments, and I've also replicated the original updating finding from the last markdown (https://rpubs.com/wvillano/652855) using predictions and PEs from both time points.


Prediction Adjustments

Below is the distribution of these so-called "adjustments," where adjustments are:

\[Prediction_2 - Prediction_1\]

n mean sd median trimmed mad min max range skew kurtosis se
Prediction Adjustments 1652 -2.73 7.44 0 -2.39 4.45 -47 50 97 -0.37 8.17 0.18



Do participants' expectations change systematically as grade reveal draws near?

On average, participants' expectations become more pessimistic as they near the reveal of grade outcomes.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: pred_change ~ 1 + (1 | id)
##    Data: grades.nomiss
## 
## REML criterion at convergence: 11314.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.7936 -0.3382  0.2852  0.3799  6.9435 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)  3.534   1.880   
##  Residual             51.897   7.204   
## Number of obs: 1652, groups:  id, 528
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  -2.7470     0.1964 473.5662  -13.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



Do prediction adjustments change across exams?

Across the four exams in the semester, the raw data seems to suggest that these adjustments become smaller at the group level as exams accrue:



Indeed, the multilevel model below shows that over the course of the semester, these adjustments become predictably smaller, but still remain negative (i.e., lowering expectations):

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: `Prediction Adjustment` ~ 1 + Exam + (1 | id)
##    Data: temp_df
## 
## REML criterion at convergence: 11287.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.8643 -0.3946  0.2161  0.4133  6.9256 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)  3.792   1.947   
##  Residual             50.919   7.136   
## Number of obs: 1652, groups:  id, 528
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   -4.1623     0.3439 1636.1858 -12.103  < 2e-16 ***
## Exam2          1.7218     0.4720 1212.8322   3.648 0.000275 ***
## Exam3          2.2091     0.4781 1224.9208   4.620 4.24e-06 ***
## Exam4          2.0054     0.5392 1341.0165   3.720 0.000208 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr) Exam2  Exam3 
## Exam2 -0.685              
## Exam3 -0.676  0.494       
## Exam4 -0.600  0.439  0.433
## $Exam




Prediction Errors

Since we have two predictions for each exam, we technically have two PEs for each exam as well. Here, I calculated PEs from the first and second predictions for each exam, and overlayed the distributions.

Do exam grade predictions become more precise throughout the semester?

Regardless of which PE is used, estimation seems to become more precise over the course of the semester:



To confirm this empirically, I regressed unsigned PEs (absolute value of PE) onto exam number to see if unsigned PEs are becoming predictably smaller over the course of the semester.

Below are the predicted values of unsigned PE across each exam from the model. Taking the PEs that arise from the first prediction, estimation error decreases over the course of the semester:

## $Exam

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: `Unsigned PE` ~ Exam + (1 | id)
##    Data: grades.nomiss.mod
## 
## REML criterion at convergence: 12115.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.7667 -0.6679 -0.1867  0.4293  8.6608 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept) 13.67    3.697   
##  Residual             77.59    8.809   
## Number of obs: 1654, groups:  id, 528
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   13.5464     0.4428 1583.7176  30.592  < 2e-16 ***
## Exam2         -1.3644     0.5843 1143.2367  -2.335   0.0197 *  
## Exam3         -3.2270     0.5927 1155.1796  -5.445 6.34e-08 ***
## Exam4         -5.0844     0.6719 1258.2696  -7.568 7.32e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr) Exam2  Exam3 
## Exam2 -0.658              
## Exam3 -0.648  0.494       
## Exam4 -0.574  0.438  0.432



A similar trend is present in the PEs that arise from the second prediction. However, a key difference is that the unsigned PEs at exam 1 and 2 are more similar when using the second prediction. Recall also that the degree of prediction adjustment is greater for the first exam.

## $Exam

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: `Unsigned PE` ~ Exam + (1 | id)
##    Data: grades.nomiss.mod
## 
## REML criterion at convergence: 12176.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6656 -0.6625 -0.2284  0.4404  8.4768 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept) 12.25    3.499   
##  Residual             81.93    9.051   
## Number of obs: 1654, groups:  id, 528
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   12.0811     0.4501 1601.5750  26.842  < 2e-16 ***
## Exam2         -0.1273     0.5999 1158.2391  -0.212   0.8320    
## Exam3         -1.3995     0.6084 1170.4602  -2.300   0.0216 *  
## Exam4         -3.4790     0.6888 1277.2316  -5.051 5.04e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr) Exam2  Exam3 
## Exam2 -0.665              
## Exam3 -0.655  0.494       
## Exam4 -0.580  0.438  0.432



Updating Model: Replicated with first and second predictions/PEs

Importantly, the updating results replicate regardless of whether the first or second predictions/PEs are used.

Below are the original results from the model using second predictions/PEs:

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: pred_delta ~ PE_lag1 + (1 | id)
##    Data: df.2
## 
## REML criterion at convergence: 11495.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.2301 -0.4976 -0.0121  0.5138  5.1139 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)   0.0     0.00   
##  Residual             206.4    14.37   
## Number of obs: 1407, groups:  id, 519
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept) 1.535e+00  3.846e-01 1.405e+03   3.991 6.92e-05 ***
## PE_lag1     3.922e-01  2.488e-02 1.405e+03  15.762  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr)
## PE_lag1 -0.091
## convergence code: 0
## boundary (singular) fit: see ?isSingular

And here are the replicated results using first predictions/PEs:

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: pred_delta ~ PE_lag1 + (1 | id)
##    Data: df.1
## 
## REML criterion at convergence: 8893.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6619 -0.4891 -0.0047  0.5453  4.6918 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)   0.0     0.00   
##  Residual             188.1    13.72   
## Number of obs: 1101, groups:  id, 475
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)    1.6243     0.4167 1099.0000   3.898 0.000103 ***
## PE_lag1        0.3093     0.0273 1099.0000  11.328  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr)
## PE_lag1 0.126 
## convergence code: 0
## boundary (singular) fit: see ?isSingular




Confidence data

We also asked participants to report their confidence in their predictions at both time points. Below are the distributions and descriptive statistics for both of these measures.



Confidence in first prediction:

vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1684 61.18 23.95 65 62.84 22.24 0 100 100 -0.56 -0.23 0.58



Confidence in second prediction:

vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1131 55.29 24.69 54 56.31 25.2 0 100 100 -0.32 -0.55 0.73



Confidence and SSE

First prediction confidence and SSE

## 
## Call:
## lm(formula = conf_1 ~ SSE_1, data = df.1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -61.796 -13.385   4.708  16.778  46.499 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  66.5046     3.6605  18.168   <2e-16 ***
## SSE_1        -0.1944     0.1455  -1.336    0.183    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.79 on 217 degrees of freedom
## Multiple R-squared:  0.008163,   Adjusted R-squared:  0.003592 
## F-statistic: 1.786 on 1 and 217 DF,  p-value: 0.1828



Second prediction confidence and SSE

## 
## Call:
## lm(formula = conf_2 ~ SSE_2, data = df.2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -58.124 -14.257  -0.093  20.545  46.535 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  61.5987     3.9473  15.605   <2e-16 ***
## SSE_2        -0.1779     0.1291  -1.378     0.17    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.18 on 228 degrees of freedom
## Multiple R-squared:  0.008254,   Adjusted R-squared:  0.003905 
## F-statistic: 1.898 on 1 and 228 DF,  p-value: 0.1697




Remaining questions:

How should we handle prediction adjustments? Can we incorporate them into an RL-type model somehow? Perhaps as a proxy for uncertainty?

Which predictions/PEs should we use? Can we somehow incorporate both?