Background and Aims - The Pygmalion Effect and Our Data Source

This report covers information taken from a study done by a group of students attempting to replicate Rosenthal and Fode’s “Clever Hans” effect. 12 Students instructed each a rat to make their way through a maze and over the course of 5 days, recorded their successful runs out of 50. More importantly, they also gathered data based on the student’s prior expectation of success of their rat and whether the students were supplied a “placebo” treatment regarding the aptitude of their rat.

This report aims to analyze this data set and investigate whether a “Clever Hans” effect is occurring within the ex2120 datasheet. Our null hypothesis is that there will be no notable difference between either treatment of “Bright” or “Dull” rats, due to the fact they are both functionally the same groups.

Computational Methods

All results were calculated and modeled using R Version 4.1.0. A majority of functions were taken from the base R package. fisher.test was used to calculate the odds ratio of the initial data-set. cbind and tapply were used to generate and alter data frames for use in linear regression and prediction. The GLM command was used to generate the logistic regression models found in this report and the step command facilitated variable selection. The confint and exp() command were used to generate confidence intervals for the coefficients found in the logistic regression models. Finally, the predict command was used in concert with the seq, c and grid command to generate our predictive data found at the bottom of the report. The data set used for this report was taken from the Sleuth3 package. ggplot2 was used to create the facet plot found near the bottom of this report.

Analysis

Pre-analysis tasks: Priming the Data-set for Analysis.

The most crucial step of analyzing this data was actually placing it within an Rdataframe, calling it from the Sleuth3 package. Furthermore, I removed the “student” column of the ex2120 dataset as it’s not actually of any value as a data analysis tool.

library(Sleuth3)
library(ggplot2)
RatData <-ex2120[,2:5]
head(RatData)
##   PriorExp Treatment Day Success
## 1       -7      dull   1      10
## 2       -7      dull   2      13
## 3       -7      dull   3      12
## 4       -7      dull   4      12
## 5       -7      dull   5       9
## 6       -6    bright   1      18

Calculating the Odds Ratio and Confidence Intervals for the Odds Ratio

To calculate the Odds Ratio of the dataset it became necessary to truncate all of the successes of the “bright” and “dull” into a “total successes” as well as creating a “total failures” column to aid in using a Fisher’s Exact Test.

Success <- tapply(RatData$Success, RatData$Treatment, sum)
Failure <- tapply(50-RatData$Success, RatData$Treatment, sum)
RatOdds <- cbind(Success, Failure)

When not including confounding variables, A Fisher Exact Test was employed to determine both the Odds Ratio and CI’s for the Odds Ratio.

fisher.test(RatOdds)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  RatOdds
## p-value < 0.00000000000000022
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.704414 2.317073
## sample estimates:
## odds ratio 
##   1.986617

As seen above, the p value is highly statistically significant(p<0.001) and the Odds ratio seems to be functionally a 2 to 1 ratio (CI 1.7,2.3) that a “bright” rat will succeed over a “dull” rat.

Generating a Robust Logistic Regression Model

In order to prepare the data set for logistical regression, I created a two column matrix marking successes and failures.

binResponse <- cbind(RatData$Success,50-RatData$Success)
binResponse
##       [,1] [,2]
##  [1,]   10   40
##  [2,]   13   37
##  [3,]   12   38
##  [4,]   12   38
##  [5,]    9   41
##  [6,]   18   32
##  [7,]   20   30
##  [8,]   21   29
##  [9,]   23   27
## [10,]   30   20
## [11,]   16   34
## [12,]   14   36
## [13,]   20   30
## [14,]   20   30
## [15,]   20   30
## [16,]   11   39
## [17,]   20   30
## [18,]   27   23
## [19,]   17   33
## [20,]   16   34
## [21,]   27   23
## [22,]   24   26
## [23,]   26   24
## [24,]   28   22
## [25,]   27   23
## [26,]   16   34
## [27,]   14   36
## [28,]   14   36
## [29,]   13   37
## [30,]   15   35
## [31,]    8   42
## [32,]   10   40
## [33,]   21   29
## [34,]   10   40
## [35,]   20   30
## [36,]   13   37
## [37,]   20   30
## [38,]   18   32
## [39,]   20   30
## [40,]   22   28
## [41,]   24   26
## [42,]   23   27
## [43,]   27   23
## [44,]   29   21
## [45,]   27   23
## [46,]   13   37
## [47,]   12   38
## [48,]   23   27
## [49,]   17   33
## [50,]   16   34
## [51,]    9   41
## [52,]   15   35
## [53,]   24   26
## [54,]   12   38
## [55,]   11   39
## [56,]   19   31
## [57,]   17   33
## [58,]   23   27
## [59,]   33   17
## [60,]   29   21

Fitting an initial Logistic Regression model incorporating all of the coefficients in a single model was the first step to interpreting the data. The model in question is seen below.

RatGLM <- glm(binResponse ~ Treatment+PriorExp+Day, family=binomial, data=RatData)
summary(RatGLM)
## 
## Call:
## glm(formula = binResponse ~ Treatment + PriorExp + Day, family = binomial, 
##     data = RatData)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2803  -0.8378  -0.1295   0.7614   3.7635  
## 
## Coefficients:
##               Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)   -0.54578    0.09789  -5.576         0.0000000247 ***
## Treatmentdull -0.71976    0.07798  -9.230 < 0.0000000000000002 ***
## PriorExp       0.02071    0.00736   2.814               0.0049 ** 
## Day            0.10910    0.02725   4.004         0.0000622190 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 201.543  on 59  degrees of freedom
## Residual deviance:  96.264  on 56  degrees of freedom
## AIC: 357.98
## 
## Number of Fisher Scoring iterations: 4

Before changing or altering the model as a whole, it’s good to note that the coefficients are all statistically significant predictors of success in the maze rats(p<0.01). As a precaution, I also did a Deviance Goodness-Of-Fit test, shown below.

residual.deviance <- summary(RatGLM)$deviance
deg.of.freedom <- summary(RatGLM)$df.residual
1-pchisq(residual.deviance,deg.of.freedom)
## [1] 0.0006624085

The model doesn’t appear to be fitting very well, so investigations were done into ways to improve the fit. The next step was using the step command to investigate if there were any interaction effects between the various coefficients and if they improved the model to any extent. This can be seen below

RatStep <- step(RatGLM, scope=list(
  lower=.~1,
  upper=.~Treatment+PriorExp+Day+(Treatment*PriorExp*Day)^3
))
## Start:  AIC=357.98
## binResponse ~ Treatment + PriorExp + Day
## 
##                      Df Deviance    AIC
## + Treatment:Day       1   93.490 357.20
## <none>                    96.264 357.98
## + PriorExp:Day        1   95.856 359.57
## + Treatment:PriorExp  1   96.252 359.96
## - PriorExp            1  104.234 363.95
## - Day                 1  112.383 372.10
## - Treatment           1  183.250 442.96
## 
## Step:  AIC=357.2
## binResponse ~ Treatment + PriorExp + Day + Treatment:Day
## 
##                      Df Deviance    AIC
## <none>                    93.490 357.20
## - Treatment:Day       1   96.264 357.98
## + PriorExp:Day        1   92.707 358.42
## + Treatment:PriorExp  1   93.475 359.19
## - PriorExp            1  101.473 363.19
summary(RatStep)
## 
## Call:
## glm(formula = binResponse ~ Treatment + PriorExp + Day + Treatment:Day, 
##     family = binomial, data = RatData)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9698  -0.8442  -0.1453   0.8494   3.7521  
## 
## Coefficients:
##                    Estimate Std. Error z value     Pr(>|z|)    
## (Intercept)       -0.671945   0.124306  -5.406 0.0000000646 ***
## Treatmentdull     -0.442043   0.183631  -2.407      0.01607 *  
## PriorExp           0.020741   0.007366   2.816      0.00487 ** 
## Day                0.150800   0.037096   4.065 0.0000480129 ***
## Treatmentdull:Day -0.091102   0.054709  -1.665      0.09587 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 201.54  on 59  degrees of freedom
## Residual deviance:  93.49  on 55  degrees of freedom
## AIC: 357.2
## 
## Number of Fisher Scoring iterations: 4
residual.deviance1 <- summary(RatStep$deviance)
deg.of.freedom1 <- summary(RatStep$df.residual)
1-pchisq(residual.deviance1,deg.of.freedom1)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0009291 0.0009291 0.0009291 0.0009291 0.0009291 0.0009291

After the step-wise variable selection, there was a small reduction in Residual deviance and AIC, which initially seemed like an improvement but it was a good idea to do a Deviance goodness of fit test. As seen above, the Deviance goodness of fit shows that this model fitting is better than the prior model, but still not good. Rather than there being any method of fixing this poor fit, it’s possible that this is simply the best possible fit for this set of variables. Making Day as a factor may fix this, but would require it also be treated as an interaction effect, massively increasing the complexity of the data to an unnecessary extent.

RatDay <- glm(binResponse ~ Treatment+PriorExp+as.factor(Day), family=binomial, data=RatData)
summary(RatDay)
## 
## Call:
## glm(formula = binResponse ~ Treatment + PriorExp + as.factor(Day), 
##     family = binomial, data = RatData)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2750  -0.9184   0.0360   0.6931   2.9640  
## 
## Coefficients:
##                  Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)     -0.518332   0.096274  -5.384         0.0000000729 ***
## Treatmentdull   -0.722176   0.078120  -9.244 < 0.0000000000000002 ***
## PriorExp         0.020776   0.007372   2.818             0.004829 ** 
## as.factor(Day)2  0.141523   0.125468   1.128             0.259335    
## as.factor(Day)3  0.536164   0.122899   4.363         0.0000128498 ***
## as.factor(Day)4  0.379453   0.123654   3.069             0.002150 ** 
## as.factor(Day)5  0.436938   0.123338   3.543             0.000396 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 201.54  on 59  degrees of freedom
## Residual deviance:  86.23  on 53  degrees of freedom
## AIC: 353.94
## 
## Number of Fisher Scoring iterations: 4
ggplot(data = ex2120, aes(x=Day,y=Success, group=Student, colour=Student)) + facet_wrap(~ex2120$Treatment) + geom_line()

Looking at a plot of the data seen above, faceted by both treatments and across all students and days, there are only loose trends that are informed mostly by human-adjacent causes and potential error. For this reason, the “RatStep” model is probably the best model to fit with the information we have.

Using the Logistic Regression Model to Generate Multiplicative Effects

Using the final model to deduce some confidence intervals is a pretty simple process, so that was done below using th confint command.

confint(RatStep)
## Waiting for profiling to be done...
##                          2.5 %      97.5 %
## (Intercept)       -0.916934493 -0.42944336
## Treatmentdull     -0.802974663 -0.08292922
## PriorExp           0.006339361  0.03522122
## Day                0.078289572  0.22375860
## Treatmentdull:Day -0.198395560  0.01610701

These odds are held in log form, and aren’t particularly interpretable conceptually so placing them inside of an exponential function makes them more understandable in a multiplicative sense.

exp(confint(RatStep))
## Waiting for profiling to be done...
##                       2.5 %    97.5 %
## (Intercept)       0.3997426 0.6508713
## Treatmentdull     0.4479943 0.9204163
## PriorExp          1.0063595 1.0358488
## Day               1.0814358 1.2507690
## Treatmentdull:Day 0.8200454 1.0162374

As can be seen above, people given “Dull” rats will typically have lower chances of success and those with higher pre-conceptions of their own success will see higher chances of success. It also seems That giving the students more time to train their rats increases their odds of success, but this benefit is not conveyed onto the “dull” rats.

Discussion - Predictions

priorexp <- seq(-7,10,1)
day <- seq(1,5,1)
treat <- c("bright","dull")
grid <- expand.grid(PriorExp=priorexp, Treatment=treat, Day=day)
Successes <- predict(RatStep,newdata=grid, type="response")
toPlot <- cbind(grid,Successes)
ggplot(data = toPlot, aes(x=Day,y=Successes)) + facet_wrap(~toPlot$Treatment) + geom_point()

The predictions shown in the table above seem incredibly consistent with the previous findings, that “duller” rats, despite there being no actual difference from their “brighter” counterparts are less likely to succeed in the maze experiment. This table does not account for prior expectations of the students, which can be seen in another graph below.

ggplot(data = toPlot, aes(x=Day,y=Successes)) + facet_wrap(~toPlot$Treatment) + geom_point() + facet_wrap(~toPlot$PriorExp)

This plot shows how there is quite a notable increase in chance of success when individuals have higher expectations for themselves that increases the success of their rodent day-on-day. Those with lower expectations of their success notably did not reap the benefits of this effect, and only showed marginal increases in success compared to their more confident counterparts.

Discussion - Findings

This data set is largely a representative example of the Pygmalion Effect (Or “Clever Hans” Effect) occurring in practice. Despite the fact that both Rat treatments were effectively the same, it appears that people’s preconceptions of their rats and their own success was artificially inflating or deflating their chances of success with “Bright” rats having a functionally 2 to 1 Odds of succeeding over their “Duller” counterparts. Prior expectations of success seem to cause anywhere between a 0.6% and a 3% increase in rat success when raised by one unit of expectation. Furthermore, as the days went on there was between a 8 and 25% increase in chances of success per day, of which only the “brighter” rats seemed to benefit from.

There was an interesting trend shown in the plot above where “dull” rats massively increased their performance on day 3 before drastically dropping back down. Current reasoning for this is unknown and is likely a result of human error relating to expectations of rat performance. This is not unexpected considering much of the data from this report is built upon human error, expectations and preconceptions.

With that knowledge and our data in mind, it’s possible to reject the null hypothesis on solid ground. Specifically, that despite their functionally identical training, “Dull” and “Bright” rats will experience different rates of success dependent on the preconceptions of their trainers and their trainer’s success.