Project 3 - Is your Rat the next Einstein?

Background and Aims - The Pygmalion Effect and Our Data Source

This report covers information taken from a study done by a group of students attempting to replicate Rosenthal and Fode’s “Clever Hans” effect. 12 Students instructed each a rat to make their way through a maze and over the course of 5 days, recorded their successful runs out of 50. More importantly, they also gathered data based on the student’s prior expectation of success of their rat and whether the students were supplied a “placebo” treatment regarding the aptitude of their rat.

This report aims to analyze this data set and investigate whether a “Clever Hans” effect is occurring within the ex2120 datasheet.

Computational Methods

All results were calculated and modelled using R Version 4.1.0. A majority of functions were taken from the base R package. The data set used for this report was taken from the Sleuth3 package.

Analysis

Pre-analysis tasks: Priming the Data-set for Analysis.

The most crucial step of analyzing this data was actually placing it within an Rdataframe, calling it from the Sleuth3 package. Furthermore, I removed the “student” column of the ex2120 dataset as it’s not actually of any value as a data analysis tool.

library(Sleuth3)
RatData <-ex2120[,2:5]
head(RatData)

##   PriorExp Treatment Day Success
## 1       -7      dull   1      10
## 2       -7      dull   2      13
## 3       -7      dull   3      12
## 4       -7      dull   4      12
## 5       -7      dull   5       9
## 6       -6    bright   1      18

Calculating the Odds Ratio and Confidence Intervals for the Odds Ratio

To calculate the Odds Ratio of the dataset it became necessary to truncate all of the successes of the “bright” and “dull” into a “total successes” as well as creating a “total failures” column to aid in using a Fisher’s Exact Test.

Success <- tapply(RatData$Success, RatData$Treatment, sum)
Failure <- tapply(50-RatData$Success, RatData$Treatment, sum)
RatOdds <- cbind(Success, Failure)

When not including confounding variables, A Fisher Exact Test was employed to determine both the Odds Ratio and CI’s for the Odds Ratio.

fisher.test(RatOdds)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  RatOdds
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.704414 2.317073
## sample estimates:
## odds ratio 
##   1.986617

As seen above, the p value is highly statistically significant(p<0.001) and the Odds ratio seems to be functionally a 2 to 1 ratio (CI 1.7,2.3) that a “bright” rat will succeed over a “dull” rat.

Generating a Robust Logistic Regression Model

In order to prepare the data set for logistical regression, I created a two column matrix marking successes and failures.

binResponse <- cbind(RatData$Success,50-RatData$Success)

Fitting an initial Logistic Regression model incorporating all of the coefficients in a single model was the first step to interpreting the data. The model in question is seen below.

RatGLM <- glm(binResponse~ RatData$Treatment+RatData$PriorExp+RatData$Day, family=binomial)
summary(RatGLM)

## 
## Call:
## glm(formula = binResponse ~ RatData$Treatment + RatData$PriorExp + 
##     RatData$Day, family = binomial)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2803  -0.8378  -0.1295   0.7614   3.7635  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -0.54578    0.09789  -5.576 2.47e-08 ***
## RatData$Treatmentdull -0.71976    0.07798  -9.230  < 2e-16 ***
## RatData$PriorExp       0.02071    0.00736   2.814   0.0049 ** 
## RatData$Day            0.10910    0.02725   4.004 6.22e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 201.543  on 59  degrees of freedom
## Residual deviance:  96.264  on 56  degrees of freedom
## AIC: 357.98
## 
## Number of Fisher Scoring iterations: 4

Before changing or altering the model as a whole, it’s good to note that the coefficients are all statistically significant predictors of success in the maze rats(p<0.01). As a precaution, I also did a Deviance Goodness-Of-Fit test, shown below.

residual.deviance <- summary(RatGLM)$deviance
deg.of.freedom <- summary(RatGLM)$df.residual
1-pchisq(residual.deviance,deg.of.freedom)

## [1] 0.0006624085

The model doesen’t appear to be fitting very well, so investigations were done into ways to improve the fit. The next step was using the step command to investigate if there were any interaction effects between the various coefficients and if they improved the model to any extent. This can be seen below

RatStep <- step(RatGLM, scope=list(
  lower=.~1,
  upper=.~RatData$Treatment+RatData$PriorExp+RatData$Day+(RatData$Treatment*RatData$PriorExp*RatData$Day)^3
))

## Start:  AIC=357.98
## binResponse ~ RatData$Treatment + RatData$PriorExp + RatData$Day
## 
##                                      Df Deviance    AIC
## + RatData$Treatment:RatData$Day       1   93.490 357.20
## <none>                                    96.264 357.98
## + RatData$PriorExp:RatData$Day        1   95.856 359.57
## + RatData$Treatment:RatData$PriorExp  1   96.252 359.96
## - RatData$PriorExp                    1  104.234 363.95
## - RatData$Day                         1  112.383 372.10
## - RatData$Treatment                   1  183.250 442.96
## 
## Step:  AIC=357.2
## binResponse ~ RatData$Treatment + RatData$PriorExp + RatData$Day + 
##     RatData$Treatment:RatData$Day
## 
##                                      Df Deviance    AIC
## <none>                                    93.490 357.20
## - RatData$Treatment:RatData$Day       1   96.264 357.98
## + RatData$PriorExp:RatData$Day        1   92.707 358.42
## + RatData$Treatment:RatData$PriorExp  1   93.475 359.19
## - RatData$PriorExp                    1  101.473 363.19

summary(RatStep)

## 
## Call:
## glm(formula = binResponse ~ RatData$Treatment + RatData$PriorExp + 
##     RatData$Day + RatData$Treatment:RatData$Day, family = binomial)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9698  -0.8442  -0.1453   0.8494   3.7521  
## 
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                       -0.671945   0.124306  -5.406 6.46e-08 ***
## RatData$Treatmentdull             -0.442043   0.183631  -2.407  0.01607 *  
## RatData$PriorExp                   0.020741   0.007366   2.816  0.00487 ** 
## RatData$Day                        0.150800   0.037096   4.065 4.80e-05 ***
## RatData$Treatmentdull:RatData$Day -0.091102   0.054709  -1.665  0.09587 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 201.54  on 59  degrees of freedom
## Residual deviance:  93.49  on 55  degrees of freedom
## AIC: 357.2
## 
## Number of Fisher Scoring iterations: 4

residual.deviance1 <- summary(RatStep$deviance)
deg.of.freedom1 <- summary(RatStep$df.residual)
1-pchisq(residual.deviance1,deg.of.freedom1)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0009291 0.0009291 0.0009291 0.0009291 0.0009291 0.0009291

As seen above, the Deviance goodness of fit test is better, but still not very useful as a fitted model.

Project 3 - Is your Rat the next Einstein?

Jesse Webb

Background and Aims - The Pygmalion Effect and Our Data Source

Computational Methods

Analysis

Pre-analysis tasks: Priming the Data-set for Analysis.

Calculating the Odds Ratio and Confidence Intervals for the Odds Ratio

Generating a Robust Logistic Regression Model

Using the Logistic Regression Model to Generate Confidence Intervals

Discussion

Conclusions and Predictions