Replication of Scheme Induction and Analogical Transfer by Gick and Holyoak (1983, Cognbitive Psychology)

# load packages
library(tidyverse) # for data munging

## ── Attaching packages ───────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.7
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0

## ── Conflicts ──────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library('scales')     # for scale_y_continuous(label = percent)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

library('ggthemes')   # for scale_fill_few('medium')
knitr::opts_chunk$set(comment = NA)
options(ztable.type = 'html')

Introduction

In their original paper ““Scheme Induction and Analogical Transfer”, Gick and Holyoak (1983) investigate the nature of analogical transfer between semantically distint problems. This paper focused on “how analogies are noticed and then applied to generate solutions to novel problems.” According to Schwartz and Goldstone (2016), Gick and Holyoak’s work demonstrate a way to use analogies by providing students with at least two examples that they use to later identify the scheme. The study participants were more likely to derive the problem scheme as an unintended result of identifying the similarities between two given analogs. The quality of the induced scheme was highly predictive of the subsequent transfer performance. This way of using analogies turned out to be powerful for learning and transfer. For the purpose of this replication project, I will focus on part II of this study, specifically experiment 5.

My research interests lie at the intersection of engineering education and language within the context of the multilingual science classroom. Specifically, I seek to identify opportunities for children to learn about engineering problem solving and design, without the limitations of monoglossic learning environments. However, very little is known about how the linguistic contexts may afford or limit the construction of knowledge and meaning in science and engineering. Although engineering has a long history both in higher education and as a profession, the recent K-12th science education standards bring the ideas and practices of engineering to the elementary and secondary levels (NRC, 2012; NGSS, 2013) . Due to the recent inclusion of engineering in the standards, I argue that a successful implementation of the framework demands a better understanding of how we can afford opportunities for multilingual learning in K-12th science and engineering. Contemporary research has yet to define those opportunities. It also needs to address research questions about how analogies are noticed in engineering problems and then applied to generate solutions to novel challenges, which it is similarly to Gick and Holyoak’s work. This paper followed a 2X2 factorial design as participants were exposed to two conditions in the design. In the case of experiment 5, a total of 143 college students participated in the study. Participants were evenly splited between two conditions, with-principle condition and without-principle condition. The authors used “one pair of dissimilar story analogs: “The General” and “The Fire Chief.” Subjects in the without-principle condition read the two stories just as they appear in Appendix II of the article. Those in the with-principle condition read the identical stories, except that the verbal statement of principle was appended as the final paragraph of each story in the following format: “(The general or the fire chief) attributed his success to an important principle: If you need a large force to accomplish some purpose, but are prevented from applying such a force directly, many smaller forces applied simultaneously from different directions may work just as well”). This statement was designed to focus the subjects’ attention on the critical aspects of the schema implicit in each of the two analogs.
As part of the procedure, subjects were first told to study the two stories carefully for 5 min in preparation for answering questions about them. The stories were then collected, and the remainder of the initial story task was done from memory. Subjects were asked to briefly summarize each story, rate the comprehensibility of each, describe as clearly as possible the ways in which the situations in the two stories seemed similar, and rate their overall similarity. After this initial task was completed, the radiation problem was administered in the usual two-pass manner: first without a hint to use the prior story or statement, and then with such a hint. The challenges of performing this type of experiment are related to sampling and utility.

Link to the original paper

Publication in Rpubs

Github repository

Link to paradigm or survey data collection instrument

Image of the MTurk Sandbox trial

Draft of email to the author requesting feedback

#Raw data for Pilot B 
library(readr)
cruda <- read_csv("~/Desktop/PhD Stanford /PSYCH-251 Experimental Methods/Final Project /Pilot B/PilotB_120518/Comprehension_v3_11/PilotB_120518_Data.csv")

Parsed with column specification:
cols(
  `Duration (in seconds)` = col_integer(),
  RecordedDate = col_character(),
  Q24 = col_character(),
  Q26 = col_character(),
  Q27 = col_character(),
  Q28 = col_character(),
  Q29_1 = col_character(),
  Q54 = col_character(),
  Q58 = col_character(),
  Q59 = col_character(),
  Q60 = col_character(),
  Q61_1 = col_character(),
  Q47 = col_character(),
  Q41 = col_character(),
  Q1 = col_character(),
  Q1_3_TEXT = col_character(),
  Q2 = col_character(),
  Q3 = col_character(),
  Q34 = col_character()
)

head(cruda)

# A tibble: 6 x 19
  `Duration (in s… RecordedDate Q24   Q26   Q27   Q28   Q29_1 Q54   Q58  
             <int> <chr>        <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1               NA <NA>         Summ… summ… Comp… Desc… The … Summ… summ…
2              536 12/5/18 15:… Ther… Mr. … Inte… They… Mode… <NA>  <NA> 
3              410 12/5/18 15:… The … The … Good  The … Agree <NA>  <NA> 
4              635 12/5/18 16:… <NA>  <NA>  <NA>  <NA>  <NA>  A ge… The …
5             1051 12/5/18 16:… <NA>  <NA>  <NA>  <NA>  <NA>  The … Mr. …
6              641 12/5/18 16:… one … one … Inte… they… Mode… <NA>  <NA> 
# ... with 10 more variables: Q59 <chr>, Q60 <chr>, Q61_1 <chr>,
#   Q47 <chr>, Q41 <chr>, Q1 <chr>, Q1_3_TEXT <chr>, Q2 <chr>, Q3 <chr>,
#   Q34 <chr>

The feedback on the paradigm that I got by running it several times. After running the paradigm several times, the users advised me to put a break page between the stories and allow for users to click on continue when they have finished with their stories. Please see below the clean data based on the raw data presented on the previous section.

d <- read_csv("~/Desktop/PhD Stanford /PSYCH-251 Experimental Methods/Final Project /PilotA/Trial 2_PilotA/Table A_Data_Trial2_PilotA.csv")

Parsed with column specification:
cols(
  Subject = col_integer(),
  Condition = col_integer(),
  BeforeHint = col_integer(),
  AfterHint = col_integer(),
  Comprenhensibility = col_character()
)

head(d)

# A tibble: 3 x 5
  Subject Condition BeforeHint AfterHint Comprenhensibility
    <int>     <int>      <int>     <int> <chr>             
1     111         1          0         1 Good              
2     112         1          1         1 Poor              
3     113         2          1         1 Intermediate

Methods

Power Analysis

#reproduces the exact test statistic from the original article
x <- matrix(c(45,28,28,42), nrow = 2, ncol = 2) 
c2 <- chisq.test(x)
DescTools::GTest(x)


    Log likelihood ratio (G-test) test of independence without
    correction

data:  x
G = 6.7515, X-squared df = 1, p-value = 0.009367

#Original effect size for the paper
w=sqrt(as.numeric(c2$statistic)/sum(x))
w

[1] 0.2024462

# 80% power
pwr::pwr.chisq.test(w=w, N=192, df=1)


     Chi squared power calculation 

              w = 0.2024462
              N = 192
             df = 1
      sig.level = 0.05
          power = 0.8010049

NOTE: N is the number of observations

# 90% power
pwr::pwr.chisq.test(w=w, N=257, df=1)


     Chi squared power calculation 

              w = 0.2024462
              N = 257
             df = 1
      sig.level = 0.05
          power = 0.9006905

NOTE: N is the number of observations

# 95% power
pwr::pwr.chisq.test(w=w, N=318, df=1)


     Chi squared power calculation 

              w = 0.2024462
              N = 318
             df = 1
      sig.level = 0.05
          power = 0.9505458

NOTE: N is the number of observations

# 68% is the power of the original study for experiment #5
pwr::pwr.chisq.test(w=w, N=143, df=1)


     Chi squared power calculation 

              w = 0.2024462
              N = 143
             df = 1
      sig.level = 0.05
          power = 0.6775852

NOTE: N is the number of observations

The original effect size for the Gick and Holyoak (1983) paper is 0.20. The power analysis for samples to achieve 80% is a sample size of 192 participants. The power analysis for samples to achieve 90% is 257 participants, and to achieve 95% power to detect an effect size of 0.20 is 318 subjects. The original study was under-power at 68% with 143 participants. In light of these results and due to the resource limitations for the replication that I am attempting to complete here, the replication is going to be also under-power.

Planned Sample

A total of 143 subjects, college students, participated in the original experiment 5. Participants were evenly splited between the two conditions. For the purpose of this replication project, I am considering a sample size equal to the sample size use by the authors in the original study. The data will be collected at the begining of December 2018-2019, Fall quarter. To be considered for this study, subjects must be currently enrolled in college. As in the original paper, this study have participants in two different conditions, with principle and without principle.

Materials

All materials used in this experiment come from the original article and were followed precisely as indicated by the original authors. The authors used “one pair of dissimilar story analogs: “The General” and “The Fire Chief.” Subjects served in one of two conditions. Those in the without-principle condition read the two stories just as they appear in Appendix II. Those in the with-principle condition read the identical stories, except that the verbal statement of the convergence principle, previously used in Experiment 2, was appended as the final paragraph of each (“…attributed his success to an important principle: If you need a large force to accomplish some purpose, but are prevented from applying such a force directly, many smaller forces applied simultaneously from different directions may work just as well”). The statement was worded exactly as it had been in the earlier experiment (except, of course, that the sentence for “The Fire Chief” began as following: “The fire chief attributed his success. . .“). The statement was thus designed to focus the subjects’ attention on the critical aspects of the schema implicit in each of the two analogs.

Here are the stories included in the experiment:

Radiation Problem “Suppose you are a doctor faced with a patient who has a malignant tumor in his stomach. It is impossible to operate on the patient, but unless the tumor is destroyed the patient will die. There is a kind of ray that can be used to destroy the tumor. If the rays reach the tumor all at once at a sufficiently high intensity, the tumor will be destroyed. Unfortunately, at this intensity the healthy tissue that the rays pass through on the way to the tumor will also be destroyed. At lower intensities the rays are harmless to healthy tissue, but they will not affect the tumor either. What type of procedure might be used to destroy the tumor with the rays, and at the same time avoid destroying the healthy tissue?”

The General “A small country was ruled from a strong fortress by a dictator. The fortress was situated in the middle of the country, surrounded by farms and villages. Many roads led to the fortress through the countryside. A rebel general vowed to capture the fortress. The general knew that an attack by his entire army would capture the fortress. He gathered his army at the head of one of the roads, ready to launch a full-scale direct attack. However, the general then learned that the dictator had planted mines on each of the roads. The mines were set so that small bodies of men could pass over them safely, since the dictator needed to move his troops and workers to and from the fortress. However, any large force would detonate the mines. Not only would this blow up the road, but it would also destroy many neighboring villages. It therefore seemed impossible to capture the fortress. However, the general devised a simple plan. He divided his army into small groups and dispatched each group to the head of a different road. When all was ready he gave the signal and each group marched down a different road. Each group continued down its road to the fortress so that the entire army arrived together at the fortress at the same time. In this way, the general captured the fortress and overthrew the dictator.”

The Fire Chief “One night a fire broke out in a wood shed full of timber on Mr. Johnson’s place. As soon as he saw flames he sounded the alarm, and within minutes dozens of neighbors were on the scene armed with buckets. The shed was already burning fiercely, and everyone was afraid that if it wasn’t controlled quickly the house would go up next. Fortunately, the shed was right beside a lake, so there was plenty of water available. If a large volume of water could hit the fire at the same time, it would be extinguished. But with only small buckets to work with, it was hard to make any headway. The fire seemed to evaporate each bucket of water before it hit the wood. It looked like the house was doomed. Just then the fire chief arrived. He immediately took charge and organized everyone. He had everyone fill their bucket and then wait in a circle surrounding the burning shed. As soon as the last man was prepared, the chief gave a shout and everyone threw their bucket of water at the tire. The force of all the water together dampened the fire right down, and it was quickly brought under control, Mr. Johnson was relieved that his house was saved, and the village council voted the tire chief a raise in pay.”

Procedure

As indicated in the introduction of this paper, participants were first told to study the two stories carefully for 5 min in preparation for answering questions about them. The stories were then collected, and the remainder of the initial story task was done from memory. Subjects were asked to briefly summarize each story, rate the comprehensibility of each, describe as clearly as possible the ways in which the situations in the two stories seemed similar, and rate their overall similarity. After this initial task was completed, the radiation problem was administered in the usual two-pass manner.

There are two conditions included in the paradigm: with principle and without principle condition. A principle is a statement at the end of the stories that makes explicit the solution to the problem. In both conditions, participants are asked to solve the “Radiation Problem”, first without a hint and later with a hint.

Analysis Plan

For this replication of experiment 5, I chose to calculate the maximum likelihood chi square (G2), as done in the original experiment. I will calculate the G2 for the frequency data that compares the two conditions, with principle and without principle. In addition, a table with the frequency data a percentage of the frequency values will be included in the analysis.

Differences from Original Study

There are no specific differences in the sample size between the original study and this replication. However, instead of in-person administration of the experiment, this replication will be an experiment on Amazon Mechanical Turk (mturk.com). This is a deviation from the original population tested in the original study. All of the other requirements for the sample population have been kept consistent in order to minimize the differences that could arise from administering the experiment online. However, differences will inevitably occur.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

Load Relevant Libraries and Functions

# load packages
library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files



library('scales')     # for scale_y_continuous(label = percent)
library('ggthemes')   # for scale_fill_few('medium')
knitr::opts_chunk$set(comment = NA)
options(ztable.type = 'html')

Import data

First, let’s look at the “data” collected from me and my friends to guarantee that the data is logging correctly. This was my first attempt to raw data that I collected with three samples in total. The data was logging in correctly, but I have more information than needed it for this experiment.

###Data Preparation
library(readr)
rawdata <- read_csv("~/Desktop/PhD Stanford /PSYCH-251 Experimental Methods/Final Project /PilotA/Trial 2_PilotA/Data_Trial2_PilotA.csv")

Parsed with column specification:
cols(
  .default = col_character()
)

See spec(...) for full column specifications.

rawdata

# A tibble: 4 x 24
  StartDate EndDate Progress `Duration (in s… RecordedDate ResponseId Q1   
  <chr>     <chr>   <chr>    <chr>            <chr>        <chr>      <chr>
1 Start Da… End Da… Progress Duration (in se… Recorded Da… Response … Gend…
2 11/25/18… 11/25/… 100      137              11/25/18 17… R_u8KIu3Y… Pref…
3 11/25/18… 11/25/… 100      35               11/25/18 17… R_2AMWJ5p… Pref…
4 11/25/18… 11/25/… 100      387              11/25/18 17… R_2Xb76xN… Fema…
# ... with 17 more variables: Q1_3_TEXT <chr>, Q2 <chr>, Q3 <chr>,
#   Q24 <chr>, Q22 <chr>, Q26 <chr>, Q27 <chr>, Q28 <chr>, Q29_1 <chr>,
#   Q54 <chr>, Q55 <chr>, Q58 <chr>, Q59 <chr>, Q60 <chr>, Q61_1 <chr>,
#   Q47 <chr>, Q41 <chr>

Data exclusion / filtering

d <- read_csv("~/Desktop/PhD Stanford /PSYCH-251 Experimental Methods/Final Project /Pilot B/PilotB_120518/Comprehension_v3_11/PilotB_120518_Data_Clean.csv")

Parsed with column specification:
cols(
  Subject = col_integer(),
  Condition = col_integer(),
  Comprehensibility = col_character(),
  `Describe stories similarity` = col_character(),
  Scheme.Quality = col_character(),
  Radiation.WO.Hint = col_character(),
  BeforeHint = col_integer(),
  Radiation.W.Hint = col_character(),
  AfterHint = col_integer(),
  Gender = col_character(),
  Gender.Other = col_character(),
  Race = col_character(),
  Age = col_character()
)

head(d)

# A tibble: 6 x 13
  Subject Condition Comprehensibili… `Describe stori… Scheme.Quality
    <int>     <int> <chr>            <chr>            <chr>         
1     111         1 Intermediate     They were both … Intermediate  
2     112         1 Good             The force of al… Poor          
3     113         2 Good             Both required d… Poor          
4     114         2 Good             The two stories… Poor          
5     115         1 Intermediate     they are simila… Poor          
6     116         1 Good             In both stories… Poor          
# ... with 8 more variables: Radiation.WO.Hint <chr>, BeforeHint <int>,
#   Radiation.W.Hint <chr>, AfterHint <int>, Gender <chr>,
#   Gender.Other <chr>, Race <chr>, Age <chr>

Prepare data for analysis - create columns etc.

The code for my planned analyses and the confirmation that I can run the code on my data.

summary_table <- d %>%
  group_by(Condition) %>%
  summarise(beforehint_freq = sum(BeforeHint), 
            beforehint_perc = round((mean(BeforeHint)*100),0),
            after_freq = sum(AfterHint & !BeforeHint),
            after_perc = round(mean((AfterHint & !BeforeHint)*100),0),
            notbefore_freq = n() - beforehint_freq,
            notbefore_perc = round(((n() - beforehint_freq)/n()),0))
summary_table

# A tibble: 2 x 7
  Condition beforehint_freq beforehint_perc after_freq after_perc
      <int>           <int>           <dbl>      <int>      <dbl>
1         1               1              25          1         25
2         2               2              67          0          0
# ... with 2 more variables: notbefore_freq <int>, notbefore_perc <dbl>

# Is this what I am going to use to calculate Chi Square? 
contingency_table <- matrix(nrow = 2, ncol = 2, 
                            c(summary_table$beforehint_freq[summary_table$Condition == 1],
                              summary_table$beforehint_freq[summary_table$Condition == 2],
                              summary_table$notbefore_freq[summary_table$Condition == 1],
                              summary_table$notbefore_freq[summary_table$Condition == 2]))
contingency_table

     [,1] [,2]
[1,]    1    3
[2,]    2    1

# Table clean for the graph with conditions in the x-axys and percentages in the y-axys 
summary_table.clean <- summary_table %>% 
  select(-beforehint_freq, -after_freq, -notbefore_freq, -notbefore_perc) 
summary_table.clean

# A tibble: 2 x 3
  Condition beforehint_perc after_perc
      <int>           <dbl>      <dbl>
1         1              25         25
2         2              67          0

#change data to a long format 
data.long <- gather(summary_table.clean, Factor, Percentage, beforehint_perc, after_perc) %>%
  mutate (Condition=factor (Condition, levels= 1:2, labels = c("With Principle", "Without Principle")))
data.long

# A tibble: 4 x 3
  Condition         Factor          Percentage
  <fct>             <chr>                <dbl>
1 With Principle    beforehint_perc         25
2 Without Principle beforehint_perc         67
3 With Principle    after_perc              25
4 Without Principle after_perc               0

Confirmatory analysis

As indicated in the analysis plan, below is the Chi Square and the corresponding table.

c2.contingency.table <- chisq.test(contingency_table)

Warning in chisq.test(contingency_table): Chi-squared approximation may be
incorrect

DescTools::GTest(contingency_table)


    Log likelihood ratio (G-test) test of independence without
    correction

data:  contingency_table
G = 1.2429, X-squared df = 1, p-value = 0.2649

There is no graph in the original paper. However, I have decided to produce the following graph to show the percentage of participants by condition that answer the “Radiation Problem” before and after the hint.

#Stacked percent plot of the conditions and the percentages in each of the radiation problem passes 
grafica <- ggplot(data.long, aes(Condition, Percentage, fill = Factor)) +
  geom_bar(stat = "identity", width = 0.5) +
  geom_text(aes(label= Percentage), vjust=1.6, 
            color="White", size=5)+
  theme_bw()+ theme_minimal() +
  scale_fill_brewer(labels = c("After Hint","Before Hint"), palette = "Paired")
grafica

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

When contacted on December 2nd, 2018, the second author replied the following:

“Hi Greses – the paradigm looks good to me. Maybe tweak the wording here:‘Please propose a solution to “The Radiation” problem that is suggested by one or both of the prior stories, “The General” and “The Fire Chief”. (If you previously proposed a solution suggested by the stories, you can simply repeat it.)’

You’ll need to score schema quality; details in original paper I think. There have been various conceptual replications (with different materials), e.g., studies by Jeff Loewenstein & Dedre Gentner on teaching negotiation strategies. Maybe check a review by Goldwater & Schalk (2016) in Psych Bulletin. Of course MTurk has its issues – hard to know whether people will pay attention. But scoring schema quality will give evidence as to whether or not Turkers were using their brains ☺

Good luck!

—Keith"

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.

Comments from MTurk Participants in Pilot B

From all MTurk participants in PilotB (n=7), two did not provide feedback; one complaint about lacking time and the rest (n= 4) provided positive feedback. The participant who complaint, stated: “Not enough time given to do this without stressing out”. The positive feedback included: “Thank you for giving me the opportunity to participate in this study” “awesome survey” “All was good, thanks” “good”.

In terms of time that participants took to answer the questionnaire, the minimum time was about 7 minutes and the maximum time was about 18 minutes, with an average time of 11 minutes.

Replication of “Scheme Induction and Analogical Transfer” by Gick and Holyoak (1983, Cognbitive Psychology)

Pérez, Greses (greses@stanford.edu)

December 05, 2018