Replication of Why Do People Tend to Infer Ought From Is? The Role of Biases in Explanation by Tworek & Cimpian (2016, Psychological Science)

Introduction

Tworek & Cimpian 2016 is a paper that investigates the is-ought fallacy by exploring how people tend to explain what is common (e.g. Roses are commonly gifted on Valentine’s Day) in terms of intrinsic properties (e.g. Roses are beautiful), which then promotes normative conclusions about what is good (e.g. It is good to give roses). This paper is relevant to my interest in how we represent categories and how we develop those category representations, including how we represent descriptive and normative dimensions of categories and to what extent those dimensions are separate and to what extent they interact. Tworek & Cimpian 2016 addresses this topic by investigating how our normative beliefs about categories can emerge from descriptive properties of categories. From replicating this paper, I hope to both learn further about a topic relevant to my research area and develop skills in running the kinds of studies used in the course of that research.

Tworek & Cimpian Experiment 1 shows that individuals’ preference for intrinsic explanations of common phenomena correlates with their endorsement of normative judgments about common phenomena.

Adults on Mechanical Turk were administered a task that assessed their preference for inherent explanatiosn of common phenomena (inherence bias measure) and a task that assessed their endorsement of normative judgments for common phenomena (ought measure), in random order. The inherence bias task consisted of rating agreement with 15 sentences that proposed inherent explanations for common phenomena (e.g. “Black is associated with funerals because of something about the color black or about funerals-maybe because the darkness of black conveys how people feel at funerals”). The ought task consisted of reading 6 passages written like press releases that described a common phenomenon (e.g. the popularity of eating pizza among Americans), and then for each passage, answering an ought question about that phenomenon that either used the term “should” or used the term “good” (e.g. “Do you think it should be that so many Americans eat pizza?”), along with 4 filler questions. Finally, participants provided demographics information and were debriefed. The critical analyses was to show that participants’ inherence bias measure was positively correlated with their ought measure. In other words, participants’ preference for inherence explanations for common phenomena is correlated with their likelihood of endorsing normative ought judgments of common phenomena.

The study will be easy to adminster and run, since it involves surveys administered on MTurk.

The repository for the replication, the preregistration for the replication, and the original paper can be found online.

Methods

Power Analysis

The original effect size was r=0.30. For 80% power, this effect size requires n=82, as calculated using an a priori two-tailed power analysis in G*Power.

This sample size is smaller than the original sample size of n=122, which achieved a post-hoc power of 93%.

Planned Sample

Planned sample size is 82 adults recruited on Amazon Mechanical Turk. Participants will be required to be located within the United States, and will be required to have a HIT acceptance rate of 80% or above.

Materials

The following materials were used:

Ought inferences. Participants read six passages that were structured like and derived from actual press releases. The passages described a typical societal practice (i.e., what is). For example, one was titled “America’s Pizza Obsession: By The Numbers” and read as follows:

The quintessential American food may be apple pie, but its popularity pales beside our national love affair with pizza pies. The Daily reports that Americans consume a staggering 100 acres of pizza a day, according to data from the National Association of Pizza Operators (NAPO). Over $38 billion of pizza is sold in America annually, according to Pizza Today, and 3 billion pizzas are sold in the U.S. each year according to NAPO. 350 slices of pizza are sold every second, according to NAPO, and the average American eats an average of 46 slices of pizza a year, according to Packaged Facts. Overall, a total of 94% of Americans eat pizza (adapted from “America’s pizza obsession: By the numbers,” 2011).

After reading each press release, participants were asked five questions: one ought question (e.g., “Do you think it should be that so many Americans eat pizza?” 1 = definitely no, 9 = definitely yes) and four filler questions that served to camouflage the main focus of the study (e.g., “Do you think the amount of pizza sold will grow in the next 5 years?” “What do you think accounts for the current prices of pizza?”). For three of the press releases, the ought question was phrased with “should” (see the example in the first sentence of this paragraph), and for the other three, the ought questions were phrased with “good” - for example, “Do you think that it’s good that so many Americans drive to work?” (1 = really not good, 9 = really good), which was presented after a passage claiming that 88% of Americans drive to work. Participants’ average scores for the “good” and “should” questions were significantly correlated, r(122) = .37, 95% confidence interval (CI) = [.20, .51], p < .001.

Note that the press releases were purposely about behaviors that fall outside the scope of most existing accounts of sociomoral reasoning (eating pizza, driving to work, drinking coffee, owning a TV, using e-mail, and watching football) so that our results would highlight the unique contribution of our account. All passages were factual in tone, without evaluative language, to avoid influencing participants’ normative judgments (for the full text of the passages, see the Supplemental Material at Open Science Framework, https://osf.io/4kanr/).

Responses to the six ought questions were averaged into a composite score, which we refer to as the ought measure (= .58). The lowest correlation between a particular question and the average of all six questions (i.e., the lowest item-total correlation) was .33. (Note that the results remained the same when excluding the item with the lowest item-total correlation.) The ought measure served as our main dependent variable.

Inherence bias. Fifteen items were used to assess the extent to which participants preferred explanations in terms of inherent facts (e.g., “Black is associated with funerals because of something about the color black or about funerals - maybe because the darkness of black conveys how people feel at funerals”; = .85; lowest item total correlation = .47; see Table 1 for other sample items). All items were rated using a 9-point scale (1 = disagree strongly, 9 = agree strongly) and were presented in random order. Note that, as with the ought measure, the items in the measure of inherence bias were worded factually and did not contain evaluative language. Two catch items were included to detect inattention (e.g., “Please click on the number three below to indicate that you are paying attention”). Participants who missed either of these attention checks were excluded (n = 7).

Control measures. Four control measures were administered to investigate alternative explanations for the predicted relationship between participants’ explanations and their ought inferences. These measures tapped into dimensions that could influence both variables of interest, giving rise to a correlation between them in the absence of a causal relationship. First, we measured participants’ level of education using a scale from 1, less than high school, to 6, doctoral (Ph.D., J.D., M.D.). Second, we measured their fluid intelligence with one 12-item set of Raven’s Progressive Matrices (Raven, 1960; see also Salomon & Cimpian, 2014). Third, we measured participants’ political views: “How would you describe your political attitudes?” (1 = strongly liberal, 9 = strongly conservative). (Because higher scores on this measure indicate more conservative attitudes, we occasionally refer to it as a measure of conservatism.)

Fourth, a measure related to the measure of political views assessed participants’ belief in a just world: for example, “Basically, the world is a just place” (1 = disagree strongly, 9 = agree strongly; Rubin & Peplau, 1975). Table S1 in the Supplemental Material (at Open Science Framework, https://osf.io/4kanr/) provides descriptive statistics for these measures."

Two control measures - belief-in-a-just-world scale, Raven’s Progressive Matrices - were omitted for reasons of study duration and costs, and since the original study found a significant correlation between inherence bias measure and ought measure even without controlling for any of the four control measures.

Procedure

The following procedure was followed, with exceptions noted below:

Participants were tested online via Qualtrics (Qualtrics Labs Inc., Provo, UT). The ought measure, the measure of inherence bias, the belief-in-a-just-world scale, and Raven’s Progressive Matrices were presented in random order. Item order was randomized for all scales except Raven’s Progressive Matrices, which were presented in increasing order of difficulty. The measures of participants’ education and conservatism were administered at the end of the sessions, along with other demographic questions. Finally, participants were debriefed.

The ought measure and inherence bias were presented in random order. Two control measures - belief-in-a-just-world scale and fluid intelligence (a 12-item set of Raven’s Progressive Matrices) - were omitted for reasons of study duration and costs, and since the original study found a significant correlation between inherence bias measure and ought measure even without controlling for any of the four control measures. In demographics, participants were asked for their age instead of DOB out of concerns about identifiable information.

The survey paradigm, adapted from the original survey paradigm, can be found on Qualtrics.

Analysis Plan

Exclusion criteria

Participants will be excluded if they do not provide the answer they are told to provide to any of the two catch items on the Inherence Heuristic Scale manipulation catch, or if they indicate during debriefing that they had not paid attention.

Analysis of interest

A Pearson correlation will be derived between subjects’ inherence bias measure and ought measure. The Pearson correlation is expected to be positive and statistically significant (p<0.05).

Differences from Original Study

Two control measures, fluid intelligence (Raven’s Progressive Matrices) and belief-in-a-just-world, were omitted out of cost concerns. The final analysis in this study finds the Pearson correlation between subjects’ inherence bias measure and ought measure, without controlling for any control measures. Given that the original study reported a significant correlation without controlling for any control measures, as well as a significant correlation with controlling for control measures, this analysis plan is also expected to find a significant correlation without controlling for any control measures. Controlling for control measures should increase the signal, so failing to control for the control measures should merely add noise.

The sample size is smaller (n=82) than the original (n=122) to achieve 80% power if the effect size is the same as what they found.

In addition, the original study paid its Amazon Mechanical Turk participants $0.75 for completing the study. In this replication, participants will be paid $1.45, which is the current federal minimum wage for the expected time to completion (12 minutes). This increase in subject payment is not predicted to affect the result of the replication, since the increase is a trivial amount.

Finally, in the demographics section, participants are asked for their age instead of their date of birth out of concerns about identifiable information, and race, income, and religion were changed from from free response to multiple choice as to make demographic reporting easier. These changes should have no affect on the target result, since demographics is the final set of questions, and the target analysis does not involve controlling for any of these factors.

Methods Addendum (Post Data Collection)

Actual Sample

The final sample size was 74 participants located in the US and recruited on Amazon Mechanical Turk, which provides 75% power on an effect size of r = 0.30, as calculated using a post-hoc two-tailed power analysis in G*Power. 8 additional participants were excluded for failing at least one control question in the inherence heuristic task.

Of the included 74 participants:

Gender. 34 were female and 40 were male.
Race. 60 were monoracial white, 6 monoracial black, 4 monoracial Asian, 1 monoracial Latino/a, and 2 multiracial.
Age. 22 were between 20 and 29 years of age, 30 between 30 and 39 years of age, 10 between 40 and 49 years of age, 9 between 50 and 59 years of age, and 3 60 years of age or older.
Income. 3 were in the $5,000 to $10,000 income bracket, 4 in the $10,000 to $15,000 bracket, 9 in the $15,000 to $25,000 bracket, 11 in the $25,000 to $35,000 bracket, 12 in the $35,000 to $50,000 bracket, 14 in the $50,000 to $65,000 bracket, 10 in the $65,000 to $80,000 bracket, 6 in the $80,000 to $100,000 bracket, and 5 in the over $100,000 bracket.
Religion. 17 were Protestant, 12 were Roman Catholic, 1 was Orthodox, 1 was Mormon, 4 were Jewish, 3 were Muslim, 2 were Buddhist, 14 were agnostic, 14 were atheist, 3 were nothing in particular, and 3 were other.
Political views. 36 were liberal (1-3 on 9-point political views scale), 24 were moderate (4-6), and 14 were conservative.
Education. 13 had completed high school or a GED, 31 had completed some college, 23 had completed a bachelor’s degree, 6 had completed a master’s degree, and 1 had completed a doctoral degree.

Differences from pre-data collection methods plan

The final sample size (n=74) was smaller than the planned sample size (n=82) due to planned exclusions.

Results

Data preparation

Data preparation following the analysis plan.

require("knitr")

## Loading required package: knitr

## Warning: package 'knitr' was built under R version 3.5.1

### Data Preparation


#### Load Relevant Libraries and Functions
library(tidyverse)

## -- Attaching packages -------- tidyverse 1.2.1 --

## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.8
## v tidyr   0.8.2     v stringr 1.3.1
## v readr   1.2.1     v forcats 0.3.0

## Warning: package 'ggplot2' was built under R version 3.5.1

## Warning: package 'tidyr' was built under R version 3.5.1

## Warning: package 'readr' was built under R version 3.5.1

## Warning: package 'dplyr' was built under R version 3.5.1

## -- Conflicts ----------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

#### Import data for analysis
data <- read_csv("../data/Exp1_finalData.csv")

## Parsed with column specification:
## cols(
##   .default = col_character()
## )

## See spec(...) for full column specifications.

#### Initial data formatting
# Filter out those who didn't consent
data <- data %>%
  filter(consent == "I agree")


# Select relevant columns
data <- data %>%
  dplyr::select(IH_1:IH_C2, # inherence heuristic scale
         O_1, O_2, O_3, O_4, O_5, O_6, # ought inferences
         attn, # attention check
         debrief_1:debrief_4, # debriefing
         gender, age, education, income, religion, race, conservatism, English) #demographics

# Add subject ID
data <- mutate(data, subject = row_number())

# Remove all strings in task response columns and conservatism question (mainly Likert scale labels)
data <- data %>% 
  mutate_at(vars(IH_1:O_6, conservatism), ~gsub("([a-z]|[A-Z]|\\s)*", "", .)) %>% 
  mutate_at(vars(IH_1:O_6, conservatism), as.numeric)



#### Data exclusion / filtering

# replace NAs in debriefing questions with text " " to prep for str_detect, which can't handle NAs
data$debrief_1 <- str_replace_na(data$debrief_1, " ")
data$debrief_2 <- str_replace_na(data$debrief_2, " ")
data$debrief_3 <- str_replace_na(data$debrief_1, " ")
data$debrief_4 <- str_replace_na(data$debrief_1, " ")

# record exclusions
data_excl <- tibble(
  attn = sum(data$attn != "Yes" | is.na(data$attn)), 
  IH_C = sum(data$IH_C1 != 3 | is.na(data$IH_C1) | data$IH_C2 != 7 | is.na(data$IH_C2)),
  non_naive = sum(str_detect(data$debrief_1, "is-ought"))
  + sum(str_detect(data$debrief_2, "is-ought"))
  + sum(str_detect(data$debrief_3, "is-ought"))
  + sum(str_detect(data$debrief_4, "is-ought")))
data_excl

## # A tibble: 1 x 3
##    attn  IH_C non_naive
##   <int> <int>     <int>
## 1     0     8         0

# Excluding subjects
data <- data %>%
  filter(attn == "Yes" & # Exclude those who didn't pay attention
        IH_C1 == 3 & IH_C2 == 7 & # Exclude those who didn't answer correctly on control questions on the inherence heuristic scale
        !str_detect(debrief_1, "is-ought") & 
        !str_detect(debrief_2, "is-ought") & 
        !str_detect(debrief_3, "is-ought") & 
        !str_detect(debrief_4, "is-ought")) %>%  # Exclude those who mentioned "is-ought" in debriefing
  select(-attn, -(IH_C1:IH_C2), -(debrief_1:debrief_4)) # Delete attn and inherence heuristic scale control columns


#### Demographics analysis
# gender
data_demographics_gender <- data %>%
  count(gender)
data_demographics_gender

## # A tibble: 2 x 2
##   gender     n
##   <chr>  <int>
## 1 Female    34
## 2 Male      40

# age
data_demographics_age <- data %>%
  count(age)
data_demographics_age

## # A tibble: 32 x 2
##    age       n
##    <chr> <int>
##  1 20        1
##  2 24        2
##  3 25        2
##  4 26        4
##  5 27        5
##  6 28        6
##  7 29        2
##  8 30        7
##  9 31        1
## 10 32        4
## # ... with 22 more rows

# education
data_demographics_education <- data %>%
  count(education)
data_demographics_education

## # A tibble: 5 x 2
##   education                        n
##   <chr>                        <int>
## 1 Bachelor's (B.A., B.S.)         23
## 2 Doctoral (Ph.D., J.D., M.D.)     1
## 3 High school/GED                 13
## 4 Master's (M.A., M.S.)            6
## 5 Some college                    31

# income
data_demographics_income <- data %>%
  count(income)
data_demographics_income

## # A tibble: 9 x 2
##   income                  n
##   <chr>               <int>
## 1 $10,000 to $15,000      4
## 2 $15,000 to $25,000      9
## 3 $25,000 to $35,000     11
## 4 $35,000 to $50,000     12
## 5 $5,000 to $10,000       3
## 6 $50,000 to $65,000     14
## 7 $65,000 to $80,000     10
## 8 $80,000 to $100,000     6
## 9 Over $100,000           5

# religion
data_demographics_religion <- data %>%
  count(religion)
data_demographics_religion

## # A tibble: 11 x 2
##    religion                                       n
##    <chr>                                      <int>
##  1 Agnostic                                      14
##  2 Atheist                                       14
##  3 Buddhist                                       2
##  4 Jewish                                         4
##  5 Mormon                                         1
##  6 Muslim                                         3
##  7 Nothing in particular                          3
##  8 Orthodox such as Greek or Russian Orthodox     1
##  9 Other                                          3
## 10 Protestant                                    17
## 11 Roman Catholic                                12

# race
data_demographics_race <- data %>%
  count(race)
data_demographics_race

## # A tibble: 7 x 2
##   race                                                                   n
##   <chr>                                                              <int>
## 1 Asian/Asian American                                                   4
## 2 Black/African American                                                 6
## 3 Latino/Latina                                                          1
## 4 White/European American                                               60
## 5 White/European American,Asian/Asian American                           1
## 6 White/European American,Black/African American,Native American/Pa~     1
## 7 White/European American,Latino/Latina                                  1

# conservatism
data_demographics_conservatism <- data %>%
  count(conservatism)
data_demographics_conservatism

## # A tibble: 9 x 2
##   conservatism     n
##          <dbl> <int>
## 1            1    13
## 2            2    17
## 3            3     6
## 4            4     5
## 5            5    15
## 6            6     4
## 7            7     3
## 8            8     6
## 9            9     5

#### Prepare data for analysis - create columns etc.
# gather to tidy long form
data_tidy <- data %>% 
  gather(question, response, IH_1:O_6)

# classify items as inherence heuristic questions vs ought questions
data_tidy <- data_tidy %>% 
  separate(question, c("question_type", "question_number"), "_")



# summarize inherence bias average and ought measure average across subjects
data_means <- data_tidy %>%
  group_by(question_type) %>% 
  summarize(avg=mean(response, na.rm=TRUE), sd=sd(response, na.rm=TRUE), n())
data_means

## # A tibble: 2 x 4
##   question_type   avg    sd `n()`
##   <chr>         <dbl> <dbl> <int>
## 1 IH             6.01  2.31  1110
## 2 O              5.73  2.23   444

# summarize inherence bias and ought measure per subject
data_means_subj <- data_tidy %>% 
  group_by(subject, question_type) %>%  
  summarize(avg=mean(response, na.rm=TRUE)) %>% 
  spread(question_type, avg)
data_means_subj

## # A tibble: 74 x 3
## # Groups:   subject [74]
##    subject    IH     O
##      <int> <dbl> <dbl>
##  1       1  5.93  5.67
##  2       2  5.53  7.83
##  3       3  5.13  5.5 
##  4       5  2.4   5.17
##  5       6  7.33  5.67
##  6       7  6.27  7   
##  7       8  2.27  6   
##  8       9  5.2   4.83
##  9      10  7     6.33
## 10      11  5.67  4.67
## # ... with 64 more rows

Confirmatory analysis

The analyses as specified in the analysis plan.

The original result was r(120) = .30, 95% CI = [.13, .46], p < .001.

### Analysis of replication data
# scatterplot of inherence bias measure and ought measure
p <- ggplot(data_means_subj, aes(x = IH, y = O)) + 
  geom_point() + 
  labs(x="Inherence bias measure", 
       y="Ought measure") +
  geom_smooth(method = "lm") +
  # Ought measure is 1 to 9 scale
  scale_y_continuous(limits=c(1,9), breaks=seq(1, 9, 1)) +
  # Inherence bias measure is 1 to 9 scale
  scale_x_continuous(limits=c(1,9), breaks=seq(1, 9, 1)) +
  ggtitle("Replication: inherence bias measure and ought measure correlation")


# Pearson correlation between each subject's inherence bias measure and their ought measure 
cor <- cor.test(data_means_subj$IH, data_means_subj$O, method = "pearson")
cor

## 
##  Pearson's product-moment correlation
## 
## data:  data_means_subj$IH and data_means_subj$O
## t = 1.5283, df = 72, p-value = 0.1308
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05339746  0.38996738
## sample estimates:
##       cor 
## 0.1772641

### Scatterplot of original data
# Importing original data
data_original <- read_csv("../data/Exp1_originalData.csv")

## Warning: Duplicated column names deduplicated: 'timing-first click'
## => 'timing-first click_1' [40], 'timing-last click' => 'timing-last
## click_1' [41], 'timing-page submit' => 'timing-page submit_1' [42],
## 'timing-click count' => 'timing-click count_1' [43], 'timing-first
## click' => 'timing-first click_2' [45], 'timing-last click' => 'timing-
## last click_2' [46], 'timing-page submit' => 'timing-page submit_2' [47],
## 'timing-click count' => 'timing-click count_2' [48], 'timing-first
## click' => 'timing-first click_3' [50], 'timing-last click' => 'timing-
## last click_3' [51], 'timing-page submit' => 'timing-page submit_3' [52],
## 'timing-click count' => 'timing-click count_3' [53], 'timing-first
## click' => 'timing-first click_4' [55], 'timing-last click' => 'timing-
## last click_4' [56], 'timing-page submit' => 'timing-page submit_4' [57],
## 'timing-click count' => 'timing-click count_4' [58], 'timing-first
## click' => 'timing-first click_5' [60], 'timing-last click' => 'timing-
## last click_5' [61], 'timing-page submit' => 'timing-page submit_5' [62],
## 'timing-click count' => 'timing-click count_5' [63], 'timing-first
## click' => 'timing-first click_6' [65], 'timing-last click' => 'timing-
## last click_6' [66], 'timing-page submit' => 'timing-page submit_6' [67],
## 'timing-click count' => 'timing-click count_6' [68], 'timing-first
## click' => 'timing-first click_7' [70], 'timing-last click' => 'timing-
## last click_7' [71], 'timing-page submit' => 'timing-page submit_7' [72],
## 'timing-click count' => 'timing-click count_7' [73], 'timing-first
## click' => 'timing-first click_8' [75], 'timing-last click' => 'timing-
## last click_8' [76], 'timing-page submit' => 'timing-page submit_8' [77],
## 'timing-click count' => 'timing-click count_8' [78], 'timing-first
## click' => 'timing-first click_9' [80], 'timing-last click' => 'timing-
## last click_9' [81], 'timing-page submit' => 'timing-page submit_9' [82],
## 'timing-click count' => 'timing-click count_9' [83], 'timing-first click'
## => 'timing-first click_10' [85], 'timing-last click' => 'timing-last
## click_10' [86], 'timing-page submit' => 'timing-page submit_10' [87],
## 'timing-click count' => 'timing-click count_10' [88], 'timing-first
## click' => 'timing-first click_11' [90], 'timing-last click' => 'timing-
## last click_11' [91], 'timing-page submit' => 'timing-page submit_11' [92],
## 'timing-click count' => 'timing-click count_11' [93], 'timing-first click'
## => 'timing-first click_12' [117], 'timing-last click' => 'timing-last
## click_12' [118], 'timing-page submit' => 'timing-page submit_12' [119],
## 'timing-click count' => 'timing-click count_12' [120], 'timing-first click'
## => 'timing-first click_13' [127], 'timing-last click' => 'timing-last
## click_13' [128], 'timing-page submit' => 'timing-page submit_13' [129],
## 'timing-click count' => 'timing-click count_13' [130], 'timing-first click'
## => 'timing-first click_14' [137], 'timing-last click' => 'timing-last
## click_14' [138], 'timing-page submit' => 'timing-page submit_14' [139],
## 'timing-click count' => 'timing-click count_14' [140], '\xe2<U+00A0>below
## is a passage from a / newspaper article or press release. after reading
## the passage, / you will be asked a few questions. please read the entirety
## of / the passage and answer the questions to t...' => '<e2><U+00A0>below
## is a passage from a / newspaper article or press release. after reading the
## passage, / you will be asked a few questions. please read the entirety of /
## the passage and answer the questions to t..._1' [146], 'timing-first click'
## => 'timing-first click_15' [147], 'timing-last click' => 'timing-last
## click_15' [148], 'timing-page submit' => 'timing-page submit_15' [149],
## 'timing-click count' => 'timing-click count_15' [150], '\xe2<U+00A0>below
## is a passage from a / newspaper article or press release. after reading
## the passage, / you will be asked a few questions. please read the entirety
## of / the passage and answer the questions to t...' => '<e2><U+00A0>below
## is a passage from a / newspaper article or press release. after reading the
## passage, / you will be asked a few questions. please read the entirety of /
## the passage and answer the questions to t..._2' [156], 'timing-first click'
## => 'timing-first click_16' [157], 'timing-last click' => 'timing-last
## click_16' [158], 'timing-page submit' => 'timing-page submit_16' [159],
## 'timing-click count' => 'timing-click count_16' [160], 'timing-first click'
## => 'timing-first click_17' [167], 'timing-last click' => 'timing-last
## click_17' [168], 'timing-page submit' => 'timing-page submit_17' [169],
## 'timing-click count' => 'timing-click count_17' [170]

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `what do you think accounts for the current prices of pizza?` = col_character(),
##   `what are your pizza consumption habits?` = col_character(),
##   `what do you think accounts for why only 6% of football viewers / watch the games live (that is, in per...` = col_character(),
##   `what are your football viewing habits?` = col_character(),
##   `what do you think accounts for why so few / americans ride their bikes to work?` = col_character(),
##   `what are your driving habits?` = col_character(),
##   `what do you think accounts for the / growing number of available devices to watch videos?` = col_character(),
##   `what are your tv viewing habits?` = col_character(),
##   `what do you / think accounts for the recent rise in the population of internet users?` = col_character(),
##   `what are your email and internet search habits?` = col_character(),
##   `what do you think accounts for the success of the single cup / brew?` = col_character(),
##   `what are your coffee drinking habits?` = col_character(),
##   relig = col_character(),
##   ethnic = col_character(),
##   `1. did you find any aspect of the procedure odd or confusing?` = col_character(),
##   `2. what did you think we were studying?` = col_character(),
##   `3. do you think that there may have been more to this study than meets the eye? if so, what do you t...` = col_character(),
##   `4. do you have any additional thoughts or comments about the study?` = col_character()
## )

## See spec(...) for full column specifications.

# Add subject ID
data_original <- mutate(data_original, subject = row_number())

# Filter out excluded subjects
data_original <- data_original %>% 
  filter(excluded == 0)

## Warning: Mangling the following names: instructions / <e2><U+00A0> / / /
## on the following screens, / you will be asked to fill out several
## surveys. instructions for / each survey will be given when you get
## to / it.<e2><U+00A0> / / / / you / can<e2><U+00A0>begin at any time. ->
## instructions / <e2><U+00A0> / / / on the following screens, / you will be
## asked to fill out several surveys. instructions for / each survey will be
## given when you get to / it.<e2><U+00A0> / / / / you / can<e2><U+00A0>begin
## at any time., begin next task / / <e2><U+00A0> / / the following is a
## pattern-matching task. in / this section, you will be shown figures such as
## the one below. it / is a 3 x 3 grid with the bottom right space missing. /
## there wi... -> begin next task / / <e2><U+00A0> / / the following is
## a pattern-matching task. in / this section, you will be shown figures
## such as the one below. it / is a 3 x 3 grid with the bottom right space
## missing. / there wi..., practice question / 1: answer / <e2><U+00A0> / /
## <e2><U+00A0> / / here is the / correct answer to the previous question.
## please review the / answer, and proceed to the next practice question
## when you are / ready. / answer:<e2><U+00A0> / / / <e2><U+00A0>... -
## > practice question / 1: answer / <e2><U+00A0> / / <e2><U+00A0> / / here
## is the / correct answer to the previous question. please review the /
## answer, and proceed to the next practice question when you are / ready. /
## answer:<e2><U+00A0> / / / <e2><U+00A0>..., practice question / 2: answer /
## <e2><U+00A0> / / <e2><U+00A0> / / here is the / correct answer to the
## previous question. please review the / answer, and move on to the next page
## when you are ready. / answer:<e2><U+00A0> / / / <e2><U+00A0><e2><U+00A0> /
## <e2><U+00A0> / / / here... -> practice question / 2: answer /
## <e2><U+00A0> / / <e2><U+00A0> / / here is the / correct answer to the
## previous question. please review the / answer, and move on to the next page
## when you are ready. / answer:<e2><U+00A0> / / / <e2><U+00A0><e2><U+00A0> /
## <e2><U+00A0> / / / here..., <e2><U+00A0>below is / a passage from a
## newspaper article or press release. after / reading the passage, you will
## be asked a few questions. please / read the entirety of the passage and
## answer the questions to t... -> <e2><U+00A0>below is / a passage from a
## newspaper article or press release. after / reading the passage, you will
## be asked a few questions. please / read the entirety of the passage and
## answer the questions to t..., <e2><U+00A0>below is a passage from a /
## newspaper article or press release. after reading the passage, / you will
## be asked a few questions. please read the entirety of / the passage and
## answer the questions to t... -> <e2><U+00A0>below is a passage from a /
## newspaper article or press release. after reading the passage, / you will
## be asked a few questions. please read the entirety of / the passage and
## answer the questions to t.... Use enc2native() to avoid the warning.

# Select relevant columns
data_original <- data_original %>%
  select(inhav, # inherence heuristic scale
         oughtav, # ought inferences
         subject) # subject ID

# Rename columns to be same as replication data
data_original <- data_original %>% 
  rename(IH = inhav,
         O = oughtav)

# scatterplot of inherence bias measure and ought measure
p_original <- ggplot(data_original, aes(x = IH, y = O)) + 
  geom_point() + 
  labs(x="Inherence bias measure", 
       y="Ought measure") +
  geom_smooth(method = "lm") +
  # Ought measure is 1 to 9 scale
  scale_y_continuous(limits=c(1,9), breaks=seq(1, 9, 1)) +
  # Inherence bias measure is 1 to 9 scale
  scale_x_continuous(limits=c(1,9), breaks=seq(1, 9, 1)) +
  ggtitle("Original: inherence bias measure and ought measure correlation")


# Pearson correlation between each subject's inherence bias measure and their ought measure 
cor_original <- cor.test(data_original$IH, data_original$O, method = "pearson")
cor_original

## 
##  Pearson's product-moment correlation
## 
## data:  data_original$IH and data_original$O
## t = 3.5013, df = 120, p-value = 0.0006509
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1339405 0.4574561
## sample estimates:
##       cor 
## 0.3044526

### Comparing replication data with original data
# difference between this correlation and original study's correlation (r(120) = .30, 95% CI = [.13, .46], p < .001)
diff_cor <- cor$estimate - cor_original$estimate
diff_cor

##        cor 
## -0.1271884

gridExtra::grid.arrange(p_original, p, nrow = 1)

The result in this replication was r(72) = 0.18, 95% CI = [-0.05, 0.39], p = 0.1308098.

The original result was r(120) = .30, 95% CI = [.13, .46], p < .001.

Exploratory analyses

# SPREAD: more spread in replication data than original data?
# replication: inherence bias measure and ought meaure across subjects
data_means

## # A tibble: 2 x 4
##   question_type   avg    sd `n()`
##   <chr>         <dbl> <dbl> <int>
## 1 IH             6.01  2.31  1110
## 2 O              5.73  2.23   444

# original: spread in inherence bias measure and ought measure across subjects
data_original_spread <- data_original %>%
  summarize_at(vars(IH, O), sd)
data_original_spread

## # A tibble: 1 x 2
##      IH     O
##   <dbl> <dbl>
## 1  1.15  1.12

The data from the replication attempt seems to exhibit more spread (inherence measure SD = 2.3080879, ought measure SD = 2.2262377) than the original data (inherence measure SD = 1.1482071, ought measure SD = 1.1194696).

# ITEM DIFFERENCES WITHIN SCALES IN REPLICATION?
# create function to estimate confidence interval
sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(sum(!is.na((x))))}
ci <- function(x) {sem(x) * 1.96} # reasonable approximation 

# inherence bias item and ought measure item averages
data_tidy$question_number <- as.numeric(data_tidy$question_number)

data_IH_items <- data_tidy %>% 
  filter(question_type == "IH") %>% 
  group_by(question_number) %>% 
  summarize(avg=mean(response, na.rm=TRUE), ci=ci(response), n())
data_IH_items

## # A tibble: 15 x 4
##    question_number   avg    ci `n()`
##              <dbl> <dbl> <dbl> <int>
##  1               1  7.47 0.364    74
##  2               2  6.61 0.444    74
##  3               3  5.38 0.568    74
##  4               4  4.35 0.550    74
##  5               5  5.99 0.475    74
##  6               6  5.39 0.529    74
##  7               7  4.73 0.565    74
##  8               8  6.43 0.524    74
##  9               9  6.76 0.444    74
## 10              10  6.65 0.419    74
## 11              11  5.45 0.576    74
## 12              12  6.19 0.497    74
## 13              13  5.24 0.540    74
## 14              14  6.80 0.411    74
## 15              15  6.68 0.408    74

data_O_items <- data_tidy %>% 
  filter(question_type == "O") %>% 
  group_by(question_number) %>% 
  summarize(avg=mean(response, na.rm=TRUE), ci=ci(response), n())
data_O_items

## # A tibble: 6 x 4
##   question_number   avg    ci `n()`
##             <dbl> <dbl> <dbl> <int>
## 1               1  5.58 0.513    74
## 2               2  5.46 0.453    74
## 3               3  4    0.441    74
## 4               4  6.08 0.487    74
## 5               5  7.69 0.381    74
## 6               6  5.57 0.379    74

# bar graph of item averages
ggplot(data_IH_items, aes(x = question_number, y = avg)) +
  geom_point() +
  geom_errorbar(aes(ymin = avg - ci, ymax = avg + ci), 
                width=.2) + 
  labs(title = "Average rating per inherence heuristic item", x = "Item", y = "Average rating (1 [disagree strongly] to 9 [agree strongly])") +
  scale_y_continuous(limits = c(1,9), breaks = 1:9) +
  scale_x_continuous(breaks = 1:15) +
  theme(panel.grid.minor = element_blank())

ggplot(data_O_items, aes(x = question_number, y = avg)) +
  geom_point() +
  geom_errorbar(aes(ymin = avg - ci, ymax = avg + ci), 
                width=.2) + 
  labs(title = "Average rating per ought measure item", x = "Item", y = "Average rating (1 [really not good] to 9 [really good])") +
  scale_y_continuous(limits = c(1,9), breaks = 1:9) +
  scale_x_continuous(breaks = 1:15) +
  theme(panel.grid.minor = element_blank())

Within the replication data, there appears to be some variance between items on the inherence bias measure and the ought measure.

Awareness of negative consequences and centrality to one’s own life are likely factors in how people rated ought items. For an example, the lowest rated ought item was ought item 3 about driving to work, likely because most people are aware of the harmful environmental conequences of driving. The highest rated ought item was ought item 5 about email use, likely because people are not aware of any negative consequences and email is usually critical to people’s work and life.

Awareness of variation and the cultural frequency of the proposed inherent explanation likely influenced how people rated inherence items. The lowest rated inherence item was inherence item 4 about the use of green in currency because green intrinsically signals trust. This explanation was perhaps rated most dubious because people are aware of variance in the color of currency across currency areas, and because green does not culturally signal trust in the US. The highest rated inherence item was inherence item 1 about the use of red in traffic lights because red intrinsically signals a warning. This explanaton was perhaps most appealing because people do not know of variance in the color of traffic lights, and because red does symbolize a warning in many contexts, including in traffic lights.

# CORRELATION ACROSS POLITICAL VIEWS, OR EDUCATION?
# summarize inherence bias and ought measure per subject, keeping political view and edu
data_tidy$education <- ordered(data_tidy$education, levels = c("High school/GED", "Some college", "Bachelor's (B.A., B.S.)", "Master's (M.A., M.S.)", "Doctoral (Ph.D., J.D., M.D.)"))

data_means_subj_conservatism_edu <- data_tidy %>% 
  group_by(subject, question_type, conservatism, education) %>%  
  summarize(avg=mean(response, na.rm=TRUE)) %>% 
  spread(question_type, avg)

# scatterplot of inherence bias measure and ought measure, by political view
ggplot(data_means_subj_conservatism_edu, aes(x = IH, y = O, color = conservatism)) + 
  geom_point() + 
  labs(x="Inherence bias measure", 
       y="Ought measure") +
  geom_smooth(method = "lm") +
  # Ought measure is 1 to 9 scale
  scale_y_continuous(limits=c(1,9), breaks=seq(1, 9, 1)) +
  # Inherence bias measure is 1 to 9 scale
  scale_x_continuous(limits=c(1,9), breaks=seq(1, 9, 1)) +
  ggtitle("Replication: inherence bias measure and ought measure correlation by political view")

# scatterplot of inherence bias measure and ought measure, by education
ggplot(data_means_subj_conservatism_edu, aes(x = IH, y = O, color = education)) + 
  geom_point() + 
  labs(x="Inherence bias measure", 
       y="Ought measure") +
  geom_smooth(method = "lm", se = FALSE) +
  # Ought measure is 1 to 9 scale
  scale_y_continuous(limits=c(1,9), breaks=seq(1, 9, 1)) +
  # Inherence bias measure is 1 to 9 scale
  scale_x_continuous(limits=c(1,9), breaks=seq(1, 9, 1)) +
  ggtitle("Replication: inherence bias measure and ought measure correlation by education")

An analysis of the correlation between participants’ inherence bias measure and ought measure by political views and education did not reveal anything of note.

Discussion

Summary of Replication Attempt

The result of this replication attempt partially replicated the original result in Tworek & Cimpian 2016, Experiment 1. The result in this replication was a slight correlation between participants’ inherence bias measure and ought measure that did not reach statistical significance: r(72) = 0.18, 95% CI = [-0.05, 0.39], p = 0.1308098. The original result was r(120) = 0.3, 95% CI = [0.13, 0.46], p = 0.1308098. This replication generated a Pearson correlation that although trends positive, is 0.1271884 smaller than the original Pearson correlation. In addition, the p-value from this replication did not reach statistical significance, as the original p-value did.

Commentary

Follow-up exploratory analysis revealed that the replication data exhibited more spread than the original data. Additionally, there was some inter-item variance on the inherence bias measure, perhaps due to people’s varying awareness of cultural variation on some phenomenon and the cultural dominance of the proposed inherent explanation, as well as on the ought measure, perhaps due to people’s varying judgments of negative consequences of some activity and centrality of the activity to their own lives. Lastly, an analysis of the target correlation by political views and education did not reveal anything of note.

It is unlikely that changes in the study paradigm itself (all minor changes, detailed in the methods section) affected the results of this replication.

It is possible that the small sample size influenced the results of the replication. The final sample size (n=74) due to planned exclusions was slightly smaller than the sample size needed to achieve 80% power given the original effect size (n=82), and also smaller than the original sample size (n=122). It is possible that the actual effect size is smaller than the effect size found by Tworek & Cimpian 2016, such that the sample size of this replication failed to detect the effect.

Replication of ‘Why Do People Tend to Infer Ought From Is? The Role of Biases in Explanation’ by Tworek & Cimpian (2016, Psychological Science)

Marianna Zhang (marianna.zhang@stanford.edu)

December 19, 2018