I am replicating a recent study by Peer et al. (March, 2021), in which they compared quality of data gathered from the most popular sites used for online behavioral research — Amazon MTurk, CloudResearch (formerly TurkPrime), and Prolific. Peer et al. (2021) found “higher data quality for Prolific and CloudResearch compared to MTurk (the differences between Prolific and MTurk and between CR and MTurk were significant (p < .001), but the difference between CR and Prolific was not (p = 0.91)).”
In response, a team of investigators from CloudResearch, Litman et al. (2021, SSRN), attempted to replicate Peer et al.’s (2021) findings and found “undisclosed methodological decisions that would limit the inferences made in [the original publication]” (Litman et al., 2021). The team from CloudResearch were specifically concerned that Peer et al. (2021) “chose to turn off the recommended data quality filters and reputation qualifications, including filters that are on by default and were designed to address known data quality issues on MTurk.” Contrary to the findings from Peer et al., (2021), upon implementing what they claim are recommended and default filters on MTurk, the team at CloudResearch (Litman et al., 2021) found CloudResearch’s data quality superior to that of Prolific.
I am curious to examine the data quality filters that were used in each of these studies and investigate these sites’ data quality (as operationalize originally by Peer et al. (2021) as a relatively-objective third-party investigator.
The only challenges I anticipate at this stage is tracking and figuring out the specific automated/default, and manually-set data quality filters that each team of researchers incorporated in their data collection process.
Project repository: https://github.com/psych251/saadatian2021.git
Survey preview: https://ucbpsych.qualtrics.com/jfe/preview/SV_cSKr5voz0akZ7P8?Q_CHL=preview&Q_SurveyVersionID=current
Project OSF: https://osf.io/qw4sf/?view_only=49333769f6464316a805d48e8cd1d5ca
Original paper: https://github.com/psych251/saadatian2021/blob/3c2eeb3b8a352c51dea439f7878b2ab95f67acff/original-paper/Peer%20et%20al.%20(2021).pdf
Original study’s OSF: https://osf.io/342dp/
Here I aim to replicate the main finding in Study 2, where Peer et al., (2021) compare data quality across the three sites and find “statistically significant differences between the sites on [overall data quality score], F(2, 1458) = 129.4, p < .001, which showed higher scores for Prolific and CR (M = 5.87, 5.78, SD = 1.0, 1.1, respectively) compared to MTurk (M = 4.55, SD = 1.9))”.
To replicate an F statistic of 129.4 with a power of 0.95, I would need a sample of 43 participants minimum from each site. However, to get precise estimates of the differences in data quality across the three sites, I recruit 100 participants from each of the three platforms (Mturk, CloudResearch, and Prolific). Participants from these three independent samples should add up to a total of 300 participants for the study as a whole.
One hundred participants who are 18 years or older and ALSO current residents of the United States are recruited on Amazon MTurk, CloudResearch, and Prolific. Similar to the original study, participants are paid a $1.5 each for completing the study, which originally averaged around 9.8 minutes-long (SD = 5.2; Peer et al., 2021).
The present study implements all data quality pre-screening filters that were used in the original study (Peer et al., 2021). Peer et al. (2021) describe the data quality pre-screening and exclusion criteria as such:
I plan to implement every relevant exclusion criteria used by the original authors (numbers 1 and 2 from the list above).
*Note: since the original investigators had to post the survey twice on Mturk (once through their MTurk account and once through their CR account), some participants (N = 39) completed the survey twice. Peer et al. (2021) removed those participants’ later submissions and I will be doing the same.
Unlike the original study, I do not plan to exclude participants who do not complete all of the study as I think that might significantly affect the data quality.
-> [ac1, ac2, ac3]
The original scholars measure attention using one two-item explicit question and one covert question. Similar to the original study, “the first [attention check] asks participants to answer”six" and “three” to two items regardless of their actual preference (other responses are coded as failures); the second is an item within a scale worded “I currently don’t pay attention to the questions I’m being asked in the survey” (response other than “strongly disagree” is coded as a failure)."
-> [comprehension1 and comprehension2]
Comprehension is examined via coding of participants’ written summaries of instructions to two tasks. “The first task asks participants to identify faces in a picture, but includes an instruction to only report zero; the second includes instructions for the”Matrix task" (Mazar et al., 2008)."In the original study, two independent, blind raters coded the participant responses. The investigators counted an answer as correct if both coders had coded it as correct, applying a third rater to responses with split votes. Due to time constraints in data coding and analysis, the PI, Kimia Saadatian, is the sole rater of open responses to comprehension items. Following this class replication project, Kimia plans to have the data coded by two blind raters independently.
-> [nfc1 to nfc18]
—> Not immediately relevant - will later be reported in exploraty analyses.
Similar to the original study, reliability is measured using Cronbach’s alpha for the eighteen-item, five-point Need for Cognition scale (Cacioppo et al., 1984). The original investigators chose this scale because it has been validated and found to be highly reliable across many studies.
-> [honesty 1, honesty2, honesty3]
Replicating the honesty measure in the original study, honesty is measured using “an online version of the Matrix task (Mazar et al., 2008) that include(s) two unsolvable matrices. Reporting solving any of these two problems will be coded as a dishonest response. Additionally, we examine whether participants lie about their eligibility for a future study by asking them to indicate if they want to be invited to a study that samples participants of their own gender but whose age will be described as 5-10 years above the age participants reported in the beginning of the study.”
-> [ac_total + comprehension_total + honesty_total]
Data quality was computed by calculating composite score of data quality based on the average scores of attention, comprehension, and dishonesty.
The below measures are not immediately relevant to the main analyses reported in this class project report.
To replicate the original study as closely as possible, we also measure participants’ drop-out rates, response duration, overall response time and speed between sites, differences in responses to the NFC scale, demographics, and patterns of usage of the site (main purpose, frequency of usage, number of submissions and approval ratings, usage of other sites), and whether or not the participants have completed a similar study in the last months.
Mimicking the original survey introduction (Peer et al., 2021), the description of the currect study states : “You are invited to complete a survey on individual differences in personal attitudes, opinions, and behaviors.”
I use the original .QSF file for the survey content, which begins with demographic questions, followed by the data quality measures described above.
Participants are prompted with questions related to their usage of the online platform (e.g., how often they use the site, why they use it, how much they earn on average, their percent of approved submissions (responses that participants submit and are approved by the researcher), and how often (if at all) they use other sites) before finishing the survey and receiving their online payment.
The new survey did not force responses on any of the questions, except the demographics questions on the first page (data from which is needed for measuring later items). Instead, all questions are set to “request” responses (a Qualtrics setting).
The present study utilizes the same analysis plan (one-way ANOVA) that were conducted in the original study by Peer et al. (2021). I attempt to replicate Study 2, where the main finding is that the data quality is significantly higher for data gathered from Prolific and CloudResearch (M = 5.87, 5.78, SD = 1.0, 1.1, respectively) compared to MTurk (M = 4.55, SD = 1.9; F(2, 1458) = 129.4, p < .001).
In order to compute the overall data quality score, the original authors:
Relevant for present EXPLORATORY analyses:
Relevant for present CONFIRMATORY analyses:
I do not anticipate any deviations from the original analysis plan at this stage.
As of Nov 30 at 10 pm PST, the total number of recorded responses were 236. Before exporting the data file from Qualtrics, one submission from MTurk submission was removed due to the participant entering an invalid survey ID at the end. The final .CSV file included a total of 235 rows.
Unlike Peer et al. (2021), I do not exclude participants who have not responded to all of the survey as I believe that would be a manipulation on the data quality in and of itself. Instead, I omit NAs from analyses.
*NOTE: I expect that there will be duplicated data or participants taking the survey multiple times (not sure how this happened). Duplicates have not yet been removed at the time of writing and presenting this report. I will search for (and later remove) duplicates closely by examining duplicate participation IDs, duplicate IP addresses, and duplicate qualitative responses to the open-ended questions). Once data collection from CloudResearch is complete, I will again remove participants that took the survey through both the CR posting and the MTurk posting of the project.
I planned to apply the following data quality filters that were used by Peer et al. (2021):
However, on CloudResearch and Prolific, I was not able to find or create a filter that would restrict participation to those with 100 prior submissions. Other than that, I used the default filters from each site with the exception of adding a filter on MTurk to only allow participants who have an HIT approval rating of 95% more (premium qualifications on MTurk would incur additional fees").
Attention Checks
I calculated 3 attention check questions; whereas, the original investigators discussed and measured only two (even though they were present in the .QSF file that was posted on their OSF page). I included all three in my calculation of the overall data quality (therefore increasing the possible range and possible max scores for data quality).
Second Honesty Measure
Peer et al. say that 2 out of the 5 matrices used to calculate participants’ honesty were unsolve-able. However, after looking at the .QSF file from from their OSF page, I have come to strongly believe that there are 3 (as opposed to 2) unsolve-able matrices. As such, I recoded responses to those three items so that if participants report that they have solved those matrices, their responses will be coded as “0” for honesty. By including the third item in this honesty scale, I again, increased the possible range and possible max scores for data quality.
The composite Data Quality score ranged from 0 to 7 (M = 5.41; SD = 1; Med = 6). Overall data quality composite scores ranked from highest to lowest were: 1) Prolific (M = 5.87, SD = 1.0) 2) CR ( M = 5.78, SD = 1.1) 3) MTurk (M = 4.55, SD = 1.9)
Peer et al. (2021) found statistically significant differences between the sites on Overall Data Quality Score, F(2, 1458) = 129.4, p < .001.
Post hoc tests with Bonferroni correction showed: - differences between Prolific and MTurk p < .001 - differences between CR and MTurk p < .001 - difference between CR and Prolific p = 0.91.
Data preparation following the analysis plan.
###Data Preparation
##### Starting a Script #####
# Clear environment
rm(list=ls())
# Checking working directory
getwd()
## [1] "/Users/kimiasaadatian/Desktop/ExpPsych/saadatian2021"
#### Loading in data ####
dat <- read.csv("incompletedata.csv",
na.strings="NA", strip.white=TRUE)
### Load Relevant Libraries and Functions
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
library(ggplot2)
library(ltm)
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: msm
## Loading required package: polycor
##### Looking at your data #####
#### Identifying problems ####
# Finding the number of duplicate cases
# WARNING: These do not include the first instance of the duplicated value.
length(dat$id) - length(unique(dat$id))
## [1] 29
# Finding rows of duplicate cases.
# WARNING: These do not include the first instance of the duplicated value.
which(duplicated(dat$column))
## integer(0)
#### Looking at the dataset ####
colnames(dat) # list of column names in order
## [1] "StartDate"
## [2] "EndDate"
## [3] "Status"
## [4] "IPAddress"
## [5] "Progress"
## [6] "Duration..in.seconds."
## [7] "Finished"
## [8] "RecordedDate"
## [9] "ResponseId"
## [10] "RecipientLastName"
## [11] "RecipientFirstName"
## [12] "RecipientEmail"
## [13] "ExternalReference"
## [14] "LocationLatitude"
## [15] "LocationLongitude"
## [16] "DistributionChannel"
## [17] "UserLanguage"
## [18] "Q_BallotBoxStuffing"
## [19] "Q1.2_Browser"
## [20] "Q1.2_Version"
## [21] "Q1.2_Operating.System"
## [22] "Q1.2_Resolution"
## [23] "gender"
## [24] "gender_3_TEXT"
## [25] "age"
## [26] "ethnicity"
## [27] "ethnicity_5_TEXT"
## [28] "education"
## [29] "education_9_TEXT"
## [30] "income"
## [31] "country"
## [32] "ac1"
## [33] "ac2"
## [34] "nfc1"
## [35] "nfc2"
## [36] "nfc3"
## [37] "nfc4"
## [38] "nfc5"
## [39] "nfc6"
## [40] "nfc7"
## [41] "nfc8"
## [42] "nfc9"
## [43] "nfc10"
## [44] "nfc11"
## [45] "ac3"
## [46] "nfc12"
## [47] "nfc13"
## [48] "nfc14"
## [49] "nfc15"
## [50] "nfc16"
## [51] "nfc17"
## [52] "nfc18"
## [53] "nfctime_First.Click"
## [54] "nfctime_Last.Click"
## [55] "nfctime_Page.Submit"
## [56] "nfctime_Click.Count"
## [57] "comprehension1"
## [58] "comprehensiontime1_First.Click"
## [59] "comprehensiontime1_Last.Click"
## [60] "comprehensiontime1_Page.Submit"
## [61] "comprehensiontime1_Click.Count"
## [62] "comprehensiontime2_First.Click"
## [63] "comprehensiontime2_Last.Click"
## [64] "comprehensiontime2_Page.Submit"
## [65] "comprehensiontime2_Click.Count"
## [66] "accomprehension"
## [67] "comprehension2"
## [68] "X1_matrixtime_First.Click"
## [69] "X1_matrixtime_Last.Click"
## [70] "X1_matrixtime_Page.Submit"
## [71] "X1_matrixtime_Click.Count"
## [72] "X1_honesty1"
## [73] "X2_matrixtime_First.Click"
## [74] "X2_matrixtime_Last.Click"
## [75] "X2_matrixtime_Page.Submit"
## [76] "X2_matrixtime_Click.Count"
## [77] "X2_honesty1"
## [78] "X3_matrixtime_First.Click"
## [79] "X3_matrixtime_Last.Click"
## [80] "X3_matrixtime_Page.Submit"
## [81] "X3_matrixtime_Click.Count"
## [82] "X3_honesty1"
## [83] "X4_matrixtime_First.Click"
## [84] "X4_matrixtime_Last.Click"
## [85] "X4_matrixtime_Page.Submit"
## [86] "X4_matrixtime_Click.Count"
## [87] "X4_honesty1"
## [88] "X5_matrixtime_First.Click"
## [89] "X5_matrixtime_Last.Click"
## [90] "X5_matrixtime_Page.Submit"
## [91] "X5_matrixtime_Click.Count"
## [92] "X5_honesty1"
## [93] "usage1"
## [94] "usage2"
## [95] "usage2_4_TEXT"
## [96] "usage3"
## [97] "usage4"
## [98] "Q9.5_8"
## [99] "Q9.5_9"
## [100] "Q9.5_20"
## [101] "Q9.5_21"
## [102] "Q9.5_16"
## [103] "Q9.5_16_TEXT"
## [104] "usage6"
## [105] "usage6_3_TEXT"
## [106] "honesty2"
## [107] "honesty2_2_TEXT"
## [108] "id"
## [109] "comments"
## [110] "SC0"
## [111] "PROLIFIC_PID"
## [112] "workerid"
## [113] "assignmentId"
## [114] "hitId"
## [115] "site"
## [116] "gender.1"
## [117] "aid"
## [118] "randomId"
## [119] "sign"
## [120] "sample"
## [121] "Create.New.Field.or.Choose.From.Dropdown..."
head(dat) # first five rows
## StartDate EndDate Status IPAddress Progress
## 1 11/29/21 23:07 11/29/21 23:13 0 71.227.248.177 100
## 2 11/29/21 23:02 11/29/21 23:14 0 73.45.63.48 100
## 3 11/29/21 23:01 11/29/21 23:15 0 108.226.198.101 100
## 4 11/29/21 23:04 11/29/21 23:16 0 72.24.229.97 100
## 5 11/29/21 23:14 11/29/21 23:21 0 71.212.117.122 100
## 6 11/29/21 23:11 11/29/21 23:22 0 108.64.6.85 100
## Duration..in.seconds. Finished RecordedDate ResponseId
## 1 346 1 11/29/21 23:13 R_w05LdNz9t3MdMvT
## 2 712 1 11/29/21 23:14 R_1LYdFg61IZxKZJM
## 3 851 1 11/29/21 23:15 R_3ndAaQKXVjdw6HG
## 4 708 1 11/29/21 23:16 R_1Io7uoe6rKoNTZp
## 5 464 1 11/29/21 23:21 R_1RlLRVZvxcjk2Q1
## 6 670 1 11/29/21 23:22 R_vCZLiJb8raUMuXL
## RecipientLastName RecipientFirstName RecipientEmail ExternalReference
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## LocationLatitude LocationLongitude DistributionChannel UserLanguage
## 1 47.6722 -122.1257 anonymous EN
## 2 42.1097 -87.9399 anonymous EN
## 3 35.4145 -119.0403 anonymous EN
## 4 43.6138 -116.3972 anonymous EN
## 5 47.6459 -122.3995 anonymous EN
## 6 26.1250 -80.2670 anonymous EN
## Q_BallotBoxStuffing Q1.2_Browser Q1.2_Version Q1.2_Operating.System
## 1 NA Safari iPhone 15.1 iPhone
## 2 NA Chrome 96.0.4664.55 Macintosh
## 3 NA Chrome 96.0.4664.45 Windows NT 10.0
## 4 NA Chrome 96.0.4664.45 Windows NT 10.0
## 5 NA Chrome 96.0.4664.45 Windows NT 10.0
## 6 NA Safari 14 Macintosh
## Q1.2_Resolution gender gender_3_TEXT age ethnicity ethnicity_5_TEXT education
## 1 414x736 2 17 1 4
## 2 2560x1440 1 14 4 4
## 3 1536x864 2 23 1 4
## 4 1708x961 2 54 1 5
## 5 1920x1080 2 28 2,3 3
## 6 810x1080 2 15 1,4 4
## education_9_TEXT income country ac1 ac2 nfc1 nfc2 nfc3 nfc4 nfc5 nfc6 nfc7
## 1 NA 4 189 5 4 3 2 4 3 3 2 4
## 2 NA 4 189 7 4 3 4 2 4 2 1 5
## 3 NA 2 189 6 3 4 4 2 2 2 4 2
## 4 NA 1 189 6 3 2 3 4 3 4 2 3
## 5 NA 3 189 6 3 2 4 2 2 3 4 2
## 6 NA 1 189 6 3 4 4 2 2 2 3 2
## nfc8 nfc9 nfc10 nfc11 ac3 nfc12 nfc13 nfc14 nfc15 nfc16 nfc17 nfc18
## 1 3 3 3 4 1 2 2 3 2 4 2 2
## 2 3 4 3 4 2 1 3 3 3 1 4 3
## 3 2 2 4 4 1 2 4 4 4 2 1 4
## 4 4 4 2 2 1 2 2 2 2 4 4 2
## 5 3 4 4 4 1 2 3 4 4 4 3 4
## 6 3 2 3 4 1 2 3 4 4 3 2 4
## nfctime_First.Click nfctime_Last.Click nfctime_Page.Submit
## 1 4.722 101.132 101.757
## 2 28.934 171.462 172.227
## 3 10.681 112.922 115.176
## 4 10.855 129.400 131.725
## 5 7.694 90.258 91.612
## 6 2.902 121.400 122.619
## nfctime_Click.Count comprehension1 comprehensiontime1_First.Click
## 1 35 1 0.496
## 2 19 0 43.569
## 3 25 1 51.391
## 4 19 1 39.662
## 5 21 1 6.537
## 6 35 1 4.811
## comprehensiontime1_Last.Click comprehensiontime1_Page.Submit
## 1 48.968 52.746
## 2 78.131 84.274
## 3 129.717 134.719
## 4 84.654 86.418
## 5 58.372 67.765
## 6 88.782 90.455
## comprehensiontime1_Click.Count comprehensiontime2_First.Click
## 1 12 0.549
## 2 2 0.000
## 3 6 4.254
## 4 2 2.973
## 5 7 2.352
## 6 9 0.890
## comprehensiontime2_Last.Click comprehensiontime2_Page.Submit
## 1 1.264 3.038
## 2 0.000 20.011
## 3 5.707 7.464
## 4 4.202 5.866
## 5 3.828 5.161
## 6 2.139 4.178
## comprehensiontime2_Click.Count accomprehension comprehension2
## 1 3 1 1
## 2 0 NA NA
## 3 2 1 1
## 4 2 1 1
## 5 2 1 1
## 6 2 1 1
## X1_matrixtime_First.Click X1_matrixtime_Last.Click X1_matrixtime_Page.Submit
## 1 15.157 15.157 16.042
## 2 15.301 15.301 21.006
## 3 7.913 7.913 13.869
## 4 0.000 0.000 21.007
## 5 11.023 11.023 12.682
## 6 0.000 0.000 21.016
## X1_matrixtime_Click.Count X1_honesty1 X2_matrixtime_First.Click
## 1 1 0 1.707
## 2 1 0 13.320
## 3 1 0 0.000
## 4 0 NA 0.000
## 5 1 0 13.944
## 6 0 NA 20.287
## X2_matrixtime_Last.Click X2_matrixtime_Page.Submit X2_matrixtime_Click.Count
## 1 1.707 2.133 1
## 2 14.688 16.322 2
## 3 0.000 21.017 0
## 4 0.000 21.109 0
## 5 13.944 15.735 1
## 6 20.287 21.114 1
## X2_honesty1 X3_matrixtime_First.Click X3_matrixtime_Last.Click
## 1 0 1.633 1.633
## 2 0 9.912 9.912
## 3 NA 19.512 19.512
## 4 NA 20.468 20.468
## 5 0 0.000 0.000
## 6 1 17.386 17.386
## X3_matrixtime_Page.Submit X3_matrixtime_Click.Count X3_honesty1
## 1 1.886 1 0
## 2 10.812 1 0
## 3 21.010 1 0
## 4 21.007 1 1
## 5 21.023 0 NA
## 6 21.013 1 1
## X4_matrixtime_First.Click X4_matrixtime_Last.Click X4_matrixtime_Page.Submit
## 1 1.001 1.001 1.215
## 2 8.484 8.484 12.599
## 3 0.000 0.000 21.112
## 4 0.000 0.000 21.109
## 5 19.933 19.933 20.527
## 6 16.401 16.401 21.013
## X4_matrixtime_Click.Count X4_honesty1 X5_matrixtime_First.Click
## 1 1 0 0.644
## 2 1 0 6.091
## 3 0 NA 0.000
## 4 0 NA 19.508
## 5 1 1 18.299
## 6 1 1 14.585
## X5_matrixtime_Last.Click X5_matrixtime_Page.Submit X5_matrixtime_Click.Count
## 1 0.644 0.828 1
## 2 6.091 6.858 1
## 3 0.000 21.007 0
## 4 19.508 21.007 1
## 5 18.299 18.971 1
## 6 14.585 21.111 1
## X5_honesty1 usage1 usage2 usage2_4_TEXT usage3
## 1 0 3 2 $7
## 2 0 4 3 20 GBP
## 3 NA 4 2 $5-$10
## 4 1 3 2 20
## 5 1 2 2 4
## 6 1 2 2 I don’t know like 7$
## usage4
## 1 100%
## 2 97%
## 3 I'm not sure. I believe the vast majority, if not all. It will not let me check while doing this task, but it's likely over 95%.
## 4 100%
## 5 97
## 6 I don’t know like 97%
## Q9.5_8 Q9.5_9 Q9.5_20 Q9.5_21 Q9.5_16 Q9.5_16_TEXT usage6 usage6_3_TEXT
## 1 1 5 1 1 1 4 NA
## 2 1 5 1 1 1 4 NA
## 3 1 4 1 1 1 4 NA
## 4 1 4 1 1 5 Google 4 NA
## 5 5 5 3 1 1 none 4 NA
## 6 1 4 3 1 1 1 NA
## honesty2 honesty2_2_TEXT id
## 1 0 610aba8c00984e30b7fc809b
## 2 0 616507c81808c6caff7f6278
## 3 1 5ed70cb596035b197801f12f
## 4 1 614f783d947d56fab33f93fe
## 5 1 58a96911ea3d11000170e50f
## 6 1 6111b9c48538fdc476c7b8f1
## comments SC0 PROLIFIC_PID
## 1 5 610aba8c00984e30b7fc809b
## 2 5 616507c81808c6caff7f6278
## 3 2 5ed70cb596035b197801f12f
## 4 0 614f783d947d56fab33f93fe
## 5 2 58a96911ea3d11000170e50f
## 6 Sorry I couldn’t answer the math problems 0 6111b9c48538fdc476c7b8f1
## workerid assignmentId hitId site gender.1 aid randomId sign sample
## 1 NA NA NA Prolific NA NA NA GBP Prolific
## 2 NA NA NA Prolific NA NA NA GBP Prolific
## 3 NA NA NA Prolific NA NA NA GBP Prolific
## 4 NA NA NA Prolific NA NA NA GBP Prolific
## 5 NA NA NA Prolific NA NA NA GBP Prolific
## 6 NA NA NA Prolific NA NA NA GBP Prolific
## Create.New.Field.or.Choose.From.Dropdown...
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
View(dat) # view dataset
## Warning in system2("/usr/bin/otool", c("-L", shQuote(DSO)), stdout = TRUE):
## running command ''/usr/bin/otool' -L '/Library/Frameworks/R.framework/Resources/
## modules/R_de.so'' had status 1
## Warning in View(dat): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_de.so':
## dlopen(/Library/Frameworks/R.framework/Resources/modules//R_de.so, 6): Library not loaded: /opt/X11/lib/libSM.6.dylib
## Referenced from: /Library/Frameworks/R.framework/Versions/4.1/Resources/modules/R_de.so
## Reason: image not found
## Error in View(dat): X11 dataentry cannot be loaded
nrow(dat) # number of rows in dataset ## N = 235
## [1] 235
ncol(dat) # number of columns in dataset
## [1] 121
#### Data exclusion / filtering
#### Removing unwanted data ####
# Omitting rows with NAs
data <- na.omit(dat)
# Removing columns we don't need about for primary analyses
dat <- dat %>%
dplyr::select(ac1, ac2, ac3, comprehension1,
comprehension2, contains("honesty"),
ResponseId, site, sample
)
colnames(dat)
## [1] "ac1" "ac2" "ac3" "comprehension1"
## [5] "comprehension2" "X1_honesty1" "X2_honesty1" "X3_honesty1"
## [9] "X4_honesty1" "X5_honesty1" "honesty2" "honesty2_2_TEXT"
## [13] "ResponseId" "site" "sample"
I report differences between sites on each data quality measure and then aggregate those findings to a composite score of data quality, reporting differences across the three sites.
ATTENTION (originally : ACQs, χ2(4) = 548.48, 203.56, p < .001. )
#### recode ac1 so that : pass if ac1 = "6" ; fail ifelse
#### record ac2 so that : pass if ac2 = "3" ; fail ifelse
#### recode ac3 so that : pass if ac3 = '1' ; fail ifelse
dat <- dat %>%
mutate(ac1_pass = ifelse(ac1 == "6", 1,0),
ac2_pass = ifelse(ac2 == "3", 1,0),
ac3_pass = ifelse(ac3 == "1", 1,0),
ac_total = (ac1_pass + ac2_pass + ac3_pass)
)
### Basic Descriptive Stats for Attention from each Sample
group_by(dat, sample) %>%
summarise(
mean = mean(ac_total, na.rm = TRUE),
sd = sd(ac_total, na.rm = TRUE) # the na.rm tells R to ignore NA values
)
## # A tibble: 2 × 3
## sample mean sd
## <chr> <dbl> <dbl>
## 1 MTurk 2.41 0.920
## 2 Prolific 2.73 0.700
COMPREHENSION (originally : χ2(4) = 152.4, p < .001 )
#### Have 2 coders code any response that suggested a minimum level of understanding as indicating comprehension and only flag responses as incorrect if they were undoubtedly illegible. Responses that are flagged by both raters will be coded as incorrect answers.
### manually code open responses to comprehension1
### manually code open responses to comprehension2
dat <- dat %>%
mutate(comprehension_total = (comprehension1 + comprehension2)
)
dat$comprehension_total
## [1] 2 NA 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 NA 2 2 2 2 2
## [26] 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2
## [51] 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2
## [76] 1 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2
## [101] NA NA NA 2 NA 0 0 1 2 0 0 2 1 2 0 1 0 2 0 2 2 0 2 0 0
## [126] 2 2 2 2 1 2 1 2 0 0 1 0 0 2 1 1 1 0 1 1 0 0 0 2 2
## [151] 2 2 1 0 2 1 2 2 2 2 2 2 1 1 2 2 0 1 2 2 2 1 2 2 2
## [176] 0 2 1 2 1 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 1 0
## [201] 2 2 2 2 2 2 2 NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA NA 2
## [226] 2 2 NA NA 2 2 NA NA 0 NA
### Basic Descriptive Stats for Comprehension from each Sample
group_by(dat, sample) %>%
summarise(
mean = mean(comprehension_total, na.rm = TRUE),
sd = sd(comprehension_total, na.rm = TRUE) # the na.rm tells R to ignore NA values
)
## # A tibble: 2 × 3
## sample mean sd
## <chr> <dbl> <dbl>
## 1 MTurk 1.36 0.822
## 2 Prolific 1.91 0.289
HONESTY (originally : χ2(4) = 153.44, p < .001 )
Peer et al. (2021) claim that 2 out of the 5 matrices in the first measure of honesty (X1_honesty1 to X5_honesty1) are unsolve-able. However, I use the same .QSF file and find that three matrices are unsolve-able.
More specifically, X1_honesty1 and X2_honesty1 are solve-able so we wont be looking at participants’ response to these matrices while calculating the overall honesty scores.
X3_honesty1, X4_honesty1, and X5_honesty1 are NOT solve-able. Since responses were coded as 0 for “found it” and 1 for “no”, they do not need to be reverse-coded to demonstrate honesty scores (e.g., a response of 0 to these three items means the participant is NOT being honest).
#### code responses to honesty manually so that:
### honesty1 : 0 = "found it" = dihonest 1 = "no" = honest
### honesty2 : 0 = not honest 1 = honest
### honesty3 : 0 = not honest 1 = honest 2 = other
colnames(dat)
## [1] "ac1" "ac2" "ac3"
## [4] "comprehension1" "comprehension2" "X1_honesty1"
## [7] "X2_honesty1" "X3_honesty1" "X4_honesty1"
## [10] "X5_honesty1" "honesty2" "honesty2_2_TEXT"
## [13] "ResponseId" "site" "sample"
## [16] "ac1_pass" "ac2_pass" "ac3_pass"
## [19] "ac_total" "comprehension_total"
dat <- dat %>%
mutate(honesty1_matrix = (X3_honesty1 + X4_honesty1 +
X5_honesty1
),
honesty_total = (honesty1_matrix + honesty2
)
)
### Basic Descriptive Stats for Honesty from each Sample
group_by(dat, sample) %>%
summarise(
mean = mean(honesty_total, na.rm = TRUE),
sd = sd(honesty_total, na.rm = TRUE) # the na.rm tells R to ignore NA values
)
## # A tibble: 2 × 3
## sample mean sd
## <chr> <dbl> <dbl>
## 1 MTurk 2.04 1.51
## 2 Prolific 2.59 1.36
### We can remove the unnecessary honesty columns from our dataset now
## dont forget to include the new variables (comprehension_total, etc.) in this process.
colnames(dat)
## [1] "ac1" "ac2" "ac3"
## [4] "comprehension1" "comprehension2" "X1_honesty1"
## [7] "X2_honesty1" "X3_honesty1" "X4_honesty1"
## [10] "X5_honesty1" "honesty2" "honesty2_2_TEXT"
## [13] "ResponseId" "site" "sample"
## [16] "ac1_pass" "ac2_pass" "ac3_pass"
## [19] "ac_total" "comprehension_total" "honesty1_matrix"
## [22] "honesty_total"
dat <- dat %>%
dplyr::select(ac1, ac2, ac3, ac_total, comprehension1,
comprehension2, comprehension_total,
X3_honesty1, X4_honesty1, X5_honesty1,
honesty1_matrix, honesty2, honesty_total,
ResponseId, site, sample
)
colnames(dat)
## [1] "ac1" "ac2" "ac3"
## [4] "ac_total" "comprehension1" "comprehension2"
## [7] "comprehension_total" "X3_honesty1" "X4_honesty1"
## [10] "X5_honesty1" "honesty1_matrix" "honesty2"
## [13] "honesty_total" "ResponseId" "site"
## [16] "sample"
OVERALL DATA QUALITY SCORES (From the original study: “The score gave participants a value between 0 and 7, showing whether they passed one or both ACQs, answered correctly one or two comprehension questions,.. claim to have solved the unsolvable problems, [and claimed to qualify for a future study that they did not qualify for based on their earlier responses in the survey]. … The overall composite score should not be considered as measuring the same construct. Rather, it is used here as a multifactorial measure that attests to the overall general level of data quality”)
dat <- dat %>%
mutate(dataquality = (ac_total + comprehension_total +
honesty_total)
)
colnames(dat)
## [1] "ac1" "ac2" "ac3"
## [4] "ac_total" "comprehension1" "comprehension2"
## [7] "comprehension_total" "X3_honesty1" "X4_honesty1"
## [10] "X5_honesty1" "honesty1_matrix" "honesty2"
## [13] "honesty_total" "ResponseId" "site"
## [16] "sample" "dataquality"
dat$dataquality
## [1] 3 NA NA NA NA 9 4 NA NA NA NA 7 NA NA NA 7 9 NA NA NA 7 NA NA NA NA
## [26] NA NA NA NA NA 9 NA NA NA 9 NA 8 8 NA NA NA NA NA NA NA NA NA NA NA NA
## [51] 9 NA 7 9 NA NA NA NA 6 NA 9 4 NA NA 6 NA NA NA NA NA NA NA NA 8 9
## [76] 5 NA 9 7 8 7 6 NA 7 NA NA NA NA NA NA NA 6 NA NA 6 7 8 NA NA NA
## [101] NA NA NA NA NA 1 2 5 9 3 5 6 5 9 NA 2 0 10 2 4 NA 3 9 2 0
## [126] NA 6 8 9 8 NA 3 NA 3 6 4 4 2 9 5 4 3 NA 6 6 1 NA 3 7 NA
## [151] NA 6 4 4 3 4 NA NA 5 NA NA 6 5 NA NA 6 NA NA 6 NA NA 4 NA 6 7
## [176] 3 NA 7 9 5 NA 9 NA 9 9 NA 5 8 6 NA 9 NA NA NA 9 7 9 NA NA 6
## [201] NA 9 9 9 9 7 NA NA NA NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [226] NA 9 NA NA NA NA NA NA NA NA
LOOKING AT ALL MEANS ON EACH PLATFORM
group_by(dat, sample) %>%
summarise(
mean = mean(dataquality, na.rm = TRUE),
sd = sd(dataquality, na.rm = TRUE) # the na.rm tells R to ignore NA values
)
## # A tibble: 2 × 3
## sample mean sd
## <chr> <dbl> <dbl>
## 1 MTurk 5.61 2.67
## 2 Prolific 7.19 1.66
summary(dat$dataquality) # dataquality ranges from 0 to 10 (M = 6.09)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 4.000 6.000 6.087 9.000 10.000 132
## original authors' data quality ranged from 0 to 7 including : comp1 comp2 honesty1_1 honesty1_2 honesty 2 ac1 ac2 for the max data quality score of 7. Here we have added ac3 and honesty1_3; with these 2 new variables, our range should come up to 9 so Im not sure what the max score of 10 really indicates???
COMPARING DATA QUALITY ACROSS THE GROUPS (the different sites)
# Compute t test since we currently only have 2 samples
ttest <- print(t.test(dat$dataquality ~ dat$sample))
##
## Welch Two Sample t-test
##
## data: dat$dataquality by dat$sample
## t = -3.6508, df = 87.815, p-value = 0.0004436
## alternative hypothesis: true difference in means between group MTurk and group Prolific is not equal to 0
## 95 percent confidence interval:
## -2.4438623 -0.7210123
## sample estimates:
## mean in group MTurk mean in group Prolific
## 5.611111 7.193548
report(ttest) # medium significant difference with Mturk having lower data quality
## Error in report(ttest): could not find function "report"
# once data collection is complete,, Compute one-way ANOVA
singleANOVA <- aov(dataquality ~ sample, data = dat)
# Summary of one-way ANOVA
summary(singleANOVA)
## Df Sum Sq Mean Sq F value Pr(>F)
## sample 1 54.3 54.26 9.322 0.00289 **
## Residuals 101 587.9 5.82
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 132 observations deleted due to missingness
library("report") # Load the package every time you start R
report(singleANOVA) # medium significant difference with MTurk having lower data quality
## For one-way between subjects designs, partial eta squared is equivalent to eta squared.
## Returning eta squared.
## The ANOVA (formula: dataquality ~ sample) suggests that:
##
## - The main effect of sample is statistically significant and medium (F(1, 101) = 9.32, p = 0.003; Eta2 = 0.08, 95% CI [0.02, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
## checking normality even though it's not necessary given the size of the samples
# histogram
hist(singleANOVA$residuals)
# QQ-plot
library(car)
qqPlot(singleANOVA$residuals,
id = FALSE # id = FALSE to remove point identification
)
## Since residuals follow a normal distribution, we can now check whether the variances are equal across different groups. The result will determine whether we use the ANOVA or the Welch ANOVA.
par(mfrow = c(1, 2)) # combine plots
# 1. Homogeneity of variances
plot(singleANOVA, which = 3)
# 2. Normality
plot(singleANOVA, which = 2)
It turns out that there is in fact a significant difference on data quality scores based on the site you gather your data from!
The Welch Two Sample t-test testing the difference of data quality scores by sample (mean in group MTurk = 5.61, mean in group Prolific = 7.19) suggests that the effect is negative, statistically significant, and medium (difference = -1.58, 95% CI [-2.44, -0.72], t(87.81) = -3.65, p < .001; Cohen’s d = -0.71, 95% CI [-1.11, -0.31])
The one-way ANOVA also demonstrates that the main effect of sample is statistically significant and medium (F(1, 101) = 9.32, p = 0.003; Eta2 = 0.08, 95% CI [0.02, 1.00]).
#### *Side-by-side graph with original graph is ideal here*
data <- dat %>%
group_by(sample, dataquality) %>%
summarise()
## `summarise()` has grouped output by 'sample'. You can override using the `.groups` argument.
# boxplots and t-tests since we only have 2 samples at this point
## Basic Boxplot
boxplot(dat$dataquality ~ dat$sample,
ylab = ("Composite Data Quality Score"),
xlab = ("Sites used to Gather the Data")
) ## there are no outliers
## Elementary Boxplot
ggplot(dat) +
aes(x = sample, y = dataquality) +
geom_boxplot() +
geom_smooth() +
theme_classic2() +
ylab("Composite Data Quality Score") +
xlab("Sites used to Gather the Data")
## Error in theme_classic2(): could not find function "theme_classic2"
## Cool colorful boxplot with P values
library(ggpubr)
# Edit from here #
x <- which(names(dat) == "sample") # name of grouping variable
y <- which(names(dat) == "dataquality"
| names(dat) == "ac_total"
| names(dat) == "comprehension_total"
| names(dat) == "honesty_total"
) # names of variables to test
method <- "t.test" # one of "wilcox.test" or "t.test"
paired <- FALSE # if paired make sure that in the dataframe you have first all individuals at T1, then all individuals again at T2
# Edit until here
# Edit at your own risk
for (i in y) {
for (j in x) {
ifelse(paired == TRUE,
p <- ggpaired(dat,
x = colnames(dat[j]), y = colnames(dat[i]),
color = colnames(dat[j]), line.color = "gray", line.size = 0.4,
palette = "npg",
legend = "none",
xlab = colnames(dat[j]),
ylab = colnames(dat[i]),
add = "jitter"
),
p <- ggboxplot(dat,
x = colnames(dat[j]), y = colnames(dat[i]),
color = colnames(dat[j]),
palette = "npg",
legend = "none",
add = "jitter"
)
)
# Add p-value
print(p + stat_compare_means(aes(label = paste0(..method.., ", p-value = ", ..p.format..)),
method = method,
paired = paired,
# group.by = NULL,
ref.group = NULL
))
}
}
## Warning: Removed 7 rows containing non-finite values (stat_boxplot).
## Warning: Removed 7 rows containing non-finite values (stat_compare_means).
## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 27 rows containing non-finite values (stat_boxplot).
## Warning: Removed 27 rows containing non-finite values (stat_compare_means).
## Warning: Removed 27 rows containing missing values (geom_point).
## Warning: Removed 131 rows containing non-finite values (stat_boxplot).
## Warning: Removed 131 rows containing non-finite values (stat_compare_means).
## Warning: Removed 131 rows containing missing values (geom_point).
## Warning: Removed 132 rows containing non-finite values (stat_boxplot).
## Warning: Removed 132 rows containing non-finite values (stat_compare_means).
## Warning: Removed 132 rows containing missing values (geom_point).
### Run this code after data collection is complete to visualize the comparison for all 3 sites
# Edit from here
x <- which(names(dat) == "sample") # name of grouping variable
y <- which(names(dat) == "dataquality"
| names(dat) == "ac_total"
| names(dat) == "comprehension_total"
| names(dat) == "honesty_total"# names of variables to test
)
method1 <- "anova" # one of "anova" or "kruskal.test"
method2 <- "t.test" # one of "wilcox.test" or "t.test"
my_comparisons <- list(c("MTurk", "Prolific"),
c("MTurk", "CloudResearch"),
c("Prolific", "CloudResearch")) # comparisons for post-hoc tests
# Edit until here
# Edit at your own risk
for (i in y) {
for (j in x) {
p <- ggboxplot(dat,
x = colnames(dat[j]), y = colnames(dat[i]),
color = colnames(dat[j]),
legend = "none",
palette = "npg",
add = "jitter"
)
print(
p + stat_compare_means(aes(label = paste0(..method.., ", p-value = ", ..p.format..)),
method = method1, label.y = max(dat[, i], na.rm = TRUE)
)
+ stat_compare_means(comparisons = my_comparisons, method = method2, label = "p.format") # remove if p-value of ANOVA or Kruskal-Wallis test >= alpha
)
}
}
## Warning: Removed 7 rows containing non-finite values (stat_boxplot).
## Warning: Removed 7 rows containing non-finite values (stat_compare_means).
## Warning: Removed 7 rows containing non-finite values (stat_signif).
## Warning: Computation failed in `stat_signif()`:
## not enough 'y' observations
## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 27 rows containing non-finite values (stat_boxplot).
## Warning: Removed 27 rows containing non-finite values (stat_compare_means).
## Warning: Removed 27 rows containing non-finite values (stat_signif).
## Warning: Computation failed in `stat_signif()`:
## not enough 'y' observations
## Warning: Removed 27 rows containing missing values (geom_point).
## Warning: Removed 131 rows containing non-finite values (stat_boxplot).
## Warning: Removed 131 rows containing non-finite values (stat_compare_means).
## Warning: Removed 131 rows containing non-finite values (stat_signif).
## Warning: Computation failed in `stat_signif()`:
## not enough 'y' observations
## Warning: Removed 131 rows containing missing values (geom_point).
## Warning: Removed 132 rows containing non-finite values (stat_boxplot).
## Warning: Removed 132 rows containing non-finite values (stat_compare_means).
## Warning: Removed 132 rows containing non-finite values (stat_signif).
## Warning: Computation failed in `stat_signif()`:
## not enough 'y' observations
## Warning: Removed 132 rows containing missing values (geom_point).
#### coolest Boxplot
library(ggstatsplot)
## You can cite this package as:
## Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
## Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167
ggbetweenstats(data = dat,
x = sample,
y = dataquality,
type = "parametric", # ANOVA or Kruskal-Wallis
var.equal = TRUE, # ANOVA or Welch ANOVA
plot.type = "box",
pairwise.comparisons = TRUE,
pairwise.display = "significant",
centrality.plotting = FALSE,
bf.message = FALSE
)
I truly wish I had more time to run exploratory analyses in time to for my final presentation and submission of this report. I began looking at data reliability and responses to the Need for Cognition Scale but have not had a chance to complete that.
For the time being, please disregard the below section on reliability and the NFC scale. I look forward to looking into the data more closely once the quarter ends!
RELIABILITY
#### Reliability is measured using the cronbach alpha of the NFC scale
### The original authors re-coded the negatively worded items of the NFC before running analyses but here, our code specifically defines the reverse scored items within this scale. Since NFC is a 5-point linkert scale, I use 6-ITEM to reverse code an item.
## Reverse scoring nfc items
dat$nfc3_reversed <- 6-dat$nfc3
## Error in `$<-.data.frame`(`*tmp*`, nfc3_reversed, value = numeric(0)): replacement has 0 rows, data has 235
dat$nfc4_reversed <- 6-dat$nfc4
## Error in `$<-.data.frame`(`*tmp*`, nfc4_reversed, value = numeric(0)): replacement has 0 rows, data has 235
dat$nfc5_reversed <- 6-dat$nfc5
## Error in `$<-.data.frame`(`*tmp*`, nfc5_reversed, value = numeric(0)): replacement has 0 rows, data has 235
dat$nfc7_reversed <- 6-dat$nfc7
## Error in `$<-.data.frame`(`*tmp*`, nfc7_reversed, value = numeric(0)): replacement has 0 rows, data has 235
dat$nfc8_reversed <- 6-dat$nfc8
## Error in `$<-.data.frame`(`*tmp*`, nfc8_reversed, value = numeric(0)): replacement has 0 rows, data has 235
dat$nfc9_reversed <- 6-dat$nfc9
## Error in `$<-.data.frame`(`*tmp*`, nfc9_reversed, value = numeric(0)): replacement has 0 rows, data has 235
dat$nfc12_reversed <- 6-dat$nfc12
## Error in `$<-.data.frame`(`*tmp*`, nfc12_reversed, value = numeric(0)): replacement has 0 rows, data has 235
dat$nfc16_reversed <- 6-dat$nfc16
## Error in `$<-.data.frame`(`*tmp*`, nfc16_reversed, value = numeric(0)): replacement has 0 rows, data has 235
dat$nfc17_reversed <- 6-dat$nfc7
## Error in `$<-.data.frame`(`*tmp*`, nfc17_reversed, value = numeric(0)): replacement has 0 rows, data has 235
## Creating the nfc scale
dat <- dat %>%
mutate(nfc_scale = (nfc1 + nfc2 + nfc3_reversed +
nfc4_reversed + nfc5_reversed +
nfc6 + nfc7_reversed + nfc8_reversed +
nfc9_reversed + nfc10 + nfc11 +
nfc12_reversed + nfc13 + nfc14 +nfc15 +
nfc16_reversed + nfc17_reversed + nfc18
)
)
## Error: Problem with `mutate()` column `nfc_scale`.
## ℹ `nfc_scale = (...)`.
## x object 'nfc1' not found
cronbach.alpha(nfc_total)
## Error in cronbach.alpha(nfc_total): object 'nfc_total' not found
In short, the present study partially replicated the findings from Peer et al. (2021); however, that is simply because no data has been collected or analyzed from participants on CloudResearch. When it comes to comparisons between Prolific and MTurk, the present results replicated that of the original study. Similar to the original study, we find that there is a statistically significant and medium difference between the data quality from Prolific (M = 7.19, SD = 1.66) and MTurk (M = 5.61, SD = 2.67) with MTurk producing a significantly lower data quality (t(87.81) = -3.65, p < .001; Cohen’s d = -0.71, 95% CI [-1.11, -0.31]). The one-way ANOVA confirmed the results from the Welch Two Sample t-test (reported above) that the main effect of sample is statistically significant and medium (F(1, 101) = 9.32, p = 0.003; Eta2 = 0.08, 95% CI [0.02, 1.00]).
Deviations from the original study (e.g., procedure and exclusion criteria) are very unlikely to have moderated the results. Furthermore, the question of whether or not the original findings and the aforementioned critiques of said findings hold, will be better answered once data collection is complete.