As science is constantly evolving, the media often presents new findings of the risks and benefits of certain health behaviours (eg. fasting diets and the consumption of alcohol), often replacing old information previously endorsed by experts. However, viewers often perceive diverging claims as conflicting, often causing them to doubt the reliability of scientific consensus. For example, a real news headline by ABC News (2017) revealed that a “Fasting diet could regenerate pancreas and reverse diabetes, researchers say”, while an independent headline by The Guardian (2018) followed with the statement that “Fasting diets may raise risk of diabetes, researchers warn”. In the present study, Haigh & Birch (2020) hypothesised that changing the wording of headlines by adding a qualifier “some”, hence attributing new findings to “some researchers say” rather than “researchers say”, will reduce this perceived conflict in scientific consensus by cushioning the harsh diversions in emerging research compared to those previously established.
Experiments 1 and 2 adopted a 2 x 2 independent samples design, whereby participants were presented with a set of either conflicting or non-conflicting headlines (Headline Conflict), that adopted the aforementioned qualified format (eg. “some researchers say”) or the generic format (eg. “researchers say”) (Headline Format). Experiment 1 measured self-reported perceptions of contradiction, scientific advancement, and confusion following presentation of the headlines, while experiment 2 measured more global beliefs about nutrition and scientific consensus via self reports of nutritional confusion, nutritional backlash, mistrust in expertise, confidence in the scientific community, and awareness that scientific knowledge is uncertain and constantly evolving.
According to the results, conflicting headlines in experiment 1 induced greater perceived contradiction, lower perceived scientific advancement and a greater sense of confusion relative to non-conflicting headlines, as hypothesised by experimenters (N = 294). However, contrary to predictions, qualified formatting of headlines did not reduce the effect of conflict, as there was no significant interaction detected between Headline Conflict and Headline Format. In experiment 2, Headline Conflict had no effect on any of the six global measures, despite predictions that those exposed to conflicting headlines would display greater Nutritional Confusion, Nutritional Backlash, mistrust in expertise, lower confidence in the Scientific Community, and greater awareness that scientific knowledge is uncertain and constantly developing (N = 400). Although there was also no significant interaction, a potential moderating effect of headline format on perceived conflict was unable to be directly tested.
Haigh & Birch’s (2020) findings indicate that adjusting the format of headlines to include “some researchers say”, rather than “researchers say”, has no effect in reducing the perceived conflict associated with opposing research claims, while an effect relating to beliefs in scientific consensus more broadly is yet to be established.
I wasn’t convinced that the addition of the qualifier “some” to conflicting headlines would serve the purpose of reducing the effect of Headline Conflict as experimenters hypothesised. As literature suggests that the qualifier ‘some’ invites a more general inference of ‘some but not all’ automatically by readers (Bott & Noveck, 2004), my initial assumption was that this scalar inference would further perceived conflict and confusion in scientific consensus, as it could imply that scientists are now divided in their beliefs of the risks and benefits of certain health behaviours (as only “some” researchers say, not all). Since Headline Format had no effect on perceived contradiction/scientific advancement/confusion or more global beliefs about nutrition and scientific development, nor did it have any moderating effect on the conflict manipulation, it begs the question of whether there is a more effective way of cushioning what may seem like harsh diversions in scientific consensus to the general public.
I was surprised that, given the extremely diverse age range of participants in both experiment 1 and 2 samples (18 to 69, and 18 to 73, respectively), no assessment of age differences were conducted by the experimenters (see exploratory question B). This should be considered as theories based on health and nutrition that have spanned across decades are potentially more ingrained in the lifestyle decisions of older generations in our community. Thus, in the face of emerging research that replaces existing theories, it may very well be the case that older generations will demonstrate lower levels of trust or confidence in the scientific community as a result of this development. Any future attempts to cushion harsh diversions in scientific consensus may need to consider these demographic differences and tailor approaches depending on their reach in the community. Such can also apply to gender differences, whereby men could potentially have different susceptibility to conflict manipulations compared to women (see exploratory question A). If such is the case, risks and benefits of health behaviours that apply to one gender, such as eating sushi while pregnant, may need to be communicated differently to minimise perceived conflict and confusion that is characteristic of the respective gender.
I want to learn more about the dependent variables selected by the experimenters in experiment 2. The rationale behind the use of the global measures seem to be underpinned by previous studies. For example, the nutritional confusion measure is a six-item scale previously established by Clark et al. (2019) to measure confusion about nutritional advice, and the mistrust of expertise measure is taken from Oliver and Rahn’s (2016) scale that assesses general skepticism of science and expert opinion. While Haigh & Birch (2020) measured the internal reliability of each measure using Cronbach’s alpha (each indicated a good level of reliability (> 0.70)), the relationship between this measure and the other five scales created by separate experimenters is yet to be established. Although Headline Conflict had no effect on any of the six global measures, nor did Headline Format, I feel as though identifying correlations between scores will potentially justify why experimenters expected a common theme in scores, whereby participants that experience a greater sense of general Nutritional Confusion would also demonstrate Nutritional Backlash, mistrust in expertise and lack of confidence in the Scientific Community. I am also interested to see how negative beliefs are related to positive benefits, as I struggle to grasp why experimenters proposed that, after reading conflicting headlines, those with strong negative beliefs would also hold sophisticated beliefs that scientific knowledge is uncertain and constantly developing (see exploratory question C). # Part 2: Verification
Firstly, I downloaded the open data from the OSF repository in order to reproduce the descriptive statistics reported in Experiment 1 and Experiment 2 (hyperlinked). Specifically, the goal was to reproduce the…
Demographic descriptives reported on page 4:
Exp 1: “…a final sample of 294 participants (126 males, 168 females)…Participants were aged 18 - 69 (Mage = 34.29, SD = 12.97).”
Exp 2: “…the final sample was 400 participants…Participants were aged 18 - 73 (Mage = 33.5, SD = 12) with 150 identifying as male, 248 identifying as female and 2 identifying with neither of those categories.”
Pre-registered analysis of group means for experiment 1 reported on pages 6-7:
“Participants exposed to conflicting headlines perceived greater contradiction between the six headline pairs (M = 25.3) than those exposed to non-conflicting headlines (M = 13.4).”
“Participants exposed to conflicting headlines indicated greater agreement that ‘the headlines create confusion about how to be healthy’ than those exposed to non-conflicting headlines (4.52 vs 3.65…”
“Perceived Scientific Advancement…The mean response of those exposed to non-conflicting headlines (0.007) was greater than the mean response of those exposed to conflicting headlines (-0.25).”
Figures reported on page 7-10:
Note that, where appropriate, messages/warnings/text output for chunks of code are hidden by setting the chunk option {r, message = FALSE / warning = FALSE / results = FALSE} (in order to keep my knitted file concise!).
library(qualtRics) #for reading data, filtering irrelevant rows and setting variables to 'numeric'
library(tidyverse) #for dplyr and ggplot2
library(ggbeeswarm) #for a violin swarm plot
library(ggeasy) #for ggplot formatting shortcuts
library(patchwork) #for combining plots into a single output
library(gt) #for pretty tables
library(jmv) #for statistical analyses
library(ggpubr) #for 'publication ready' plots in exploratory analysis A
library(corrplot) #for correlation matrix in exploratory analysis C
mydata=read_survey("Study_1_data.csv") #data now named "mydata" in Global Environment
mydata <- as.data.frame(mydata)
glimpse(mydata)
sapply(mydata, class) %>% as.data.frame.list() #Note that relevant columns for calculations should be 'numeric'
mydata <- mydata %>% rename(Recall_score = SC0, Condition = FL_10_DO) #new name = old name
mydata %>% summarise(n()) #371
## n()
## 1 371
mydata$Prolific_PID[duplicated(mydata$Prolific_PID)] #59 duplicates found
## [1] "5cd1836da6f34300017e240c" "5c76f2d92819ab0015b94b4c"
## [3] "5c72de7b96a9600001870966" "5c3d48955fd1050001a99364"
## [5] "5eb18525b95d6127da6815ee" "5eaac092d8a78e0172680a0e"
## [7] "5edff6e7c53e0f33aed588ce" "5d6a55cb78fce300014e078e"
## [9] "5e8b765e88065404c4d50fe2" "5ed22fc8bc5d7c01191e3f78"
## [11] "5ea4397b9b06662614d6e4f6" "5d371c527a04ed00018ac1c8"
## [13] "5e9c29bb77f86e13d6435f3d" "5ea3ef637e4bd537fe45cf6c"
## [15] "5ebb0f6aff8da30d1f29dc29" "5cd59eea8618af0001bdab4b"
## [17] "5eac3189c4a262061aed16d5" "5e3842b92fdfd2000fc286a4"
## [19] "5ec504eab4fbc3423f29cd97" "5e332f72c43078000cda48dd"
## [21] "5ea5e76a147a4909b3ff0482" "5c28b31a0091e40001ca5030"
## [23] "5eb089604b5931137dd66a19" "5ed0d82c81c0bd1bccc8ceea"
## [25] "5ed94fdd9a2ae04cf21bfa48" "5be1cc598c6a19000137a503"
## [27] "5cd813aca9a8c4001963a0aa" "5e925105a981b55934a34813"
## [29] "5eb30ae9bb223e0e176edecd" "5ec637f5e0bca2000ae6f211"
## [31] "5eb179d6d6fcb726f9275f97" "5dcbd8542bdeaa8740d52630"
## [33] "5b570f4cc146600001b82d8d" "5ed793104268812282fdc90d"
## [35] "5edf50e72ef80a1fe0267aeb" "5d7f598628843a00181eb444"
## [37] "5ec2ec9fdaef0d11e3109f74" "5d1290103b20b0000102e8e3"
## [39] "57eee744e62704000199d5ec" "5e6a839e2fd3b003a0f7f248"
## [41] "5e7365f19674532b961c2bb6" "5eb94e5731298c01178531c0"
## [43] "5e9db2995b38950c6f669b55" "584bb2b8bd873800015531da"
## [45] "5eac351c63858608351866b4" "5ec0260375bf15077a00e645"
## [47] "5cc3289b9e21e200015f0bb3" "5eb16a0602af57258fd3f8e1"
## [49] "5ed7578ef05e671db283844e" "5ebc1ae1ea22c801479541ab"
## [51] "5edfd94cb54d22309545cf06" "559ab96cfdf99b219a612bcf"
## [53] "5eac35df3043c62536d14f14" "5e9ff2a0cf50621a9b17c94f"
## [55] "5e5d7f349238db09c60bcbab" "5ea611dda778214a5e89fbf2"
## [57] "5c62d8e2a34174000187a003" "5ed6a937eb466b1029493c39"
## [59] "5ebfabc7676c2502837188cf"
mydata <- mydata[!duplicated(mydata$Prolific_PID), ] #remove duplicate rows
mydata %>% summarise(n()) #no. of observations are now 312 (371 - 59 = 312)
## n()
## 1 312
#Applying pre-registered exclusion criteria and creating a subset of variables
mydata <- mydata %>%
filter(
Consent == 1,
Finished == 1,
Serious_check == 1,
Recall_score >= 4) %>%
select(
Finished,
`Duration (in seconds)`,
Gender,
Age,
Serious_check,
Recall_score,
Condition,
contradiction_1:advancement)
mydata %>% summarise(n()) #final sample size = 294
## n()
## 1 294
write_csv(mydata, "MyDataSubset.csv") #data saved as "MyDataSubset.csv" in my working directory
My data set was finally ready for calculating descriptive statistics!
#AGE: mean, SD, and range
exp1_age <- mydata %>% summarise(Mean = mean(Age), #name = function(Age)
SD = sd(Age),
Range_min = min(Age),
Range_max = max(Age))
exp1_age %>%
gt() %>% #format into pretty table
tab_header(title = "Experiment 1: Age demographics") %>%
fmt_number(c(Mean, SD), decimals = 2)
| Experiment 1: Age demographics | |||
|---|---|---|---|
| Mean | SD | Range_min | Range_max |
| 34.29 | 12.97 | 18 | 69 |
#GENDER:
exp1_gender <- mydata %>% count(Gender)
exp1_gender %>%
gt() %>%
tab_header(title = "Experiment 1: Gender demographics")
| Experiment 1: Gender demographics | |
|---|---|
| Gender | n |
| 1 | 126 |
| 2 | 168 |
Both experiment 1 and 2 had an identical 2 x 2 between-subjects design. The first independent variable (IV) was Headline Conflict, whereby participants were randomly assigned to see headlines that either contained non-conflicting claims or conflicting claims about six nutrition topics (levels = Conflict and Consistent). The second IV was Headline Format, whereby headline claims were either generically attributed to experts (e.g., “Scientists say…”) or were hedged by adding a qualifier (e.g., “Some scientists say…”) (levels = Generic and Qualified). This resulted in four experimental conditions, whereby participants were assigned see headlines that were either Generic/Conflicting, Generic/Non-conflicting, Qualified/Conflicting or Qualified/Non-conflicting.
mydata <- separate(mydata, Condition, c("block", "number", "Format", "Conflict"))
#Change
mydata$Format <- as.factor(mydata$Format)
mydata$Conflict <- as.factor(mydata$Conflict)
sapply(mydata, class) #(optional) to check
## Finished Duration (in seconds) Gender
## "numeric" "numeric" "numeric"
## Age Serious_check Recall_score
## "numeric" "numeric" "numeric"
## block number Format
## "character" "character" "factor"
## Conflict contradiction_1 contradiction_2
## "factor" "numeric" "numeric"
## contradiction_3 contradiction_4 contradiction_5
## "numeric" "numeric" "numeric"
## contradiction_6 confusion advancement
## "numeric" "numeric" "numeric"
#Rename
mydata$Conflict <- recode_factor(mydata$Conflict,
Conflict = "Conf.", #old name = new name
Consistent = "Non-Conf.") #old name = new name
In Experiment 1, participants completed self-reports on 3 dependent variables:
#NEW variable: sum of the six contradiction ratings
mydata <- mydata %>%
mutate(contradiction = contradiction_1 + contradiction_2 + contradiction_3 + contradiction_4 + contradiction_5 + contradiction_6)
PERCEIVED SCIENTIFIC ADVANCEMENT (PSA): participants were asked “When we take the results reported in these headlines together, do we now know more, less or the same as we did before about how to be healthy?”. They indicated their response on a 3-point scale (-1 = we know less, 0 = we know the same amount, 1 = we know more).
CONFUSION: participants were presented with the statement “The headlines I was asked to remember create confusion about how to be healthy” and indicated the extent to which they agreed or disagreed on a 5-point Likert scale. Scores are recorded under the ‘confusion’ variable (a higher score indicates a greater level of confusion).
#Export the FINAL data to a .csv
write_csv(mydata, "MyDataFinalSubset.csv") # data saved as "MyDataFinalSubset.csv" in my WD
Part of the pre-registered analysis involves a conflict manipulation check whereby the mean score of participants exposed to the conflicting headlines (Conflict condition) is compared to the mean score of participants exposed to non-conflicting headlines (Consistent condition) on EACH dependent variable.
This comparison of Conflicting vs non. conflicting group means serves to ensure that the conflict manipulation was perceived as intended (i.e. there is a main effect of Headline Conflict). Results presented in the paper indicate that such was achieved:
- Participants exposed to conflicting headlines perceived greater CONTRADICTION between the six headline pairs (M = 25.3) than those exposed to non-conflicting headlines (M = 13.4). (2) Participants exposed to non-conflicting headlines perceived greater SCIENTIFIC ADVANCEMENT (PSA) (M = 0.007) than those exposed to conflicting headlines (M = -0.25). (3) Participants exposed to conflicting headlines indicated greater agreement that ‘the headlines create CONFUSION about how to be healthy’ (M = 4.52) than those exposed to non-conflicting headlines (M = 3.65).
##Conflicting vs non. conflicting group means (pre-registered analysis)
#Perceived Contradiction:
contradiction_means_byconflict <- mydata %>%
group_by(Conflict) %>% #Specify group indicator
summarise(Mean = mean(contradiction)) #name = function(specify column)
contradiction_means_byconflict %>%
gt() %>%
tab_header(title = "Perceived Contradiction means (grouped by Conflict)") %>%
fmt_number(c(Mean), decimals = 1)
| Perceived Contradiction means (grouped by Conflict) | |
|---|---|
| Conflict | Mean |
| Conf. | 25.3 |
| Non-Conf. | 13.4 |
#Scientific advancement
advancement_means_byconflict <- mydata %>%
group_by(Conflict) %>%
summarise(Mean = mean(advancement))
advancement_means_byconflict %>%
gt() %>%
tab_header(title = "PSA means (grouped by Conflict)") %>%
fmt_number(c(Mean), decimals = 3)
| PSA means (grouped by Conflict) | |
|---|---|
| Conflict | Mean |
| Conf. | −0.245 |
| Non-Conf. | 0.007 |
#Confusion
confusion_means_byconflict <- mydata %>%
group_by(Conflict) %>%
summarise(Mean = mean(confusion))
confusion_means_byconflict %>%
gt() %>%
tab_header(title = "Confusion means (grouped by Conflict)") %>%
fmt_number(c(Mean), decimals = 2)
| Confusion means (grouped by Conflict) | |
|---|---|
| Conflict | Mean |
| Conf. | 4.52 |
| Non-Conf. | 3.65 |
It then came time to tackle the beast! The violin plots reported in the paper for both experiment 1 and 2 were coded using pirateplot() in the author’s code, however the code was not accompanied by any annotations explaining the structure nor what any of the arguments meant! This meant that reproducing the plots with intention rather than just copying and pasting code would be a struggle. Take a look at this jibberish!:
*author’s code for experiment 1 plots
In all seriousness, while extensive Google searches would have clarified this code, this came with a caveat. My group and I questioned how much coding knowledge we could gain from working backwards from an already established chunk of code provided by the authors. It seemed much more enticing, despite being more difficult, to apply our basic ggplot skills we learned from Danielle Navarro and expand our knowledge to be able to produce the more complex code characteristic of the plots in the paper. This was hence the rationale for using ggplot() from the ggplot2 package within tidyverse as opposed to pirateplots. This meant that I had to start from scratch, and I can’t say I would have been successful in achieving the desired plot aesthetics without the help of my group members.
Figure 1 presents descriptive statistics for each variable measured in Experiment 1. Specifically, violin plots are used to visualise the distribution of the data at each point on the DV scale. Horizontal bars with 95% confidence intervals are also added to represent the condition mean, as well as swarm plots layered onto the violins. As coding for contradiction, advancement and confusion plots each follow the same structure, I will explain my memorable challenges and triumphs in coding the contradiction plot which also applies to the latter two (each function will be explained in depth in my final code):
ggplot(mydata) +
geom_violin(aes(x = Format, y = contradiction)) +
facet_wrap(~ Conflict + Format)
I must admit that at this point I was already puzzled. I tried using facet_grid() but had no luck as the violins were still stacked, so I turned to my group for help. After employing facet_wrap(vars()) and specifying vars() with Format alone, plus changing my x variable to Conflict (as Jia suggested), violins now plotted side by side!
ggplot(mydata) +
geom_violin(aes(x = Conflict, y = contradiction)) +
facet_wrap(vars(Format))
That looks much better! But still far off from the published plots.
ggplot(mydata) +
geom_violin(aes(x = Conflict, y = contradiction)) +
facet_wrap(vars(Format), strip.position = "bottom") +
ggtitle(label = "Contradiction") + #adding title
scale_y_continuous(name = "Perceived Contradiction", limits = c(0,30)) +
scale_x_discrete(name = NULL) + #removing x-axis title
theme(plot.title = element_text(hjust = 0.5)) #centering the title
YAY! We’re getting warmer.
ggplot(mydata, aes(x = Conflict, y = contradiction, fill = Conflict)) +
#fill = colour grouping based on Conflict type
geom_violin() +
facet_wrap(vars(Format), strip.position = "bottom") +
stat_summary(
mapping = NULL,
data = NULL,
geom = "crossbar",
fun.data = "mean_ci") + #INSTEAD OF mean_se
geom_beeswarm() +
ggtitle(label = "Contradiction") +
scale_y_continuous(
name = "Perceived Contradiction",
limits = c(0,30)) +
scale_x_discrete(name = NULL) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position = 'none') #removing legend
I must say I am pleasantly surprised with the these plots, data wise they are identical to the published plots!
ggplot(mydata,aes(x = Conflict, y = contradiction, fill = Conflict)) +
geom_violin() +
facet_wrap(vars(Format), strip.position = "bottom") +
stat_summary(fun.data = "mean_cl_normal",
geom = "crossbar",
fill = "white", #change the fill of the crossbars
alpha = .7) + #change the opacity of the fill
geom_beeswarm(cex = 0.2) + #'cex' = scaling for adjusting point spacing
theme_bw() + #change background to white
ggtitle(label = "Contradiction") +
easy_center_title() +
easy_remove_legend() +
scale_x_discrete(name = NULL) +
scale_y_continuous(name = "Perceived Contradiction", limits = c(0,30)) +
scale_fill_manual(values = c("slategray2", "lightpink1")) #change violin fill colours
Surely these same lines of code do not have to be repeated for Advancement and Confusion plots…
Given the fact that each individual plot in the paper are derived from near identical code, in spirit of the D.R.Y principle (Don’t Repeat Yourself), it seemed only logical to produce a function for experiment 1 and 2 that would condense our code to avoid redundancies in creating the plots separately:
For experiment 1, the three plots (Contradiction, Advancement and Confusion) were formed by CREATING our own function() to apply the same statements to all plots using a single chunk of code. The unique aspects of each plot (the Y-axis variable, titles and y-axis limits) were specified upon calling the function.
Here is a rundown of the function:
##Violin ggplots
#Making our own function
violin_fun <- function(dataset, y_var, plot_title, y_title, lim_1, lim_2) {
ggplot(data = dataset, aes(x = Conflict, y = y_var, fill = Conflict)) +
geom_violin() +
facet_wrap(vars(Format), strip.position = "bottom") +
stat_summary(fun.data = "mean_cl_normal", geom = "crossbar", fill = "white",
alpha = .7) +
geom_beeswarm(cex = 0.2) +
theme_bw() +
ggtitle(label = plot_title) +
easy_center_title() +
easy_remove_legend() +
scale_x_discrete(name = NULL) +
scale_y_continuous(name = y_title, limits = c(lim_1, lim_2)) +
scale_fill_manual(values = c("slategray2", "lightpink1")) }
#Plotting Contradiction, Advancement and Confusion plots using function
contradiction_plot <- violin_fun(dataset = mydata, y_var = mydata$contradiction, plot_title = "Contradiction", y_title = "Perceived Contradiction", lim_1 = 1, lim_2= 30)
advancement_plot <- violin_fun(dataset = mydata, y_var = mydata$advancement, plot_title = "Advancement", y_title = "Perceived Scientific Advancement", lim_1 = -1, lim_2 = 1)
confusion_plot <- violin_fun(dataset = mydata, y_var = mydata$confusion, plot_title = "Confusion", y_title = "Confusion", lim_1 = 1, lim_2 = 5)
combinedplots1 <- contradiction_plot + advancement_plot + confusion_plot + plot_layout(ncol = 2)
print(combinedplots1)
Figure 2 presents descriptive statistics for the PSA variable in experiment 1. Specifically, a histogram plot displays the frequency of ‘-1’, ‘0’ and ‘1’ scores on the advancement scale which indicates “the number of participants in each condition who felt the body of research reported in the headlines resulted in us knowing more than before (+1), less than before (-1) or the same as before (0)”.
At first glance it was quite daunting to see all the colour coded conditions and value labels applied to the x axis, as I had no clue how to go about coding this. Once again, a tidyverse article came to my rescue (https://ggplot2.tidyverse.org/reference/geom_bar.html).
#FIRST adding value labels
advancementvalue <- factor(mydata$advancement,
levels = c(-1, 0, 1),
labels = c("Less", "Same", "More")) #labeling the levels of advancement
advancement_histogram <- ggplot(mydata) +
geom_bar(aes(x = advancementvalue, group = Conflict:Format, fill = Conflict:Format),
position = "dodge") +
scale_x_discrete(name = "Advancement") +
scale_y_continuous(name = "Number of Participants") +
scale_fill_grey(name = "Condition",
labels = c("Conflicting/Generic",
"Conflicting/Qualified",
"Non-conflicting/Generic",
"Non-conflicting/Qualified"))
print(advancement_histogram)
Having worked hard to learn the functions necessary to analyse the descriptive data in experiment 1, transferring these skills to experiment 2 was rather handy and had the additional payoff of helping me better consolidate the skills learned!
The same functions are used as those justified in Experiment 1
mydata2=read_survey("Study_2_data.csv")
mydata2<- as.data.frame(mydata2)
glimpse(mydata2)
sapply(mydata2, class) %>% as.data.frame.list() #relevant columns for calculations should be 'numeric'
mydata2 <- mydata2 %>% rename(Recall_score = SC0, Condition = FL_12_DO, confidence = GSS)
#new name = old name
mydata2 %>% summarise(n()) #412
## n()
## 1 412
mydata2$Prolific_PID[duplicated(mydata2$Prolific_PID)] #3 duplicates found
## [1] "5b8800b14fe6450001f07d71" "5ae8d7b596845f0001c7948d"
## [3] "5e68d2b5576136000ba058a2"
mydata2 <- mydata2[!duplicated(mydata2$Prolific_PID), ] #remove duplicate rows
mydata2 %>% summarise(n()) #no. of observations are now 409
## n()
## 1 409
#Applying pre-registered exclusion criteria and filtering variables
mydata2 <- mydata2 %>%
filter(
Consent == 1,
Finished == 1,
Serious_check == 1,
Recall_score >= 4) %>%
select(
Finished,
`Duration (in seconds)`,
Gender,
Age,
Serious_check,
Recall_score,
Condition,
NC_1:Development_sci_know_6)
mydata2 %>% summarise(n()) #final sample size = 400
## n()
## 1 400
write_csv(mydata2, "MyDataSubset2.csv") #data saved as "MyDataSubset2.csv" in my WD
#AGE: mean, SD, and range
exp2_age <- mydata2 %>% summarise(Mean = mean(Age),
SD = sd(Age),
Range_min = min(Age),
Range_max = max(Age))
exp2_age %>%
gt() %>%
tab_header(
title = "Experiment 2: Age Demographics") %>%
fmt_number(c(Mean, SD), decimals = 1)
| Experiment 2: Age Demographics | |||
|---|---|---|---|
| Mean | SD | Range_min | Range_max |
| 33.5 | 12.0 | 18 | 73 |
#GENDER:
exp2_gender <- mydata2 %>% count(Gender)
exp2_gender %>%
gt() %>%
tab_header(
title = "Experiment 2: Gender Demographics")
| Experiment 2: Gender Demographics | |
|---|---|
| Gender | n |
| 1 | 150 |
| 2 | 248 |
| 3 | 2 |
Experiment 2 adopts a 2 x 2 between-subjects design identical to Experiment 1 –> Factors = Headline Conflict (levels = Conflict and Consistent) and Headline Format (levels = Generic and Qualified) –> four experimental conditions: Generic/Conflicting, Generic/Non-conflicting, Qualified/Conflicting or Qualified/Non-conflicting.
#Separate
mydata2 <- separate(mydata2, Condition, c("block", "number", "Format", "Conflict"))
#Change
mydata2 <- mydata2 %>%
mutate(Format=as.factor(Format)) %>%
mutate(Conflict=as.factor(Conflict))
sapply(mydata2, class) #(optional) to check
## Finished Duration (in seconds) Gender
## "numeric" "numeric" "numeric"
## Age Serious_check Recall_score
## "numeric" "numeric" "numeric"
## block number Format
## "character" "character" "factor"
## Conflict NC_1 NC_2
## "factor" "numeric" "numeric"
## NC_3 NC_4 NC_5
## "numeric" "numeric" "numeric"
## NC_6 NBS_1 NBS_2
## "numeric" "numeric" "numeric"
## NBS_3 NBS_4 NBS_5
## "numeric" "numeric" "numeric"
## NBS_6 Mistrust_expertise_1 Mistrust_expertise_2
## "numeric" "numeric" "numeric"
## Mistrust_expertise_3 confidence Certainty_sci_know_1
## "numeric" "numeric" "numeric"
## Certainty_sci_know_2 Certainty_sci_know_3 Certainty_sci_know_4
## "numeric" "numeric" "numeric"
## Certainty_sci_know_5 Certainty_sci_know_6 Development_sci_know_1
## "numeric" "numeric" "numeric"
## Development_sci_know_2 Development_sci_know_3 Development_sci_know_4
## "numeric" "numeric" "numeric"
## Development_sci_know_5 Development_sci_know_6
## "numeric" "numeric"
#Rename
mydata2$Conflict <- recode_factor(mydata2$Conflict,
Conflict = "Conf.", #old name = new name
Consistent = "Non-Conf.")
Experiment 2 adopts 6 different dependent variables to Experiment 1, that measure more global beliefs about nutrition and the development of science:
##New variables: calculate the average for each scale
#nutritional confusion mean
mydata2 <- mydata2 %>%
mutate(confusion = (NC_1 + NC_2 + NC_3 + NC_4 + NC_5 + NC_6)/6)
#nutritional backlash mean
mydata2 <- mydata2 %>%
mutate(backlash = (NBS_1 + NBS_2 + NBS_3 + NBS_4 + NBS_5 + NBS_6)/6)
#Mistrust of expertise mean
mydata2 <- mydata2 %>%
mutate(mistrust = (Mistrust_expertise_1 + Mistrust_expertise_2 +
Mistrust_expertise_3)/3)
#Certainty of knowledge mean
mydata2 <- mydata2 %>%
mutate(certainty = (Certainty_sci_know_1 + Certainty_sci_know_2 +
Certainty_sci_know_3 + Certainty_sci_know_4 + Certainty_sci_know_5 +
Certainty_sci_know_6)/6)
#Development of knowledge mean
mydata2 <- mydata2 %>%
mutate(development = (Development_sci_know_1 + Development_sci_know_2 +
Development_sci_know_3 + Development_sci_know_4 + Development_sci_know_5 +
Development_sci_know_6)/6)
write_csv(mydata2, "MyDataFinalSubset2.csv") #data saved as "MyDataFinalSubset2.csv" in my WD
Figure 3 in Haigh & Birch’s (2021) paper presents the descriptive statistics for each variable measured in Experiment 2 as done in experiment 1.
As six plots (Nutritional Confusion, Nutritional Backlash, Mistrust of Expertise, Confidence in Scientific Community, Certainty of Knowledge and Development of Knowledge) were created using the same violin plot format, I used the same function ‘violin_fun’ that I created for experiment 1, keeping it D.R.Y! The unique aspects of each plot (the Y-axis variable, y axis label, and title) and the limits that differ to experiment 1 were specified upon calling the function.
#Plotting Contradiction, Advancement and Confusion plots using violin_fun function
nutritionalconfusion_plot <- violin_fun(dataset = mydata2, y_var = mydata2$confusion, plot_title = "Nutritional Confusion", y_title = "Nutritional Confusion", lim_1 = 1, lim_2 = 5)
nutritionalbacklash_plot <- violin_fun(dataset = mydata2, y_var = mydata2$backlash, plot_title = "Nutritional Backlash", y_title = "Nutritional Backlash", lim_1 = 1, lim_2 = 5)
mistrust_plot <- violin_fun(dataset = mydata2, y_var = mydata2$mistrust, plot_title = "Mistrust of Expertise", y_title = "Mistrust of Expertise", lim_1 = 1, lim_2 = 5)
confidence_plot <- violin_fun(dataset = mydata2, y_var = mydata2$confidence, plot_title = "Confidence in Scientific Community", y_title = "Confidence in Scientific Community", lim_1 = 1, lim_2 = 3)
certainty_plot <- violin_fun(dataset = mydata2, y_var = mydata2$certainty, plot_title = "Certainty of Knowledge", y_title = "Certainty of Knowledge", lim_1 = 1, lim_2 = 5)
development_plot <- violin_fun(dataset = mydata2, y_var = mydata2$development, plot_title = "Development of Knowledge", y_title = "Development of Knowledge", lim_1 = 1, lim_2 = 5)
combinedplots2 <- nutritionalconfusion_plot + nutritionalbacklash_plot + mistrust_plot + confidence_plot + certainty_plot + development_plot + plot_layout(ncol = 2)
print(combinedplots2)
In experiment 1, there was a significant effect of Headline Conflict on perceived contradiction/advancement(PSA)/confusion (those that saw conflicting headlines felt they were more contradictory, more confusing and resulted in us knowing less about how to be healthy than those who saw the non-conflicting headlines). I was thus interested to see whether this conflict manipulation results in alike effects between the sexes.
#Rename
mydata$Gender <- recode_factor(mydata$Gender,
"1" = "Male", #old name = new name
"2" = "Female")
#Change
mydata <- mydata %>%
mutate(Gender=as.factor(Gender))
library(ggpubr)
#Creating function
gender_boxplot_fun <- function(y_var, plot_title, y_title) {
#^formal arguments to be manually specified for each plot
ggboxplot(
mydata, x = "Conflict", y = y_var, #specify data and variables
color = "Conflict", #colour code points by Conflict
add = "jitter", #jittered dot points
title = plot_title, ylab = y_title, xlab = "Headline Conflict") +
facet_wrap(vars(Gender), #group conditions by Gender
strip.position = "bottom") + #move gender labels to bottom
easy_center_title() +
easy_remove_legend() }
#Creating Contradiction, Advancement and Confusion boxplots using the function
gender_contradiction_boxplot <- gender_boxplot_fun(y_var = "contradiction", plot_title = "Gender differences in Conflict manipulation: Contradiction", y_title = "Perceived Contradiction")
gender_advancement_boxplot <- gender_boxplot_fun(y_var = "advancement", plot_title = "Gender differences in Conflict manipulation: Advancement", y_title = "Perceived Scientific Advancement")
gender_confusion_boxplot <- gender_boxplot_fun(y_var = "confusion", plot_title = "Gender differences in Conflict manipulation: Confusion", y_title = "Confusion")
#Printing boxplots
print(gender_contradiction_boxplot)
print(gender_advancement_boxplot)
print(gender_confusion_boxplot)
Judging by the dot points, there does not seem to be a noticeable difference in the distribution of scores between males and females on the perceived contradiction scale as well the confusion scale. For male advancement scores however, I can see that there is a greater density of points towards the top of the scale, while females have a higher density of points towards the bottom of the scale. I can also see that the interquartile range of advancement scores lies higher for males exposed to non-conflicting headlines relative to conflicting headlines, however such is not evident for females. This leads me to believe that the effect of conflict manipulation is greater for males, relative to females, however we won’t know for sure until we conduct a statistical analysis.
Assessment of group means directly allows for an alternative descriptive assessment of scores relative to each gender…
##Conflicting vs non. conflicting group means
#Perceived Contradiction:
mydata %>%
group_by(Gender, Conflict) %>% # Specify data frame and group indicators
summarise(Mean = mean(contradiction), # Specify column and functions:
SD = sd(contradiction),
n = n(),
SE = SD/sqrt(n)) %>%
gt() %>%
tab_header(title = "Effect of Headline Conflict on Perceived Contradiction (grouped by Gender)") %>%
fmt_number(c(Mean, SD, SE), decimals = 2)
## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.
| Effect of Headline Conflict on Perceived Contradiction (grouped by Gender) | ||||
|---|---|---|---|---|
| Conflict | Mean | SD | n | SE |
| Male | ||||
| Conf. | 24.97 | 3.97 | 60 | 0.51 |
| Non-Conf. | 13.39 | 3.67 | 66 | 0.45 |
| Female | ||||
| Conf. | 25.46 | 3.56 | 87 | 0.38 |
| Non-Conf. | 13.49 | 3.98 | 81 | 0.44 |
#Perceived Scientific Advancement (PSA):
mydata %>%
group_by(Gender, Conflict) %>%
summarise(Mean = mean(advancement),
SD = sd(advancement),
n = n(),
SE = SD/sqrt(n)) %>%
gt() %>%
tab_header(title = "Effect of Headline Conflict on PSA (grouped by Gender)") %>%
fmt_number(c(Mean, SD, SE), decimals = 2)
## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.
| Effect of Headline Conflict on PSA (grouped by Gender) | ||||
|---|---|---|---|---|
| Conflict | Mean | SD | n | SE |
| Male | ||||
| Conf. | −0.10 | 0.71 | 60 | 0.09 |
| Non-Conf. | 0.12 | 0.67 | 66 | 0.08 |
| Female | ||||
| Conf. | −0.34 | 0.59 | 87 | 0.06 |
| Non-Conf. | −0.09 | 0.66 | 81 | 0.07 |
#Confusion:
mydata %>%
group_by(Gender, Conflict) %>%
summarise(Mean = mean(confusion),
SD = sd(confusion),
n = n(),
SE = SD/sqrt(n)) %>%
gt() %>%
tab_header(title = "Effect of Headline Conflict on Confusion (grouped by Gender)") %>%
fmt_number(c(Mean, SD, SE), decimals = 2)
## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.
| Effect of Headline Conflict on Confusion (grouped by Gender) | ||||
|---|---|---|---|---|
| Conflict | Mean | SD | n | SE |
| Male | ||||
| Conf. | 4.57 | 0.59 | 60 | 0.08 |
| Non-Conf. | 3.58 | 1.10 | 66 | 0.13 |
| Female | ||||
| Conf. | 4.49 | 0.70 | 87 | 0.07 |
| Non-Conf. | 3.70 | 1.05 | 81 | 0.12 |
By manually assessing means I can see that advancement scores are much lower for females, and that the difference between conflicting and non-conflicting headlines is greater for males than females. This does not seem to be largely apparent for contradiction or confusion scales. I can also see that there is a larger number of females in the sample, hence it should be noted that there is less variance in the female groups (according to SD and SE) and hence the mean statistics for men might not be as reflective of their true population.
Since the difference in advancement scores between males and females caught my attention, I initially thought to compare the average scores of females and males exposed to EACH level of the Headline Conflict factor: Conflict (Conf.) and Consistent (Non-Conf.). I did so using INDEPENDENT SAMPLES T-TESTS (for each level) via the ttestIS() function in the jmv package.
#I filtered the twice to use Conflicting or Consistent condition data to compare males and females
conflict <- mydata %>%
filter(Conflict == "Conf.")
consistent <- mydata %>%
filter(Conflict == "Non-Conf.")
#T-test comparing means of males and females
ttestIS(formula = advancement ~ Gender, data = conflict)
##
## INDEPENDENT SAMPLES T-TEST
##
## Independent Samples T-Test
## ────────────────────────────────────────────────────────────────────
## Statistic df p
## ────────────────────────────────────────────────────────────────────
## advancement Student's t 2.286083 145.0000 0.0236983
## ────────────────────────────────────────────────────────────────────
ttestIS(formula = advancement ~ Gender, data = consistent)
##
## INDEPENDENT SAMPLES T-TEST
##
## Independent Samples T-Test
## ────────────────────────────────────────────────────────────────────
## Statistic df p
## ────────────────────────────────────────────────────────────────────
## advancement Student's t 1.893226 145.0000 0.0603203
## ────────────────────────────────────────────────────────────────────
Results from the t-tests reveal there is a statistical difference in scores for male (M = -0.10, SD = 0.71) and female (M = -0.34, SD = 0.59) perceived scientific advancement when exposed to conflicting headlines (t(145) = 2.28, p < 0.05), however there is an insignificant difference between males (M = 0.12, SD = 0.67) and females (M = -0.09, SD = 0.66) exposed to the non-conflicting headlines (t(145) = 1.89, p > 0.05). Hence, according to the group means calculated earlier, females exposed to conflicting headlines perceived that scientific knowledge had advanced to a lesser extent (M = -0.34) than males (M = -0.10).
Although I arrived at a significant result, I realised that the t-tests I conducted for PSA do not inform a presence of a main effect or interaction effect between the multi-level factors, so I switched my approach. An ANOVA test on each perceived conflict measure is required to investigate this.
Specifically, I questioned…
Is there a main effect of Gender on contradiction, scientific advancement and/or confusion (averaged across Headline Conflict)? and does the effect of Conflict manipulation (Conflict vs. Consistent) differ as a function of Gender (Male vs Female) i.e. is there a significant interaction between Headline Conflict and Gender?
#ANOVA tests (formula = measurement variable ~ Factor 1 * Factor 2, data)
ANOVA(formula = advancement ~ Conflict * Gender, data = mydata)
##
## ANOVA
##
## ANOVA - advancement
## ──────────────────────────────────────────────────────────────────────────────────────
## Sum of Squares df Mean Square F p
## ──────────────────────────────────────────────────────────────────────────────────────
## Conflict 4.13300569 1 4.13300569 9.78581314 0.0019375
## Gender 3.67816208 1 3.67816208 8.70886941 0.0034251
## Conflict:Gender 0.02485749 1 0.02485749 0.05885565 0.8084853
## Residuals 122.48053717 290 0.42234668
## ──────────────────────────────────────────────────────────────────────────────────────
ANOVA(formula = contradiction ~ Conflict * Gender, data = mydata)
##
## ANOVA
##
## ANOVA - contradiction
## ────────────────────────────────────────────────────────────────────────────────────────
## Sum of Squares df Mean Square F p
## ────────────────────────────────────────────────────────────────────────────────────────
## Conflict 9954.864784 1 9954.864784 694.3783858 < .0000001
## Gender 6.317831 1 6.317831 0.4406856 0.5073191
## Conflict:Gender 2.778006 1 2.778006 0.1937733 0.6601224
## Residuals 4157.547018 290 14.336369
## ────────────────────────────────────────────────────────────────────────────────────────
ANOVA(formula = confusion ~ Conflict * Gender, data = mydata)
##
## ANOVA
##
## ANOVA - confusion
## ────────────────────────────────────────────────────────────────────────────────────────
## Sum of Squares df Mean Square F p
## ────────────────────────────────────────────────────────────────────────────────────────
## Conflict 57.01942311 1 57.01942311 72.05365069 < .0000001
## Gender 0.05540684 1 0.05540684 0.07001588 0.7915016
## Conflict:Gender 0.72126228 1 0.72126228 0.91143645 0.3405287
## Residuals 229.49056078 290 0.79134676
## ────────────────────────────────────────────────────────────────────────────────────────
I was glad to see that the results for each ANOVA replicate Haigh & Birch’s (2020) findings that the conflict manipulation was perceived as intended, indicated by significant main effects of Headline Conflict on perceived scientific advancement (p = .001), perceived contradiction (p < .001), and confusion (p < .001)
For confusion and contradiction measures, there was no significant main effect of Gender or interaction effect between Gender x Conflict. There was also no significant Gender x Conflict interaction (F(1, 290) = 0.06, p > 0.05) for the advancement measure, however, there was a significant main effect of Gender (F(1,290) = 8.70, p < .01).
To interpret the direction of this significant main effect I drew reference from the table of group means I calculated earlier, and conclude that, averaged across Conflict condition, females perceived that scientific knowledge had advanced to a lesser extent than males. However, the degree of the effect of Conflict manipulation (Conflict vs. Consistent) did not differ as a function of Gender.
In the face of huge leaps in scientific findings surrounding nutrition and diet, it would be naive to say that all age groups hold the same beliefs and willingness to accept to changing recommendations by experts…wouldn’t it? I decided to assess participants’ levels of confidence in the scientific community (whereby higher scores indicate lower confidence). This is because of my personal experience observing the opinions of my older friends and relatives, whom often display more skepticism towards nutritional and dietary advice given to them and developments in science more broadly. On the other hand, I tend to notice my younger friends and relatives being more open towards developments in science, and I personally am no stranger to the fact that disproving a theory is just as important as proving one (thanks to my psychology courses).
For this reason, I beg the question, is there a correlation between age and confidence in the scientific community whereby older people are less confident in the scientific community compared to their younger counterparts?
mydata3b <- mutate(mydata2, age_group = cut_number(Age, n = 7))
age_confidence_means <- mydata3b %>% # Specify data frame
group_by(age_group) %>% # Specify group indicators
summarise(Mean = mean(confidence), # Specify column and functions:
SD = sd(confidence), #name = function
n = n(),
SE = SD/sqrt(n))
age_confidence_means %>%
gt() %>%
tab_header(title = "Confidence in the Scientific Community (grouped by Age)") %>%
fmt_number(c(Mean, SD, SE), decimals = 2)
#I couldn't knit to a pdf when I included this output, so I hid results and took a screenshot below
It appears that with increasing age, the mean scores on the confidence scale increase slightly. It may be the case that age is moderately correlated with confidence, however we won’t know for sure until we conduct a statistical analysis.
#Assessing scores across all conditions
age_confidence_scatterplot <- ggscatter(
mydata2, x = "Age", y = "confidence",
point = FALSE,
title = "Age and Confidence in the Scientific Community",
ylab = "Confidence in the Scientific Community") +
geom_jitter(colour = "orange") +
geom_smooth(method = "lm") +
stat_cor(method = "pearson", p.accuracy = 0.001, r.accuracy = 0.01) +
easy_center_title()
print(age_confidence_scatterplot)
Again, it’s hard to tell whether there is an increase in scores as a function of age based on the dot points, but the line of best fit does slope which makes me think that there is a correlation to some degree! And according to Pearson’s correlation test, there is a weak positive correlation (r = .13, p < 0.01) between age and confidence in the scientific community.
From this exploratory analysis I can conclude that, as a higher scores on the scale reflects less confidence in the scientific community, the higher one’s age the lower their confidence in the scientific community, but this relationship only weak in strength.
I am keen to understand the relationships between the variables of measurement to better understand the rationale the experimenters’ hypotheses. Namely, experimenters hypothesised that those exposed to conflicting headlines would display greater confusion/backlash/mistrust/lower confidence but positively demonstrate more sophisticated beliefs that scientific knowledge is uncertain and constantly developing.
corr_data <- mydata2 %>%
select(confusion, backlash, mistrust, confidence, certainty, development) %>%
cor()
corrplot(corr_data, method = "color", type = "lower")
I observing this correlation matrix I can see that there is a roughly moderate positive relationship between the negative global beliefs (nutritional confusion, nutritional backlash, mistrust, confidence), which I expected. Such is also the case for the positive belief of certainty x development in the scientific community. I found it interesting that there was a roughly weak relationship between the positive and negative beliefs. However, at this point I began to doubt my coding choice, as I would be more confident with seeing explicit r values as well as significance levels, rather than colours.
corrMatrix(mydata2,
vars(confusion, backlash, mistrust, confidence, certainty, development))
##
## CORRELATION MATRIX
##
## Correlation Matrix
## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## confusion backlash mistrust confidence certainty development
## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## confusion Pearson's r —
## p-value —
##
## backlash Pearson's r 0.4664511 —
## p-value < .0000001 —
##
## mistrust Pearson's r 0.2594596 0.4602377 —
## p-value 0.0000001 < .0000001 —
##
## confidence Pearson's r 0.4092529 0.3327275 0.5122005 —
## p-value < .0000001 < .0000001 < .0000001 —
##
## certainty Pearson's r -0.0884529 -0.0514938 -0.2901025 -0.1170295 —
## p-value 0.0772306 0.3042629 < .0000001 0.0192159 —
##
## development Pearson's r -0.0929888 -0.0987324 -0.2675103 -0.1186132 0.4253782 —
## p-value 0.0631713 0.0484624 < .0000001 0.0176336 < .0000001 —
## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
That is much better! It is directly apparent that negative beliefs are, at least moderately, positively correlated with eachother (r values of at least .3, with the exception of mistrust x confusion (r = .25) and these relationships are statistically significant (p < .001)! Beliefs in the certainty or development of scientific knowledge also have a moderate positive relationship (r = .42, p < .001).
There are significant and weak negative correlations between beliefs in the certainty or development of scientific knowledge and the negative beliefs (with the exception of certainty x backlash (p > 0.05)). This is definitely insightful as it is inline with my initial thoughts that people with higher negative beliefs wouldn’t demonstrate the more sophisticated belief that science is uncertain and constantly developing. However, experimenters expected a common theme in scores of negative global beliefs, whereby CONFLICTING HEADLINES would induce a greater sense of general Nutritional Confusion, greater Nutritional Backlash, greater mistrust in expertise and lower confidence in the Scientific Community (and greater positive beliefs). It was important that I accounted for this specification by assessing relationship between the variables based on scores that come from each conflict condition.
While I didn’t plot all measurement pairs, I decided to plot Nutritional Confusion and Nutritional Backlash scores to get correlations coefficients for each level of Headline Conflict.
confusion_backlash_scatterplot <- ggscatter(
mydata2, x = "confusion", y = "backlash",
point = FALSE,
title = "Nutritional Confusion and Nutritional Backlash, grouped by Conflict",
ylab = "Nutritional Backlash", xlab = "Nutritional Confusion") +
stat_cor(p.accuracy = 0.001, r.accuracy = 0.01) +
geom_jitter(aes(colour = Conflict)) +
facet_grid(vars(Conflict)) + #grouping scores by conflict condition
geom_smooth(method = "lm") +
easy_center_title() +
easy_remove_legend()
print(confusion_backlash_scatterplot)
The significant and moderate positive relationship between levels of Nutritional Conflusion and Nutritional Backlash also holds when participants were presented with conflicting headlines! (r = .52, p < 0.001).
This is also the case for the certainty x development relationship (r = .36, p < 0.001):
certainty_development_scatterplot <- ggscatter(
mydata2, x = "certainty", y = "development",
point = FALSE,
title = "Certainty and Development of Scientific Knowledge, grouped by Conflict",
ylab = "Certainty of Scientific Knowledge", xlab = "Development of Scientific Knowledge") +
geom_jitter(aes(colour = Conflict)) +
facet_grid(vars(Conflict)) + #grouping scores by conflict condition
geom_smooth(method = "lm") +
easy_center_title() +
easy_remove_legend()
print(certainty_development_scatterplot)
I struggle to grasp how those exposed to conflicting headlines, alongside having stronger negative beliefs, would demonstrate more sophisticated beliefs that scientific knowledge is uncertain/constantly developing (this was proposed by experimenters). Results attained from the following plot indicate that, in the Conflict condition, the mistrust of expertise of participants was significantly and negatively correlated to the belief that science is uncertain (r = -.22, p < 0.01).
certainty_development_scatterplot <- ggscatter(
mydata2, x = "certainty", y = "mistrust",
point = FALSE,
title = "Mistrust of Expertise and Certainty of Scientific Knowledge, grouped by Conflict",
ylab = "Certainty of Scientific Knowledge", xlab = "Mistrust of Expertise") +
geom_jitter(aes(colour = Conflict)) +
facet_grid(vars(Conflict)) +
geom_smooth(method = "lm") +
stat_cor(p.accuracy = 0.001, r.accuracy = 0.01) +
easy_center_title() +
easy_remove_legend()
print(certainty_development_scatterplot)
It thus seems much more the case that those with negative beliefs about science as a result of reading conflicting headlines would demonstrate lower sophisticated beliefs in the uncertainty and development of science. After all, those lacking this understanding would be more inclined reject changes in theories that were once supported, thus creating confusion and animosity towards scientific consensus.
Data is much easier to understand, and plots/statistics effectively reproducible, if variables and their values are clearly labelled. In the current datasets by Haigh & Birch (2020), it is apparent that, in more instances than one, variable names and levels needed to be changed to more logical ones for the sake of plotting and understanding the code in general. For example, Recall_score/Condition/confidence variables were previously named SC0/FL_12_DO/GSS, and the Gender variable consisted of levels 1 and 2 to specify males and females that needed to be manually changed. However, none of these variables were stated explicitly in annotations to code nor were Males/females clearly labelled as 1 or 2 in the data set, so it was a matter of understanding the operations of the chunks of code or going right back to the source to see the numbered options for male and female in the survey document. This is not always common knowledge to coders especially novice ones.
The authors could have improved their code descriptions or gone further to produce a codebook, which is a document that provides an overview variables in a dataset to describe their purpose and levels. It is also apparent in my attempts to decipher ‘Condition’ that some variables had random characters that weren’t easy to interpret (it contained original inputs in the format of “Block_1_Generic_Conflict” etc.). Thus, it is important that researchers ensure that their variable names follow the same language as the article, whether that requires setting more variables to individually specify Format/Conflict or explicitly stating in a codebook what must be done to separate the variable. Jeff Leek and colleagues provide great tips on ‘How to share data with a statistician’ (https://github.com/jtleek/datasharing). Professor Benjamin Le also provides guidance on what makes research transparent in open science, and delves into specific codebook practices that will assist in the reproducibility of data.
Leading on from my previous point, code should be well annotated to ensure that those who are unfamiliar to code are still able to reproduce the published results. This requires explaining what role a chunk of code has in the analysis, specifying the lines of code, and detailing the function of certain packages and informing what order in which to run scripts (if required). Essentially, other coders will then know what sections produce which analysis in the article and the properties of the code essential to aspects of the figures. In the current r scripts provided by the authors this annotation would be particularly useful for their pirate plots in Figures 1 and 3, as only the code was shown in the script which made me, a new coder with previous knowledge of only ggplot, hesitant to use the code.
It is also essential that all analyses described in the article is included in the script. I was uncertain by the wording of the article whether the researchers applied unplanned exclusions to experiment 2 as they did in experiment 1 (where duplicate Prolific IDs were detected). Their exp 2 script also did not include code applying the exclusion criteria, despite my finding that there were 3 duplicate IDs in the dataset. I removed the duplicates anyways and correctly arrived at a final sample, I also ran the analysis without removing them and arrived at the same N. It may have been the case that the later planned exclusion critera filtered these observations out anyways, but some detail regarding this would have definitely been helpful. A good example of well-annotated code was conducted by Weston and Jackson (2018). You can find their materials here: https://osf.io/s92i4
Coders mustn’t underestimate the importance of an organised folder of files when publishing to OSF. It is extremely important that external coders know where to source the right data for a particular analysis as well as was scripts are necessary for said analysis. Unfortunately, the author’s materials and data provided on OSF were not at all consistent between experiment 1 and 2. Not only were exp 1 and 2 labelled study 8 and 7 (which was highly annoying), files were missing from experiment 1. While an rmd and r script was provided for the second experiment, only an r script was provided for exp 1. Consistency is much more appreciated by external coders.
Also, a final tidied dataset was provided for study 2 (‘Study 7 data (tidied & with exclusion criteria applied N=400).csv’) but not study 1. It would be more helpful to keep this consistent as many coders might want to use the final datasets rather than conducting the filtering of data/creating a subset of variables themselves. Having access to a tidied file is also useful for crosschecking to ensure that exclusion criteria or mutations of data were applied correctly (there were several instances where I wanted to check my coding along the way but couldn’t). You can find an awesome tutorial on project directory structure ‘Transition your workflow, keep your sanity’ by Danielle Navarro via this link https://slides.com/djnavarro/workflow#/20.
The end :)