Do Generic Attributions to ‘Scientists’ and ‘Experts’ Amplify Perceived Conflict?

Might be easier to read in .html format on: https://rpubs.com/andersonly333/assignment_verification

Context

The paper was about how the constant conflicting information found in news headlines as well as their wording can have a strong effect on the reader’s interpretation. Specifically, one of the main factors that can cause confusion about and increase perceived conflict between experts/scientists is the tendency to generically attribute the headline claim to “Researchers”, “Experts” or “Scientists”. However, the researchers predicted that by including a qualifier word (“some” in “some researchers/experts”) could trigger a scalar inference (“some but not all”) and reduce the perceived confusion and backlash. Thus, using either conflicting and non-conflicting headlines about nutrition, the study investigated whether qualified (“some”) or generic (“all”) headlines were made less contradicting and less confusing (experiment 1). It also investigated whether the inclusion of a qualifier could reduce the conflicting headlines’ impact on overall beliefs about nutrition and the scientific community (experiment 2). The results showed that, compared to non-conflicting headlines, conflicting headlines increased perceived conflict, confusion and advancement of scientific knowledge but did not affect more global beliefs about nutrition and the scientific community. Lastly, the qualifiers had no significant impact on any of these measures.

Verification

Set Up

Install Packages (if not already installed)

# install.packages("tidyverse")
# install.packages("qualtRics")
# install.packages("yarrr")
# install.packages("here")
# install.packages("grid")
# install.packages("gridExtra")
# install.packages("broom")

Load Packages

First, we load the external packages that we will use while also giving them some background and justifying their uses.

The researchers’ used 17 packages and didn’t really explain them in much detail. We tried to limit the no. of external packages to only the essentials.

The packages are:

tidyverse

Collection of useful R packages particularly:
- ggplot2 for plotting and creating graphics
- dplyr for general data manipulation
- tidyr for tidying data and the separate() function which allows us to split one column into multiple by using “_” as a delimiter

qualtRics

Reads and assigns correct class to the .csv data generated from .qsf (qualtrics survey file) where metadata often leaks into rows 2 and 3 causing all variables to be read as text rather than numeric
Images below show the result of using read_csv() from readr

yarr (we refer to this as “the pirate package”)

The researchers use this to generate pirate plots - a plot that combines violin, mean/95 CI boxplot and jitter geoms
We used ggplot2 instead but will still use this package to compare our method to the researchers’

here

Quality of life; Makes it easier to navigate files
set file location with i_am() - here() is a replacement for file.path() that starts from file location

grid & gridExtra

Both help format and combine multiple plots into one image
rectGrob() from grid
grid.arrange() from gridExtra
Used for our pirate plots

broom

Used to tabulate statistical results in a tidy way i.e. results of 95% CIs
Not actually used in our final version; was used in one of our old methods

library(tidyverse) 
library(qualtRics) 
library(yarrr)  
library(here)  
library(grid)
library(gridExtra)
library(broom)

Set File Location using i_am()

here::i_am("Assignment_Draft.Rmd")

Experiment 1

Read the Data

read_survey() from qualtRics package reads and assigns correct classes of variables in .csv data generated from .qsf (qualtrics survey file) solving the issue mentioned in the Load Packages section.

data_study_1_all = read_survey(here("data", "Study 8 data.csv"))

CONVERT data_study_1_all to a data.frame.

data_study_1_all <- as.data.frame(data_study_1_all)
print(data_study_1_all)

Count survey opens and consent

For most of this verification, we will opt for a pipe operator structure with tidyverse functions compared to the base R functions used by the researchers. This will apply to most of this verification as we are more familiar with them and find them easier to read and understand.

Before we make any changes to the df, we use dplyr to filter() the dataframe (df) for rows where progress >= 1 and count() them. This counts the total number of times the survey was opened. Compare this to the researchers’ base R method: “length(which(data_study_1_all$Progress>=1))”.

Next, we filter() the df for participants who clicked the consent button (consent == 1) and count() them.

According to the article,

no. of times survey was opened = 371
no. of participants that consented (from code not article) = 363

# Count times the survey was opened (i.e. anyone with progress >= 1%)
data_study_1_all %>% 
  filter(Progress >= 1) %>% 
  count()

##     n
## 1 371

# Count no. of participants who consented (before unplanned/planned exclusions)
data_study_1_all %>% 
  filter(Consent == 1) %>% 
  count()

##     n
## 1 363

This matches the researchers’ data with 371 surveys opened in the df and 363 participants clicking the consent button.

Tidy new data frame for experiment 1

We now create a new df, data_study_1, that is distinct from the raw unedited one, data_study_1_all. This df will be adapted to fit the needs of the analysis and the plots.

The four stages of tidying our data for both experiments will be:

renaming columns
exclusions: unplanned (exp 1 only) and planned
separating the condition column
summing scores

1. Rename columns

The first adaptation is renaming the confusing column names using rename() from dplyr.

data_study_1 <- data_study_1_all %>% 
  rename(
    recall_score = SC0,
    condition = FL_10_DO 
  )

2. Exclusions

Unplanned Exclusions

Now, we have to to do some unplanned exclusions because some of the participants received an error message on study completion and decided to complete the study a second time.

These participants can be identified and their second attempts can be excluded using the “Prolific_PID” variable

Once again dplyr was used to group rows by Prolific_Id so they can be identified and counted.

We then use dplyr distinct() so that these repeated values are filtered.

According to the article, there were 59 unplanned exclusions leaving 312 participants left.

# List out the duplicated prolific IDs.
data_study_1 %>% 
  group_by(Prolific_PID) %>% 
  summarise(n = n()) %>% # Counts how many times each PID occurs in data
  filter(n > 1) %>%  # Filter ones that show up more than once
  ungroup() # Prevents future tibbles from being grouped by PID

## # A tibble: 59 × 2
##    Prolific_PID              n
##    <chr>                 <int>
##  1 truncated559ab96cfdf9     2
##  2 truncated57eee744e627     2
##  3 truncated584bb2b8bd87     2
##  4 truncated5b570f4cc146     2
##  5 truncated5be1cc598c6a     2
##  6 truncated5c28b31a0091     2
##  7 truncated5c3d48955fd1     2
##  8 truncated5c62d8e2a341     2
##  9 truncated5c72de7b96a9     2
## 10 truncated5c76f2d92819     2
## # ℹ 49 more rows

# Remove rows from duplicated IDs using distinct()
data_study_1 <- data_study_1 %>% 
  distinct(Prolific_PID, .keep_all = TRUE ) # .keep_all preserves other columns

# Count how many rows are left
data_study_1 %>% 
  filter(Progress >= 1) %>% 
  count()

##     n
## 1 312

This matches the researchers’ data with 59 unplanned exclusions, leaving 312 participants.

Planned Exclusions

Now, we need to exclude rows where participants did not meet preregistered inclusion criteria.

Create new data frame with dplyr filter() and select() so that it abides these conditions:

only includes participants who finished the study (Finished==1), declared that they answered seriously (seriousness_check==1) AND scored 4 or above on recall - i.e. excluding non-completers, non-serious responses and those who did not recall 4 or more items
only includes relevant columns required for analysis

We then count() the no. of participants remaining.

According to the article, after all these exclusions the final sample should consist of 294 participants.

data_study_1 <- data_study_1 %>% 
  filter(
    Finished == 1 & 
    Serious_check == 1 & 
    recall_score >= 4
) %>% 
  select(
    Finished, 
    `Duration (in seconds)`, 
    Gender, 
    Age, 
    Serious_check, 
    recall_score, 
    condition, 
    contradiction_1:advancement
  )

#Count how many participants remain after exclusions
data_study_1 %>% 
  count()

##     n
## 1 294

This matches the article. Thus, the final sample size of experiment 1 is 294.

3. separate() ‘condition’ column

We now need to create separate columns to identify the four levels of the IV. The values of the ‘condition’ column are currently written like this:

“Block_1_Generic_Conflict”

The separate() function from tidyr splits it into four separate columns using underscore as the delimiter so that we get four columns:

‘block’ is a meaningless column
‘number’ is meaningless (until we found a use for it later when plotting)
- Corresponds to which of the four conditions a row belongs to
‘Format’ and ‘Conflict’ are the two IVs, each with two levels
- Format: Generic/Qualified
- Conflict: Conflict/Consistent

I renamed Conflict and Consistent to Conf. and Non-Conf. respectively just like the researchers. This is something that is not part of my group’s verification but to recreate the researchers’pirate plot as a comparison later on, I need to do this here.

We then convert the two IV columns from character to factor class using as.factor and the extractor operator, $, which i use to select a particular variable in the df.

data_study_1 <- data_study_1 %>% separate(
    col = "condition",
    into = c("block", "number", "Format", "Conflict"))

# Rename so you don't have to rename in pirate plots
# (Only relevant for researchers' plot/Doesn't affect our plot)
data_study_1 <- data_study_1 %>%
  mutate(Conflict = case_when(
   Conflict == "Conflict" ~ "Conf.",
   Conflict == "Consistent" ~ "Non-Conf."
  ))

#set these new IV columns as factors 
data_study_1$Format <- as.factor(data_study_1$Format)
data_study_1$Conflict <- as.factor(data_study_1$Conflict)
levels(data_study_1$Format)

## [1] "Generic"   "Qualified"

levels(data_study_1$Conflict)

## [1] "Conf."     "Non-Conf."

4. Sum scores

For the last step of tidying the df, we mutate() a new column called “contradiction” that is the row sum of all the columns that start with “contradiction_” which are the six contradiction scores in the df.

We also use glimpse() to see what the final tidied df looks like.

# create new variable that is the row sum of the six contradiction ratings 
data_study_1 <- data_study_1 %>% 
 mutate(contradiction = rowSums(across(starts_with("contradiction_"))))

# final df of experiment 1 can be written to .csv using this
# write.csv(data_study_1, file = "data_study_1Tidied.csv")

# Final look at final df with variables of interest
glimpse(data_study_1)

## Rows: 294
## Columns: 19
## $ Finished                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ `Duration (in seconds)` <dbl> 428, 364, 384, 632, 430, 543, 363, 354, 377, 4…
## $ Gender                  <dbl> 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2…
## $ Age                     <dbl> 50, 23, 28, 24, 26, 27, 19, 21, 19, 57, 38, 47…
## $ Serious_check           <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ recall_score            <dbl> 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 6, 7, 7, 7…
## $ block                   <chr> "Block", "Block", "Block", "Block", "Block", "…
## $ number                  <chr> "3", "3", "2", "2", "1", "4", "2", "1", "3", "…
## $ Format                  <fct> Qualified, Qualified, Generic, Generic, Generi…
## $ Conflict                <fct> Conf., Conf., Non-Conf., Non-Conf., Conf., Non…
## $ contradiction_1         <dbl> 5, 5, 1, 3, 4, 1, 1, 4, 4, 5, 1, 2, 2, 1, 1, 5…
## $ contradiction_2         <dbl> 5, 5, 2, 4, 4, 3, 1, 4, 5, 4, 2, 3, 2, 4, 1, 5…
## $ contradiction_3         <dbl> 4, 5, 1, 2, 4, 1, 5, 2, 2, 5, 1, 2, 2, 2, 1, 5…
## $ contradiction_4         <dbl> 4, 5, 1, 2, 4, 4, 1, 4, 3, 3, 4, 3, 2, 3, 4, 4…
## $ contradiction_5         <dbl> 4, 5, 4, 2, 4, 1, 1, 3, 2, 5, 1, 2, 2, 1, 3, 5…
## $ contradiction_6         <dbl> 5, 5, 3, 1, 4, 1, 1, 4, 5, 3, 1, 2, 4, 2, 1, 5…
## $ confusion               <dbl> 4, 5, 1, 4, 4, 2, 5, 4, 5, 5, 1, 3, 5, 2, 2, 5…
## $ advancement             <dbl> -1, 0, 1, -1, 0, 0, -1, 0, -1, -1, 0, 0, -1, 0…
## $ contradiction           <dbl> 27, 30, 12, 14, 24, 11, 10, 21, 21, 25, 10, 14…

This is much concise than our old method:

data_study_1 <- data_study_1 %>% 
  mutate(contradiction1 = 
    contradiction_1 + 
    contradiction_2 + 
    contradiction_3 +
    contradiction_4 + 
    contradiction_5 + 
    contradiction_6
  )

Demographics

Participant Age and Gender

The researchers’ demographic age and gender data for experiment 1 is as follows:

126 males, 168 females (sums to 294 = final sample size)
Minimum Age = 18, Maximum Age = 69
Mean Age = 34.29, SD Age = 12.97

We can verify the age demographic data by using dplyr summarise() again to calculate the mean, standard deviation, minimum and maximum of the Age variable. These are all conveniently placed into a df called study_1_age.

For gender, we had to read through the researchers’ code to understand what gender response each number value in the Gender column represented. With this in mind, we use mutate() with case_when() so that values of 1, 2, 3, 4 in the Gender column are converted to Male, Female, Other and Prefer_not_say respectively.

We then use dplyr group_by(Gender) and summarise(n()) in order to create a new df with the count of each gender. However, we find out that there are no Other or Prefer_not_say responses in the df but feel that it is important to include that information in the final demographic df. So, we use add_row() with “count” arguments of 0 in order to include them.

# Age
study_1_age <- data_study_1 %>% 
  summarise(
    mean_age = round(mean(Age), 2), # Rounded to 2 dp
    sd_age = round(sd(Age), 2),
    min_age = min(Age),
    max_age = max(Age)
  )

# Gender
data_study_1 <- data_study_1 %>% 
  mutate(Gender = case_when(
   Gender == "1" ~ "Male", 
   Gender == "2" ~ "Female",
   Gender == "3" ~ "Other",
   Gender == "4" ~ "Prefer_not_say" # Gender coded as 1 to 4 in data
  ))

# Count gender 
study_1_gender <- data_study_1 %>%
  group_by(Gender) %>% 
  summarise(Count = n()) %>% 
  ungroup() 

print(study_1_gender)

## # A tibble: 2 × 2
##   Gender Count
##   <chr>  <int>
## 1 Female   168
## 2 Male     126

# We still want to include a count of zero for "Other" or "Prefer_not_say"
study_1_gender <- study_1_gender %>% 
  add_row(
    Gender = c("Other", "Prefer not to say"), 
    Count = c(0, 0)
  ) 

# Print all demographics
print(study_1_age)

##   mean_age sd_age min_age max_age
## 1    34.29  12.97      18      69

print(study_1_gender)

## # A tibble: 4 × 2
##   Gender            Count
##   <chr>             <dbl>
## 1 Female              168
## 2 Male                126
## 3 Other                 0
## 4 Prefer not to say     0

# 168 + 126 = 294 = Final Sample Size

Matches the article.

Mean Completion Time

We calculate the mean completion time by using the dplyr summarise() function to calculate the mean of the ‘Duration (in seconds)’ variable across all participants. This mean is placed into a df. We then use mutate() to form a new column in this df that divides that mean value by 60 in order to convert it into minutes.

According to the article, mean completion time should be 8.48 minutes.

data_study_1 %>% 
  summarise(mean_completion_time_sec = mean(`Duration (in seconds)`)) %>%
  mutate(mean_completion_time_min = mean_completion_time_sec/60)

##   mean_completion_time_sec mean_completion_time_min
## 1                 508.8197                 8.480329

Matches the article.

Pirate plots / CI Descriptives

For this section, we thought that relying on the researchers’ method of using the pirate package to recreat the figure above felt a bit like “cheating”. It also felt really bad because we weren’t familiar with its function/arguments. Instead we attempted to recreate the pirate plots without the pirate package and using ggplot2 (something that we were familiar with). We initially thought it would be easy because a pirate plot is just three geoms: violin, jitter and box. At least that it what we initially thought.

Our long coding journey

Below we discuss the three major issues we encountered across our coding journey (with some images and past iterations of code):

Issue 1: CI Calculation

The pirate package dumbs down the process of obtaining and displaying the mean and 95% CI for each variable. We found using base R and ggplot to do this tedious. Our original method consisted of 12 separate CI tests using t.test() for each condition & variable (4 * 3 = 12 in total). This was very long winded but was accurate. Secondly, we tried calculating the CI’s manually with dplyr functions in a piped structure but this provided inaccurate CIs. Thirdly, we found a much more concise method using a combination of both base R, t.test(), and dplyr functions, group_by() and summarise(). Using these results, we mapped the results of the CI to the hinges using the lower, middle and upper arguments of the box geom. However, we finally found out that we could just use statsummary() and mean_cl_normal from ggplot2 to plot these CIs. However, the exact values of each 95 CI will still be calculated later on and will be important in our comparison section.

The following is how our method of calculating 95 CIs and 95 CI box plots evolved over time. We only use the contradiction measure for our examples and set eval=FALSE so that these chunks don’t run/give output.

This was our first, long, tedious method of calculating CIs:

# Contradiction
contradiction_t1 <- data_study_1 %>%
  subset(number == 1) %>% 
  t.test(.$contradiction, conf.level = 0.95, data = .) # '.' is what's piped

contradiction_t2 <- data_study_1 %>%
  subset(number == 2) %>% 
  t.test(.$contradiction, conf.level = 0.95, data = .)

contradiction_t3 <- data_study_1 %>%
  subset(number == 3) %>% 
  t.test(.$contradiction, conf.level = 0.95, data = .)

contradiction_t4 <- data_study_1 %>%
  subset(number == 4) %>% 
  t.test(.$contradiction, conf.level = 0.95, data = .)

# map_df() is from purrr/tidyverse and tidy argument is from broom.
# Both work to create neat dataframe containing CI results 
contradiction_tab <- map_df(
  list(contradiction_t1, contradiction_t2, contradiction_t3, contradiction_t4),         tidy)

glimpse(contradiction_tab)

# Select relevant columns
contradiction_tab[c("estimate", "statistic", "p.value", "conf.low", "conf.high")]

This was our second, flawed, mathematical method of calculating CIs:

# Contradiction
contradiction_tab <- data_study_1 %>%
  group_by(number) %>%
  mutate(
    n = n()-1,
    mean = mean(contradiction),
    SD = sd(contradiction),
    SE = SD/sqrt(n),
    tc = qt(p=.05, df=n-1, lower.tail=F),
    ub = mean + tc * SE,
    lb = mean - tc * SE,
    ) %>%
  select(n, mean, SD, SE, tc, ub, lb) %>%
  distinct(number, .keep_all = TRUE) %>%
  arrange(number) %>%
  ungroup()

print(contradiction_tab)

Third (okay) method of calculating CIs:

contradiction_tab <- data_study_1 %>% 
  group_by(number) %>% 
  summarise(tidy(t.test(contradiction))) %>% 
  select(estimate, statistic, p.value, conf.low, conf.high) %>% 
  ungroup()
print(contradiction_tab)

# Mapped into boxplot geom later
geom_boxplot(
  ymax = contradiction_tab$conf.high, # Using CI results
  ymin = contradiction_tab$conf.low,
  upper = contradiction_tab$conf.high,
  lower = contradiction_tab$conf.low,
  middle = contradiction_tab$estimate,
  coef = 0, # Removes whiskers
  width = 0.5,
  outliers = FALSE,
  alpha = 0.7,
  linewidth = 1.1
) +

Final, Best Method: We simply use stat_summary and mean_cl_normal from ggplot2. This calculates means and CIs and goes straight into plot as a crossbar geom (has no whiskers, worked better than boxplot)

# We will use this in our pirate plots soon
stat_summary( 
    fun.data = "mean_cl_normal",  
    geom = "crossbar", 
    fill = "white", 
    alpha = .7,
    width = 0.5,
    linewidth = 1.1
  ) +
  
# We can later see the means and CIs like this:
contradiction_tab <- data_study_1 %>%
  group_by(Format,Conflict) %>%
  summarise(ci = mean_cl_normal(contradiction)) %>%
  unnest(cols = c(ci)) %>% # Expands out the nested values in 'ci'
  rename(Mean=y, Lower=ymin, Upper=ymax) # Rename the new columns
print(contradiction_tab)

Issue 2: Secondary Labels

The pirate plots have very specific placements of labels:

The secondary ‘Conflict’ and ‘Format’ labels on the left were difficult to recreate. We tried to replicate this at first using facet_wrap but this didn’t create the desired effect. I thought of the unconventional approach of creating a separate dataframe with the name of the label (e.g. ‘Conflict’) and, by eye, plotting it on an x y value that matched the location under the x axis of the plot using a text geom as well as using coord_cartesian(clip=“off”). This overrides the clipping of the cartesian plane so that the graph does not stretch the limits to include the text geom within the plot. However, I later found an an even easier method by using the annotate() function.

This code chunk below shows my original method:

# Original Method
ann_text_conflict <- data.frame(label = "Conflict")
ann_text_format <- data.frame(label = "Format")

  # This part was in the plot code later on
geom_text(data = ann_text_format, mapping = aes(x=0.5,y=-1.55, label = label), size = 3) +
geom_text(data = ann_text_format, mapping = aes(x=0.5,y=-2.2, label = label), size = 3) +
coord_cartesian( 
  ylim=c(0,30),
  clip="off"
) +

This shows our final method and the one I will use in my plot:

annotate(geom = "text", x = 0.5, y = -1.55, size = 3, label = "Conflict") + 
annotate(geom = "text", x = 0.5, y = -2.2, size = 3, label = "Format") +
coord_cartesian(
  ylim=c(0,30),
  clip="off"
) +

Issue 3: x axis variable

Finally, the plotting of two x variables (‘Format’ vs ‘Conflict’) was difficult because usually to achieve this you would facet_wrap the ‘Format’ variable but that causes even more issues (e.g. text geom labels double up, panel border shows up in middle). Instead I used the ‘number’ variable, that I initially thought was meaningless, to plot 4 distinct factors. However, the only thing in our entire plot that we could not replicate was the horizontal spacing between the left two ‘Generic’ geoms and the right two ‘Qualified’ geoms. But considering there are labels under that specify which of the two Formats they belong to, it should not affect the interpretation of the plot.

Pirate Plots (non-pirate package/our version)

To create our pirate plots, we use ggplot2 to plot the three main geoms violin, jitter and the mean/CI crossbar with x = the condition number, y = the measure and fill = the two conflict conditions. Then, we add the secondary labels (discussed before). We also, by eye, match a lot of the aesthetics of the original plot by adjusting the geom’s arguments, adding the titles/labels and also the overall formatting/theme of the plot. This is mostly just getting rid of the default grey background and adding in a black border. We do this for the three plots inside the figure.

At the end, we use grid.arrange to combine the plots and save the final plot using .png and dev.off as it allows us to resize the entire figure to match the researchers’.

Also note that there were so many iterations of pirate plot below but due to length of each plot’s code, I decided not to show our coding journey through an iterative structure but by discussing our main issues above.

Contradiction:

The only things that we have to alter between plots are the y variable, violin widths, secondary label coords, limits, major break sizes and the inclusion of minor breaks (this will come into play in the next section).

contradiction_plot <- ggplot(
  data = data_study_1,
  mapping = aes(
    x = number, 
    y = contradiction,
    fill = Conflict
  )
) +
geom_violin(
  width = 0.4, # Violin widths differ across plots
  linewidth = 0.9 
) +
geom_jitter(
  stroke = 0, # Jitter arguments are the same across pirate plots
  size = 1.5, 
  alpha = 0.3, 
  position = position_jitter(
    height = 0, 
    width = 0.04
  )
) +
stat_summary( 
    fun.data = "mean_cl_normal", # Calculate mean + 95 CI
    geom = "crossbar", # Display it through crossbar geom
    fill = "white", # crossbar arguments are the same across pirate plots
    alpha = .7,
    width = 0.5,
    linewidth = 1.1
  ) +
# 'Conflict' and 'Format' labels on left side
annotate(geom = "text", x = 0.5, y = -1.55, size = 3, label = "Conflict") + 
annotate(geom = "text", x = 0.5, y = -2.2, size = 3, label = "Format") +
coord_cartesian( 
  ylim=c(0,30), # Set y limits
  clip="off" # Override clipping
) +
labs(
  title = "Contradiction", 
  x = "Generic                                           Qualified", 
# White space added to create illusion of two Format labels 
  y = "Perceived Contradiction"
) +
theme( # All the aesthetics are matched by eye to the original
  plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
  axis.text.y = element_text(angle = 90, hjust=0.5), # Rotate y axis label
  panel.grid.minor = element_blank(), 
  panel.background = element_rect(fill = NA),
  panel.grid.major.x = element_blank(), # ^ Remove background/most grid lines
  panel.grid.major.y = element_line( # Keep major horizontal grid lines
    linewidth = 0.3,
    color="gray"
  ),
  panel.border = element_rect( # Black border around outside of plot
    fill = NA,
    colour = "black",
    linewidth = 0.5,
  ),
  legend.position = "none" # Hide legend
) +
scale_x_discrete(labels = c("Conf.", "Non-Conf.", "Conf.", "Non-Conf.")) +
scale_y_continuous(breaks = seq(0, 30, by = 5), # Adjust y axis major breaks
                   expand = expansion(add = c(1,1)) # Expansion beyond y limit
) +
scale_fill_manual(values = c( # Remember fill = Conflict , so
  "slategray2", # Blue = Conflicting Headline
  "lightpink1" # Red = Non-Conflicting
))

plot(contradiction_plot)

# These will be combined into final .png later so they don't look perfect yet

Advancement:

Repeat

advancement_plot <- ggplot(
  data = data_study_1,
  mapping = aes(
    x = number, 
    y = advancement,
    fill = Conflict
  )
) +
geom_violin(
  width = 1,
  linewidth = 0.9
) +
geom_jitter(
  stroke = 0,
  size = 1.5, 
  alpha = 0.3, 
  position = position_jitter(
    height = 0, 
    width = 0.04
  )
) +
stat_summary( 
    fun.data = "mean_cl_normal",  
    geom = "crossbar", 
    fill = "white", 
    alpha = .7,
    width = 0.5,
    linewidth = 1.1
  ) +
annotate(geom = "text", x = 0.5, y = -1.14, size = 3, label = "Conflict") + 
annotate(geom = "text", x = 0.5, y = -1.185, size = 3, label = "Format") +
coord_cartesian(
  ylim=c(-1,1),
  clip="off"
) +
labs(
  title = "Advancement", 
  x = "Generic                                           Qualified", 
  y = "Perceived Scientific Advancement"
) +
theme(
  plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
  axis.text.y = element_text(angle = 90, hjust=0.5),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = NA),
  panel.grid.major.x = element_blank(),
  panel.grid.major.y = element_line(
    linewidth = 0.3, 
    color="gray"
  ),
  panel.grid.minor.y = element_line(
    linewidth = 0.3, 
    color="gray"
  ),
  panel.border = element_rect(
    fill = NA, 
    colour = "black", 
    linewidth = 0.5,
  ),
  legend.position = "none"
) +
scale_x_discrete(labels = c("Conf.", "Non-Conf.", "Conf.", "Non-Conf.")) +
scale_y_continuous(breaks = seq(-1, 1, by = 1),
                   minor_breaks = seq(-1, 1, by = 0.2), # Adjust minor breaks
                   expand = expansion(add = c(0.1,0.1))
) +
scale_fill_manual(values = c(
  "slategray2", 
  "lightpink1"
))

plot(advancement_plot)

Confusion:

Repeat

confusion_plot <- ggplot(
  data = data_study_1,
  mapping = aes(
    x = number, 
    y = confusion,
    fill = Conflict
  )
) +
geom_violin(
  width = 1.1,
  linewidth = 0.9
) +
geom_jitter(
  stroke = 0,
  size = 1.5, 
  alpha = 0.3, 
  position = position_jitter(
    height = 0, 
    width = 0.04
  )
) +
stat_summary( 
    fun.data = "mean_cl_normal",  
    geom = "crossbar", 
    fill = "white", 
    alpha = .7,
    width = 0.5,
    linewidth = 1.1
  ) +
annotate(geom = "text", x = 0.5, y = 0.725, size = 3, label = "Conflict") + 
annotate(geom = "text", x = 0.5, y = 0.62, size = 3, label = "Format") +
coord_cartesian(
  ylim=c(1,5),
  clip="off"
) +
labs(
  title = "Confusion", 
  x = "Generic                                           Qualified", 
  y = "Perceived Confusion"
) +
theme(
  plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
  axis.text.y = element_text(angle = 90, hjust=0.5),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = NA),
  panel.grid.major.x = element_blank(),
  panel.grid.major.y = element_line(
    linewidth = 0.3, 
    color="gray"
  ),
  panel.grid.minor.y = element_line(
    linewidth = 0.3, 
    color="gray"
  ),
  panel.border = element_rect(
    fill = NA, 
    colour = "black", 
    linewidth = 0.5,
  ),
  legend.position = "none"
) +
scale_x_discrete(labels = c("Conf.", "Non-Conf.", "Conf.", "Non-Conf.")) +
scale_y_continuous(breaks = seq(1, 5, by = 1),
                   minor_breaks = seq(1, 5, by = 0.5),
                   expand = expansion(add = c(0.2,0.2))
) +
scale_fill_manual(values = c(
  "slategray2", 
  "lightpink1"
))

plot(confusion_plot)

Combine plots:

Finally, we combine the three pirate plots together using grid.arrange from gridExtra. Exporting this to .png ends up with ugly grey rectangle in bottom right quadrant so we create a white graphical object with rectGrob() from grid to fill that space in.

We use png() and dev.off() to create an image with the desired dimensions. The size of figure’s image in article is 793x1121 so that is what we use as reference for the heiight and width of the png.

We also tried ggsave() initially but it was problematic and caused clipping issues.

# Combine
pirate_all <- grid.arrange(
  contradiction_plot, advancement_plot, confusion_plot, 
  ncol = 2, nrow = 2
)

# Create white graphical object for empty quadrant of figure 
whitegrob <- rectGrob(gp = gpar( # Graphical Parameters
  fill= "white",
  col = "white" # col is the border colour
  )
)

# Combine with white grob
pirate_all <- grid.arrange(
  contradiction_plot, advancement_plot, confusion_plot, whitegrob, 
  ncol = 2, nrow = 2
)

# This figure shows why we need to resize

# Save resized image of plot
png(filename = "pirate_all.png", width = 793, height = 1121)
plot(pirate_all) # Final plot is the png saved after this
dev.off()

## quartz_off_screen 
##                 2

This is a good replication of the article’s plot but the code behind it is REALLY long (~300 lines) but we found a solution to this.

Functions

Useful, helped us learn to use functions: https://bookdown.org/ndphillips/YaRrr/a-worked-example-plot-advanced.html

Rewriting the plot we did as a function (pirate.fun) saves us a lot of time and stops us from repeating ourselves too much (DRY principle). The body of pirate.fun is essentially the same as our previous pirate plots but with the arguments that we need to manually change in each of our now parameterised. We also “mathematically” standardise some arguments so that we don’t have to manually parameterise more arguments. For example:

The y axis expansion value has been standardised into (lim2-lim1)/30. This causes all plots created using this function to look like they have the same amount of expansion.
Furthermore, the placement of the secondary ‘Conflict’ and ‘Format’ labels are standardised using the two lim parameters. E.g. for the Conflict label it is placed at y = (lim1 - 1.55*((lim2-lim1)/30)). This should set the location at standardised y under the x axis across the plots.

I should note that this function is specific to our use/our data. For example, the fill argument uses Conflict as an hard coded argument which is specific to our data.

pirate.fun(): Create a pirate plot

Description

Creates a pirate plot with customisable labels, limits, breaks, violin width, etc.

Input:

data: what dataframe are we using
y_var: what measure is being plotted on the y axis
plot_title: title of plot
y_title: label title of y axis
lim1: lower limit of y axis
lim2: upper limit of y axis
break_size: how big each major break is (in y axis value)
add.minor.break: true if you want minor breaks, false if you don’t
- Uses a Boolean expression so that it can be “switched” on and off: “if(add.minor.break){} else{}”
no.minor.breaks: desired number of minor breaks between major breaks
violin_width: how wide the violin geom is

Output:

pirate plot graph with:

Jitter showing raw data
Violin showing distribution
Crossbar showing mean and 95 CIs

Example:

See below

Packages:

ggplot2 for data visualisation

The Pirate Plot Function

pirate.fun <- function(data, y_var, plot_title, y_title, lim1, lim2, break_size, add.minor.break, no.minor.breaks = 0, violin_width) {
  
# Boolean for minor breaks
minor_break <- if (add.minor.break) {
    element_line(
      linewidth = 0.3,
      color = "gray"
    )
  } else {
    element_blank()  # Otherwise no lines
  }

# Plot
ggplot(
  data = data,
  mapping = aes(
    x = number, 
    y = {{y_var}}, # {{ }} indicates variable in dataset
    fill = Conflict
  )
) +
geom_violin(
  width = violin_width, # Violin width varies so it is a parameter
  linewidth = 0.9
) +
geom_jitter(
  stroke = 0,
  size = 1.5, 
  alpha = 0.3, 
  position = position_jitter(
    height = 0, 
    width = 0.04
  )
) +
stat_summary( 
    fun.data = "mean_cl_normal",  
    geom = "crossbar", 
    fill = "white",    
    alpha = .7,
    width = 0.5,
    linewidth = 1.1
  ) +
annotate( 
  geom = "text", 
  x = 0.5, 
  y = (lim1 - 1.55*((lim2-lim1)/30)), # location of labels are standardised
  size = 3, 
  label = "Conflict"
) +
annotate(
  geom = "text", 
  x = 0.5, 
  y = (lim1 - 2.55*((lim2-lim1)/30)), 
  size = 3, label = "Format"
) +
coord_cartesian(
  ylim=c(lim1, lim2),
  clip="off"
) +
labs(
  title = plot_title, 
  x = "Generic                                           Qualified", 
  y = y_title
) +
theme(
  plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
  axis.text.y = element_text(angle = 90, hjust=0.5),
  panel.background = element_rect(fill = NA),
  panel.grid.major.x = element_blank(),
  panel.grid.major.y = element_line(
    linewidth = 0.3, 
    color="gray"
  ),
  panel.border = element_rect(
    fill = NA, 
    colour = "black", 
    linewidth = 0.5,
  ),
  legend.position = "none",
  panel.grid.minor.y = minor_break
) +
scale_x_discrete(labels = c("Conf.", "Non-Conf.", "Conf.", "Non-Conf.")) +
scale_y_continuous(breaks = seq(lim1, lim2, by = break_size),
                   minor_breaks = seq(lim1, lim2, by = (1/no.minor.breaks)),
                   # So that y axis/divided desired amount of minor breaks
                   expand = expansion(mult = c(0,0),
                                      add = c(  # standardises expansion
                                        (lim2-lim1)/30, 
                                        (lim2-lim1)/30
))) +
scale_fill_manual(values = c(
  "slategray2", 
  "lightpink1"
)) 
  
}

Calling the function

We already know what each parameter value for each plot should be.

# Contradiction
contradiction_function_plot <- pirate.fun(
  data = data_study_1,
  y_var = contradiction, 
  plot_title = "Contradiction", 
  y_title = "Perceived Contradiction", 
  lim1 = 0, 
  lim2 = 30, 
  break_size = 5, 
  add.minor.break = FALSE,
  violin_width = 0.4
)
plot(contradiction_function_plot)

# Advancement
advancement_function_plot <- pirate.fun(
  data = data_study_1,
  y_var = advancement, 
  plot_title = "Advancement", 
  y_title = "Perceived Scientific Advancement", 
  lim1 = -1, 
  lim2 = 1, 
  break_size = 1, 
  add.minor.break = TRUE,
  no.minor.breaks = 5,
  violin_width = 1
)
plot(advancement_function_plot)

# Confusion
confusion_function_plot <- pirate.fun(
  data = data_study_1,
  y_var = confusion, 
  plot_title = "Confusion", 
  y_title = "Perceived Confusion", 
  lim1 = 1, 
  lim2 = 5, 
  break_size = 1, 
  add.minor.break = TRUE,
  no.minor.breaks = 2,
  violin_width = 1.1
)
plot(confusion_function_plot)

Combine plots

Same method

pirate_all_function <- grid.arrange(
  contradiction_function_plot, advancement_function_plot, 
  confusion_function_plot, whitegrob, 
  ncol = 2, nrow = 2
)

png(filename="pirate_all_function.png", width=793, height=1121)
pirate_all_function # Final plot is the png saved after this

## TableGrob (2 x 2) "arrange": 4 grobs
##   z     cells    name                grob
## 1 1 (1-1,1-1) arrange      gtable[layout]
## 2 2 (1-1,2-2) arrange      gtable[layout]
## 3 3 (2-2,1-1) arrange      gtable[layout]
## 4 4 (2-2,2-2) arrange rect[GRID.rect.354]

dev.off()

## quartz_off_screen 
##                 2

Creates the same plot in a much more succinct way (300 vs 50 lines)

Pirate Plots (pirate package/researchers’ version)

The following is researchers’ code to create their pirate plots. They use yarr::pirateplot() function.

We run it here so that:

we can check that it produces the same plot in the article and
we can use it as a comparison to our own plot

# Using dimensions found in original article 793×1121
png(filename="plot_pirate.png", width=793, height=1121)

# will place plots in a grid with 2 rows and 2 columns
par(mfrow = c(2, 2))

pirateplot(formula = contradiction ~ Conflict*Format, data = data_study_1,inf.method = 'ci', yaxt = "n", ylim =c(0,30), theme=1,main = "Contradiction", ylab = "Percieved Contradiction",cex.names = 0.75, cex.lab = 0.9)
axis(2, at = seq(from = 0, to = 30, by = 5))

pirateplot(formula = advancement ~ Conflict*Format, data = data_study_1,inf.method = 'ci', yaxt = "n", theme=1,main = "Advancement", ylab = "Percieved Scientific Advancement",cex.names = 0.75, cex.lab = 0.9)
axis(2, at = seq(from = -1, to = 1, by = 1))


pirateplot(formula = confusion ~ Conflict*Format, data = data_study_1,inf.method = 'ci', yaxt = "n", theme=1,main = "Confusion", ylab = "Percieved Confusion",cex.names = 0.75, cex.lab = 0.9)
axis(2, at = seq(from = 1, to = 5, by = 1))

#to set the grid back to default (just one plot) use the command below 
par(mfrow = c(1, 1))

dev.off()

## quartz_off_screen 
##                 2

This shows that:

Plot created by researchers’ code = Plot found in the article

We will now use the above image to compare it the plot created by our method.

Comparison/Sanity Check

As a final sanity check to test if these two versions of the pirate plot provide the same estimates of the mean and 95 CI limits, we need to calculate the actual values of the means and 95 CIs used in both plotting methods and compare them.

Our CI Descriptives

We can calculate the this by grouping by the Format and Conflict (the 4 conditions), summarising using mean_cl_normal from ggplot2. However, this produces nested mean, upper and lower values inside a column called “ci” but we want them in separate columns. We do this by using unnest() before renaming to things that make more sense.

# Contradiction CIs
contradiction_tab <- data_study_1 %>%
  group_by(Format,Conflict) %>%
  summarise(ci = mean_cl_normal(contradiction)) %>%
  unnest(cols = c(ci)) %>% # Expands out the nested values in 'ci'
  rename(Mean=y, Lower=ymin, Upper=ymax) # Rename the new columns

## `summarise()` has grouped output by 'Format'. You can override using the
## `.groups` argument.

print(contradiction_tab)

## # A tibble: 4 × 5
## # Groups:   Format [2]
##   Format    Conflict   Mean Lower Upper
##   <fct>     <fct>     <dbl> <dbl> <dbl>
## 1 Generic   Conf.      25.3  24.5  26.1
## 2 Generic   Non-Conf.  12.7  11.9  13.5
## 3 Qualified Conf.      25.2  24.3  26.1
## 4 Qualified Non-Conf.  14.2  13.3  15.1

# Advancement CIs
advancement_tab <- data_study_1 %>%
  group_by(Format,Conflict) %>%
  summarise(ci = mean_cl_normal(advancement)) %>%
  unnest(cols = c(ci)) %>% # Expands out the nested values in 'ci'
  rename(Mean=y, Lower=ymin, Upper=ymax) # Rename the new columns

## `summarise()` has grouped output by 'Format'. You can override using the
## `.groups` argument.

print(advancement_tab)

## # A tibble: 4 × 5
## # Groups:   Format [2]
##   Format    Conflict     Mean   Lower   Upper
##   <fct>     <fct>       <dbl>   <dbl>   <dbl>
## 1 Generic   Conf.     -0.183  -0.345  -0.0215
## 2 Generic   Non-Conf.  0.0959 -0.0605  0.252 
## 3 Qualified Conf.     -0.303  -0.442  -0.163 
## 4 Qualified Non-Conf. -0.0811 -0.233   0.0711

# Confusion CIs
confusion_tab <- data_study_1 %>%
  group_by(Format,Conflict) %>%
  summarise(ci = mean_cl_normal(confusion)) %>%
  unnest(cols = c(ci)) %>% # Expands out the nested values in 'ci'
  rename(Mean=y, Lower=ymin, Upper=ymax) # Rename the new columns

## `summarise()` has grouped output by 'Format'. You can override using the
## `.groups` argument.

print(confusion_tab)

## # A tibble: 4 × 5
## # Groups:   Format [2]
##   Format    Conflict   Mean Lower Upper
##   <fct>     <fct>     <dbl> <dbl> <dbl>
## 1 Generic   Conf.      4.58  4.45  4.70
## 2 Generic   Non-Conf.  3.51  3.25  3.76
## 3 Qualified Conf.      4.47  4.30  4.65
## 4 Qualified Non-Conf.  3.78  3.54  4.02

Researchers’ CI Descriptives

Using yarr::pirateplot() function and plot=FALSE.

# Contradiction
contradiction.pp <- pirateplot(formula = contradiction ~ Conflict*Format,data = data_study_1,inf.method = 'ci',plot = FALSE)
contradiction.pp

## $summary
##    Conflict    Format bean.num  n      avg   inf.lb   inf.ub
## 1     Conf.   Generic        1 71 25.32394 24.52199 26.12590
## 2 Non-Conf.   Generic        2 73 12.69863 11.87954 13.51772
## 3     Conf. Qualified        3 76 25.19737 24.27573 26.11901
## 4 Non-Conf. Qualified        4 74 14.18919 13.26034 15.11803
## 
## $avg.line.fun
## [1] "mean"
## 
## $inf.method
## [1] "ci"
## 
## $inf.p
## [1] 0.95

# Advancement
advancement.pp <- pirateplot(formula = advancement ~ Conflict*Format,data = data_study_1,inf.method = 'ci',plot = FALSE)
advancement.pp

## $summary
##    Conflict    Format bean.num  n         avg      inf.lb      inf.ub
## 1     Conf.   Generic        1 71 -0.18309859 -0.34467660 -0.02152058
## 2 Non-Conf.   Generic        2 73  0.09589041 -0.06045142  0.25223224
## 3     Conf. Qualified        3 76 -0.30263158 -0.44235175 -0.16291141
## 4 Non-Conf. Qualified        4 74 -0.08108108 -0.23330321  0.07114105
## 
## $avg.line.fun
## [1] "mean"
## 
## $inf.method
## [1] "ci"
## 
## $inf.p
## [1] 0.95

# Confusion
confusion.pp <- pirateplot(formula = confusion ~ Conflict*Format,data = data_study_1,inf.method = 'ci', plot = FALSE)
confusion.pp

## $summary
##    Conflict    Format bean.num  n      avg   inf.lb   inf.ub
## 1     Conf.   Generic        1 71 4.577465 4.453102 4.701827
## 2 Non-Conf.   Generic        2 73 3.506849 3.251490 3.762209
## 3     Conf. Qualified        3 76 4.473684 4.300765 4.646603
## 4 Non-Conf. Qualified        4 74 3.783784 3.543471 4.024096
## 
## $avg.line.fun
## [1] "mean"
## 
## $inf.method
## [1] "ci"
## 
## $inf.p
## [1] 0.95

Now that the means and CIs used in both methods have been calculated, we use ==, the equality operator, to check if the mean/CI values matches their corresponding value in the researchers’ method. However, I quickly encounter an issue and change my approach to subtracting their absolute values.

# Contradiction Sanity Check
contradiction_tab$Lower==contradiction.pp$summary$inf.lb

## [1]  TRUE FALSE  TRUE FALSE

# Seems that some values fail the equality check

# Instead I try to subtract their absolute values
abs(contradiction_tab$Lower)-abs(contradiction.pp$summary$inf.lb)

## [1]  0.000000e+00  1.776357e-15  0.000000e+00 -1.776357e-15

abs(contradiction_tab$Upper)-abs(contradiction.pp$summary$inf.ub)

## [1] 0.000000e+00 1.776357e-15 3.552714e-15 0.000000e+00

abs(contradiction_tab$Mean)-abs(contradiction.pp$summary$avg)

## [1] 0 0 0 0

# Advancement Sanity Check
abs(advancement_tab$Lower)-abs(advancement.pp$summary$inf.lb)

## [1] 0.000000e+00 5.551115e-17 0.000000e+00 5.551115e-17

abs(advancement_tab$Upper)-abs(advancement.pp$summary$inf.ub)

## [1]  1.144917e-16 -5.551115e-17 -2.775558e-17  0.000000e+00

abs(advancement_tab$Mean)-abs(advancement.pp$summary$avg)

## [1]  8.326673e-17 -5.551115e-17  0.000000e+00  2.775558e-17

# Confusion Sanity Check
abs(confusion_tab$Lower)-abs(confusion.pp$summary$inf.lb)

## [1]  0.000000e+00  4.440892e-16 -8.881784e-16  4.440892e-16

abs(confusion_tab$Upper)-abs(confusion.pp$summary$inf.ub)

## [1] -8.881784e-16 -4.440892e-16  8.881784e-16  0.000000e+00

abs(confusion_tab$Mean)-abs(confusion.pp$summary$avg)

## [1] 0.000000e+00 0.000000e+00 0.000000e+00 4.440892e-16

Subtracting the absolute values of each mean estimate and CI limit results in at most a trivially small difference with the largest difference having a negative exponent of 15. While I can’t be certain that base R and the pirate package have different methods of calculating CI’s, it should noted that there is a trivially small difference in their outputs for this analysis. Ultimately, this difference would not affect the interpretation of the plots (see below for comparison) or any of the descriptive statistics for that matter.

Both plots are visually almost identical in appearance and the information they convey. We show that the researchers’ plot (right) can be replicated relatively easily by using the pirate package. However, we were more familiar with and liked the versatility of ggplot2. We also enjoyed the personal challenge of understanding and working out the geoms/layers of a pirate plot rather than blindly relying on an external package. So in our eyes, our method (left) is more flexible and customisable which may be useful if someone wanted to further tweak the plot’s aesthetics. It may also be useful in our exploratory analysis.

Hence, we were able to verify the researchers’ plot as our plot has points, mean bars and CIs that seem to line up with their and we confirmed this through calculating and receiving (essentially) the same mean and CI values as them. We also as a personal challenge were able to really nicely mirror the pirate package’s aesthetics.

Summary Statistics

Consists of the marginal means, sd and n by format and by conflict for the three measures:

Contradiction

Advancement

Confusion

Only the differences in marginal means by conflict are mentioned in the paper likely because the researchers did not find a significant difference between format groups in their inferential tests. We will only calculate the marginal mean/sd by conflict below but they can easily be calculated by format using the same method.

# Contradiction
stats_contradiction_conflict <- data_study_1 %>% 
  group_by(Conflict) %>%
  summarise(
    n = n(),
    mean = mean(contradiction), 
    sd = sd(contradiction)
  ) %>% 
  ungroup()
print(stats_contradiction_conflict)

## # A tibble: 2 × 4
##   Conflict      n  mean    sd
##   <fct>     <int> <dbl> <dbl>
## 1 Conf.       147  25.3  3.72
## 2 Non-Conf.   147  13.4  3.83

# Advancement
stats_advancement_conflict <- data_study_1 %>% 
  group_by(Conflict) %>%
  summarise(
    n = n(),
    mean = mean(advancement),
    sd = sd(advancement)
  ) %>% 
  ungroup()
print(stats_advancement_conflict)

## # A tibble: 2 × 4
##   Conflict      n     mean    sd
##   <fct>     <int>    <dbl> <dbl>
## 1 Conf.       147 -0.245   0.647
## 2 Non-Conf.   147  0.00680 0.667

# Confusion
stats_confusion_conflict <- data_study_1 %>% 
  group_by(Conflict) %>%
  summarise(
    n = n(),
    mean = mean(confusion),
    sd = sd(confusion)
  ) %>% 
  ungroup()
print(stats_confusion_conflict)

## # A tibble: 2 × 4
##   Conflict      n  mean    sd
##   <fct>     <int> <dbl> <dbl>
## 1 Conf.       147  4.52 0.655
## 2 Non-Conf.   147  3.65 1.07

The marginal mean differences between conflict (left) and non conflicting (right) groups that the authors reported were:

Mean Contradiction: 25.3 vs. 13.4

Mean Scientific Advancement: -0.25 vs. 0.007

Mean Confusion: 4.52 vs. 3.65

Our summary statistics match these values when rounded to the same dp.

Histogram

Now we attempt to recreate the frequency histogram plot shown above. It displays the no. of people in each of the four conditions who selected -1 / 0 / +1 for their scientific advancement response which represents less, same or more respectively. We use mutate() and case_when() to create a new variable called “advancementvalue” that converts the numeric responses of “advancement” to their corresponding character responses - “less”, “same” or “more” (in terms of scientific advancement).

This is then plotted using geom_bar with position = “dodge” so that the 4 IV groups are placed next to each other as vertical bars for for each advancement value. The : operator in group and fill parameters create the four permutations of Conflict and Format conditions (i.e. the four IV groups). We also needed to manually set the order of the x axis using “levels = c(‘Less’, ‘Same’, ‘More’)” as otherwise they would appear in alphabetical order. The bars are grey scale filled by condition. Once again, we use the png. and dev.off() method of resizing and saving a final resized .png.

data_study_1 <- data_study_1 %>% 
  mutate(advancementvalue = case_when(
    advancement == -1 ~ "Less",
    advancement == 0 ~ "Same",
    advancement == 1 ~ "More"
  ))

png(filename="plot_advancement_histogram.png", width=737, height=1062)

ggplot(
  data = data_study_1, 
  mapping = aes(
    x = factor(advancementvalue, levels = c('Less', 'Same', 'More')),
    group = Conflict:Format, # Group by the 4 conditions
    fill = Conflict:Format # Fill by the 4 conditions
  )
) + 
geom_bar(position = "dodge") + # put the conditions next to each other
labs(
  x = "Advancement",
  y = "Number of Participants"
) +
scale_fill_grey(name = "Condition", # Legend name and labels
                 labels = c("Conflicting/Generic", 
                            "Conflicting/Qualified", 
                            "Non-conflicting/Generic", 
                            "Non-conflicting/Qualified")) 

dev.off()

## quartz_off_screen 
##                 2

As we can see above, our plot (left) matches the one found in the researchers’ article (right).

Experiment 2

The verification of experiment 2 follows a very similar structure and uses the same functions as experiment 1. As a result, we will keep it super concise so that it is mostly just the reproducibility goal -> code chunks -> output.

Read the Data

data_study_2_all = read_survey(here("data", "Study 7 data.csv"))

Convert data_study_2_all to a data.frame.

data_study_2_all <- as.data.frame(data_study_2_all)
print(data_study_2_all)

Count survey opens and consent

According to the article,

no. of times survey was opened = 412
no. of participants that consented (from code not article) = 412

# Count how many times the survey was opened (i.e., anyone with progress >= 1%)
data_study_2_all %>% 
  filter(Progress >= 1) %>% 
  count()

##     n
## 1 412

# Count no. of participants who consented (before unplanned/planned exclusions) 
data_study_2_all %>% 
  filter(Consent == 1) %>% 
  count()

##     n
## 1 412

Matches researchers’ data, 412 opens and all 412 responses consented.

Tidy new data frame for experiment 2

Once again, we do the same four stages of tidying:

1. Rename columns

# Create new df and rename confusing column names
data_study_2 <- data_study_2_all %>% 
  rename(
    recall_score = SC0,
    condition = FL_12_DO,
    confidence = GSS
)

2. Exclusions

Unlike experiment 1, there were no issues with double responding so there are no unplanned exclusions.

Planned Exclusions

Exclude rows where participants did not meet preregistered inclusion criteria:

only includes participants who finished the study (Finished==1), declared that they answered seriously (seriousness_check==1) AND scored 4 or above on recall
- i.e., excluding non-completers, non-serious responses and those who did not recall 4 or more items
only includes relevant columns required for analysis

According to the article, after all these exclusions the final sample should consist of 400 participants.

data_study_2 <- data_study_2 %>% 
  filter(
    Finished == 1 & 
    Serious_check == 1 & 
    recall_score >= 4
) %>% 
  select(
    Finished, 
    `Duration (in seconds)`, 
    Gender, 
    Age, 
    Serious_check, 
    recall_score, 
    condition, 
    NC_1:Development_sci_know_6
  )

#Count how many participants remain after exclusions (final sample size = 400)
data_study_2 %>% 
  count()

##     n
## 1 400

Final sample size of 400 matches.

3. separate() ‘condition’ column

data_study_2 <- data_study_2 %>% separate(
    col = "condition",
    into = c("block", "number", "Format", "Conflict"))

# Rename so you don't have to rename in pirate plots
# (Only relevant for researchers' plot/Doesn't affect our plot)
data_study_2 <- data_study_2 %>% 
  mutate(Conflict = case_when(
   Conflict == "Conflict" ~ "Conf.",
   Conflict == "Consistent" ~ "Non-Conf."
  ))

# set these new IV columns as factors 
data_study_2$Format <- as.factor(data_study_2$Format)
data_study_2$Conflict <- as.factor(data_study_2$Conflict)
levels(data_study_2$Format)

## [1] "Generic"   "Qualified"

levels(data_study_2$Conflict)

## [1] "Conf."     "Non-Conf."

4. Sum scores

There are five mean scores we need to calculate in experiment 2’s df so using rowSums(across(starts_with())) method (and dividing by no. of scores) saves a lot of space here.

# Nutritional confusion mean
data_study_2 <- data_study_2 %>% 
  mutate(confusion = rowSums(across(starts_with("NC_")))/6)

# Nutritional backlash mean
data_study_2 <- data_study_2 %>% 
  mutate(backlash = rowSums(across(starts_with("NBS_")))/6)

# Mistrust of expertise mean
data_study_2 <- data_study_2 %>% 
  mutate(mistrust = rowSums(across(starts_with("Mistrust_expertise_")))/3)

# Certainty of knowledge mean
data_study_2 <- data_study_2 %>% 
  mutate(certainty = rowSums(across(starts_with("Certainty_sci_know_")))/6)

# Development of knowledge mean
data_study_2 <- data_study_2 %>% 
  mutate(development = rowSums(across(starts_with("Development_sci_know_")))/6)

# final df of experiment 2 can be written to .csv using this
# write.csv(data_study_2, file = "data_study2Tidied.csv")

# Final look at final df with variables of interest
glimpse(data_study_2)

## Rows: 400
## Columns: 43
## $ Finished                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ `Duration (in seconds)` <dbl> 430, 454, 513, 486, 745, 584, 476, 674, 445, 6…
## $ Gender                  <dbl> 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1…
## $ Age                     <dbl> 51, 19, 31, 32, 49, 20, 20, 29, 29, 18, 31, 21…
## $ Serious_check           <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ recall_score            <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 5, 7, 7, 7, 7, 7…
## $ block                   <chr> "Block", "Block", "Block", "Block", "Block", "…
## $ number                  <chr> "3", "3", "4", "1", "1", "2", "4", "2", "2", "…
## $ Format                  <fct> Qualified, Qualified, Qualified, Generic, Gene…
## $ Conflict                <fct> Conf., Conf., Non-Conf., Conf., Conf., Non-Con…
## $ NC_1                    <dbl> 2, 2, 2, 5, 4, 4, 4, 1, 4, 4, 3, 4, 4, 2, 4, 1…
## $ NC_2                    <dbl> 2, 2, 1, 4, 4, 4, 4, 1, 4, 4, 3, 4, 3, 2, 4, 1…
## $ NC_3                    <dbl> 2, 2, 1, 3, 4, 3, 2, 4, 2, 2, 3, 2, 2, 1, 2, 1…
## $ NC_4                    <dbl> 2, 4, 2, 4, 4, 4, 5, 1, 4, 4, 3, 3, 2, 4, 4, 3…
## $ NC_5                    <dbl> 2, 3, 2, 2, 4, 2, 4, 4, 2, 2, 3, 3, 3, 4, 4, 3…
## $ NC_6                    <dbl> 1, 3, 2, 2, 4, 2, 2, 4, 2, 2, 3, 2, 2, 2, 4, 2…
## $ NBS_1                   <dbl> 2, 5, 1, 4, 4, 4, 5, 3, 4, 5, 4, 4, 3, 3, 3, 3…
## $ NBS_2                   <dbl> 2, 3, 2, 4, 2, 4, 4, 4, 2, 2, 4, 2, 2, 2, 2, 1…
## $ NBS_3                   <dbl> 2, 2, 2, 4, 2, 5, 4, 4, 2, 2, 3, 4, 1, 1, 3, 1…
## $ NBS_4                   <dbl> 1, 2, 2, 3, 2, 1, 1, 2, 2, 1, 3, 2, 2, 2, 2, 1…
## $ NBS_5                   <dbl> 1, 2, 2, 3, 2, 1, 2, 2, 2, 2, 3, 3, 3, 2, 2, 1…
## $ NBS_6                   <dbl> 1, 4, 2, 4, 2, 2, 5, 2, 4, 4, 3, 4, 2, 2, 4, 3…
## $ Mistrust_expertise_1    <dbl> 2, 2, 1, 3, 2, 2, 3, 1, 3, 2, 3, 3, 3, 1, 3, 2…
## $ Mistrust_expertise_2    <dbl> 2, 1, 1, 2, 1, 1, 2, 1, 3, 3, 3, 3, 1, 1, 3, 1…
## $ Mistrust_expertise_3    <dbl> 2, 2, 5, 3, 2, 1, 4, 2, 2, 2, 3, 2, 2, 2, 2, 1…
## $ confidence              <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 2, 2, 2, 1, 1, 1, 2, 2…
## $ Certainty_sci_know_1    <dbl> 4, 5, 5, 3, 4, 4, 5, 2, 4, 5, 3, 4, 1, 5, 5, 4…
## $ Certainty_sci_know_2    <dbl> 2, 3, 4, 2, 2, 3, 5, 1, 4, 2, 3, 2, 5, 4, 5, 3…
## $ Certainty_sci_know_3    <dbl> 4, 5, 5, 4, 5, 5, 5, 5, 3, 4, 3, 4, 5, 5, 5, 5…
## $ Certainty_sci_know_4    <dbl> 4, 5, 3, 3, 2, 3, 5, 4, 4, 2, 3, 4, 2, 2, 5, 4…
## $ Certainty_sci_know_5    <dbl> 4, 5, 5, 5, 5, 4, 5, 3, 4, 4, 3, 4, 3, 4, 5, 5…
## $ Certainty_sci_know_6    <dbl> 4, 5, 5, 5, 4, 4, 4, 5, 3, 2, 3, 4, 5, 5, 5, 5…
## $ Development_sci_know_1  <dbl> 4, 5, 5, 5, 4, 5, 5, 5, 4, 5, 3, 4, 5, 5, 5, 5…
## $ Development_sci_know_2  <dbl> 4, 4, 5, 4, 5, 5, 5, 5, 4, 5, 3, 4, 4, 5, 5, 5…
## $ Development_sci_know_3  <dbl> 4, 5, 5, 4, 4, 5, 4, 5, 5, 5, 2, 4, 5, 5, 5, 5…
## $ Development_sci_know_4  <dbl> 4, 4, 5, 4, 5, 5, 5, 5, 5, 5, 2, 4, 5, 5, 5, 5…
## $ Development_sci_know_5  <dbl> 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 2, 4, 5, 5, 5, 5…
## $ Development_sci_know_6  <dbl> 5, 4, 5, 4, 4, 5, 4, 5, 4, 4, 3, 2, 3, 5, 5, 5…
## $ confusion               <dbl> 1.833333, 2.666667, 1.666667, 3.333333, 4.0000…
## $ backlash                <dbl> 1.500000, 3.000000, 1.833333, 3.666667, 2.3333…
## $ mistrust                <dbl> 2.000000, 1.666667, 2.333333, 2.666667, 1.6666…
## $ certainty               <dbl> 3.666667, 4.666667, 4.500000, 3.666667, 3.6666…
## $ development             <dbl> 4.333333, 4.500000, 5.000000, 4.166667, 4.5000…

Demographics

Participant Age and Gender

The researchers’ demographic age and gender data for experiment 2 is as follows:

150 males, 248 females, 2 identify as neither male/female (sums to 400 = final sample size)
Minimum Age = 18, Maximum Age = 73
Mean Age = 33.5, SD Age = 12

# Age
study_2_age <- data_study_2 %>% 
  summarise(
    mean_age = round(mean(Age), 1),
    sd_age = round(sd(Age), 0),
    min_age = min(Age),
    max_age = max(Age)
  )

# Gender
# Assign names of gender to data
data_study_2 <- data_study_2 %>% 
  mutate(Gender = case_when(
   Gender == "1" ~ "Male", 
   Gender == "2" ~ "Female",
   Gender == "3" ~ "Other",
   Gender == "4" ~ "Prefer_not_say" # Gender coded as 1 to 4 in data
  ))

# Count gender 
study_2_gender <- data_study_2 %>%
  group_by(Gender) %>% 
  summarise(Count = n())

print(study_2_gender)

## # A tibble: 3 × 2
##   Gender Count
##   <chr>  <int>
## 1 Female   248
## 2 Male     150
## 3 Other      2

# There are no participants who responded "Prefer_not_say"
# But we still want to include a count of zero
study_2_gender <- study_2_gender %>% 
  add_row(
    Gender = "Prefer not to say", 
    Count = 0
  ) %>% 
  ungroup()

# Print demographics
print(study_2_age)

##   mean_age sd_age min_age max_age
## 1     33.5     12      18      73

print(study_2_gender)

## # A tibble: 4 × 2
##   Gender            Count
##   <chr>             <dbl>
## 1 Female              248
## 2 Male                150
## 3 Other                 2
## 4 Prefer not to say     0

# 150 + 248 + 2 = 400 = Final Sample Size

Matches researchers’ demographics.

Mean Completion Time

According to the article, mean completion time should be 10.89 minutes.

# Calculate mean completion time in seconds.
data_study_2 %>% 
  summarise(
    mean_completion_time_sec = mean(`Duration (in seconds)`)
  ) %>%
  mutate(
    mean_completion_time_min = mean_completion_time_sec/60
  )

##   mean_completion_time_sec mean_completion_time_min
## 1                   653.68                 10.89467

Matches.

Pirate plots / CI Descriptives

Pirate plot with Functions (non-pirate package/our version)

We already wrote the function so we just have to call it using the right values for each parameter. Looking at the researchers’ plots in the article tells us what we should put for these aesthetic parameters.

# Calling the function
# Confusion
confusion2_function_plot <- pirate.fun( # Distinct from confusion in exp1
  data = data_study_2,
  y_var = confusion,
  plot_title = "Nutritional Confusion",
  y_title = "Nutritional Confusion",
  lim1 = 1,
  lim2 = 5,
  break_size = 1,
  add.minor.break = TRUE,
  no.minor.breaks = 2,
  violin_width = 1
)
plot(confusion2_function_plot)

# Backlash
backlash_function_plot <- pirate.fun(
  data = data_study_2,
  y_var = backlash,
  plot_title = "Nutritional Backlash",
  y_title = "Nutritional Backlash",
  lim1 = 1,
  lim2 = 5,
  break_size = 1,
  add.minor.break = TRUE,
  no.minor.breaks = 2,
  violin_width = 1
)
plot(backlash_function_plot)

# Mistrust
mistrust_function_plot <- pirate.fun(
  data = data_study_2,
  y_var = mistrust,
  plot_title = "Mistrust of Expertise",
  y_title = "Mistrust of Expertise",
  lim1 = 1,
  lim2 = 5,
  break_size = 1,
  add.minor.break = TRUE,
  no.minor.breaks = 2,
  violin_width = 1
)
plot(mistrust_function_plot)

# Confidence
confidence_function_plot <- pirate.fun(
  data = data_study_2,
  y_var = confidence,
  plot_title = "Confidence in Scientific Community",
  y_title = "Confidence in Scientific Community",
  lim1 = 1,
  lim2 = 3,
  break_size = 1,
  add.minor.break = TRUE,
  no.minor.breaks = 5,
  violin_width = 1
)
plot(confidence_function_plot)

# Certainty
certainty_function_plot <- pirate.fun(
  data = data_study_2,
  y_var = certainty,
  plot_title = "Certainty of Knowledge",
  y_title = "Certainty of Knowledge",
  lim1 = 1,
  lim2 = 5,
  break_size = 1,
  add.minor.break = TRUE,
  no.minor.breaks = 2,
  violin_width = 1
)
plot(certainty_function_plot)

# Development
development_function_plot <- pirate.fun(
  data = data_study_2,
  y_var = development,
  plot_title = "Development of Knowledge",
  y_title = "Development of Knowledge",
  lim1 = 1,
  lim2 = 5,
  break_size = 1,
  add.minor.break = TRUE,
  no.minor.breaks = 2,
  violin_width = 1
)
plot(development_function_plot)

# Save image of plot
pirate_all_function <- grid.arrange(
confusion2_function_plot, backlash_function_plot, mistrust_function_plot, confidence_function_plot, certainty_function_plot, development_function_plot,
ncol = 2, nrow = 3
)

# Save as final resized .png using dimensions found in article 
png(filename="pirate_all_2_function.png", width=841, height=1168) 
pirate_all_function

## TableGrob (3 x 2) "arrange": 6 grobs
##   z     cells    name           grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (2-2,1-1) arrange gtable[layout]
## 4 4 (2-2,2-2) arrange gtable[layout]
## 5 5 (3-3,1-1) arrange gtable[layout]
## 6 6 (3-3,2-2) arrange gtable[layout]

dev.off()

## quartz_off_screen 
##                 2

Pirate Plots (pirate package/researchers’ version)

We use the pirate package/researchers’ code first to make sure it is the same as the one found in the article (see above).

# Using dimensions found in original article 841x1168
png(filename="plot_pirate_study_2.png", width=841, height=1168) 

# will place plots in a grid with 3 rows and 2 columns
par(mfrow = c(3, 2)) 

pirateplot(formula = confusion ~ Conflict*Format, data = data_study_2, yaxt = "n", theme=1,main = "Nutritional Confusion", ylab = "Nutritional Confusion", cex.names = 0.75, cex.lab = 0.9, inf.method = 'ci')
axis(2, at = seq(from = 1, to = 5, by = 1)) 

pirateplot(formula = backlash ~ Conflict*Format, data = data_study_2, yaxt = "n", theme=1,main = "Nutritional Backlash", ylab = "Nutritional Backlash", cex.names = 0.75, cex.lab = 0.9, inf.method = 'ci')
axis(2, at = seq(from = 1, to = 5, by = 1)) 

pirateplot(formula = mistrust ~ Conflict*Format, data = data_study_2, yaxt = "n", theme=1,main = "Mistrust of Expertise", ylab = "Mistrust of expertise", cex.names = 0.75, cex.lab = 0.9, inf.method = 'ci')
axis(2, at = seq(from = 1, to = 5, by = 1)) 

pirateplot(formula = confidence ~ Conflict*Format, data = data_study_2, yaxt = "n", theme=1,main = "Confidence in Scientific Community", ylab = "Confidence in Scientific Community", cex.names = 0.75, cex.lab = 0.9, inf.method = 'ci')
axis(2, at = seq(from = 1, to = 3, by = 1)) 

pirateplot(formula = certainty ~ Conflict*Format, data = data_study_2, yaxt = "n", theme=1,main = "Certainty of Knowledge", ylab = "Certainty of knowledge", cex.names = 0.75, cex.lab = 0.9, inf.method = 'ci')
axis(2, at = seq(from = 1, to = 5, by = 1)) 

pirateplot(formula = development ~ Conflict*Format, data = data_study_2,yaxt = "n", ylim =c(1,5), theme=1,main = "Development of Knowledge", ylab = "Development of knowledge", cex.names = 0.75, cex.lab = 0.9, inf.method = 'ci')

axis(2, at = seq(from = 1, to = 5, by = 1))
# ylim =c(1,5) because the default scale went from 3-5 rather than 1-5 (presumably because there were no scores of 1 or 2)                 

# set grid back to default (just one plot)
par(mfrow = c(1, 1))

dev.off()

## quartz_off_screen 
##                 2

Matches the figure from the article, also showing very little differences between the 4 different groups across all measures.

Plot created by researchers’ code = Plot found in the article

We will now use the above image to compare it to our plot.

Comparison

Comparing our plot (left) to the researchers’ (right), we can see that they visually seem to match eacho ther in means and CIs (there is little difference in all measures). We’ve also shown in the experiment 1 that the mean_cl_normal method of calculating mean and 95 CIs creates trivially different values compared to the pirate package. Because of this, I won’t compare the values calculated from two methods here for the sake of brevity. We can also once again see that they are aesthetically matched, demonstrating the vast capabilities of ggplot2.

Summary Statistics

None of the marginal means of experiment 2 are mentioned in the paper likely because there was no significant difference between format or conflict groups. They have been omitted here for brevity but can be computed easily using the method used in the summary statistics section of experiment 1. It is also clear using the pirate plot above that the marginal means do not significantly differ anyways.

Recommendations

I think that the researchers actually did a pretty good job overall in terms of making their data open access, transparent and reproducible.

Specifically, the experiment’s raw data and survey questions were openly accessible on the Open Science Framework with registration of protocols for both experiments.

The researchers were transparent and honest about their data and analysis because we didn’t notice any falsification or fabrication in their script. Their second experiment’s manipulation did not work at all and they also owned up to this.

Finally, they also used a package called checkpoint that installs packages from a specific date (for version control) but I personally couldn’t get it to work.

Eventually, we were able to reproduce the same results and figures albeit with great difficulty. This following are three main issues that made reproducing their results difficult along with some recommendations and resources to mitigate these issues in the future:

Naming of files and columns

The authors included two CSV files in their OSF - titled Experiment 8.csv and Experiment 7.csv. Yet, there was no mention of Experiments 7/8 anywhere in the article or in any README files. We eventually worked out that they referred to only Experiments 1 and 2 respectively (weird that exp 1 is 8 and exp 2 is 7). The experimenters should have either renamed these files to so that it matches the article or provided a readme.txt file that explains why they are named this way (maybe they ran many experiments) and what each file is. This would greatly improve the reproducibility of their work.

Also, some of the variables of interest “condition”, “recall_score” and “confidence” had very strange names when we first saw the df. This meant that we had to really dig through the researchers’ script to contextualise these variables. While SC0 (recall score), FL_12_DO and FL_10_DO (condition) were renamed in the researchers’ script, GSS (confidence) definitely was not and we had to do that ourselves. This made us appreciate codebooks in experiments like this where there are hundreds of variables with strange names (beyond these select variables of interest). While they do include a word doc. containing the survey questions, it is quite an annoying way to scroll through and comb for variable meanings. I really think the inclusion of a code book would improve the reproducibility of the researchers’ work in the future.

Where should they go to learn this:

Danielle Navarro, our lecturer in the coding modules, had an entire week of videos dedicated to the importance of properly naming things and README files. This is a very basic intro this topic.

https://i-am-learning-r.netlify.app/week4

More detailed information in article form can be found in this OSF link. It contains a number of articles on managing project files. The articles that are most relevant here though are Files Naming and Organize Files.

https://help.osf.io/article/387-project-files

While quickly browsing for some resources on making code books, I found Cookbook for a Codebook by Kai T. Horstmann to be a very informative but concise read on the expectations of a codebook.

https://osf.io/72hrh/?action=download&version=2

Better documentation/formatting of code

Most of the researchers’ script was formatted as large unsegmented chunks of code that were not labelled very well. They should have tried to split them into smaller chunks with a clear, understandable structure. There also wasn’t really a logical flow that good scripts have which goes something like: loading the data, tidying and to descriptives + visualisation. This made reading their script very annoying so this disorganised structure is something that we aimed to fix in our verification report above. By adopting this much more streamlined structure, the researchers would make the replication of their analysis so much easier for others.

The researchers’ script also suffered from poor documentation. It relied heavily on inline comments that stretch across the screen so it is annoying to read and cluttered the actual code. These comments were also missing/lacked detail in areas that really needed them (e.g. pirate plot section). Explaining the functions/arguments that you use in a script, particularly if you are using a lot of external packages as they do, makes it so much easier for other people to understand what you are trying to accomplish in a script and importantly improves the reproducibility of your work. I personally think that the researchers’ should try to use Rmarkdown as you can form clear code chunks that you can collapse, have more commenting/documentation options and is much easier to look at overall. If not, they should improve their documentation by making their comments clearer and introducing any new functions/arguments.

Where should they go to learn this:

If the researcher’s are interested in making use of the many formatting/documentation boons that Rmarkdown provides, this very detailed guide has everything they ever need to know:

https://bookdown.org/yihui/rmarkdown/

Justification and explanation of packages

The researchers’ script also lacks justification and explanation of the external packages they used (besides small inline comments). This meant that we did not know what each package provided and how they worked in terms of functions/arguments. For our verification report, we found this issue particularly annoying when we tried to use the pirate package. There was no rationale for why it was used so readily over another plotting package like ggplot2 and they didn’t even provide a guide of what the different arguments in the pirate plot were doing. This is a big part of why we were driven to use ggplot2 as we didn’t want to blindly follow what the researchers’ were doing and see if we could reproduce their results with packages that we understood. There were also a lot of packages that they used that we didn’t end up using because we thought they were unessential or they were included in tidyverse. For future reproducibility, we recommend that the researchers justify/explain their external packages as well as keeping in mind the KISS principle when choosing packages.

Where should they go to learn this:

Short article about KISS and DRY principles in coding (both useful):

https://dev.to/albertbennett/what-is-the-dry-kiss-principal-1h3n

Stackoverflow thread about how to choose which external packages to use:

https://stackoverflow.com/questions/44832985/r-programming-too-many-packages

Reflections

When I first heard that the internship involved learning how to code in R I felt…

nervous but also very keen because coding was something I always wanted to learn and I also enjoy learning languages in any form. I was naturally a bit nervous at first because as with learning any new language, the initial learning curve is very steep. I thought that I would have to memorise a lot of functions/arguments as you would words/definitions when learning a spoken language. But I was pleasantly surprised later on that this wasn’t really the case because each function/argument was named quite intuitively at least for R (maybe not necessarily true for other coding languages).

Now that I have been learning R for 10 weeks, I feel….

much more confident and, after getting a lot of hands-on experience, I feel that it was more enjoyable than I initially thought it would be. I realised after a while that coding is more of a problem solving process than memorising and regurgitating functions. In a lot of cases actually, you have plenty of functions to choose from but you just need to work out the most efficient method of using them together. It also feels particularly good when you spend so long working on your script work to eventually get it to work perfectly.

The hardest thing I have encountered about learning R this term was…

not anything coding related (I found difficult issues fun to solve in anything) but mostly the logistical side of working on Rstudio. Early on, a lot of small silly issues like trying to write the correct file path on my computer and not understanding working directories. To add to this, I found it very hard to find any resources on the internet about how to fix issues like this and resorted to “trial and erroring” solutions. In the later weeks, it was more about things like making sure formatting of documents was correct (YAML can be painful) and that .rmd files could knit properly to a pdf or html. Another logistical issue (or maybe my issue) is when this assignment’s .rmd file didn’t save properly, lost a day’s amount of work and had to scrounge up code from the history panel. Not forgetting about small logistical things like this will save me lots of grief when working with R in the future.

I am most proud of myself for…

challenging myself to use ggplot2 rather than blindly following the researchers’ method of creating pirate plots. There was a point after I decided to do this where I thought that I bit off more than I could chew but it was actually very worth it in the end. I learnt a lot of things like further controlling the formatting/aesthetics of plots using ggplot2, outside of the box thinking/solutions (e.g. creating labels using annotate() function) and most importantly learning to write a custom function. This saved us so much time and allowed us to condense the complicated aesthetics/geoms of the pirate plot down in a way that is presentable. Even if it is a bit rudimentary, I think it is a good start before delving further into writing functions. I’m also quite proud that it introduced my group and the class (during presentations) to custom functions as I think that it’s a very useful tool in coding.

The next thing I want to learn how to do in R is…

learning to use statistical tests to collect inferential data. I felt that this course really only dealt with tidying data frames as well as creating plots and creating demographic/descriptives. However, I had a skim through the code for the researchers’ inferential tests and saw how useful R can be for inferential statistics especially when combined with the capabilities of visualisation/plotting in R. For this reason, I really want to learn how to apply these tests as this could be extremely useful as a skill in the future.

Exploratory

When considering what research questions to explore I needed to keep mind that some of the manipulations in the two experiments did not work. This means it wouldn’t make sense to factor in these manipulations when analysing the measures that they did not (significantly) affect.

Namely, only the conflict manipulation worked in exp 1. Because of this, I opted not to analyse format’s influence further for exp 1. However, I do choose to look at how the effect of conflicting headlines on the advancement measure might have sex-based differences. I also choose to look at how the time spent on the survey might affect the three measures.

While in exp 2, both manipulations did not result in significant differences. This is why I avoid looking at experiment 2’s manipulations entirely (they were too weak/didn’t work) and instead I look at the correlations between the six measures, something that should be unaffected by the strength of the manipulations.

1. Are there sex-based differences in how consistent/conflicting scientific headlines are viewed as advancement/regression in knowledge?

plot(advancement_plot)

When first reading the article, I was really quite fascinated in the conflict condition’s effect on the advancement measure. The effect being that conflicting headlines were viewed as a setback in scientific knowledge while consistent headlines an advancement (seen in plot above; +1 denotes advancement). As undergraduate science students, I was surprised to see this effect particularly participants viewing conflicting headlines as a setback in scientific knowledge. To look further into this effect, I wanted to see if this effect would vary between sexes. Maybe one sex is more conservative when it comes to this effect or maybe there is no sex-based difference at all.

Tidy Data

To do this, I first filter out all the “Qualified” Format rows because as mentioned before I am only considering the conflict manipulation and do this by only using the “Generic” format participants. I then mutate the Gender column to abbreviate Male to M and Female to F (easier to look at later).

data_exploratory_1 <- data_study_1 %>% 
  filter(Format == "Generic") %>% 
  mutate(Gender = case_when(
   Gender == "Male" ~ "M", 
   Gender == "Female" ~ "F"
  ))

Statistical Analysis

Before any statistical tests, I use aggregate() from base R to see the marginal means and cell means (sex x conflict). This will also give me the directionality of any effects before any plotting.

For my post-hoc analysis I want to perform a statistical test that will check if gender’s effect on advancement as well as interaction with conflict. This fits the description of a two way ANOVA. I use aov() from base R which will let me perform a two way ANOVA (alpha = .05) to analyse Gender and Conflict’s effect on the advancement measure.

Because I am unsure of what a large effect for the advancement measure is, I use cohens_f() from the “effectsize” package to obviously calculate cohen’s f value which obviously allows me to estimate the effect size. (the names of these functions are very intuitive)

N.B. I was originally going to conduct a scheffe test after the ANOVA (using ScheffeTest() from DescTools) but realised it was redundant for our analysis (didn’t need to look at more contrasts). I also was originally going to calculate cohen’s d. They provide very similar results but, as I understand it, cohen’s f is more appropriate in the context of ANOVAs.

I’ve shown how I was going to do the scheffe test and calculate Cohen’s d below (without running):

# 95 CI Scheffe
# install.packages("DescTools")
library(DescTools) # For Scheffe Tests

ScheffeTest(advancement_gender_aov) # From DescTools pacakge


# Calculate effect size with Cohen's D
# install.packages("rstatix")
library(rstatix) # For cohen's d test

data_exploratory_1 %>% 
   cohens_d(advancement ~ Gender)
# Small effect size (female is 0.445 deviations lower)
data_exploratory_1 %>%
  cohens_d(advancement ~ Conflict)
# Small effect size (Conflict is 0.412 deviations lower)

This is our actual statistical analysis:

# Cell means
plot_exploration_data <- aggregate(advancement ~ Gender + Conflict, data = data_exploratory_1, mean)

print(plot_exploration_data)

##   Gender  Conflict advancement
## 1      F     Conf. -0.28205128
## 2      M     Conf. -0.06250000
## 3      F Non-Conf. -0.07894737
## 4      M Non-Conf.  0.28571429

# Marginal means
aggregate(advancement ~ Gender, data = data_exploratory_1, mean)

##   Gender advancement
## 1      F  -0.1818182
## 2      M   0.1194030

aggregate(advancement ~ Conflict, data = data_exploratory_1, mean)

##    Conflict advancement
## 1     Conf. -0.18309859
## 2 Non-Conf.  0.09589041

# ANOVA F Overall Test
advancement_gender_aov <- aov(advancement ~ Gender * Conflict, data = data_exploratory_1)
summary(advancement_gender_aov)

##                  Df Sum Sq Mean Sq F value  Pr(>F)   
## Gender            1   3.25   3.251   7.379 0.00743 **
## Conflict          1   2.63   2.632   5.975 0.01575 * 
## Gender:Conflict   1   0.19   0.188   0.428 0.51424   
## Residuals       140  61.68   0.441                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Gender and Conflict main effect is sig. but not interaction. 
# To tell direction we check means but i will save interpretation for later

# Estimate effect size using Cohen's f

# install.packages("effectsize")
library(effectsize) # For cohen's f test

cohens_f(advancement_gender_aov, partial = TRUE, ci = 0.95, squared = FALSE, model2 = NULL)

## # Effect Size for ANOVA (Type I)
## 
## Parameter       | Cohen's f (partial) |      95% CI
## ---------------------------------------------------
## Gender          |                0.23 | [0.09, Inf]
## Conflict        |                0.21 | [0.07, Inf]
## Gender:Conflict |                0.06 | [0.00, Inf]
## 
## - One-sided CIs: upper bound fixed at [Inf].

# Gender - small effect size (0.23)

P value is <.05 for conflict but more importantly gender which has a cohen’s f of 0.23. If we look at the means, he we can see that the difference comes from females having a lower advancement score compared to males.

Visualisation

Now, I use ggplot2 from tidyverse to create a simple interaction plot that maps “Conflict” on the x axis, “advancement” on the y axis. I use a point and line geom which are split into two separate sets of lines/pints by grouping and colouring through the “Gender” variable. I then adjust the labels of the axes, title, legend, the theme/formatting of the plot and finally the limits (and their expansion value).

# Plot
plot_exploration_1 <- ggplot(
  data = plot_exploration_data, 
  mapping = aes(
    x = Conflict, 
    y = advancement,
    color = Gender,
    group = Gender)
) +
geom_point() +
geom_line() +
labs(
  title = "Advancement", 
  x = "Level of Conflict", 
  y = "Perceived Scientific Advancement"
) +
theme(
  plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
  axis.text.y = element_text(angle = 90, hjust=0.5),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = NA),
  panel.grid.major.x = element_blank(),
  panel.grid.major.y = element_line(
    linewidth = 0.3, 
    color="gray"
  ),
  panel.grid.minor.y = element_line(
    linewidth = 0.3, 
    color="gray"
  ),
  panel.border = element_rect(
    fill = NA, 
    colour = "black", 
    linewidth = 0.5,
  )
) +
scale_x_discrete(labels = c("Conflicting", "Consistent")) +
scale_y_continuous(limits = c(-1, 1),
                   breaks = seq(-1, 1, by = 1),
                   minor_breaks = seq(-1, 1, by = 0.2), # Adjust minor breaks
                   expand = expansion(add = c(0.1,0.1))
) +
scale_color_manual(
  values = c("lightpink1", "slategray2"), # Set colours of lines/points
  labels = c("Female", "Male")
) 


plot(plot_exploration_1)

Based on the participants that are looking at consistent headlines, this plot as well as the post-hoc ANOVA shows that women are more likely to view these scientific headlines as a setback in scientific advancement compared to men who view them as progress (small effect size, cohen’s f = 0.23). Conflicting headlines are viewed as setbacks in knowledge more than consistent headlines which are seen as advancements in knowledge (we already know this). These are the main effects and they are both significant. There was no significant interaction effect.

In general, this effect demonstrates a flawed way of thinking that contradicts scientific view that “new findings cannot generally reduce our knowledge” (from the article). Essentially, the advancement value the participants respond with really shouldn’t be <0 because conflicting findings don’t set back our scientific knowledge. When we compare sexes, this becomes even more interesting because we now see that this effect (conflicting -> setback) is even stronger in women than in men.

2. Duration Taken

While doing the verification report, I was surprised to see the “Duration (in seconds)” variable that qualtrics produces. I was curious about it and thought maybe those who spent longer in the survey (presumably) studying the headlines would maybe see greater scores across the measures. The variable ‘Duration (in seconds)’ is not perfect because we don’t really know what each participant is spending their time on but it’s the best way I have to model this.

N.B. I only do experiment 1 here because neither manipulation worked in experiment 2.

Tidy Data

First, I filter out all the rows with “Qualified” for the same reason as before. I then calculate three quantiles which i will use to separate the participants into a fast, middle and slow class based on the duration variable.

# Filter out "qualified" participants
data_exploratory_2 <- data_study_1 %>% 
  filter(Format == "Generic")

# Calculate three quantiles
quantile(
  data_exploratory_2$`Duration (in seconds)`, 
  probs = c(0,1/3,2/3,1))

##        0% 33.33333% 66.66667%      100% 
##  316.0000  414.6667  511.3333 1131.0000

# Assign duration classes
data_exploratory_2 <- data_study_1 %>%
  mutate(duration_class = case_when(
    `Duration (in seconds)` < (414 + 2/3) ~ "Fast",
    `Duration (in seconds)` >= (414 + 2/3) & `Duration (in seconds)` <= (511 + 1/3) ~ "Middle",
    `Duration (in seconds)` > (511 + 1/3) ~ "Slow"
  ))

Statistical Analysis

I follow a similar procedure as the previous exploratory question. I first use aggregate() from base R to see the marginal means and cell means (duration class x conflict). Then, I perform a two way ANOVA (alpha = .05) but since I don’t find any significant differences for the duration classes, I don’t calculate an effect size like I did before.

# Means
aggregate(contradiction ~ duration_class + Conflict, data = data_exploratory_2, mean)

##   duration_class  Conflict contradiction
## 1           Fast     Conf.      25.35417
## 2         Middle     Conf.      25.47059
## 3           Slow     Conf.      24.93750
## 4           Fast Non-Conf.      12.91304
## 5         Middle Non-Conf.      13.45833
## 6           Slow Non-Conf.      13.90566

aggregate(advancement ~ duration_class + Conflict, data = data_exploratory_2, mean)

##   duration_class  Conflict advancement
## 1           Fast     Conf. -0.12500000
## 2         Middle     Conf. -0.33333333
## 3           Slow     Conf. -0.27083333
## 4           Fast Non-Conf.  0.10869565
## 5         Middle Non-Conf.  0.02083333
## 6           Slow Non-Conf. -0.09433962

aggregate(confusion ~ duration_class + Conflict, data = data_exploratory_2, mean)

##   duration_class  Conflict confusion
## 1           Fast     Conf.  4.458333
## 2         Middle     Conf.  4.686275
## 3           Slow     Conf.  4.416667
## 4           Fast Non-Conf.  3.608696
## 5         Middle Non-Conf.  3.520833
## 6           Slow Non-Conf.  3.792453

# ANOVA F Overall Tests
contradiction_duration_aov <- aov(contradiction ~ duration_class*Conflict, data = data_exploratory_2)
summary(contradiction_duration_aov)

##                          Df Sum Sq Mean Sq F value Pr(>F)    
## duration_class            2     13       7   0.469  0.626    
## Conflict                  1  10244   10244 713.525 <2e-16 ***
## duration_class:Conflict   2     26      13   0.892  0.411    
## Residuals               288   4135      14                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

advancement_duration_aov <- aov(advancement ~ duration_class*Conflict, data = data_exploratory_2)
summary(advancement_duration_aov)

##                          Df Sum Sq Mean Sq F value Pr(>F)    
## duration_class            2   1.64   0.819   1.901  0.151    
## Conflict                  1   4.76   4.760  11.053  0.001 ***
## duration_class:Conflict   2   0.41   0.205   0.475  0.622    
## Residuals               288 124.03   0.431                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

confusion_duration_aov <- aov(confusion ~ duration_class*Conflict, data = data_exploratory_2)
summary(confusion_duration_aov)

##                          Df Sum Sq Mean Sq F value   Pr(>F)    
## duration_class            2   0.30    0.15   0.192   0.8258    
## Conflict                  1  56.67   56.67  72.151 1.08e-15 ***
## duration_class:Conflict   2   3.68    1.84   2.345   0.0977 .  
## Residuals               288 226.22    0.79                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

None of the duration classes show any significant difference across all three measures of exp 1.

Visualisation

For this, I wanted to reuse the pirate plot function I made in the verification section. I quickly realise that it doesn’t really work because there are a lot of hard coded arguments in the original custom function. There are also some things that I don’t want/no longer need such as the secondary labels, a different x variable, wrapping by conflict, other different labels/mappings. So I create a new custom function that slightly tweaks these things.

Trying to use old custom function

Here I try to use pirate.fun() that was written in the verification report:

# Call Function
# Contradiction
pirate_exploratory_1_test <- pirate.fun(
  data = data_exploratory_2,
  y_var = contradiction, 
  plot_title = "Contradiction", 
  y_title = "Perceived Contradiction", 
  lim1 = 0, 
  lim2 = 30, 
  break_size = 5, 
  add.minor.break = FALSE,
  violin_width = 0.4
)

# Advancement
pirate_exploratory_2_test <- pirate.fun(
  data = data_exploratory_2,
  y_var = advancement, 
  plot_title = "Advancement", 
  y_title = "Perceived Scientific Advancement", 
  lim1 = -1, 
  lim2 = 1, 
  break_size = 1, 
  add.minor.break = TRUE,
  no.minor.breaks = 5,
  violin_width = 1
)

# Confusion
pirate_exploratory_3_test <- pirate.fun(
  data = data_exploratory_2,
  y_var = confusion, 
  plot_title = "Confusion", 
  y_title = "Perceived Confusion", 
  lim1 = 1, 
  lim2 = 5, 
  break_size = 1, 
  add.minor.break = TRUE,
  no.minor.breaks = 2,
  violin_width = 1
)

# Combine
pirate_exploratory_test <- grid.arrange(
  pirate_exploratory_1_test, pirate_exploratory_2_test, 
  pirate_exploratory_3_test, whitegrob,
  ncol = 2, nrow = 2
)

png(filename="pirate_exploratory_test.png", width=1500, height=1000)
plot(pirate_exploratory_test) # Final plot is the png saved after this
dev.off()

## quartz_off_screen 
##                 2

Definitely not what I want. Need to get rid of the secondary labels, change the x variable to duration class, format label is no longer needed. I want three violins for the three duration classes next to eachother and I wanted this to be wrapped across the Conflict variable so that there will be 6 violins for each measure. To do this I make a new function that tweaks the old custom function. I remove everything I no longer need (secondary labels, format labels) and change the existing labels (conflict -> duration class). All the parameters are preserved and their input values shouldn’t change either. The only real thing that changes is that I know add facet_wrap(vars(Conflict)) to the end of the function.

Tweaked pirate plot function

Here I adjust the old function so that it fits better with our needs here:

pirate.fun.exploratory <- function(data, y_var, plot_title, y_title, lim1, lim2, break_size, add.minor.break, no.minor.breaks = 0, violin_width) {
  
# Boolean for minor breaks
minor_break <- if (add.minor.break) {
    element_line(
      linewidth = 0.3,
      color = "gray"
    )
  } else {
    element_blank()  # Otherwise no lines
  }

# Plot
ggplot(
  data = data,
  mapping = aes(
    x = factor(duration_class, levels = c('Slow', 'Middle', 'Fast')), # New variable
    y = {{y_var}}, # {{ }} indicates variable in dataset
    fill = duration_class
  )
) +
geom_violin(
  width = violin_width,
  linewidth = 0.9
) +
geom_jitter(
  stroke = 0,
  size = 1.5, 
  alpha = 0.3, 
  position = position_jitter(
    height = 0, 
    width = 0.04
  )
) +
stat_summary( 
    fun.data = "mean_cl_normal",  
    geom = "crossbar", 
    fill = "white",    
    alpha = .7,
    width = 0.5,
    linewidth = 1.1
  ) +
labs(
  title = plot_title, 
  x = "Survey Duration", # New label
  y = y_title
) +
theme(
  plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
  axis.text.y = element_text(angle = 90, hjust=0.5),
  panel.background = element_rect(fill = NA),
  panel.grid.major.x = element_blank(),
  panel.grid.major.y = element_line(
    linewidth = 0.3, 
    color="gray"
  ),
  panel.border = element_rect(
    fill = NA, 
    colour = "black", 
    linewidth = 0.5,
  ),
  legend.position = "none",
  panel.grid.minor.y = minor_break
) +
scale_y_continuous(limits = c(lim1, lim2),
                   breaks = seq(lim1, lim2, by = break_size),
                   minor_breaks = seq(lim1, lim2, by = (1/no.minor.breaks)),
                   # So that y axis/divided desired amount of minor breaks
                   expand = expansion(mult = c(0,0),
                                      add = c(  # standardises expansion
                                        (lim2-lim1)/30, 
                                        (lim2-lim1)/30
))) +
facet_wrap(vars(Conflict)) # Wrap by conflict
  
}  

# Call Function
# Contradiction
pirate_exploratory_1 <- pirate.fun.exploratory(
  data = data_exploratory_2,
  y_var = contradiction, 
  plot_title = "Contradiction", 
  y_title = "Perceived Contradiction", 
  lim1 = 0, 
  lim2 = 30, 
  break_size = 5, 
  add.minor.break = FALSE,
  violin_width = 0.4
)

# Advancement
pirate_exploratory_2 <- pirate.fun.exploratory(
  data = data_exploratory_2,
  y_var = advancement, 
  plot_title = "Advancement", 
  y_title = "Perceived Scientific Advancement", 
  lim1 = -1, 
  lim2 = 1, 
  break_size = 1, 
  add.minor.break = TRUE,
  no.minor.breaks = 5,
  violin_width = 1
)

# Confusion
pirate_exploratory_3 <- pirate.fun.exploratory(
  data = data_exploratory_2,
  y_var = confusion, 
  plot_title = "Confusion", 
  y_title = "Perceived Confusion", 
  lim1 = 1, 
  lim2 = 5, 
  break_size = 1, 
  add.minor.break = TRUE,
  no.minor.breaks = 2,
  violin_width = 1
)

# Combine
pirate_exploratory <- grid.arrange(
  pirate_exploratory_1, pirate_exploratory_2, 
  pirate_exploratory_3, whitegrob,
  ncol = 2, nrow = 2
)

png(filename="pirate_exploratory.png", width=1500, height=1000)
plot(pirate_exploratory) # Final plot is the png saved after this
dev.off()

## quartz_off_screen 
##                 2

After tweaking the old custom function, I got what I wanted. Bonus that the colours look good as well even though I didn’t intentionally do that. The plot, besides looking cool, visualises the lack of differences between duration classes (they are not significantly different). Hence, the time taken to finish the survey didn’t seem to affect the three measures in experiment 1.

3. Correlation between exp 2 measures

I’ve only done exploratory analysis for exp 1 but I still want to do some exp 2. Since neither manipulation in experiment 2 worked, I avoid looking too much into that. Instead, I remembered that while reading the paper that two of the measures in exp 2 seemed quite similar in terms of what they were measuring (E.g. “Mistrust of expertise” contrasts “Confidence in scientific community”). Consequently, I’ve decided to look into the correlation between the different measures. This is perfect because the strength of manipulations (or a lack thereof) should not affect the correlations between the measures. In other words, one measure will still be highly correlated with another measure regardless of whether the manipulation worked.

Statistical Analysis

I use cor() from the “corrplot” package to calculate Pearson’s correlation coefficient across the six selected measures in exp 2. These are displayed using head() rounded to two dp.

# install.packages("corrplot")
library(corrplot)

## corrplot 0.92 loaded

# Used to calculate correlation with cor() and plot correlogram with corrplot()

# Create correlation data for 6 measures of exp 2
data_correlation <- data_study_2 %>% 
  select(
    confusion,
    backlash,
    mistrust,
    confidence,
    certainty,
    development
  ) %>% 
  cor() 
head(round(data_correlation,2))

##             confusion backlash mistrust confidence certainty development
## confusion        1.00     0.47     0.26       0.41     -0.09       -0.09
## backlash         0.47     1.00     0.46       0.33     -0.05       -0.10
## mistrust         0.26     0.46     1.00       0.51     -0.29       -0.27
## confidence       0.41     0.33     0.51       1.00     -0.12       -0.12
## certainty       -0.09    -0.05    -0.29      -0.12      1.00        0.43
## development     -0.09    -0.10    -0.27      -0.12      0.43        1.00

Visualisation

Correlogram

Useful resource:

http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram#data-for-correlation-analysis

I then use the package’s eponymous corrplot() function to visualise all the correlations above. In the arguments, I specify “pie” in the method argument so that correlations are shown as pie charts and “upper” in the type argument so that the correlations aren’t diagonally mirrored.

# Correlogram plot
corrplot(data_correlation, method = "pie", type="upper")

Note that the first four measures all have weak (0.2-0.39) to moderate (0.4-0.59) correlation coefficients. The last two measures (Epistemic beliefs about Certainty and Development of knowledge) are also moderately correlated with each other. This makes a lot of sense because the first four measures relate to the headlines in the experiment while the last two relate to general beliefs about science. I am also pleasantly surprised to see that out of all the correlations, the Mistrust and Confidence measure are most strongest correlated (0.51).

Scatterplot between mistrust and confidence

I create a scatter plot with a linear regression line to visualise mistrust vs. confidence scores, the two most correlated measures in exp 2. I also reduce the alpha and slightly jitter the raw data points to make it easier to see the distribution of scores.

ggplot(
  data = data_study_2, 
  mapping = aes(
    x = mistrust, 
    y = confidence)
) +
geom_jitter(size = 1.5,
  alpha = .5,
  position = position_jitter(
    height = 0.1,
    width = 0.1)) +
geom_smooth(method = "lm", se = FALSE) +
labs(
  title = "Confidence vs. Mistrust",
  caption = "r = 0.51",
  x = "Mistrust of Expertise",
  y = "Confidence in the Scientific Community"
)

## `geom_smooth()` using formula = 'y ~ x'

We can see the correlation of these scores in greater detail using the raw score points and a regression line. Obviously, a moderate correlation between the scores is not the strongest but it is still worth noting.

Psyc 3361 Assignment

2024-06-13

Do Generic Attributions to ‘Scientists’ and ‘Experts’ Amplify Perceived Conflict?

Context

Verification

Set Up

Install Packages (if not already installed)

Load Packages

Set File Location using i_am()

Experiment 1

Read the Data

Count survey opens and consent

Tidy new data frame for experiment 1

1. Rename columns

2. Exclusions

Unplanned Exclusions

Planned Exclusions

3. separate() ‘condition’ column

4. Sum scores

Demographics

Participant Age and Gender

Mean Completion Time

Pirate plots / CI Descriptives

Our long coding journey

Issue 1: CI Calculation

Issue 2: Secondary Labels

Issue 3: x axis variable

Pirate Plots (non-pirate package/our version)

Contradiction:

Advancement:

Confusion:

Combine plots:

Functions

The Pirate Plot Function

Calling the function

Combine plots

Pirate Plots (pirate package/researchers’ version)

Comparison/Sanity Check

Our CI Descriptives

Researchers’ CI Descriptives

Summary Statistics

Histogram

Experiment 2

Read the Data

Count survey opens and consent

Tidy new data frame for experiment 2

1. Rename columns

2. Exclusions

Planned Exclusions

3. separate() ‘condition’ column

4. Sum scores

Demographics

Participant Age and Gender

Mean Completion Time

Pirate plots / CI Descriptives

Pirate plot with Functions (non-pirate package/our version)

Pirate Plots (pirate package/researchers’ version)

Comparison

Summary Statistics

Recommendations

Naming of files and columns

Better documentation/formatting of code

Justification and explanation of packages

Reflections

Exploratory

1. Are there sex-based differences in how consistent/conflicting scientific headlines are viewed as advancement/regression in knowledge?

Tidy Data

Statistical Analysis

Visualisation

2. Duration Taken

Tidy Data

Statistical Analysis

Visualisation

Trying to use old custom function

Tweaked pirate plot function

3. Correlation between exp 2 measures

Statistical Analysis

Visualisation

Correlogram

Scatterplot between mistrust and confidence