Exploratory analysis

Loading packages

#Loading relevant packages
library(qualtRics) #for reading data, filtering redundant rows and setting variables with numeric entries as 'numeric'
library(tidyverse) #for dplyr and ggplot

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggbeeswarm) #for a violin scatter plot
library(ggeasy) #for ggplot formatting shortcuts
library(patchwork) #for combining plots into a single output
library(gt) #for pretty plots
library(jmv) #for statistical analyses
library(ggpubr) #for ggplot in exploratory analysis 3a
library(corrplot) #for correlation matrix in exploratory analysis 3c

## corrplot 0.90 loaded

Reading data

mydata <- read.csv("MyDataFinalSubset.csv")
mydata2 <- read.csv("MyDataFinalSubset2.csv")

3a. Does gender impact perceived contradiction, advancement, and confusion?

Conflict manipulation visualisation

In experiment 1, there was a significant effect of Headline Conflict on perceived contradiction/advancement/confusion (those that saw conflicting headlines felt they were more contradictory, more confusing and resulted in us knowing less about how to be healthy than those who saw the non-conflicting headlines). I am interested to see whether this conflict manipulation results in alike perceived conflict between the sexes.

using the recode_factor() function to RENAME “1” to male and “2” to female under the Gender variable
- We are Assessing males and females - thus no filtering required for the mydata (Exp 1) dataset as the sample only consists of male and females
CHANGING the ‘Gender’ variable from character to factor using the as.factor() function.
CREATING our own function() in order to apply the same statements to all plots using a single chunk of code
- Using the ggboxplot() function from the ggpubr package to create “publication ready” boxplots! I decided to go with this style plot to challenge myself as I had not yet created a boxplot using Haigh & Birch’s (2020) data

mydata$Gender <- recode_factor(mydata$Gender,
                                      "1" = "Male",       #old name = new name
                                      "2" = "Female")
mydata <- mydata %>%
  mutate(Gender=as.factor(Gender)) 

library(ggpubr)

gender_boxplot_fun <- function(y_var, plot_title, y_title) {
  ggboxplot(
    mydata, x = "Conflict", y = y_var, color = "Conflict", facet.by = "Gender", add =
      "jitter") + 
  facet_wrap(vars(Gender), strip.position = "bottom") + 
  ggtitle(label = plot_title) +
  easy_center_title() + 
  scale_x_discrete(name = "Headline Conflict") +
  scale_y_continuous(name = y_title) +
  easy_remove_legend() }

#Creating Contradiction, Advancement and Confusion boxplots using the function 
gender_contradiction_boxplot <- gender_boxplot_fun(y_var = "contradiction", plot_title = "Gender differences in Conflict manipulation: Contradiction", y_title = "Perceived Contradiction") 
gender_advancement_boxplot <- gender_boxplot_fun(y_var = "advancement", plot_title = "Gender differences in Conflict manipulation: Advancement", y_title = "Perceived Scientific Advancement")
gender_confusion_boxplot <- gender_boxplot_fun(y_var = "confusion", plot_title = "Gender differences in Conflict manipulation: Confusion", y_title = "Confusion")

#Printing boxplots
print(gender_contradiction_boxplot)

print(gender_advancement_boxplot)

print(gender_confusion_boxplot)

Judging by the dot points on the boxplot, there does not appear to be a noticeable difference in the distribution of scores between males and females on the perceived contradiction scale as well the confusion scale. For male advancement scores however, I can see that there is a greater density of points towards the top of the scale, while females have a higher density of points towards the bottom of the scale. I can also see that the interquartile range of advancement scores lies higher for males exposed to non-conflicting headlines relative to conflicting headlines, however such is not evident for females. This leads me to believe that the effect of conflict manipulation is greater for males, relative to males, however we won’t know for sure until we conduct a statistical analysis.

Assessment of group means directly allows for an alternative descriptive assessment of scores relative to each gender:

Conflicting vs non. conflicting group means, grouped by gender

I am calculating the the mean, sd, n, and se of confidence/advancement/confusion scores for each conflict condition and each gender using the group_by() and summarise() functions.
I will also pipe in the gt() function using the gt package to display these results in a pretty table, title it using the tab_header() function, and round decimals to 2 places using the fmt_number function.

A REMINDER OF THE SCALE OF THE DV: In experiment 1, participants were tested on their perceived level of SCIENTIFIC ADVANCEMENT: participants were asked “When we take the results reported in these headlines together, do we now know more, less or the same as we did before about how to be healthy?”. They indicated their response on a 3-point scale (-1 = we know less, 0 = we know the same amount, 1 = we know more).

##Conflicting vs non. conflicting group means

#Perceived Contradiction: 
mydata %>%
  group_by(Gender, Conflict) %>%   # Specify data frame and group indicators
  summarise(Mean = mean(contradiction), # Specify column and functions:
            SD = sd(contradiction),             
            n = n(),                        
            SE = SD/sqrt(n)) %>% 
  gt() %>% 
  tab_header(title = "Effect of Headline Conflict on Perceived Contradiction (grouped by Gender)") %>% 
  fmt_number(c(Mean, SD, SE), decimals = 2)

## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.

Effect of Headline Conflict on Perceived Contradiction (grouped by Gender)
Conflict	Mean	SD	n	SE
Male
Conf.	24.97	3.97	60	0.51
Non-Conf.	13.39	3.67	66	0.45
Female
Conf.	25.46	3.56	87	0.38
Non-Conf.	13.49	3.98	81	0.44

#Perceived Scientific Advancement (PSA): 
mydata %>% 
  group_by(Gender, Conflict) %>%
  summarise(Mean = mean(advancement),        
            SD = sd(advancement),             
            n = n(),                        
            SE = SD/sqrt(n)) %>% 
  gt() %>% 
  tab_header(title = "Effect of Headline Conflict on PSA (grouped by Gender)") %>% 
  fmt_number(c(Mean, SD, SE), decimals = 2)

## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.

Effect of Headline Conflict on PSA (grouped by Gender)
Conflict	Mean	SD	n	SE
Male
Conf.	−0.10	0.71	60	0.09
Non-Conf.	0.12	0.67	66	0.08
Female
Conf.	−0.34	0.59	87	0.06
Non-Conf.	−0.09	0.66	81	0.07

#Confusion: 
mydata %>%        
  group_by(Gender, Conflict) %>%           
  summarise(Mean = mean(confusion),       
            SD = sd(confusion),             
            n = n(),                        
            SE = SD/sqrt(n))  %>%
  gt() %>% 
  tab_header(title = "Effect of Headline Conflict on Confusion (grouped by Gender)") %>% 
  fmt_number(c(Mean, SD, SE), decimals = 2)

## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.

Effect of Headline Conflict on Confusion (grouped by Gender)
Conflict	Mean	SD	n	SE
Male
Conf.	4.57	0.59	60	0.08
Non-Conf.	3.58	1.10	66	0.13
Female
Conf.	4.49	0.70	87	0.07
Non-Conf.	3.70	1.05	81	0.12

Statistics

Since the difference in advancement scores between males and females caught my attention, I initially thought to compare the average advancement scores of females and males exposed to EACH level of the Headline Conflict factor: Conflict (Conf.) and Consistent (Non-Conf.). I did so using INDEPENDENT SAMPLES T-TESTS (for each level) via the ttestIS() function in the jmv package.

conflict <- mydata %>%
  filter(Conflict == "Conf.")

consistent <- mydata %>%
  filter(Conflict == "Non-Conf.")

ttestIS(formula = advancement ~ Gender, data = conflict)

## 
##  INDEPENDENT SAMPLES T-TEST
## 
##  Independent Samples T-Test                                           
##  ──────────────────────────────────────────────────────────────────── 
##                                  Statistic    df          p           
##  ──────────────────────────────────────────────────────────────────── 
##    advancement    Student's t     2.286083    145.0000    0.0236983   
##  ────────────────────────────────────────────────────────────────────

ttestIS(formula = advancement ~ Gender, data = consistent)

## 
##  INDEPENDENT SAMPLES T-TEST
## 
##  Independent Samples T-Test                                           
##  ──────────────────────────────────────────────────────────────────── 
##                                  Statistic    df          p           
##  ──────────────────────────────────────────────────────────────────── 
##    advancement    Student's t     1.893226    145.0000    0.0603203   
##  ────────────────────────────────────────────────────────────────────

Results from the t test reveal there is a statistical difference in perceived scientific advancement between males and females exposed to conflicting headlines (p < 0.05), however there is an insignificant difference between males and females exposed to the non-conflicting headlines (p > 0.05).

WHAT DOES THIS MEAN? According to the group means calculated earlier, females exposed to conflicting headlines perceived that scientific knowledge had advanced to a lesser extent (M = -0.34) than males (-0.10).

Although I arrived at a significant result, the T-tests I conducted do not inform a presence of a main effect or interaction effect between the multi-level factors. An ANOVA test on each perceived conflict measure is required to investigate this

Is there a main effect of Gender on contradiction, scientific advancement and/or confusion (averaged across Headline Conflict)?
Does the effect of Conflict manipulation (Conflict vs. Consistent) on the three measures differ as a function of Gender (Male vs Female) i.e. is there a significant interaction between Headline Conflict and Gender?

ANOVA(formula = advancement ~ Conflict * Gender, data = mydata)

## 
##  ANOVA
## 
##  ANOVA - advancement                                                                    
##  ────────────────────────────────────────────────────────────────────────────────────── 
##                       Sum of Squares    df     Mean Square    F             p           
##  ────────────────────────────────────────────────────────────────────────────────────── 
##    Conflict               4.13300569      1     4.13300569    9.78581314    0.0019375   
##    Gender                 3.67816208      1     3.67816208    8.70886941    0.0034251   
##    Conflict:Gender        0.02485749      1     0.02485749    0.05885565    0.8084853   
##    Residuals            122.48053717    290     0.42234668                              
##  ──────────────────────────────────────────────────────────────────────────────────────

ANOVA(formula = contradiction ~ Conflict * Gender, data = mydata)

## 
##  ANOVA
## 
##  ANOVA - contradiction                                                                    
##  ──────────────────────────────────────────────────────────────────────────────────────── 
##                       Sum of Squares    df     Mean Square    F              p            
##  ──────────────────────────────────────────────────────────────────────────────────────── 
##    Conflict              9954.864784      1    9954.864784    694.3783858    < .0000001   
##    Gender                   6.317831      1       6.317831      0.4406856     0.5073191   
##    Conflict:Gender          2.778006      1       2.778006      0.1937733     0.6601224   
##    Residuals             4157.547018    290      14.336369                                
##  ────────────────────────────────────────────────────────────────────────────────────────

ANOVA(formula = confusion ~ Conflict * Gender, data = mydata)

## 
##  ANOVA
## 
##  ANOVA - confusion                                                                        
##  ──────────────────────────────────────────────────────────────────────────────────────── 
##                       Sum of Squares    df     Mean Square    F              p            
##  ──────────────────────────────────────────────────────────────────────────────────────── 
##    Conflict              57.01942311      1    57.01942311    72.05365069    < .0000001   
##    Gender                 0.05540684      1     0.05540684     0.07001588     0.7915016   
##    Conflict:Gender        0.72126228      1     0.72126228     0.91143645     0.3405287   
##    Residuals            229.49056078    290     0.79134676                                
##  ────────────────────────────────────────────────────────────────────────────────────────

The results for each ANOVA replicate Haigh & Birch’s (2020) findings that the conflict manipulation was perceived as intended, indicated by significant main effects of Headline Conflict on perceived scientific advancement (p = .001), perceived contradiction (p < .001), and confusion (p < .001) - YAY!

For confusion and contradiction measures, there was no significant main effect of Gender or Gender x Conflict interaction. On the other hand for the advancement measure, there is a significant main effect of Gender! (p < .01). However there is no significant Gender x Conflict interaction.

WHAT DOES THIS MEAN? (To interpret the direction of this significant main effect I drew reference from the table of group means I calculated earlier): Averaged across Conflict condition, females perceived that scientific knowledge had advanced to a lesser extent than males.

3b. Is there a correlation between age and confidence in the scientific community?

In the face of huge leaps in scientific findings surrounding nutrition and diet, it would be naive to say that all age groups hold the same beliefs and willingness to accept to changing recommendations by experts. Throughout the course of my coding journey I have wondered why differences in age have not been explored alongside perceived conflict in scientific consensus.

While there are 6 variables in experiment 2 that can be considered, I am leaning most towards confidence in the scientific community. This is because of my personal experience observing the opinions of my older friends and relatives, whom often display more scepticism towards nutritional and dietary advice given to them and developments in science more broadly. On the other hand, I tend to notice my younger friends and relatives being more open towards developments in science, and I personally am no stranger to the fact that disproving a theory is just as important as proving one thanks to my psychology courses!

For this reason, I beg the question, is there a correlation between age and confidence in the scientific community? And more specifically, are older people less confident in the scientific community compared to their younger counterparts?

Descriptive statistics

I will CREATE a new variable using the mutate() function, which groups ages into 7 groups - WHY? I am interested in looking at the mean confidence scores for each age, and rather than looking at 73 rows of data, looking at groups of relative ages will help me get a gist of any trend that is present in the data
As per Jenny S’s advice, I will begin by looking at descriptive statistics, namely the mean, sd, n, and se of confidence scores for each age group using the group_by() and summarise() functions.
I will also implement the gt() function using the gt package to display these results in a pretty table.

A REMINDER OF THE SCALE OF THE DV: - In exp 2, participants were tested on the degree to which they demonstrated a lack of CONFIDENCE IN THE SCIENTIFIC COMMUNITY: participants were asked “How much confidence would you say you have in the scientific community?”. They indicated their response on a 3-point scale (1 = a great deal of confidence; 2 = only some confidence; 3 = hardly any confidence at all).

mydata3b <- mutate(mydata2, age_group = cut_number(Age, n = 7))

age_confidence_means <- mydata3b %>%         # Specify data frame
  group_by(age_group) %>%                         # Specify group indicators
  summarise(Mean = mean(confidence),               # Specify column and functions:
            SD = sd(confidence),             
            n = n(),                        
            SE = SD/sqrt(n))

age_confidence_means %>%
  gt() %>% 
  tab_header(title = "Confidence in the Scientific Community (grouped by Age)")

Confidence in the Scientific Community (grouped by Age)
age_group	Mean	SD	n	SE
[18,21]	1.379310	0.5240727	58	0.06881411
(21,25]	1.375000	0.4879500	64	0.06099375
(25,29]	1.491803	0.5040817	61	0.06454105
(29,32]	1.392157	0.5321064	51	0.07450980
(32,37]	1.377358	0.4893644	53	0.06721937
(37,48]	1.568966	0.5335110	58	0.07005343
(48,73]	1.563636	0.5697185	55	0.07682082

It appears that with increasing age, the mean scores on the confidence scale increase slightly. It may be the case that age is moderately correlated with confidence, however we won’t know for sure until we conduct a statistical analysis.

Visualisation

geom_jitter - I have taken preference to use geom_jitter() over geom_point. I had quite a bit of over-plotting when using the latter - Conversely, geom_jitter() adds a small amount of random noise to data thus spreading out points that would otherwise be overplotted. Since the confidence scale is non-continuous, jittering the data into the whitespace surrounding the discrete values allows the individual points to be seen

#Assessing across all conditions
age_confidence_plot <- ggplot(mydata2, aes(Age, confidence)) +
  geom_jitter() +       
  geom_smooth(method = "lm") +
  scale_y_continuous(name = "Confidence in the Scientific Community") + 
  ggtitle(label = "Age and Confidence in the Scientific Community")

print(age_confidence_plot)

## `geom_smooth()` using formula 'y ~ x'

#Assessing by condition
age_confidence_plotbycondition <- ggplot(mydata2, aes(Age, confidence, fill = Conflict)) +
  geom_jitter() +
  geom_smooth(method = "lm") +
  facet_grid(vars(Format, Conflict)) + 
  scale_y_continuous(name = "Confidence in the Scientific Community") + 
  ggtitle(label = "Age and Confidence in the Scientific Community")

print(age_confidence_plotbycondition)

## `geom_smooth()` using formula 'y ~ x'

Again, it’s hard to tell whether there is an increase in scores as a function of age, but by the looks of the lines of best fit, they seem to slope which makes me think that there is a correlation to some degree!

I happened to notice that the two eldest participants whom were both 73 years of age were both allocated to the Qualified-Conflict condition…ages were not distributed methodically across conditions?

Statistics

cor.test() using age_confidence_means data set - such produced a correlation coefficient of 4.6 - does not take into account the variability within groups of people with the same age

cor.test(mydata2$Age, mydata2$confidence)

## 
##  Pearson's product-moment correlation
## 
## data:  mydata2$Age and mydata2$confidence
## t = 2.714, df = 398, p-value = 0.006937
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03723952 0.22981325
## sample estimates:
##      cor 
## 0.134799

According to Pearson’s product-moment correlation test there is only a small correlation (p = 0.1) between age and confidence in the scientific community (As a higher score on the scale reflects less confidence in the scientific community, the higher one’s age the lower their confidence in the scientific community, but only to a small degree).

3c. What is the relationship between the global beliefs measured in experiment 2?

The rationale behind the use of the 6 global measures seem to be underpinned by previous studies. For example, the nutritional confusion measure is a six-item scale previously established by Clark et al. (2019) to measure confusion about nutritional advice, and the mistrust of expertise measure is taken from Oliver and Rahn’s (2016) scale that assessed general skepticism of science and expert opinion.

While Haigh & Birch (2020) measured the internal reliability of the measures using Cronbach’s alpha (each indicated a good level of reliability (> 0.70)), the relationship between the five scales created by SEPERATE experimenters is yet to be established. Although Headline Conflict had no effect on any of the six global measures, nor did Headline Format, I feel as though identifying correlations between scores of different scales across all conditions will either justify or discredit the trend in results hypothesised by experimenters.

Visualisation

Experimenters expected a common theme in scores of negative global beliefs, whereby conflicting headlines would induce a greater sense of general Nutritional Confusion, greater Nutritional Backlash, greater mistrust in expertise and lower confidence in the Scientific Community. I decided to plot Nutritional Confusion and Nutritional Backlash scores to get a sense of the relationship between these two negative beliefs. According to the experimenters, it should be a positive relationship.

confusion_backlash_plot <- ggplot(mydata2, aes(confusion, backlash)) +
  geom_jitter() +       
  geom_smooth(method = "lm")
print(confusion_backlash_plot)

## `geom_smooth()` using formula 'y ~ x'

I am also looking to visualise scores indicating sophisticated epistemic beliefs surrounding the certainty of scientific knowledge and the development of scientific knowledge. There should be a positive relationship between the two.

certainty_development_plot <- ggplot(mydata2, aes(certainty, development)) +
  geom_jitter() +       
  geom_smooth(method = "lm")
print(certainty_development_plot)

## `geom_smooth()` using formula 'y ~ x'

I am also interested to see how negative beliefs are related to positive benefits, as I struggle to grasp how those exposed to conflicting headlines would have stronger negative beliefs but also demonstrate more sophisticated beliefs that scientific knowledge is uncertain /constantly developing, as such was proposed by experimenters. I alternatively hypothesise that those who display greater mistrust in expertise will LACK the awareness that scientific knowledge is uncertain.

mistrust_certainty_plot <- ggplot(mydata2, aes(mistrust, certainty)) +
  geom_jitter() +       
  geom_smooth(method = "lm")
print(mistrust_certainty_plot)

## `geom_smooth()` using formula 'y ~ x'

As I suspected, it appears that those who display greater mistrust in expertise lack the awareness that scientific knowledge is uncertain. The degree of the negative correlation seems to be weak/moderate.

Correlation matrix

To establish correlations between all 6 variables, I am creating a subset of the data to include the variables of interest using the select() function and piping the cor() function in to conduct the analysis
To display correlations in a visual manner, I am using the corrplot function from the corrplot package and selecting the “colour” method.

corr_data <- mydata2 %>%
  select(confusion, backlash, mistrust, confidence, certainty, development) %>% # variables of interest
  cor()    # piping cor() function in

corrplot(corr_data, method = "color", type = "lower")

A closer look: Pearson’s correlation coefficients and significance levels

using corrMatrix() function from the jmv package to examine linear relationships between all 6 continuous variables
- By default this function provides Pearson’s correlation coefficients (Pearson’s r values) and significance levels (p-value)

corrMatrix(mydata2, 
           vars(confusion, backlash, mistrust, confidence, certainty, development)) #a vector of strings naming the variables to correlate in data

## 
##  CORRELATION MATRIX
## 
##  Correlation Matrix                                                                                                  
##  ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
##                                  confusion     backlash      mistrust      confidence    certainty     development   
##  ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
##    confusion      Pearson's r             —                                                                          
##                   p-value                 —                                                                          
##                                                                                                                      
##    backlash       Pearson's r     0.4664511             —                                                            
##                   p-value        < .0000001             —                                                            
##                                                                                                                      
##    mistrust       Pearson's r     0.2594596     0.4602377             —                                              
##                   p-value         0.0000001    < .0000001             —                                              
##                                                                                                                      
##    confidence     Pearson's r     0.4092529     0.3327275     0.5122005             —                                
##                   p-value        < .0000001    < .0000001    < .0000001             —                                
##                                                                                                                      
##    certainty      Pearson's r    -0.0884529    -0.0514938    -0.2901025    -0.1170295             —                  
##                   p-value         0.0772306     0.3042629    < .0000001     0.0192159             —                  
##                                                                                                                      
##    development    Pearson's r    -0.0929888    -0.0987324    -0.2675103    -0.1186132     0.4253782              —   
##                   p-value         0.0631713     0.0484624    < .0000001     0.0176336    < .0000001              —   
##  ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

I am yet to interpret results, but I will be focusing on the strong correlations: - Mistrust in Expertise and Confidence in the Scientific Community - The certainty and development of scientific knowledge

I will also discuss the correlation between sophisticated epistemic beliefs about the certainty/development of scientific knowledge and Mistrust in Expertise. As I suspected, there is a weak negative correlation.

Lauren’s Learning Log (9)

Lauren Abdallah

01/08/2021

Coding goals for week 9

Successes and challenges

Exploratory analysis

Loading packages

Reading data

3a. Does gender impact perceived contradiction, advancement, and confusion?

Conflict manipulation visualisation

Conflicting vs non. conflicting group means, grouped by gender

Statistics

3b. Is there a correlation between age and confidence in the scientific community?

Descriptive statistics

Visualisation

Statistics

3c. What is the relationship between the global beliefs measured in experiment 2?

Visualisation

Correlation matrix

A closer look: Pearson’s correlation coefficients and significance levels