Coding goals for week 7

My challenging AND successful exploratory analysis attempts :D

Loading packages

#Loading relevant packages
library(qualtRics) #for reading data, filtering redundant rows and setting variables with numeric entries as 'numeric'
library(tidyverse) #for dplyr and ggplot
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggbeeswarm) #for a violin scatter plot
library(ggeasy) #for ggplot formatting shortcuts
library(patchwork) #for combining plots into a single output
library(gt) #for pretty plots
library(jmv) #for statistical analyses

Reading data

mydata <- read.csv("MyDataFinalSubset.csv")
mydata2 <- read.csv("MyDataFinalSubset2.csv")

Question 1. Does gender impact perceived contradiction, advancement, and/or confusion?

Conflict manipulation visualisation

In experiment 1, there was a significant effect of Headline Conflict on perceived contradiction/advancement/confusion (those that saw conflicting headlines felt they were more contradictory, more confusing and resulted in us knowing less about how to be healthy than those who saw the non-conflicting headlines). I am interested to see the degree to which Headline Conflict impacts perceived conflict for males and females, and whether this effect is alike between the sexes.

  • using the recode_factor() function to RENAME “1” to male and “2” to female under the Gender variable
    • We are Assessing males and females - thus no filtering required for the mydata (Exp 1) dataset as the sample only consists of male and females
  • CHANGING the ‘Gender’ variable from character to factor using the as.factor() function.
  • CREATING our own function() in order to apply the same statements to all plots using a single chunk of code
mydata$Gender <- recode_factor(mydata$Gender,
                                      "1" = "Male",       #old name = new name
                                      "2" = "Female")
mydata <- mydata %>%
  mutate(Gender=as.factor(Gender)) 

gender.fun <- function(y_var, plot_title, y_title, lim_1, lim_2)  {
  ggplot(mydata,aes(x = Conflict, y = y_var, fill = Conflict)) +
  geom_violin() +
  facet_wrap(vars(Gender), strip.position = "bottom") +
  stat_summary(fun.data = "mean_cl_normal", geom = "crossbar", fill = "white",
    alpha = .7) +
  geom_beeswarm(cex = 0.2) + 
  ggtitle(label = plot_title) +
  easy_center_title() +
  easy_remove_legend() +
  scale_x_discrete(name = NULL) +
  scale_y_continuous(name = y_title, limits = c(lim_1, lim_2)) +
  scale_fill_manual(values = c("slategray2", "lightpink1")) }

#Plotting Contradiction, Advancement and Confusion plots using function 
gender.contradiction.plot <- gender.fun(y_var = mydata$contradiction, plot_title = "Gender Differences in Perceived Conflict: Contradiction", y_title = "Perceived Contradiction", lim_1 = 1, lim_2= 30) 
gender.advancement.plot <- gender.fun(y_var = mydata$advancement, plot_title = "Gender Differences in Perceived Conflict: Advancement", y_title = "Perceived Scientific Advancement", lim_1 = -1, lim_2 = 1)
gender.confusion.plot <- gender.fun(y_var = mydata$confusion, plot_title = "Gender Differences in Perceived Conflict: Confusion", y_title = "Confusion", lim_1 = 1, lim_2 = 5)

print(gender.contradiction.plot)

print(gender.advancement.plot)

print(gender.confusion.plot)

From these plots, there does not appear to be a noticeable difference in scores between males and females on the perceived contradiction scale as well the confusion scale. In contrast, I can see that advancement scores are higher for males relative to females.

Assessment of group means directly allows for an alternative descriptive assessment of scores relative to each gender.

Descriptives: Conflicting vs non. conflicting group means, according to gender

  • I am calculating the the mean, sd, n, and se of confidence/advancement/confusion scores for each conflict condition within each gender using the group_by() and summarise() functions.
  • I will also implement the gt() function using the gt package to display these results in a pretty table.

A REMINDER OF THE SCALE OF THE DV: In experiment 1, participants were tested on their perceived level of SCIENTIFIC ADVANCEMENT: participants were asked “When we take the results reported in these headlines together, do we now know more, less or the same as we did before about how to be healthy?”. They indicated their response on a 3-point scale (-1 = we know less, 0 = we know the same amount, 1 = we know more).

##Conflicting vs non. conflicting group means

#Perceived Contradiction: 
contradiction_means_gender <- mydata %>%         # Specify data frame
  group_by(Gender, Conflict) %>%            # Specify group indicators
  summarise(mean = mean(contradiction),         # Specify column and function
            sd = sd(contradiction),             
            n = n(),                        
            se = sd/sqrt(n))
## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.
contradiction_means_gender %>% 
  gt() %>% 
  tab_header(
    title = "Effect of Headline Conflict on Perceived Contradiction (grouped by Gender)")
Effect of Headline Conflict on Perceived Contradiction (grouped by Gender)
Conflict mean sd n se
Male
Conf. 24.96667 3.965814 60 0.5119844
Non-Conf. 13.39394 3.666391 66 0.4513016
Female
Conf. 25.45977 3.556208 87 0.3812656
Non-Conf. 13.49383 3.981593 81 0.4423993
#Perceived Scientific Advancement (PSA): 
advancement_means_gender <- mydata %>%         # Specify data frame
  group_by(Gender, Conflict) %>%            # Specify group indicators
  summarise(mean = mean(advancement),         # Specify column and function
            sd = sd(advancement),             
            n = n(),                        
            se = sd/sqrt(n))
## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.
advancement_means_gender %>% 
  gt() %>% 
  tab_header(
    title = "Effect of Headline Conflict on PSA (grouped by Gender)")
Effect of Headline Conflict on PSA (grouped by Gender)
Conflict mean sd n se
Male
Conf. -0.10000000 0.7059073 60 0.09113224
Non-Conf. 0.12121212 0.6682964 66 0.08226160
Female
Conf. -0.34482759 0.5872202 87 0.06295662
Non-Conf. -0.08641975 0.6556968 81 0.07285520
#Confusion: 
confusion_means_gender <- mydata %>%         # Specify data frame
  group_by(Gender, Conflict) %>%            # Specify group indicators
  summarise(mean = mean(confusion),         # Specify column and function
            sd = sd(confusion),             
            n = n(),                        
            se = sd/sqrt(n)) 
## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.
confusion_means_gender %>%
  gt() %>% 
  tab_header(
    title = "Effect of Headline Conflict on Confusion (grouped by Gender)")
Effect of Headline Conflict on Confusion (grouped by Gender)
Conflict mean sd n se
Male
Conf. 4.566667 0.5928005 60 0.07653022
Non-Conf. 3.575758 1.0962959 66 0.13494470
Female
Conf. 4.494253 0.6967293 87 0.07469722
Non-Conf. 3.703704 1.0540926 81 0.11712139

Statistics

Since the difference in advancement scores between males and females caught my attention, I want to compare the average advancement scores of females and males exposed to EACH level of the Headline Conflict factor: Conflict (Conf.) and Consistent (Non-Conf.). - I will do this using INDEPENDENT SAMPLES T-TESTS (for each level) via the ttestIS() function in the jmv package

conflict <- mydata %>%
  filter(Conflict == "Conf.")

consistent <- mydata %>%
  filter(Conflict == "Non-Conf.")

ttestIS(formula = advancement ~ Gender, data = conflict)
## 
##  INDEPENDENT SAMPLES T-TEST
## 
##  Independent Samples T-Test                                           
##  ──────────────────────────────────────────────────────────────────── 
##                                  Statistic    df          p           
##  ──────────────────────────────────────────────────────────────────── 
##    advancement    Student's t     2.286083    145.0000    0.0236983   
##  ────────────────────────────────────────────────────────────────────
ttestIS(formula = advancement ~ Gender, data = consistent)
## 
##  INDEPENDENT SAMPLES T-TEST
## 
##  Independent Samples T-Test                                           
##  ──────────────────────────────────────────────────────────────────── 
##                                  Statistic    df          p           
##  ──────────────────────────────────────────────────────────────────── 
##    advancement    Student's t     1.893226    145.0000    0.0603203   
##  ────────────────────────────────────────────────────────────────────

Results from the t test reveal there is a statistical difference in perceived scientific advancement between males and females exposed to conflicting headlines (p < 0.05), however there is no significant difference between males and females exposed to the non-conflicting headlines (p > 0.05).

WHAT DOES THIS MEAN? According to the group means calculated earlier, females exposed to conflicting headlines perceived that scientific knowledge had advanced to a lesser extent (M = -0.34) than males (-0.10).

Question 2. Is there a correlation between age and confidence in the scientific community?

In the face of huge leaps in scientific findings surrounding nutrition and diet, it would be naive to say that all age groups hold the same beliefs and willingness to accept to changing recommendations by experts. Throughout the course of my coding journey I have wondered why differences in age have not been explored alongside perceived conflict in scientific consensus.

While there are 6 variables in experiment 2 that can be considered, I am leaning most towards confidence in the scientific community. This is because of my personal experience observing the opinions of my older friends and relatives, whom often display more skepticism towards nutritional and dietary advice given to them and developments in science more broadly. On the other hand, I tend to notice my younger friends and relatives being more open towards developments in science, and I personally am no stranger to the fact that disproving a theory is just as important as proving one thanks to my psychology courses!

For this reason, I beg the question, is there a correlation between age and confidence in the scientific community? And more specifically, are older people less confident in the scientific community compared to their younger counterparts?

Descriptive statistics

  • As per Jenny S’s advice, I will begin by looking at descriptive statistics, namely the mean, sd, n, and se of confidence scores for each present age using the group_by() and summarise() functions.
  • I will also implement the gt() function using the gt package to display these results in a pretty table.

A REMINDER OF THE SCALE OF THE DV: In exp 2, participants were tested on the degree to which they demonstrated a lack of CONFIDENCE IN THE SCIENTIFIC COMMUNITY: participants were asked “How much confidence would you say you have in the scientific community?”. They indicated their response on a 3-point scale (1 = a great deal of confidence; 2 = only some confidence; 3 = hardly any confidence at all).

age_confidence_means <- mydata2 %>%         # Specify data frame
  group_by(Age) %>%                         # Specify group indicators
  summarise(mean = mean(GSS),               # Specify column and functions:
            sd = sd(GSS),             
            n = n(),                        
            se = sd/sqrt(n))

age_confidence_means %>%
  gt() %>% 
  tab_header(title = "Confidence in the Scientific Community (grouped by Age)")
Confidence in the Scientific Community (grouped by Age)
Age mean sd n se
18 1.250000 0.4522670 12 0.1305582
19 1.272727 0.4670994 11 0.1408358
20 1.526316 0.6117753 19 0.1403509
21 1.375000 0.5000000 16 0.1250000
22 1.263158 0.4524139 19 0.1037909
23 1.375000 0.5175492 8 0.1829813
24 1.380952 0.4976134 21 0.1085881
25 1.500000 0.5163978 16 0.1290994
26 1.428571 0.5070926 21 0.1106567
27 1.692308 0.4803845 13 0.1332347
28 1.375000 0.5175492 8 0.1829813
29 1.473684 0.5129892 19 0.1176878
30 1.529412 0.6242643 17 0.1514063
31 1.411765 0.5072997 17 0.1230382
32 1.235294 0.4372373 17 0.1060456
33 1.428571 0.5345225 7 0.2020305
34 1.545455 0.5222330 11 0.1574592
35 1.250000 0.4522670 12 0.1305582
36 1.300000 0.4830459 10 0.1527525
37 1.384615 0.5063697 13 0.1404417
38 1.250000 0.5000000 4 0.2500000
39 1.666667 0.5773503 3 0.3333333
40 1.333333 0.5773503 3 0.3333333
41 1.800000 0.4472136 5 0.2000000
42 1.818182 0.6030227 11 0.1818182
43 1.714286 0.4879500 7 0.1844278
44 1.500000 0.5773503 4 0.2886751
45 1.250000 0.5000000 4 0.2500000
46 1.428571 0.5345225 7 0.2020305
47 1.500000 0.5773503 4 0.2886751
48 1.500000 0.5477226 6 0.2236068
49 1.200000 0.4472136 5 0.2000000
50 1.166667 0.4082483 6 0.1666667
51 1.750000 0.5000000 4 0.2500000
52 1.250000 0.5000000 4 0.2500000
53 1.800000 0.4472136 5 0.2000000
54 2.000000 NA 1 NA
55 2.000000 NA 1 NA
56 1.750000 0.5000000 4 0.2500000
57 1.500000 0.7071068 2 0.5000000
58 1.666667 1.1547005 3 0.6666667
59 1.500000 0.7071068 2 0.5000000
60 1.500000 0.7071068 2 0.5000000
61 2.000000 0.0000000 2 0.0000000
62 2.000000 1.0000000 3 0.5773503
63 1.333333 0.5773503 3 0.3333333
64 1.500000 0.5773503 4 0.2886751
65 1.500000 0.7071068 2 0.5000000
72 2.000000 NA 1 NA
73 2.000000 NA 1 NA

It appears that with increasing age, the mean scores on the confidence scale appear to increase. It may be the case that age is strongly or moderately correlated with confidence, however we won’t know for sure until we conduct a statistical analysis.

Visualisation

#Assessing across all conditions
age_confidence_plot <- ggplot(mydata2, aes(Age, GSS)) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_y_continuous(name = "Confidence in the Scientific Community") + 
  ggtitle(label = "Age and Confidence in the Scientific Community")

print(age_confidence_plot)
## `geom_smooth()` using formula 'y ~ x'

#Assessing by condition
age_confidence_plotbycondition <- ggplot(mydata2, aes(Age, GSS, fill = Conflict)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_grid(vars(Format, Conflict)) + 
  scale_y_continuous(name = "Confidence in the Scientific Community") + 
  ggtitle(label = "Age and Confidence in the Scientific Community")

print(age_confidence_plotbycondition)
## `geom_smooth()` using formula 'y ~ x'

Again, it’s hard to tell whether there is an increase in scores as a function of age, but by the looks of the lines of best fit, they seem to slope which makes me think that there is a correlation to some degree!

Statistics

cor.test(age_confidence_means$Age, age_confidence_means$mean)
## 
##  Pearson's product-moment correlation
## 
## data:  age_confidence_means$Age and age_confidence_means$mean
## t = 4.1579, df = 48, p-value = 0.0001318
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2757365 0.6935906
## sample estimates:
##       cor 
## 0.5145892

According to Pearson’s product-moment correlation test there is a moderate correlation (p = 0.5) between age and confidence in the scientific community (with knowledge that a higher score on the scale reflects less confidence in the scientific community, the higher one’s age the lower their confidence in the scientific community).

The next steps