My challenging AND successful exploratory analysis attempts :D

Loading packages

#Loading relevant packages
library(qualtRics) #for reading data, filtering redundant rows and setting variables with numeric entries as 'numeric'
library(tidyverse) #for dplyr and ggplot

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggbeeswarm) #for a violin scatter plot
library(ggeasy) #for ggplot formatting shortcuts
library(patchwork) #for combining plots into a single output
library(gt) #for pretty plots
library(jmv) #for statistical analyses

Reading data

mydata <- read.csv("MyDataFinalSubset.csv")
mydata2 <- read.csv("MyDataFinalSubset2.csv")

Question 1. Does gender impact perceived contradiction, advancement, and/or confusion?

Conflict manipulation visualisation

In experiment 1, there was a significant effect of Headline Conflict on perceived contradiction/advancement/confusion (those that saw conflicting headlines felt they were more contradictory, more confusing and resulted in us knowing less about how to be healthy than those who saw the non-conflicting headlines). I am interested to see the degree to which Headline Conflict impacts perceived conflict for males and females, and whether this effect is alike between the sexes.

using the recode_factor() function to RENAME “1” to male and “2” to female under the Gender variable
- We are Assessing males and females - thus no filtering required for the mydata (Exp 1) dataset as the sample only consists of male and females
CHANGING the ‘Gender’ variable from character to factor using the as.factor() function.
CREATING our own function() in order to apply the same statements to all plots using a single chunk of code

mydata$Gender <- recode_factor(mydata$Gender,
                                      "1" = "Male",       #old name = new name
                                      "2" = "Female")
mydata <- mydata %>%
  mutate(Gender=as.factor(Gender)) 

gender.fun <- function(y_var, plot_title, y_title, lim_1, lim_2)  {
  ggplot(mydata,aes(x = Conflict, y = y_var, fill = Conflict)) +
  geom_violin() +
  facet_wrap(vars(Gender), strip.position = "bottom") +
  stat_summary(fun.data = "mean_cl_normal", geom = "crossbar", fill = "white",
    alpha = .7) +
  geom_beeswarm(cex = 0.2) + 
  ggtitle(label = plot_title) +
  easy_center_title() +
  easy_remove_legend() +
  scale_x_discrete(name = NULL) +
  scale_y_continuous(name = y_title, limits = c(lim_1, lim_2)) +
  scale_fill_manual(values = c("slategray2", "lightpink1")) }

#Plotting Contradiction, Advancement and Confusion plots using function 
gender.contradiction.plot <- gender.fun(y_var = mydata$contradiction, plot_title = "Gender Differences in Perceived Conflict: Contradiction", y_title = "Perceived Contradiction", lim_1 = 1, lim_2= 30) 
gender.advancement.plot <- gender.fun(y_var = mydata$advancement, plot_title = "Gender Differences in Perceived Conflict: Advancement", y_title = "Perceived Scientific Advancement", lim_1 = -1, lim_2 = 1)
gender.confusion.plot <- gender.fun(y_var = mydata$confusion, plot_title = "Gender Differences in Perceived Conflict: Confusion", y_title = "Confusion", lim_1 = 1, lim_2 = 5)

print(gender.contradiction.plot)

print(gender.advancement.plot)

print(gender.confusion.plot)

From these plots, there does not appear to be a noticeable difference in scores between males and females on the perceived contradiction scale as well the confusion scale. In contrast, I can see that advancement scores are higher for males relative to females.

Assessment of group means directly allows for an alternative descriptive assessment of scores relative to each gender.

Descriptives: Conflicting vs non. conflicting group means, according to gender

I am calculating the the mean, sd, n, and se of confidence/advancement/confusion scores for each conflict condition within each gender using the group_by() and summarise() functions.
I will also implement the gt() function using the gt package to display these results in a pretty table.

A REMINDER OF THE SCALE OF THE DV: In experiment 1, participants were tested on their perceived level of SCIENTIFIC ADVANCEMENT: participants were asked “When we take the results reported in these headlines together, do we now know more, less or the same as we did before about how to be healthy?”. They indicated their response on a 3-point scale (-1 = we know less, 0 = we know the same amount, 1 = we know more).

##Conflicting vs non. conflicting group means

#Perceived Contradiction: 
contradiction_means_gender <- mydata %>%         # Specify data frame
  group_by(Gender, Conflict) %>%            # Specify group indicators
  summarise(mean = mean(contradiction),         # Specify column and function
            sd = sd(contradiction),             
            n = n(),                        
            se = sd/sqrt(n))

## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.

contradiction_means_gender %>% 
  gt() %>% 
  tab_header(
    title = "Effect of Headline Conflict on Perceived Contradiction (grouped by Gender)")

Effect of Headline Conflict on Perceived Contradiction (grouped by Gender)
Conflict	mean	sd	n	se
Male
Conf.	24.96667	3.965814	60	0.5119844
Non-Conf.	13.39394	3.666391	66	0.4513016
Female
Conf.	25.45977	3.556208	87	0.3812656
Non-Conf.	13.49383	3.981593	81	0.4423993

#Perceived Scientific Advancement (PSA): 
advancement_means_gender <- mydata %>%         # Specify data frame
  group_by(Gender, Conflict) %>%            # Specify group indicators
  summarise(mean = mean(advancement),         # Specify column and function
            sd = sd(advancement),             
            n = n(),                        
            se = sd/sqrt(n))

## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.

advancement_means_gender %>% 
  gt() %>% 
  tab_header(
    title = "Effect of Headline Conflict on PSA (grouped by Gender)")

Effect of Headline Conflict on PSA (grouped by Gender)
Conflict	mean	sd	n	se
Male
Conf.	-0.10000000	0.7059073	60	0.09113224
Non-Conf.	0.12121212	0.6682964	66	0.08226160
Female
Conf.	-0.34482759	0.5872202	87	0.06295662
Non-Conf.	-0.08641975	0.6556968	81	0.07285520

#Confusion: 
confusion_means_gender <- mydata %>%         # Specify data frame
  group_by(Gender, Conflict) %>%            # Specify group indicators
  summarise(mean = mean(confusion),         # Specify column and function
            sd = sd(confusion),             
            n = n(),                        
            se = sd/sqrt(n))

## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.

confusion_means_gender %>%
  gt() %>% 
  tab_header(
    title = "Effect of Headline Conflict on Confusion (grouped by Gender)")

Effect of Headline Conflict on Confusion (grouped by Gender)
Conflict	mean	sd	n	se
Male
Conf.	4.566667	0.5928005	60	0.07653022
Non-Conf.	3.575758	1.0962959	66	0.13494470
Female
Conf.	4.494253	0.6967293	87	0.07469722
Non-Conf.	3.703704	1.0540926	81	0.11712139

Statistics

Since the difference in advancement scores between males and females caught my attention, I want to compare the average advancement scores of females and males exposed to EACH level of the Headline Conflict factor: Conflict (Conf.) and Consistent (Non-Conf.). - I will do this using INDEPENDENT SAMPLES T-TESTS (for each level) via the ttestIS() function in the jmv package

conflict <- mydata %>%
  filter(Conflict == "Conf.")

consistent <- mydata %>%
  filter(Conflict == "Non-Conf.")

ttestIS(formula = advancement ~ Gender, data = conflict)

## 
##  INDEPENDENT SAMPLES T-TEST
## 
##  Independent Samples T-Test                                           
##  ──────────────────────────────────────────────────────────────────── 
##                                  Statistic    df          p           
##  ──────────────────────────────────────────────────────────────────── 
##    advancement    Student's t     2.286083    145.0000    0.0236983   
##  ────────────────────────────────────────────────────────────────────

ttestIS(formula = advancement ~ Gender, data = consistent)

## 
##  INDEPENDENT SAMPLES T-TEST
## 
##  Independent Samples T-Test                                           
##  ──────────────────────────────────────────────────────────────────── 
##                                  Statistic    df          p           
##  ──────────────────────────────────────────────────────────────────── 
##    advancement    Student's t     1.893226    145.0000    0.0603203   
##  ────────────────────────────────────────────────────────────────────

Results from the t test reveal there is a statistical difference in perceived scientific advancement between males and females exposed to conflicting headlines (p < 0.05), however there is no significant difference between males and females exposed to the non-conflicting headlines (p > 0.05).

WHAT DOES THIS MEAN? According to the group means calculated earlier, females exposed to conflicting headlines perceived that scientific knowledge had advanced to a lesser extent (M = -0.34) than males (-0.10).

Question 2. Is there a correlation between age and confidence in the scientific community?

In the face of huge leaps in scientific findings surrounding nutrition and diet, it would be naive to say that all age groups hold the same beliefs and willingness to accept to changing recommendations by experts. Throughout the course of my coding journey I have wondered why differences in age have not been explored alongside perceived conflict in scientific consensus.

While there are 6 variables in experiment 2 that can be considered, I am leaning most towards confidence in the scientific community. This is because of my personal experience observing the opinions of my older friends and relatives, whom often display more skepticism towards nutritional and dietary advice given to them and developments in science more broadly. On the other hand, I tend to notice my younger friends and relatives being more open towards developments in science, and I personally am no stranger to the fact that disproving a theory is just as important as proving one thanks to my psychology courses!

For this reason, I beg the question, is there a correlation between age and confidence in the scientific community? And more specifically, are older people less confident in the scientific community compared to their younger counterparts?

Descriptive statistics

As per Jenny S’s advice, I will begin by looking at descriptive statistics, namely the mean, sd, n, and se of confidence scores for each present age using the group_by() and summarise() functions.
I will also implement the gt() function using the gt package to display these results in a pretty table.

A REMINDER OF THE SCALE OF THE DV: In exp 2, participants were tested on the degree to which they demonstrated a lack of CONFIDENCE IN THE SCIENTIFIC COMMUNITY: participants were asked “How much confidence would you say you have in the scientific community?”. They indicated their response on a 3-point scale (1 = a great deal of confidence; 2 = only some confidence; 3 = hardly any confidence at all).

age_confidence_means <- mydata2 %>%         # Specify data frame
  group_by(Age) %>%                         # Specify group indicators
  summarise(mean = mean(GSS),               # Specify column and functions:
            sd = sd(GSS),             
            n = n(),                        
            se = sd/sqrt(n))

age_confidence_means %>%
  gt() %>% 
  tab_header(title = "Confidence in the Scientific Community (grouped by Age)")

Confidence in the Scientific Community (grouped by Age)
Age	mean	sd	n	se
18	1.250000	0.4522670	12	0.1305582
19	1.272727	0.4670994	11	0.1408358
20	1.526316	0.6117753	19	0.1403509
21	1.375000	0.5000000	16	0.1250000
22	1.263158	0.4524139	19	0.1037909
23	1.375000	0.5175492	8	0.1829813
24	1.380952	0.4976134	21	0.1085881
25	1.500000	0.5163978	16	0.1290994
26	1.428571	0.5070926	21	0.1106567
27	1.692308	0.4803845	13	0.1332347
28	1.375000	0.5175492	8	0.1829813
29	1.473684	0.5129892	19	0.1176878
30	1.529412	0.6242643	17	0.1514063
31	1.411765	0.5072997	17	0.1230382
32	1.235294	0.4372373	17	0.1060456
33	1.428571	0.5345225	7	0.2020305
34	1.545455	0.5222330	11	0.1574592
35	1.250000	0.4522670	12	0.1305582
36	1.300000	0.4830459	10	0.1527525
37	1.384615	0.5063697	13	0.1404417
38	1.250000	0.5000000	4	0.2500000
39	1.666667	0.5773503	3	0.3333333
40	1.333333	0.5773503	3	0.3333333
41	1.800000	0.4472136	5	0.2000000
42	1.818182	0.6030227	11	0.1818182
43	1.714286	0.4879500	7	0.1844278
44	1.500000	0.5773503	4	0.2886751
45	1.250000	0.5000000	4	0.2500000
46	1.428571	0.5345225	7	0.2020305
47	1.500000	0.5773503	4	0.2886751
48	1.500000	0.5477226	6	0.2236068
49	1.200000	0.4472136	5	0.2000000
50	1.166667	0.4082483	6	0.1666667
51	1.750000	0.5000000	4	0.2500000
52	1.250000	0.5000000	4	0.2500000
53	1.800000	0.4472136	5	0.2000000
54	2.000000	NA	1	NA
55	2.000000	NA	1	NA
56	1.750000	0.5000000	4	0.2500000
57	1.500000	0.7071068	2	0.5000000
58	1.666667	1.1547005	3	0.6666667
59	1.500000	0.7071068	2	0.5000000
60	1.500000	0.7071068	2	0.5000000
61	2.000000	0.0000000	2	0.0000000
62	2.000000	1.0000000	3	0.5773503
63	1.333333	0.5773503	3	0.3333333
64	1.500000	0.5773503	4	0.2886751
65	1.500000	0.7071068	2	0.5000000
72	2.000000	NA	1	NA
73	2.000000	NA	1	NA

It appears that with increasing age, the mean scores on the confidence scale appear to increase. It may be the case that age is strongly or moderately correlated with confidence, however we won’t know for sure until we conduct a statistical analysis.

Visualisation

#Assessing across all conditions
age_confidence_plot <- ggplot(mydata2, aes(Age, GSS)) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_y_continuous(name = "Confidence in the Scientific Community") + 
  ggtitle(label = "Age and Confidence in the Scientific Community")

print(age_confidence_plot)

## `geom_smooth()` using formula 'y ~ x'

#Assessing by condition
age_confidence_plotbycondition <- ggplot(mydata2, aes(Age, GSS, fill = Conflict)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_grid(vars(Format, Conflict)) + 
  scale_y_continuous(name = "Confidence in the Scientific Community") + 
  ggtitle(label = "Age and Confidence in the Scientific Community")

print(age_confidence_plotbycondition)

## `geom_smooth()` using formula 'y ~ x'

Again, it’s hard to tell whether there is an increase in scores as a function of age, but by the looks of the lines of best fit, they seem to slope which makes me think that there is a correlation to some degree!

Statistics

cor.test(age_confidence_means$Age, age_confidence_means$mean)

## 
##  Pearson's product-moment correlation
## 
## data:  age_confidence_means$Age and age_confidence_means$mean
## t = 4.1579, df = 48, p-value = 0.0001318
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2757365 0.6935906
## sample estimates:
##       cor 
## 0.5145892

According to Pearson’s product-moment correlation test there is a moderate correlation (p = 0.5) between age and confidence in the scientific community (with knowledge that a higher score on the scale reflects less confidence in the scientific community, the higher one’s age the lower their confidence in the scientific community).

Lauren’s Learning Log (8)

Lauren Abdallah

25/07/2021

Coding goals for week 7

My challenging AND successful exploratory analysis attempts :D

Loading packages

Reading data

Question 1. Does gender impact perceived contradiction, advancement, and/or confusion?

Conflict manipulation visualisation

Descriptives: Conflicting vs non. conflicting group means, according to gender

Statistics

Question 2. Is there a correlation between age and confidence in the scientific community?

Descriptive statistics

Visualisation

Statistics

The next steps