This week’s coding goals

This week’s main goal was to complete the first exploratory analyses question for the verification report.

Achieving the goals

For my first exploratory analyses question, I was interested in whether there were any gender differences in the level of dogmatism. Compared to males, female political strategies are typically aimed to create social relationships (Geary, 2021). Hence it is likely that females may display a lower level of dogmatism since this trait could result in sociability.

To do this, I first loaded all the appropriate libraries and formatted the data following the Harris and van Bavel (2021) article. Since, the question looks at dogmatism, I coded for R to calculate the mean Dogmatism score for each participant. I then mean centered the Dogmatism scores following the code used in Harris and van Bavel (2021).

library(tidyverse)
library(dplyr)
library(ggplot2)
library(car)
library(ggeasy)
library(gt)
library(sjlabelled)
data <- read.csv("beliefsuperiority_all.csv")
data <- filter(data,Q62 == 1)

data_attn= filter(data,AC_a==3) %>% 
  filter(AC_b==5)

data_attn=dplyr::select(data_attn,-starts_with('AC'))

data_attn$Q37_2 = recode(data_attn$Q37_2, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_4 = recode(data_attn$Q37_4, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_5 = recode(data_attn$Q37_5, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_7 = recode(data_attn$Q37_7, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_10 = recode(data_attn$Q37_10, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_11 = recode(data_attn$Q37_11, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_13 = recode(data_attn$Q37_13, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_16 = recode(data_attn$Q37_16, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_18 = recode(data_attn$Q37_18, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_19 = recode(data_attn$Q37_19, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')

dogscale=dplyr::select(data_attn,starts_with('Q37'))
data_attn$meanDog=rowMeans(dogscale,na.rm = TRUE)

#mean center mean dogmatism scores
data_attn$meanD_c= data_attn$meanDog-mean(data_attn$meanDog,na.rm=TRUE)

I then followed Jennifer’s advice from the week 8 QnA regarding the procedure for exploratory analyses. So I started off with descriptives. I used the group_by() function (dplyr) to filter values from the gender column in data_attn. I then used the summarise() function (dplyr) to find the mean (mean()), sd (sd()), n (n()), and se (sd/sqrt(n)). Na.rm = TRUE was used to make sure R doesn’t include na values in its calculations.

I then plotted the descriptive statistics in a table using the gt() function from the gt package. The fmt_number() function (gt) was used to make the columns of the table list the mean, sd and se. Values of the variables were also rounded up to 2 decimal places.

Gender_dog <- data_attn %>% group_by(gender) %>% 
  summarise(mean = mean(meanD_c, na.rm = TRUE),
            sd = sd(meanD_c, na.rm = TRUE),
            n = n(),
            se = sd/sqrt(n))

Gender_dog %>% gt() %>% fmt_number(columns = vars(mean,sd,se),
      decimals = 2) %>% tab_header(title = "Mean and SD of Dogmatism scores between males and females")

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

Mean and SD of Dogmatism scores between males and females
gender	mean	sd	n	se
1	0.16	1.10	319	0.06
2	−0.13	1.11	388	0.06

I then plotted a graph for the mean dogmatism scores between males and females. Additionally, since this question involves a categorical (gender) and a continuous variable (dogmatism), I plotted the data in a boxplot with violin so that the data was visualized in an informative way. First, I changed the gender column into categorical variables (as.factor()) since, it was originally numeric. The group_by() function was used again to filter data between genders. I then used ggplot to create a basic graph with gender as the independent variable and mean centered Dogmatism scores as the dependent variable. Fill was also used to differentially colour the boxplots. Geom_boxplot() was used to format the graph into a boxplot and geom_jitter was used to add random noise and adjust the opacity (alpha) of the scatter points.

data_attn$gender = as.factor(data_attn$gender)

Gender_dog <- data_attn %>% 
  group_by(gender) %>% 
  ggplot(aes(x = gender, y = meanD_c, fill = gender)) +
  geom_boxplot() +
  geom_jitter(alpha=.4) + labs(y="Mean Dogmatism")
plot(Gender_dog)

## Warning: Removed 1 rows containing non-finite values (stat_boxplot).

## Warning: Removed 1 rows containing missing values (geom_point).

To see whether the slight difference in Dogmatism scores from the graph (above) was significant I also conducted a t-test. For this, I followed the t-test Jennifer showed us in the week 8 QnA. Male and female dataframes were created which filtered out values for each of the genders (1 = male, 2 = female). T.test() (stats) was then used to compare the average mean centered Dogmatism score between males and females.

#t-test 
female <- data_attn %>% 
  filter(gender==2)

male = data_attn %>% 
  filter(gender==1)

t.test(female$meanD_c, male$meanD_c)

## 
##  Welch Two Sample t-test
## 
## data:  female$meanD_c and male$meanD_c
## t = -3.5612, df = 680.79, p-value = 0.0003949
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.4609009 -0.1332927
## sample estimates:
##  mean of x  mean of y 
## -0.1338198  0.1632770

The resulting t-test showed that t = -3.56, df = 680.79 and p-value = 0.0003949. Since p < .05, females have significantly lower Dogmatism scores compared to males supporting past studies on gender differences in political strategies (Geary, 2021).

Challenges and successes

Before following Jennifer’s advice, I thought that making new dataframes for each gender would make it easier but R didn’t compute this properly (I’m not sure why). So instead, when referring back to my notes from the week 8 QandA I realized that since there was already a “gender” column in the data_attn dataframe, I could just use that to filter the data via the group_by () function.

When I first ran the code for the table, NaN and Na appeared for the mean, sd and se for males. So to fix this I used the na.rm = TRUE function to make sure R doesn’t use missing data in its calculations. At first I used the na.omit() function following the week 8 QandA but I found that this changed the total no. of male participants. I was worried that this would affect the average mean dogmatism score between genders so I went with na.rm = TRUE instead.

When I first tried plotting the data using the boxplot code, I found that the while the scatter points were differentiated between genders, this didn’t happen for the boxplot. However, I realized that since boxplots were appropriate for questions involving one CATEGORICAL variable and one continuous, I had to make the gender variable categorical. When I hover my mouse over the gender column in data_attn, R tells me it’s numeric. So I converted the numeric gender variables into categorical variables via the as.factor(). And this fixed the graph.

I also had major challenges finding methods to relabel 1 and 2 as “male” and “female”, respectively, in the table and the graph. Unfortunately, I wasn’t able to find any appropriate functions yet. For example, I have considered using gsub() to globally substitute all the number codes for the gender but this would change the numeric variables which would affect the table. I also installed the package sjlabelled but the package didn’t seem to have the function on the Rdocumentation website.

Overall however, this week I didn’t have many issues with coding (hopefully an indication that I’m getting better at coding).

The next stage

For next week, I will be working on my verification report and aiming to code for the next exploratory research question/s. After finishing the exploratory analyses however, I aim to format the appearances of the graphs and tables.

Questions for Q and A Wk 9

Are there functions from ggplot2/ ggeasy and gt that allow me to change the labels for each level of the independent variable?

Week 8 Learning Log

Fun Hui

25/07/2021

This week’s coding goals

Achieving the goals

Challenges and successes

The next stage

Questions for Q and A Wk 9