Week 5

Goals

My goals for this week were:

To verify and understand the code for demographic and descriptive statistics of the main study
To code figure 1 of the study
To Rubber Duck comment on each code chunk

Coding Steps

Converting 4-point scales to 5-point scales

For this code, I used ‘$’ to select the vote_a, torture_a and affirmaction_a columns within the data_attn data frame. This is because these questions used a 4-point scale. As all other questions used a 5-point scale, I wanted to convert the questions using a 4-point scale to a 5-point scale for easy measurement and analysis. To do this, I used the ‘recode’ function to recode the values 2 to 1, 3 to 2.3333, 4 to 3.6667 and 5 to 5. These numbers were found by applying the equation (5 - 1) * (x - 1) / (4 - 1) + 1.

data_attn$vote_a = recode(data_attn$vote_a, '2=1; 3=2.3333; 4=3.6667; 5=5')
data_attn$torture_a = recode(data_attn$torture_a, '2=1; 3=2.3333; 4=3.6667; 5=5')
data_attn$affirmaction_a = recode(data_attn$affirmaction_a, '2=1; 3=2.3333; 4=3.6667; 5=5')

Means and standard deviations

To replicate the values of the study, I had to include the political affiliation table as well as all means and standard deviations into one code chunk.

First, I used the ‘table’ function to make a table of the number of participants who affiliate with the Democratic Party, the Republican Party or the Independents. Political affiliation was found in the question 16 column of the data_attn data frame.

From the table, there were 707 participants (291 + 225 + 154 + 11 +26 = 707). From that, 291 (41%) affiliated with Democratic, 225 (32%) affiliated with Republican and 154 (22%) affiliated with the Independents. This replicates the results reported in the article.

table(data_attn$Q16)

## 
##   1   2   3   4   5 
## 291 225 154  11  26

291/707

## [1] 0.4115983

225/707

## [1] 0.3182461

154/707

## [1] 0.2178218

#dogmatism mean and standard deviation
mean(data_attn$meanD,na.rm=TRUE)

## [1] 4.418632

sd(data_attn$meanD,na.rm=TRUE)

## [1] 1.113614

#attitude ratings mean and standard deviation
attitudes=dplyr::select(data_attn,ends_with('_a'))
data_attn$meanA=rowMeans(attitudes,na.rm = TRUE)
mean(data_attn$meanA,na.rm=TRUE)

## [1] 2.660533

sd(data_attn$meanA,na.rm=TRUE)

## [1] 0.603734

#belief superiority mean and standard deviation
beliefs=dplyr::select(data_attn,ends_with('_b'))
meanB=rowMeans(beliefs,na.rm = TRUE)
mean(meanB,na.rm=TRUE)

## [1] 2.582416

sd(meanB,na.rm=TRUE)

## [1] 1.053552

To calculate the mean and standard deviations, I used either ‘mean’ or ‘sd’ functions, then specified the data frame (data_attn) and selected the column (meanD, meanA or meanB) to retrieve the overall mean of each scale (D meaning dogmatism, A meaning attitudes and B meaning belief superiority). The na.rm= TRUE, ignores or skips NA values found in the columns.

In the attitude ratings and belief superiority codes, two extra lines of code were included to create data frames only relating to attitudes or beliefs. For example, to calculate an average attitude score per participant, I first created a new data frame ‘attitudes’ from data_attn. This included using the ‘select()’ function in the dplyr package, to select only columns that ended with _a (as ’_a’ denotes attitude questions). I used the data_attn data frame, as this will only include participants who passed the attention checks. Then I used the rowMeans function to look at the average attitude scores (meanA) within data_attn.

These processes were repeated for the belief superiority variable, however a ‘beliefs’ data frame was created from data_attn to only select columns relating to belief superiority (denoted by a ’_b’ ending).

Participants’ demographics

To obtain the participants’ demographics, I used a similar code to the chunk above. This includes using ‘mean’ and ‘sd’ functions to look at age in data_attn, and ignoring NA values with ‘na.rm=TRUE’. Percentages of gender means and standard deviations were then replicated using the ‘table’ function, which selected the ‘gender’ column within data_attn.

#participant's age mean and standard deviation
mean(data_attn$age, na.rm=TRUE)

## [1] 44.89109

sd(data_attn$age, na.rm=TRUE)

## [1] 16.23563

#participant's gender mean and standard deviation
table(data_attn$gender)

## 
##   1   2 
## 319 388

319/707

## [1] 0.4512023

388/707

## [1] 0.5487977

Figure 1 from the authors

This is the code the authors provided for figure 1:

#dogmatism plot 1
data_attn$meanA_c= data_attn$meanA-mean(data_attn$meanA,na.rm=TRUE)

ggplot(data_attn, aes(x=meanA_c, y=meanD)) + geom_point(aes(y = meanD), position = position_jitter(width = .15), size = 2.5, alpha = 0.5) +
  labs(x='Average Attitude', y='Dogmatism') +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 2) +
     theme_minimal()+
  theme(axis.title.y = element_text(size=16, face="bold"))+
  theme(axis.title.x = element_text(size=16, face="bold"))+
  theme(axis.text.y=element_text(color = "black", size = 14))+
  theme(axis.text.x=element_text(color = "black", size = 14))+
  theme(legend.text = element_text(color = "black", size = 14))+
  theme(legend.title = element_text(color = "black", size = 14))+
  theme(axis.line= element_line(color="black")) +
  theme(axis.ticks.y = element_line(color="black")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
  theme(strip.text.x=element_text(color = "black", size = 14, face="bold"))+
  xlim(c(-2,2))+
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9))

## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing missing values (geom_point).

When looking at this code, I was initially intimidated and confused. It was suggested however, that I could cut down a lot of the formatting code using the ggeasy package. I also decided to try and write the code myself using functions I understand and already know.

Figure 1 replicated

To replicate the graph, I first had to mean center the variables where a new column within data_attn is selected and created. This column denotes the subtraction of the grand mean from the mean attitudes (meanA). This line of code is needed for the graph to be created, however I am still unsure of why.

To create the plot, I used the ‘ggplot’ package to look at the data_attn data frame. I then used the ‘aes’ function to colour points on the graph according to meanA_c values and to plot the x axis with meanA_c values and y with meanD values. The function ‘geom_point’ was then used to create a scatterplot where ‘aes’ was once again used to place meanD on the y-axis. Position and position_jitter are functions relating to the formatting of points on the scatterplot, determined by the width, size, and colour density (alpha) of the points. I then used the ‘labs’ function to label the x and y axis.

As I attempted ‘geom_smooth’, which did not work, I referred back to the authors code for their quadratic regression line, which included the equation they used for the line in the figure (formula = y ~ x + I(x^2)).

I then included ‘theme_minimal()’ to have a blank/white background and used the ‘xlim’ function to make the coordinates -2 and 2 the limits of the x axis. As the y-axis has a continuous scale, the ‘breaks’ function is needed to tell R to label the intervals from 0-9, otherwise a similar interval used for the x axis will be generated. I then used the functions ‘easy_remove_legend’ and ‘easy_all_text_size’ from the ggeasy package to change the font size of all text and remove the legend. Finally, to create a colour gradient that matches the article’s graph, I googled and found the ‘scale_color_gradient2’ function and coded the mid-section of the plot to be black, the low-section blue and the high-section red.

data_attn$meanA_c= data_attn$meanA-mean(data_attn$meanA,na.rm=TRUE)

ggplot(data_attn, aes(x=meanA_c, y=meanD, color=meanA_c)) + geom_point(aes(y = meanD), position = position_jitter(width = .15),size = 0.5, alpha = 1) +
  labs(x='Average Attitude', y='Dogmatism') +  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + theme_minimal()+
  xlim(c(-2,2))+
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9))+ easy_all_text_size(12) + scale_color_gradient2(mid="black", low="blue", high="red") + easy_remove_legend()

## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing missing values (geom_point).

This looks more similar to the article’s graph shown below:

Challenges

The main challenge I came across this week, was the organisation of my data. As R code is very finicky, there were multiple occasions where I couldn’t divide the code into separate chunks otherwise they would not run or produce the output that I wanted. For example, my affiliations, means and standard deviations code all had to be included in the same chunk for the correct table to be produced.
Another challenge I came across this week was figuring out the code for figure 1. I particularly had a problem with understanding all the functions that I was not familiar with such as ‘jitter’ and ‘breaks’. However, after googling, I found that these are formatting functions used to make the figure easier to read.
I am however, confused on the need for mean centering the variables to plot the graphs.
I am also confused on why the graph from the authors’ code looks different to what they published in the article.

Successes

We did a lot this week! I am really happy with our progress, as we only need to figure out the code for figures 2 and 3 and the descriptive stats for the pilot study.
The major success of this week was being able to understand most of the unfamiliar r functions including using ggeasy to simplify the code for figure 1.

Next Steps

My next steps for coding are:

To simplify and understand the code for figures 2 and 3
Find the descriptive statistics for the pilot study
Rename the variables in the code for clarity

Learning Log

Sasha Kew

04/07/2021