This week’s coding goals

This week I aimed to work with my teammates to replicate: 1) Mean attitude scores 2) Demographic variables 3) Descriptives of the scales in the original study 4) The first scatterplot of the study I also aimed to condense the code for the scatterplot as the majority of the original code were for formatting functions.

Achieving the goals

Mean attitude scores

Load packages and recode data

First packages were loaded, the data read and then filtered like last week.

library(tidyverse)
library(dplyr)
library(ggplot2)
library(ggeasy)
data <- read.csv("beliefsuperiority_all.csv")
data <- filter(data,Q62 == 1)

data_attn= filter(data,AC_a==3) %>% 
  filter(AC_b==5)

data_attn=dplyr::select(data_attn,-starts_with('AC'))

unlike last week the ‘car’ package was not loaded as dplyr also has the same recode() function. the lapply() function was also omitted in this code as it did not affect the results of subsequent chunks.

Following last week, I also had to recode (recode()) the data to ensure that items using scales with values in descending order were flipped to be ascending. Finally, the mean dogmatism scores were calculated.

data_attn$Q37_2 = recode(data_attn$Q37_2, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_4 = recode(data_attn$Q37_4, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_5 = recode(data_attn$Q37_5, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_7 = recode(data_attn$Q37_7, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_10 = recode(data_attn$Q37_10, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_11 = recode(data_attn$Q37_11, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_13 = recode(data_attn$Q37_13, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_16 = recode(data_attn$Q37_16, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_18 = recode(data_attn$Q37_18, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')
data_attn$Q37_19 = recode(data_attn$Q37_19, '1=9; 2=8; 3=7; 4=6; 6=4; 7=3; 8=2; 9=1')

dogscale=dplyr::select(data_attn,starts_with('Q37'))
data_attn$meanDog=rowMeans(dogscale,na.rm = TRUE)

Converting scales

Since the attitude questions on voting, torture and affirmative action had responses on a 4pt scale, I had to first convert these questions into 5pt scales to format the data so that R consistently uses 5 values to calculate the average attitude score. To do this, I used the recode() function from dplyr to shift the original values of 1,2,3,4 in the 4pt scale to 0,1,2.3333,3.6667, 5.

data_attn$vote_a = recode(data_attn$vote_a, '2=1; 3=2.3333; 4=3.6667; 5=5')
data_attn$torture_a = recode(data_attn$torture_a, '2=1; 3=2.3333; 4=3.6667; 5=5')
data_attn$affirmaction_a = recode(data_attn$affirmaction_a, '2=1; 3=2.3333; 4=3.6667; 5=5')

The new scale items used to replace the old values are calculated from the formula (5 - 1) * (x - 1) / (4 - 1) + 1 which itself is derived from: from this forum

I also created a new dataframe (‘attitudes’) and used the select() function to make the dataframe include only values from attitude questions (i.e. columns with labels ending in ’_a’).

Mean attitude scores

I then made a new column ‘meanAtt’, where average attitude scores will appear. Lastly, the rowMeans() function was used to calculate the average attitude scores with any incomplete/missing values being skipped over via the na.rm = TRUE function.

attitudes=dplyr::select(data_attn,ends_with('_a'))
data_attn$meanAtt=rowMeans(attitudes,na.rm = TRUE)

Demographic variables

Political affiliation

To find the proportion of participants under each political affiliation, I used the table() function to make a table on the number of participants identifying as Democrat and Republican (coded as 1 and 2, respectively). The $ function was also used to make R read only data from the column ‘Q16’ from the ‘data_attn’ dataframe. From this I found that there were 291 Democrats and 225 Republicans out of the 707 participants which replicates the data of the original study.

table(data_attn$Q16)

## 
##   1   2   3   4   5 
## 291 225 154  11  26

Mean and standard deviation(SD) of age

I used the mean() and sd() function to calculate the descriptive statistics for age. Additionally, $ was used to instruct R to only use values from the ‘age’ column. na.rm = TRUE was used to skip over missing values in the calculations. Again I managed to replicate the descriptives from the original study.

mean(data_attn$age,na.rm=TRUE)

## [1] 44.89109

sd(data_attn$age,na.rm=TRUE)

## [1] 16.23563

Proportion of genders

Lastly to replicate the gender ratio in the original study, I used the same ‘table’ function from the political affiliation chunk. However, the ‘gender’ column was selected instead. This data also replicated successfully with 319 males and 388 females out of the 707 participants.

table(data_attn$gender)

## 
##   1   2 
## 319 388

Mean and SD for the 3 scales

Like the descriptive statistics for age, the mean and SD for the all the scales was calculated using mean() and sd() with $ used to ensure R uses values from the appropriate columns (e.g. meanDog for the dogmatism scale) and na.rm = TRUE excluding incomplete values.

#Dogmatism scale
mean(data_attn$meanDog,na.rm=TRUE)
sd(data_attn$meanDog,na.rm=TRUE)
#Attitude scale
mean(data_attn$meanAtt,na.rm=TRUE)
sd(data_attn$meanAtt,na.rm=TRUE)
#Belief superiority
beliefs=dplyr::select(data_attn,ends_with('_b'))
meanBel=rowMeans(beliefs,na.rm = TRUE)
mean(meanBel,na.rm=TRUE)
sd(meanBel,na.rm=TRUE)

For the belief superiority scale however, there were a couple extra steps before applying the same functions as the other two scales. First, the average belief superiority score for each participant had to be calculated. For this chunk, I followed the coding for mean attitude scores. A dataframe called ‘beliefs’ was created and select() was used to select columns that coded responses for belief questions (those ending in ’_b’) from the ‘data_attn’ dataframe. Another dataframe, ‘meanBel’ was then made to include mean belief scores derived from rowMeans().

First scatterplot of the study

For the scatterplot, the original code was ran to observe the graph I wanted to replicate. The first line included code on mean centering the mean attitude scores.

However, despite including the code because I noticed that the scatterplot chunk included this variable, I still wasn’t sure on its function (see Challenges and Successes).

The original code then used ggplot() to graph the mean centered attitude scores and the mean dogmatism scores in a scatterplot format via geom_point(). Position_jitter() was used to make the data easier to read by customizing the width of “random noise” in the graph. Size = and alpha = were used to adjust the size and colour opacity of the dots. The labels ‘Average Attitude’ and ‘Dogmatism’ were coded for the x and y axis, respectively. A line for quadratic regression was graphed with stat_smooth() and the formula = y ~ x + I(x^2). Then xlim() was used to set the limits of the x axis (-2,2) and scale_y_continous(breaks=) was used to create breaks in the y axis from 0 to 9. 1 and 9 were also set as the limits on the y-axis via lim().

data_attn$meanA_c= data_attn$meanAtt-mean(data_attn$meanAtt,na.rm=TRUE)

Dog_plot = ggplot(data_attn, aes(x=meanA_c, y=meanDog)) +
  geom_point(aes(y = meanDog), position = position_jitter(width = .15), size = 2.5, alpha = 0.5) +
  labs(x='Average Attitude', y='Dogmatism') +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 2) +
     theme_minimal() +
  theme(axis.title.y = element_text(size=16, face="bold")) +
  theme(axis.title.x = element_text(size=16, face="bold")) +
  theme(axis.text.y=element_text(color = "black", size = 14)) +
  theme(axis.text.x=element_text(color = "black", size = 14)) +
  theme(legend.text = element_text(color = "black", size = 14)) +
  theme(legend.title = element_text(color = "black", size = 14)) +
  theme(axis.line= element_line(color="black")) +
  theme(axis.ticks.y = element_line(color="black")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(strip.text.x=element_text(color = "black", size = 14, face="bold")) +
  xlim(c(-2,2)) +
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9))
  
plot(Dog_plot)

The resulting graph looked like this

Since the rest of the chunk were used to adjust more superficial parts of the graph, this meant that I could omit a lot of unnecessary code and use ‘ggeasy’ to achieve an identical graph. Easy_remove_legend() was used to remove the legend. However, I had to consult with my teammates to find the function for creating a colour gradient in the graph which, turned out to be done via scale_color_gradient2().

data_attn$meanA_c= data_attn$meanAtt-mean(data_attn$meanAtt,na.rm=TRUE)

Dog_plot_simple = ggplot(data_attn, aes(x=meanA_c, y=meanDog)) +
  geom_point(aes(y = meanDog), position = position_jitter(width = .15), size = 2.5, alpha = 0.5) +
  labs(x='Average Attitude', y='Dogmatism') +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 2) +
     theme_minimal() +
 xlim(c(-2,2)) +
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9)) + easy_remove_legend() + scale_color_gradient2(mid = "black", low = "blue", high = "red")

plot(Dog_plot_simple)

Challenges and successes

For this week, most of the data replicated smoothly with error messages being for minor mistakes in coding. I also find that it’s getting a bit easier to read the chunks in the original code.

There were a couple of issues that I did encounter. First, the scale_color_gradient2() function didn’t work and my graph remained in black and white. I also couldn’t properly knit the document. Specifically, there were error messages about how x was not numeric in the code for calculating meanDog and meanAtt. This was odd since I had no issues running the code without knitting. Unfortunately, I couldn’t find a solution from Google aside from converting character values with numeric ones. I also tried to insert the lapply chunk that I omitted but this didn’t solve the problem. So unfortunately, for this week’s learning log many chunks (M and SD for the scales etc.) will not show any output since I had to use eval=FALSE to knit the document.

As a group we also didn’t understand the function of the chunk on mean centering (below). Even after Googling and running the code, we only found that we needed the mean centering code for mean attitude in order to graph the first scatterplot.

#mean center the variables
#dogmatism
data_attn$meanD_c= data_attn$meanDog-mean(d_attn$meanDog,na.rm=TRUE)
#mean attitude 
data_attn$meanA_c= data_attn$meanAtt-mean(d_attn$meanAtt,na.rm=TRUE)
#individual attitude ratings
data_attn$immigration_a_c=data_attn$immigration_a-mean(data_attn$immigration_a,na.rm=TRUE)
data_attn$abortion_a_c=data_attn$abortion_a-mean(data_attn$abortion_a,na.rm=TRUE)
data_attn$vote_a_c=data_attn$vote_a-mean(data_attn$vote_a,na.rm=TRUE)
data_attn$tax_a_c=data_attn$tax_a-mean(data_attn$tax_a,na.rm=TRUE)
data_attn$torture_a_c=data_attn$torture_a-mean(data_attn$torture_a,na.rm=TRUE)
data_attn$affirmaction_a_c=data_attn$affirmaction_a-mean(data_attn$affirmaction_a,na.rm=TRUE)
data_attn$military_a_c=data_attn$military_a-mean(data_attn$military_a,na.rm=TRUE)
data_attn$covidgov_a_c=data_attn$covidgov_a-mean(data_attn$covidgov_a,na.rm=TRUE)

#political orientation
d_attn$PO_c= d_attn$Q12-mean(d_attn$Q12,na.rm=TRUE)

The next stage

For next week, I hope to resolve all the issues I faced this week. I also want to try and condense any previous code that we have done so far. 2 more plots and their code still needs to replicated as well. Finally, I also hope to replicate data from the pilot study.

Week5 Learning Log

Fun Hui

04/07/2021