Week 7

Goals

My goals for this week were:

To simplify and understand the authors’ code for figure 3.
Obtain the descriptive statistics for the pilot study.
Finish verifying the code!

Coding Steps

1. Authors’ code for figure 3

This code given by the authors was, by far, the largest chunk we have seen! At first, it was very daunting and confusing!

data_attn$immigration_a_c=data_attn$immigration_a-mean(data_attn$immigration_a,na.rm=TRUE)
data_attn$abortion_a_c=data_attn$abortion_a-mean(data_attn$abortion_a,na.rm=TRUE)
data_attn$vote_a_c=data_attn$vote_a-mean(data_attn$vote_a,na.rm=TRUE)
data_attn$tax_a_c=data_attn$tax_a-mean(data_attn$tax_a,na.rm=TRUE)
data_attn$torture_a_c=data_attn$torture_a-mean(data_attn$torture_a,na.rm=TRUE)
data_attn$affirmaction_a_c=data_attn$affirmaction_a-mean(data_attn$affirmaction_a,na.rm=TRUE)
data_attn$military_a_c=data_attn$military_a-mean(data_attn$military_a,na.rm=TRUE)
data_attn$covidgov_a_c=data_attn$covidgov_a-mean(data_attn$covidgov_a,na.rm=TRUE)

data_attn$subject=c(1:nrow(data_attn))

data_attnlong <- bind_cols(data_attn %>% dplyr::select(.  , ends_with("_a_c"), subject,38:67,76) %>%gather(. , topic, attitude_c, ends_with("_a_c")),data_attn %>% dplyr::select(. ,ends_with("_b"), subject) %>% gather(. , topic, beliefsup, ends_with("_b")), data_attn %>% dplyr::select(. ,ends_with("_a"), subject) %>% gather(. , topic, attitude, ends_with("_a")))

## New names:
## * subject -> subject...1
## * topic -> topic...32
## * subject -> subject...34
## * topic -> topic...35
## * subject -> subject...37
## * ...

data_attnlong$attitude=as.factor(data_attnlong$attitude)
forplot= data_attnlong[which(!is.na(data_attnlong$attitude)),]

raincloud_theme = theme(
  text = element_text(size = 10),
  axis.title.x = element_text(size = 16),
  axis.title.y = element_text(size = 16),
  axis.text = element_text(size = 14),
  axis.text.x = element_text(angle = 45, vjust = 0.5),
  legend.title=element_text(size=16),
  legend.text=element_text(size=16),
  legend.position = "right",
  plot.title = element_text(lineheight=.8, face="bold", size = 16),
  panel.border = element_blank(),
  panel.grid.minor = element_blank(),
  panel.grid.major = element_blank(),
  axis.line.x = element_line(colour = 'black', size=0.5, linetype='solid'),
  axis.line.y = element_line(colour = 'black', size=0.5, linetype='solid'))

ggplot(forplot, aes(x=attitude, y=beliefsup,color=attitude))+
  theme(legend.position= "none") +
  geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .8) +
  geom_point(aes(y = beliefsup, color = attitude), position = position_jitter(width = .15), 
             size = 1.5, alpha = 0.2) +
   stat_summary(fun.y=mean, size=2, color="black",geom="line", aes(group = 1)) +
  stat_summary(fun.y=mean, size=2, color="black",geom="point", aes(group = 1)) +
  stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=.95), 
               size=1.5, aes(width=.3), color="black")+
  labs(x='Attitude', y='Belief Superiority') +
   theme_minimal()+
  theme(axis.title.y = element_text(size=16, face="bold"))+
  theme(axis.title.x = element_text(size=16, face="bold"))+
  theme(axis.text.y=element_text(color = "black", size = 14))+
  theme(axis.text.x=element_text(color = "black", size = 14))+
  theme(legend.text = element_text(color = "black", size = 14))+
  theme(legend.title = element_text(color = "black", size = 14))+
  theme(axis.line= element_line(color="black")) +
  theme(axis.ticks.y = element_line(color="black")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
  theme(strip.text.x=element_text(color = "black", size = 14, face="bold"))+
  theme(legend.position = "none")+
  scale_x_discrete(labels = c("1","2","2.3","3","3.7","4","5"))

Apart from the colour gradient, it seems that the authors’ code for this figure looks quite similar to the one they published:

We knew however, that we could simplify and improve this code!

2. Figure 3 replication- what we kept, removed and added

We deleted all code relating to the raincloud_theme section of the authors’ code as we found deleting it made no difference to the output of the figure.
Much like the previous figures, we also deleted code relating to the text and legend of the figure as we replaced this with easy_remove_legend and easy_all_text_size(12).
To retrieve the geom_flat_violin function we had to install and load the package PupillometryR.
I added a line of code referring to manually inputting the figure’s colours. The reason for this was because scale_colour_gradient2 does not work when the scales are discrete and not continuous. This meant that I had to change the colours manually by using RGB colour code.

This is the code we ended up with after deleting over 20 lines of code!

data_attn$immigration_a_c=data_attn$immigration_a-mean(data_attn$immigration_a,na.rm=TRUE)
data_attn$abortion_a_c=data_attn$abortion_a-mean(data_attn$abortion_a,na.rm=TRUE)
data_attn$vote_a_c=data_attn$vote_a-mean(data_attn$vote_a,na.rm=TRUE)
data_attn$tax_a_c=data_attn$tax_a-mean(data_attn$tax_a,na.rm=TRUE)
data_attn$torture_a_c=data_attn$torture_a-mean(data_attn$torture_a,na.rm=TRUE)
data_attn$affirmaction_a_c=data_attn$affirmaction_a-mean(data_attn$affirmaction_a,na.rm=TRUE)
data_attn$military_a_c=data_attn$military_a-mean(data_attn$military_a,na.rm=TRUE)
data_attn$covidgov_a_c=data_attn$covidgov_a-mean(data_attn$covidgov_a,na.rm=TRUE)

data_attn$subject=c(1:nrow(data_attn))

data_attnlong <- bind_cols(data_attn %>% select(ends_with("_a_c"), subject,38:67,76) %>%gather(topic, attitude_c, ends_with("_a_c")),data_attn %>% select(ends_with("_b"), subject) %>% gather(topic, beliefsup, ends_with("_b")), data_attn %>% select(ends_with("_a"), subject) %>% gather(topic, attitude, ends_with("_a")))

## New names:
## * subject -> subject...1
## * topic -> topic...32
## * subject -> subject...34
## * topic -> topic...35
## * subject -> subject...37
## * ...

data_attnlong$attitude=as.factor(data_attnlong$attitude)

forplot= data_attnlong[which(!is.na(data_attnlong$attitude)),]

ggplot(forplot, aes(x=attitude, y=beliefsup, color=attitude)) +
  geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .8) + geom_point(aes(y = beliefsup, color = attitude), position = position_jitter(width = .15), 
             size = 1, alpha = 0.5) +
   stat_summary(fun.y=mean, size=1, color="black",geom="line", aes(group = 1)) +
  stat_summary(fun.y=mean, size=2, color="black",geom="point", aes(group = 1)) +
  stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=.95),size=1, aes(width=.3), color="black")+
  labs(x='Attitude', y='Belief Superiority') +
   theme_minimal() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
  scale_x_discrete(labels = c("1","2","2.3","3","3.7","4","5"))+ easy_remove_legend() + theme(axis.line= element_line(color="black")) +
  theme(axis.ticks.y = element_line(color="black")) + scale_colour_manual(values = c("#CC0000", "#660000","#331900", "#000000", "#3399FF", "#0000FF", "#000066")) + easy_all_text_size(12)

This is looking pretty close to the article!

3. Figure 3 Mirroring

Like the two previous figures, the authors’ again decided to mirror or reverse figure 3. We attempted to fix this by trying our ‘-’ trick.

data_attn$immigration_a_c=data_attn$immigration_a-mean(data_attn$immigration_a,na.rm=TRUE)
data_attn$abortion_a_c=data_attn$abortion_a-mean(data_attn$abortion_a,na.rm=TRUE)
data_attn$vote_a_c=data_attn$vote_a-mean(data_attn$vote_a,na.rm=TRUE)
data_attn$tax_a_c=data_attn$tax_a-mean(data_attn$tax_a,na.rm=TRUE)
data_attn$torture_a_c=data_attn$torture_a-mean(data_attn$torture_a,na.rm=TRUE)
data_attn$affirmaction_a_c=data_attn$affirmaction_a-mean(data_attn$affirmaction_a,na.rm=TRUE)
data_attn$military_a_c=data_attn$military_a-mean(data_attn$military_a,na.rm=TRUE)
data_attn$covidgov_a_c=data_attn$covidgov_a-mean(data_attn$covidgov_a,na.rm=TRUE)

data_attn$subject=c(1:nrow(data_attn))

data_attnlong <- bind_cols(data_attn %>% select(ends_with("_a_c"), subject,38:67,76) %>%gather(topic, attitude_c, ends_with("_a_c")),data_attn %>% select(ends_with("_b"), subject) %>% gather(topic, beliefsup, ends_with("_b")), data_attn %>% select(ends_with("_a"), subject) %>% gather(topic, attitude, ends_with("_a")))

## New names:
## * subject -> subject...1
## * topic -> topic...32
## * subject -> subject...34
## * topic -> topic...35
## * subject -> subject...37
## * ...

data_attnlong$attitude=as.factor(data_attnlong$attitude)

forplot= data_attnlong[which(!is.na(data_attnlong$attitude)),]

ggplot(forplot, aes(x=-attitude, y=beliefsup, color=attitude)) +
  geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .8) + geom_point(aes(y = beliefsup, color = attitude), position = position_jitter(width = .15), 
             size = 1, alpha = 0.5) +
   stat_summary(fun.y=mean, size=1, color="black",geom="line", aes(group = 1)) +
  stat_summary(fun.y=mean, size=2, color="black",geom="point", aes(group = 1)) +
  stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=.95),size=1, aes(width=.3), color="black")+
  labs(x='Attitude', y='Belief Superiority') +
   theme_minimal() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
  scale_x_discrete(labels = c("1","2","2.3","3","3.7","4","5"))+ easy_remove_legend() + theme(axis.line= element_line(color="black")) +
  theme(axis.ticks.y = element_line(color="black")) + scale_colour_manual(values = c("#CC0000", "#660000","#331900", "#000000", "#3399FF", "#0000FF", "#000066")) + easy_all_text_size(12)

## Warning: `fun.y` is deprecated. Use `fun` instead.

## Warning: `fun.y` is deprecated. Use `fun` instead.

## Warning in Ops.factor(attitude): '-' not meaningful for factors

## Warning in Ops.factor(attitude): '-' not meaningful for factors

## Warning in Ops.factor(attitude): '-' not meaningful for factors

## Warning in Ops.factor(attitude): '-' not meaningful for factors

## Warning in Ops.factor(attitude): '-' not meaningful for factors

## Warning: Removed 8 rows containing non-finite values (stat_ydensity).

## Warning: Removed 8 rows containing non-finite values (stat_summary).

## Warning: Removed 8 rows containing non-finite values (stat_summary).

## Warning: Removed 8 rows containing non-finite values (stat_summary).

## Warning: Removed 8 rows containing missing values (geom_point).

## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

This is not what we wanted! We thought this might be because the x-axis contains factor variables (i.e. a discrete scale) as there is a warning that says “‘-’ not meaningful for factors”.

Unfortunately, we are still unsure how to reverse this plot like we did for the other figures.

4. Explaining figure 3 code

This first chunk of code relates to mean centering the variables for individual attitude ratings. This includes columns in data_attn relating to (specified by $) immigration, abortion, voting, tax, torture, affirmaction, military and COVID. The grand mean is then subtracted from the mean of each column.

data_attn$immigration_a_c=data_attn$immigration_a-mean(data_attn$immigration_a,na.rm=TRUE)
data_attn$abortion_a_c=data_attn$abortion_a-mean(data_attn$abortion_a,na.rm=TRUE)
data_attn$vote_a_c=data_attn$vote_a-mean(data_attn$vote_a,na.rm=TRUE)
data_attn$tax_a_c=data_attn$tax_a-mean(data_attn$tax_a,na.rm=TRUE)
data_attn$torture_a_c=data_attn$torture_a-mean(data_attn$torture_a,na.rm=TRUE)
data_attn$affirmaction_a_c=data_attn$affirmaction_a-mean(data_attn$affirmaction_a,na.rm=TRUE)
data_attn$military_a_c=data_attn$military_a-mean(data_attn$military_a,na.rm=TRUE)
data_attn$covidgov_a_c=data_attn$covidgov_a-mean(data_attn$covidgov_a,na.rm=TRUE)

This next line of code refers to creating a new column ‘subject’ in data_attn. This column selects data from each individual score from the 1st to n-th participant. This was used because it makes it easier to select specific data to use in figure 3.

data_attn$subject=c(1:nrow(data_attn))

This chunk refers to creating a new data frame data_attnlong. This frame was created to place the data into a long format so that multiple columns from data_attn could be bound together. There is pattern of selecting and gathering columns. For example, within data_attn, columns that end with _a_c (all the mean centered attitude scores) are selected then gathered (also known as pivot) to a long format. This process repeats for columns ending with _b and _a. Columns 38 to 67 and 76 were also selected from the subject column.

data_attnlong <- bind_cols(data_attn %>% select(ends_with("_a_c"), subject,38:67,76) %>%gather(topic, attitude_c, ends_with("_a_c")),data_attn %>% select(ends_with("_b"), subject) %>% gather(topic, beliefsup, ends_with("_b")), data_attn %>% select(ends_with("_a"), subject) %>% gather(topic, attitude, ends_with("_a")))

These two lines of code refer to, firstly, changing the variables in the attitudes column within data_attnlong from numerical to factor. This is because each number denotes the extremity of participants’ attitudes towards a particular issue. The line of code underneath refers to creating a new data frame forplot where the which() function calculates the position of each attitude score within data_attnlong whilst also ignoring any NA values.

data_attnlong$attitude=as.factor(data_attnlong$attitude)
forplot= data_attnlong[which(!is.na(data_attnlong$attitude)),]

As I have already explained most of the functions for this type of code in my two previous figures, I will only explain the new functions/code:

The forplot data frame is used instead of data_attn, this is because forplot only relates to the attitudes columns in the data_attnlong data frame.
Next, position=position_nudge is used for adjusting the position of the flat violin so that it can be easily seen on the graph.
Scale_x_discrete is used instead of scale_x_continuous as the x-axis variable was changed to factor. The x-axis was manually labelled through labels=c().
As previously mentioned, scale_colour_manual was used instead of scale_colour_gradient2 as the latter function can only work using continuous scales. The numbers within the brackets for scale_colour_manual are the RGB colour codes that denote a specific colour.

ggplot(forplot, aes(x=-attitude, y=beliefsup, color=attitude)) +
  geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .8) + geom_point(aes(y = beliefsup, color = attitude), position = position_jitter(width = .15), 
             size = 1, alpha = 0.5) +
   stat_summary(fun.y=mean, size=1, color="black",geom="line", aes(group = 1)) +
  stat_summary(fun.y=mean, size=2, color="black",geom="point", aes(group = 1)) +
  stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=.95),size=1, aes(width=.3), color="black")+
  labs(x='Attitude', y='Belief Superiority') +
   theme_minimal() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
  scale_x_discrete(labels = c("1","2","2.3","3","3.7","4","5"))+ easy_remove_legend() + theme(axis.line= element_line(color="black")) +
  theme(axis.ticks.y = element_line(color="black")) + scale_colour_manual(values = c("#CC0000", "#660000","#331900", "#000000", "#3399FF", "#0000FF", "#000066"))

5. Pilot study descriptives

First, we need to read the pilot study’s data into the environment and rename the data frame to pilot.

Next, three new data frames are created where each data frame only includes participants who affiliate with either the Democrats (Ds), Republicans (Rs) or Independents (Is). This is done by using the which() function, which calculates the position of 1, 2 or 3 (which is Ds, Rs or Is respectively) within the political affiliation (PA) column of pilot. The code, overallmeans=c(), Dmeans=c(), Rmeans=c() and Imeans=c() placed these mean values into the environment. From here, we could find the means by selecting rows 9 to 30 within pilot and finding the means of each affiliation by using colMeans and ignoring NA values. The mean of each affiliation was then renamed to either overallmeans, Dmeans, Rmeans or Imeans.

pilot <- read_csv("pilotdata_all.csv")

## Warning: Duplicated column names deduplicated: 'Q22_1' => 'Q22_1_1' [67],
## 'Q22_2' => 'Q22_2_1' [68], 'Q22_3' => 'Q22_3_1' [69], 'Q22_4' => 'Q22_4_1' [70],
## 'Q22_5' => 'Q22_5_1' [71]

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   Q35 = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

Ds=pilot[which(pilot$PA==1),]
Rs=pilot[which(pilot$PA==2),]
Is=pilot[which(pilot$PA==3),]

overallmeans=c()
Dmeans=c()
Rmeans=c()
Imeans=c()
for (x in 9:30){
  Mean= colMeans(pilot[x],na.rm = TRUE)
  overallmeans=c(overallmeans,Mean)
  Mean= colMeans(Ds[x],na.rm = TRUE)
  Dmeans=c(Dmeans,Mean)
  Mean= colMeans(Rs[x],na.rm = TRUE)
  Rmeans=c(Rmeans,Mean)
  Mean= colMeans(Is[x],na.rm = TRUE)
  Imeans=c(Imeans,Mean)
}

Finally, we can print the means for each affiliation and the overall mean. Here we can see that they are all over 4 apart from Q6_4 (government helping those in need) and Q6_12 (Muslim religious rights) under Imeans, which is what the article reported! This means that for the main study, those two scales were not included as they do not meet the threshold for controversiality.

overallmeans

##     Q6_2     Q6_3     Q6_4     Q6_5     Q6_6     Q6_7     Q6_8     Q6_9 
## 5.504762 5.761905 4.371429 4.685714 4.865385 4.923077 5.009524 4.904762 
##    Q6_10    Q6_11    Q6_12    Q6_13    Q6_14    Q6_15    Q6_16    Q6_17 
## 5.123810 5.257143 4.596154 5.211538 4.904762 5.352381 4.276190 4.567308 
##    Q6_18    Q6_19    Q6_20    Q6_21    Q6_22    Q22_1 
## 4.586538 4.952381 5.790476 4.942857 5.104762 4.137255

Dmeans

##     Q6_2     Q6_3     Q6_4     Q6_5     Q6_6     Q6_7     Q6_8     Q6_9 
## 5.760870 5.804348 4.782609 4.608696 4.755556 4.956522 4.891304 5.000000 
##    Q6_10    Q6_11    Q6_12    Q6_13    Q6_14    Q6_15    Q6_16    Q6_17 
## 5.065217 5.478261 4.711111 5.488889 5.065217 5.369565 4.152174 4.434783 
##    Q6_18    Q6_19    Q6_20    Q6_21    Q6_22    Q22_1 
## 4.565217 5.043478 5.739130 4.956522 5.108696 4.133333

Rmeans

##     Q6_2     Q6_3     Q6_4     Q6_5     Q6_6     Q6_7     Q6_8     Q6_9 
## 5.424242 5.969697 4.454545 4.909091 4.878788 5.030303 5.303030 4.757576 
##    Q6_10    Q6_11    Q6_12    Q6_13    Q6_14    Q6_15    Q6_16    Q6_17 
## 5.333333 5.424242 4.909091 5.030303 5.030303 5.363636 4.636364 4.843750 
##    Q6_18    Q6_19    Q6_20    Q6_21    Q6_22    Q22_1 
## 4.906250 4.818182 5.696970 5.272727 4.969697 4.064516

Imeans

##     Q6_2     Q6_3     Q6_4     Q6_5     Q6_6     Q6_7     Q6_8     Q6_9 
## 5.217391 5.304348 3.347826 4.521739 4.869565 4.727273 4.695652 4.826087 
##    Q6_10    Q6_11    Q6_12    Q6_13    Q6_14    Q6_15    Q6_16    Q6_17 
## 4.826087 4.739130 3.739130 4.956522 4.304348 5.217391 4.043478 4.347826 
##    Q6_18    Q6_19    Q6_20    Q6_21    Q6_22    Q22_1 
## 4.130435 4.869565 5.956522 4.304348 5.130435 4.434783

Challenges

The main challenge this week was tackling figure 3. As the ‘-’ function doesn’t work for this graph, we are still unsure how to reverse the figure so that it matches the one in the article. Another challenge was creating a figure that had the same colour gradient as the article. This was because the scale changed from continuous to discrete so the scale_colour_gradient2function wouldn’t work. Eventually, I found that manually inputting the colours using the RGB colour code system and the function scale_colour_manual did the trick!
Another challenge we faced this week was understanding functions the authors’ used in the pilot study that we hadn’t come across before. This included the which() function. Similarly, understanding the fun functions in figure 3 was a challenge but again, after much googling, we figured it out!
When attempting to fix the error bars on figure 2, we went through the code line by line and found that the code lim=c() from the scale_x_continuous line, changed the error bars. This could be because defining the limits of the x-axis means that a potential point beyond the specified limits cannot be used to create error bars. We fixed this issue by removing the code. However, when comparing to the original plot in the article, we found that the points looked more different than when lim=c() was included. This is not ideal as this means that the interpreted data deviates from the original. Because of this, we decided to keep lim=c() in our final code for figure 2 and not worry about fixing the error bars.

Successes

Despite being unable to mirror the graph we were still able to closely resemble the figure from the article. This means that we finished all three figures!
After googling and using the help engine within RStudio, we were able to figure out the meaning of functions we had not seen before!
We finished verifying all the code! This is a huge success, as this once daunting task has now been completed!

Next Steps

As I have now finished verifying the descriptives and figures for this article, my next steps in coding is to start the exploratory section of the verification report. Next week I need to start thinking of 3 questions to ask the data that the authors did not ask in their study and start exploring the data with my new R skills!
Specifically, my next coding steps will also be to introduce myself to inferential statistics code, so that I can use this in my final report.

Learning Log

Sasha Kew

18/07/2021