Week 6

Goals

My goals for this week were:

  1. Finish coding figure 1 so that it looks more similar to the figure in the published article.

  2. Simplify and understand the code for figure 2.

Although these goals seem few, our group found that we really needed the time to understand the code and functions behind the figures, so that we could best replicate the graph in the published article rather than what was given in the authors’ code.

Coding Steps

1. Revisiting figure 1

To get an understanding of where the authors’ code differentiated from the figure they presented in the published article, we revisited figure 1 to try and match it as closely to the article as possible. From last week’s code, we can see that we got pretty close to how it looks in the article. However, upon closer inspection, this week we figured out that the authors had flipped or mirrored their graph!

Here is our code and figure output from last week:

data_attn$meanA_c= data_attn$meanA-mean(data_attn$meanA,na.rm=TRUE)

ggplot(data_attn, aes(x=meanA_c, y=meanD, color=meanA_c)) + geom_point(aes(y = meanD), position = position_jitter(width = .15),size = 0.5, alpha = 1) +
  labs(x='Average Attitude', y='Dogmatism') +  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + theme_minimal()+
  xlim(c(-2,2))+
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9))+ easy_all_text_size(12) + scale_color_gradient2(mid="black", low="blue", high="red") + easy_remove_legend()

And here is the article’s graph:

When looking closely, it can be seen that the regression stat_smooth line is actually mirrored on the published article. We’re unsure if this is ok because mean centering the variables places the data on a sort of z-score scale, or if the authors have completely manipulated the data. Regardless, we did a lot of googling to figure out how to get this figure mirrored.

First, we tried adding the function scale_x_reverse() to the figure, which showed the trend we wanted but changed the values of the x-axis from -2, 2 to 2, -1:

ggplot(data_attn, aes(x=meanA_c, y=meanD, color=meanA_c)) + geom_point(aes(y = meanD), position = position_jitter(width = .15),size = 0.5, alpha = 1) +
  labs(x='Average Attitude', y='Dogmatism') +  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + theme_minimal()+
  xlim(c(-2,2))+
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9))+ easy_all_text_size(12) + scale_color_gradient2(mid="black", low="blue", high="red") + easy_remove_legend() + scale_x_reverse()
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

This is not what we wanted, so we continued to google! Finally we found that adding ‘-’ before meanA_c in the aes() function mirrored the graph!

Personally, I am still unsure why the authors felt the need to reverse the graph and why the ‘-’ does that.

ggplot(data_attn, aes(x=-meanA_c, y=meanD, color=meanA_c)) + geom_point(aes(y = meanD), position = position_jitter(width = .15),size = 0.5, alpha = 1) +
  labs(x='Average Attitude', y='Dogmatism') +  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + theme_minimal()+
  xlim(c(-2,2))+
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9))+ easy_all_text_size(12) + scale_color_gradient2(mid="black", low="blue", high="red") + easy_remove_legend()

Hooray!

We had a few more things to tidy up before moving onto figure 2. First, we noticed that we forgot to add axis lines. This meant that we needed to add three lines of code which used the theme function to add an axis line then used element_line to format the line to black. Then we used axis.ticks.y to add ticks to the y-axis and panel.grid.major to make the background of the figure blank (without the grid).

ggplot(data_attn, aes(x=-meanA_c, y=meanD, color=meanA_c))+ geom_point(aes(y = meanD), position = position_jitter(width = .15),size = 1, alpha = 1) +
  labs(x='Average Attitude', y='Dogmatism') +  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + theme_minimal()+
  xlim(c(-2,2)) +
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9))+ easy_all_text_size(12) + scale_color_gradient2(mid="black", low="blue", high="red") + easy_remove_legend() + theme(axis.line= element_line(color="black")) +
  theme(axis.ticks.y = element_line(color="black")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) 

This is now looking very similar to the article’s first figure!

2. Authors’ code for figure 2

We could finally move on to figure 2! Again, the authors’ code for this figure was not promising. It gave us a baseline understanding of what functions we needed but the output, again, looked nothing like what was published in the article.

data_attn$PO_c= data_attn$Q12-mean(data_attn$Q12,na.rm=TRUE)

ggplot(data_attn, aes(x=PO_c, y=meanD,color=PO_c))+
  geom_point(aes(y = meanD), position = position_jitter(width = .15), size = 2.5, alpha = 0.6) +
  stat_summary(fun.y=mean, geom='point', size=2, color="black") +
  stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=.95), 
               size=1.5, aes(width=.3), color="black")+
  labs(x='Political Orientation', y='Dogmatism') +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 2) +
   theme_minimal()+
  theme(axis.title.y = element_text(size=16, face="bold"))+
  theme(axis.title.x = element_text(size=16, face="bold"))+
  theme(axis.text.y=element_text(color = "black", size = 14))+
  theme(axis.text.x=element_text(color = "black", size = 14))+
  theme(legend.text = element_text(color = "black", size = 14))+
  theme(legend.title = element_text(color = "black", size = 14))+
  theme(axis.line= element_line(color="black")) +
  theme(axis.ticks.y = element_line(color="black")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
  theme(strip.text.x=element_text(color = "black", size = 14, face="bold")) +
  theme(legend.position = "none")+
  scale_x_continuous(breaks = c(-3,-2,-1,0,1,2,3),lim=c(-3.1,3.1))+
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9))

This code could definitely be improved on!

3. Figure 2 replication- what we kept, removed and added

First, we eliminated code relating to text, as we knew that we could use the ggeasy easy_all_text_size function to format the text of the figure. We then removed any code relating to the legend of the figure, as we added easy_remove_legend() to remove the legend entirely. We also deleted the code relating to strip.text.x as we had no idea what this meant and it did not effect the figure when removed.

We ended up removing about 8 lines of code!

We left code relating to the axis (explained in figure 1 code above) so that the axis lines and ticks remained. We also kept the line of code concerning the panel grid to make the background blank.

data_attn$PO_c= data_attn$Q12-mean(data_attn$Q12,na.rm=TRUE)

ggplot(data_attn, aes(x=PO_c, y=meanD,color=PO_c)) +
  geom_point(aes(y = meanD), position = position_jitter(width = .15), size = 1, alpha = 0.5) +
  stat_summary(fun.y=mean, geom='point', size=2, color="black") +
  stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=0.95), 
               size=1.2, aes(width=.3), color="black")+
  labs(x='Political Orientation', y='Dogmatism') +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) +
   theme_minimal() +
  theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_remove_legend() + 
  scale_x_continuous(breaks = c(-3,-2,-1,0,1,2,3),lim=c(-3.1,3.1))+
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9)) + easy_all_text_size(12) + scale_colour_gradient2(low = "red", mid = "black", high = "blue")+theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

We then added the scale_colour_gradient2 function to make the colour gradient of the figure as similar to the article as possible. We also altered the size and alpha of the graph’s points from 2.5 to 1 and 0.6 to 0.5 respectively, to again, make it more similar to the article’s figure.

4. Mirrored figures again!

Like in figure 1, we found that the authors had mirrored the figure again! So, we added ‘-’ in front of PO_c in the aes() function, to achieve this mirrored effect.

data_attn$PO_c= data_attn$Q12-mean(data_attn$Q12,na.rm=TRUE)

ggplot(data_attn, aes(x=-PO_c, y=meanD,color=PO_c)) +
  geom_point(aes(y = meanD), position = position_jitter(width = .15), size = 1, alpha = 0.5) +
  stat_summary(fun.y=mean, geom='point', size=2, color="black") +
  stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=0.95), 
               size=1.2, aes(width=.3), color="black")+
  labs(x='Political Orientation', y='Dogmatism') +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) +
   theme_minimal() +
  theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_remove_legend() + 
  scale_x_continuous(breaks = c(-3,-2,-1,0,1,2,3),lim=c(-3.1,3.1))+
  scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9)) + easy_all_text_size(12) + scale_colour_gradient2(low = "red", mid = "black", high = "blue")+theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

This is now looking very similar to the article’s graph shown below!

5. Explaining figure 2 code

Here’s a further explanation of the functions we used to plot figure 2. Figure 2 graphs the relationship between the political orientation (the extremity of one’s liberal or conservative beliefs) and dogmatism of participants.

This first line of the code, much like the first line of figure 1, relates to mean centering the political orientation variable. This was done by selecting the Q12 column within the data_attn data frame, as Q12 relates to the political orientation of participants. The grand mean of the data was then subtracted from the mean of Q12’s data. As usual, na.rm=TRUE was used to ignore NA values within the Q12 column. This created the PO_c column.

data_attn$PO_c= data_attn$Q12-mean(data_attn$Q12,na.rm=TRUE)

These next two lines of code relate to creating the graph. The ggplot function was used to read data_attn and create a figure where aes() placed PO_c on the x-axis, meanD on the y-axis, and the colours of the points on the scatter plot to follow the PO_c variable. Although this is usually done to add a third variable, using PO_c as color I think just makes the figure prettier and easier to read.

geom_point was then used to create a scatter plot where aes() asserts that meanD is on the y-axis. The colour and size of the points on the scatter plot were dictated using width, size and alpha. And position=position_jitter refers to making the graph easier to read by avoiding over plotting.

ggplot(data_attn, aes(x=-PO_c, y=meanD,color=PO_c)) +
  geom_point(aes(y = meanD), position = position_jitter(width = .15), size = 1, alpha = 0.5)

These next lines of code refer to the formatting and equation used for the regression line and its error bars. The stat_summary function was used to summarise either the complete data using fun.data or the y-axis vector using fun.y. The points along the line of regression were created using geom=point and formatting using size and color. Using fun.data, the mean_cl_boot argument obtains confidence intervals limits of the means and places these limits onto an errorbar using geom='errorbar'. The fun.args=list(conf.int=0.95) line of code is used to specify that the confidence limits are set to 95% on the error bars. Labs was then used to change the x- and y-axis labels and, like figure 1, the line of code stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) was used to code the quadratic regression line using the same formula.

It is important to note that these errorbars did not work unless the package Hmisc was installed.

stat_summary(fun.y=mean, geom='point', size=2, color="black") +
  stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=0.95), 
               size=1.2, aes(width=.3), color="black")+
  labs(x='Political Orientation', y='Dogmatism') +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1)

As previously explained, these lines of code refer to the formatting of the plot itself. This includes using theme functions to format the axis and background of the figure, scale_colour_gradient2 to specify the colour gradient of the points on the figure, easy_remove_legend() and easy_all_text_size() to format the text and remove the legend of the figure.

 theme_minimal() +
  theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_remove_legend() + easy_all_text_size(12) + scale_colour_gradient2(low = "red", mid = "black", high = "blue")+theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

These last two lines of code refer to specifying the numerical points and breaks between points on the x- and y-axis. This includes using the lim function to compute the lower (0) and upper (highest data value) limits of each axis and the breaks argument to set the start and end points for each axis. Scale_x_continuous and scale_y_continuous functions were used because the data was continuous.

scale_x_continuous(breaks = c(-3,-2,-1,0,1,2,3),lim=c(-3.1,3.1))+
scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9))

Challenges

  • The biggest challenges our group faced this week was trying to mirror the graph and obtain error bars on figure 2. We are still unsure why the authors decided to reverse the graphs and whether this was necessary. This was not referenced in the authors’ code at all, as their output also did not show this mirrored version of the figure.

  • Everyone experienced the pickiness of R this week as we all tried to obtain error bars on figure 2. There were many times where some of us had error bars but others did not, then those error bars disappeared with the addition of other code or when the document knitted. It took a while but we finally figured out that we needed to install the Hmisc package. Weirdly, we didn’t have to load the package through libraries for the error bars to work, we just needed to have the package installed. We are still having a bit of trouble with our error bars, as we cannot add the horizontal lines for the first and last error bars on the figure.

  • Our group is also still unsure for the purpose behind mean centering the variables. All we know is that the code for the figures won’t work without it!

  • I am also unsure of what the fun.args=list code means. I think it has to do with specifying the 95% confidence intervals but I’m not exactly sure how to explain it.

Successes

  • After much googling, our group was able to figure out how to reverse the graph which was a great achievement! This was very confusing for us, so being able to find how to do it felt like we had come far in our coding journey!

  • Figuring out that we needed to download the Hmisc package was also a success as we were able to code error bars!

  • Our group was able to finish figure 2 which was great! We are feeling really good with our progress and are keen to get started on figure 3!

Next Steps

  • Our group has made great progress with both generating and understanding the code for the data thus far! Therefore, our next steps are to finish and understand the necessary code for the last figure of the paper and the descriptive statistics for the pilot study. I am a bit apprehensive of figure 3 as the authors’ code for it is quite long, but I’m keen to figure it out with my group!

Teddy of the Week!

Finishing off the week with another picture of my favourite good boi!