My goals for this week were:
Finish coding figure 1 so that it looks more similar to the figure in the published article.
Simplify and understand the code for figure 2.
Although these goals seem few, our group found that we really needed the time to understand the code and functions behind the figures, so that we could best replicate the graph in the published article rather than what was given in the authors’ code.
To get an understanding of where the authors’ code differentiated from the figure they presented in the published article, we revisited figure 1 to try and match it as closely to the article as possible. From last week’s code, we can see that we got pretty close to how it looks in the article. However, upon closer inspection, this week we figured out that the authors had flipped or mirrored their graph!
Here is our code and figure output from last week:
data_attn$meanA_c= data_attn$meanA-mean(data_attn$meanA,na.rm=TRUE)
ggplot(data_attn, aes(x=meanA_c, y=meanD, color=meanA_c)) + geom_point(aes(y = meanD), position = position_jitter(width = .15),size = 0.5, alpha = 1) +
labs(x='Average Attitude', y='Dogmatism') + stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + theme_minimal()+
xlim(c(-2,2))+
scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9))+ easy_all_text_size(12) + scale_color_gradient2(mid="black", low="blue", high="red") + easy_remove_legend()
And here is the article’s graph:
When looking closely, it can be seen that the regression stat_smooth
line is actually mirrored on the published article. We’re unsure if this is ok because mean centering the variables places the data on a sort of z-score scale, or if the authors have completely manipulated the data. Regardless, we did a lot of googling to figure out how to get this figure mirrored.
First, we tried adding the function scale_x_reverse()
to the figure, which showed the trend we wanted but changed the values of the x-axis from -2, 2 to 2, -1:
ggplot(data_attn, aes(x=meanA_c, y=meanD, color=meanA_c)) + geom_point(aes(y = meanD), position = position_jitter(width = .15),size = 0.5, alpha = 1) +
labs(x='Average Attitude', y='Dogmatism') + stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + theme_minimal()+
xlim(c(-2,2))+
scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9))+ easy_all_text_size(12) + scale_color_gradient2(mid="black", low="blue", high="red") + easy_remove_legend() + scale_x_reverse()
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
This is not what we wanted, so we continued to google! Finally we found that adding ‘-’ before meanA_c in the aes()
function mirrored the graph!
Personally, I am still unsure why the authors felt the need to reverse the graph and why the ‘-’ does that.
ggplot(data_attn, aes(x=-meanA_c, y=meanD, color=meanA_c)) + geom_point(aes(y = meanD), position = position_jitter(width = .15),size = 0.5, alpha = 1) +
labs(x='Average Attitude', y='Dogmatism') + stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + theme_minimal()+
xlim(c(-2,2))+
scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9))+ easy_all_text_size(12) + scale_color_gradient2(mid="black", low="blue", high="red") + easy_remove_legend()
Hooray!
We had a few more things to tidy up before moving onto figure 2. First, we noticed that we forgot to add axis lines. This meant that we needed to add three lines of code which used the theme
function to add an axis line then used element_line
to format the line to black. Then we used axis.ticks.y
to add ticks to the y-axis and panel.grid.major
to make the background of the figure blank (without the grid).
ggplot(data_attn, aes(x=-meanA_c, y=meanD, color=meanA_c))+ geom_point(aes(y = meanD), position = position_jitter(width = .15),size = 1, alpha = 1) +
labs(x='Average Attitude', y='Dogmatism') + stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) + theme_minimal()+
xlim(c(-2,2)) +
scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9))+ easy_all_text_size(12) + scale_color_gradient2(mid="black", low="blue", high="red") + easy_remove_legend() + theme(axis.line= element_line(color="black")) +
theme(axis.ticks.y = element_line(color="black")) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
This is now looking very similar to the article’s first figure!
First, we eliminated code relating to text, as we knew that we could use the ggeasy easy_all_text_size
function to format the text of the figure. We then removed any code relating to the legend of the figure, as we added easy_remove_legend()
to remove the legend entirely. We also deleted the code relating to strip.text.x
as we had no idea what this meant and it did not effect the figure when removed.
We ended up removing about 8 lines of code!
We left code relating to the axis (explained in figure 1 code above) so that the axis lines and ticks remained. We also kept the line of code concerning the panel grid to make the background blank.
data_attn$PO_c= data_attn$Q12-mean(data_attn$Q12,na.rm=TRUE)
ggplot(data_attn, aes(x=PO_c, y=meanD,color=PO_c)) +
geom_point(aes(y = meanD), position = position_jitter(width = .15), size = 1, alpha = 0.5) +
stat_summary(fun.y=mean, geom='point', size=2, color="black") +
stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=0.95),
size=1.2, aes(width=.3), color="black")+
labs(x='Political Orientation', y='Dogmatism') +
stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) +
theme_minimal() +
theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_remove_legend() +
scale_x_continuous(breaks = c(-3,-2,-1,0,1,2,3),lim=c(-3.1,3.1))+
scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9)) + easy_all_text_size(12) + scale_colour_gradient2(low = "red", mid = "black", high = "blue")+theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
We then added the scale_colour_gradient2
function to make the colour gradient of the figure as similar to the article as possible. We also altered the size and alpha of the graph’s points from 2.5 to 1 and 0.6 to 0.5 respectively, to again, make it more similar to the article’s figure.
Like in figure 1, we found that the authors had mirrored the figure again! So, we added ‘-’ in front of PO_c in the aes()
function, to achieve this mirrored effect.
data_attn$PO_c= data_attn$Q12-mean(data_attn$Q12,na.rm=TRUE)
ggplot(data_attn, aes(x=-PO_c, y=meanD,color=PO_c)) +
geom_point(aes(y = meanD), position = position_jitter(width = .15), size = 1, alpha = 0.5) +
stat_summary(fun.y=mean, geom='point', size=2, color="black") +
stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=0.95),
size=1.2, aes(width=.3), color="black")+
labs(x='Political Orientation', y='Dogmatism') +
stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1) +
theme_minimal() +
theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_remove_legend() +
scale_x_continuous(breaks = c(-3,-2,-1,0,1,2,3),lim=c(-3.1,3.1))+
scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9)) + easy_all_text_size(12) + scale_colour_gradient2(low = "red", mid = "black", high = "blue")+theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
This is now looking very similar to the article’s graph shown below!
Here’s a further explanation of the functions we used to plot figure 2. Figure 2 graphs the relationship between the political orientation (the extremity of one’s liberal or conservative beliefs) and dogmatism of participants.
This first line of the code, much like the first line of figure 1, relates to mean centering the political orientation variable. This was done by selecting the Q12 column within the data_attn data frame, as Q12 relates to the political orientation of participants. The grand mean of the data was then subtracted from the mean of Q12’s data. As usual, na.rm=TRUE
was used to ignore NA values within the Q12 column. This created the PO_c column.
data_attn$PO_c= data_attn$Q12-mean(data_attn$Q12,na.rm=TRUE)
These next two lines of code relate to creating the graph. The ggplot
function was used to read data_attn and create a figure where aes()
placed PO_c on the x-axis, meanD on the y-axis, and the colours of the points on the scatter plot to follow the PO_c variable. Although this is usually done to add a third variable, using PO_c as color
I think just makes the figure prettier and easier to read.
geom_point
was then used to create a scatter plot where aes()
asserts that meanD is on the y-axis. The colour and size of the points on the scatter plot were dictated using width, size and alpha. And position=position_jitter
refers to making the graph easier to read by avoiding over plotting.
ggplot(data_attn, aes(x=-PO_c, y=meanD,color=PO_c)) +
geom_point(aes(y = meanD), position = position_jitter(width = .15), size = 1, alpha = 0.5)
These next lines of code refer to the formatting and equation used for the regression line and its error bars. The stat_summary
function was used to summarise either the complete data using fun.data
or the y-axis vector using fun.y
. The points along the line of regression were created using geom=point
and formatting using size
and color
. Using fun.data
, the mean_cl_boot
argument obtains confidence intervals limits of the means and places these limits onto an errorbar using geom='errorbar'
. The fun.args=list(conf.int=0.95)
line of code is used to specify that the confidence limits are set to 95% on the error bars. Labs
was then used to change the x- and y-axis labels and, like figure 1, the line of code stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1)
was used to code the quadratic regression line using the same formula.
It is important to note that these errorbars did not work unless the package Hmisc
was installed.
stat_summary(fun.y=mean, geom='point', size=2, color="black") +
stat_summary(fun.data = mean_cl_boot,geom='errorbar', fun.args=list(conf.int=0.95),
size=1.2, aes(width=.3), color="black")+
labs(x='Political Orientation', y='Dogmatism') +
stat_smooth(method = "lm", formula = y ~ x + I(x^2), size = 1)
As previously explained, these lines of code refer to the formatting of the plot itself. This includes using theme
functions to format the axis and background of the figure, scale_colour_gradient2
to specify the colour gradient of the points on the figure, easy_remove_legend()
and easy_all_text_size()
to format the text and remove the legend of the figure.
theme_minimal() +
theme(axis.line= element_line(color="black")) + theme(axis.ticks.y = element_line(color="black")) + easy_remove_legend() + easy_all_text_size(12) + scale_colour_gradient2(low = "red", mid = "black", high = "blue")+theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
These last two lines of code refer to specifying the numerical points and breaks between points on the x- and y-axis. This includes using the lim
function to compute the lower (0) and upper (highest data value) limits of each axis and the breaks
argument to set the start and end points for each axis. Scale_x_continuous
and scale_y_continuous
functions were used because the data was continuous.
scale_x_continuous(breaks = c(-3,-2,-1,0,1,2,3),lim=c(-3.1,3.1))+
scale_y_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9),lim=c(1,9))
The biggest challenges our group faced this week was trying to mirror the graph and obtain error bars on figure 2. We are still unsure why the authors decided to reverse the graphs and whether this was necessary. This was not referenced in the authors’ code at all, as their output also did not show this mirrored version of the figure.
Everyone experienced the pickiness of R this week as we all tried to obtain error bars on figure 2. There were many times where some of us had error bars but others did not, then those error bars disappeared with the addition of other code or when the document knitted. It took a while but we finally figured out that we needed to install the Hmisc
package. Weirdly, we didn’t have to load the package through libraries for the error bars to work, we just needed to have the package installed. We are still having a bit of trouble with our error bars, as we cannot add the horizontal lines for the first and last error bars on the figure.
Our group is also still unsure for the purpose behind mean centering the variables. All we know is that the code for the figures won’t work without it!
I am also unsure of what the fun.args=list
code means. I think it has to do with specifying the 95% confidence intervals but I’m not exactly sure how to explain it.
After much googling, our group was able to figure out how to reverse the graph which was a great achievement! This was very confusing for us, so being able to find how to do it felt like we had come far in our coding journey!
Figuring out that we needed to download the Hmisc
package was also a success as we were able to code error bars!
Our group was able to finish figure 2 which was great! We are feeling really good with our progress and are keen to get started on figure 3!
Finishing off the week with another picture of my favourite good boi!