Goals for this week

Welcome back to another learning log by ya boy Thomas G. AKA Mr Histogram. This week my goals were to replicate the study 1 histogram in my groups paper using geom_histogram instead of the hist function. Other goals included meeting with my workshop group to touch base on our paper, and finishing the second half of the data wrangling module.

geom_histogram trials and tribulations for replication

So, last week I managed to replicate the histogram in my study using the function hist(). However, I was not happy with the aesthetics that it gave me in the output:

This is how the graph looks in the paper:

I wanted to try and replicate this graph as exact as possible. Jenny suggested I used geom_histogram instead, so that’s what I tried to do for this week.

The first issue I ran into was that I didn’t understand how to alter the ‘binwidth’ of my geom_histogram to match that of the papers histogram. That is, the binwidth specifies the width of each of the ‘bins’ which is just another name for the bars of the histogram. But not matter what binwidth value I played around with, I could not get the correct height of the bins in the graph. This was clearly not the problem with the replication…

picture <- ggplot(Data, (aes(SCdiff))) + 
  geom_histogram(binwidth = 0.5, colour = 'black', fill = 'darkgrey') +
  labs(x = "Social Connectedness Difference Score (T2 - T1)", y = "Frequency")
  

print(picture)

Plot themes!!

While searching through forum posts and guides online, I stumbled across ggplot themes, and I came to the realisation that the graph in my paper used theme_classic() to remove the grid background and clean up the axis’, so I incorporated it into my code:

picture <- ggplot(Data, (aes(SCdiff))) + 
  geom_histogram(binwidth = 0.5, colour = 'black', fill = 'darkgrey') +
  labs(x = "Social Connectedness Difference Score (T2 - T1)", y = "Frequency") +
  theme_classic()
  
print(picture)

Axis intervals and limits

I moved onto try and replicate the intervals displayed on the x-axis, as my plots were only displaying intervals of 2. I fiddled with different bits of code that people had posted about online and managed to correctly replicate the intervals with this code:

picture <- ggplot(Data, (aes(SCdiff))) + 
  geom_histogram(binwidth = 0.5, colour = 'black', fill = 'darkgrey') +
  labs(x = "Social Connectedness Difference Score (T2 - T1)", y = "Frequency") +
  scale_x_continuous(limits = c(-3.5, 3), # sets the scale of the x axis, limits the lower and upper limits of the axis
                     breaks = c(-3, -2, -1, 0, 1, 2, 3)) + #breaks up the axis into intervals of 1. 
  theme_classic()

print(picture)
## Warning: Removed 2 rows containing missing values (geom_bar).

Fixing my plot with boundaries

I was reading up on geom_histogram and different functions on this website: https://ggplot2.tidyverse.org/reference/geom_histogram.html#examples, and came across 2 arguments that sounded like the solution to my issue of centering. As you can see in the above plots I have made, the centre of the bar meets with each of the x-axis intervals, whilst in the paper plot (to be replicated), the bins are in between each interval. I tried using the center command, which I think does what my plot already had (centred bins to intervals), but then I used boundaries and it worked!

picture <- ggplot(Data, (aes(SCdiff))) + 
  geom_histogram(binwidth = 0.5, boundary = 0 #enabled me to fit 2 bins into the interval boundaries.
                 , colour = 'black', fill = 'darkgrey') +
  labs(x = "Social Connectedness Difference Score (T2 - T1)", y = "Frequency") +
  scale_x_continuous(limits = c(-3.5, 3), 
                     breaks = c(-3, -2, -1, 0, 1, 2, 3)) +
  theme_classic()

print(picture)

Spacing between axis

The only thing left to get an exact replication was to remove the spacing between the plot and the x & y axis. After some Google searching, I was able to find that by default, the scale is expanded by 5%, so code was needed to remove this.

picture <- ggplot(Data, (aes(SCdiff))) + 
  geom_histogram(binwidth = 0.5, boundary = 0, colour = 'black', fill = 'darkgrey') +
  labs(x = "Social Connectedness Difference Score (T2 - T1)", y = "Frequency") +
  scale_x_continuous(limits = c(-3.5, 3), 
                     breaks = c(-3, -2, -1, 0, 1, 2, 3), 
                     expand = c(0, 0)) + # Sets expansion to 0 for x axis.
  scale_y_continuous(expand = c(0, 0)) + # Sets expansion to 0 for y axis.
  theme_classic()

print(picture)

And there we go, the plot in the paper has been replicated perfectly.

Next steps

The next steps for me will be to replicate the other histogram(s) in my paper, which should be much easier now that I’ve done one:

I also plan to attend the Tuesday Q&A if I have any questions, and aim to continue practicing with different R functions and arguments.