Goals

  • Add error bars to Fig 1
  • Create Fig 1 Data by Site plot
  • Add aesthetics to make Fig 1 look like the original

So far, my code for Fig 1 looks like this:

loading libraries

library(tidyverse)
library(janitor) #used to read csv
library(ggplot2) #used to plot column
library(ggdist) #used for raincloud plot
library(gridExtra) #combines both plots

reading nichols csv

data1 <- read_csv("~/RFiles/nichols/group4_nicols/data/Nichols_et_al_data.csv")

cleaning up variables

We used the dplyr package to make the data much neater. Rename was used to clean up the variables to look more like the original. We also used filter and select to narrow down the variables we will be using for Fig 1.

Claimpercent had to be mutated to become a percentage value, since they were in decimal place beforehand. Then, we used as_tibble to make the data easier to look at.

data1 <- data1 %>%  
  filter(include == 0) %>% 
  rename(cond = con,
         claimpercent = claim,
         claimmoney = moneyclaim,
         CT_practice = `completion time (practice included)`,
         CT_payments = `completion time (payments only)`,
         religiosity = relig,
         religion = Religion) %>% 
  select(site, claimpercent, id, cond) %>% 
  mutate(claimpercent = claimpercent * 100) %>% 
  as_tibble()

rename each condition for “data by condition” and “data by site”

We had to create a new claimpercent variable so each condition had a mean value instead. As such, we had to rename each condition because the values in the original plot and the variables in the code did not align.

For example, the code said that 1 was USA, 2 was Czech Republic and 3 was Japan but the order in the original graph went: 1 = USA, 2 = Japan, 3 = Czech Republic. I also had to change the scaling from 1-3 to 0-2 so the graph could start at zero instead.

# data by condition

data1$cond[data1$cond==4] <- 0 # Make religious prime the reference category
data1$cond[data1$cond==1] <- 4 # This is in a weird order as R reads the code line by line, so if we go from top to bottom, 
data1$cond[data1$cond==3] <- 1 # we're changing the number twice which screws up our dataframe
data1$cond[data1$cond==4] <- 3

# data by site

data1$site[data1$site==1] <- 0 
data1$site[data1$site==3] <- 1 

mutating claimpercent

We used mutate from the dplyr package to create “claimpercent2” for data by condition and “claimpercent3” for data by site, which would be the average value of each condition/site rather than the sum all of the values.

# data by condition

data1 <- data1 %>% 
  mutate(numberOf = (cond == 0) * 100 + (cond == 1 | cond == 2) * 103 + (cond == 3) * 102) %>% 
  mutate(claimpercent2 = claimpercent / numberOf)

# data by site

data1 <- data1 %>% 
  mutate(numberOf2 = (site == 0) * 123 + (site == 2) * 127 + (site == 1) * 156) %>% 
  mutate(claimpercent3 = claimpercent / numberOf2)

calculating error bars

Created group_cond so geom_error (in the next chunk) can plot the error bar. group_by() from dplyr creates a tibble so we can summarise() (also from dplyr) to make a new data frame.

Calculated the mean with “mean = mean(claimpercent)” and na.rm = TRUE skips all NA values.

n = n() assigns n the number of rows in the dataframe.

Like na.rm = TRUE, drop_na() from tidyverse “drops” rows that contains missing values.

Then we created two new variables, lowerCI and upperCI, by calculating se and then calculating lowerCI and upperCI.

Then, we ungrouped the data frame.

The same was done for Data by Site, except we created group_site.

# data by condition

group_cond <- data1 %>% 
  group_by(cond) %>% 
  summarise(mean = mean(claimpercent, na.rm = TRUE),
            sd = sd(claimpercent, na.rm = TRUE),
            n = n()) %>% 
  drop_na() %>% 
  mutate(se = sd / sqrt(n),
         lowerCI = mean - qt(1 - (0.05/2), n - 1) * se,
         upperCI = mean + qt(1 - (0.05/2), n - 1) * se) %>% 
  ungroup()

# data by site

group_site <- data1 %>% 
  group_by(site) %>% 
  summarise(mean = mean(claimpercent, na.rm = TRUE),
            sd = sd(claimpercent, na.rm = TRUE),
            n = n()) %>% 
  drop_na() %>% 
  mutate(se = sd / sqrt(n),
         lowerCI = mean - qt(1 - (0.05/2), n - 1) * se,
         upperCI = mean + qt(1 - (0.05/2), n - 1) * se) %>% 
  ungroup()

plotting fig 1 - data by condition and data by site

We used the ggdist package for “stat_halfeye” to create the cloud parts of the graph, and we combined it with geom_col from ggplot2 to get the bar underneath. We had to mess with the width so both could fit in the respective x axis.

In the ggdist::stat_halfeye section, adjust changed the height so we could leave room for the column. “.width” and “point_colour” were a part of the original cloud plot but we removed them since they are not in Fig 1.

We used “coord_flip()” from ggplot2 to flip the graph to look like the original Figure 1.

Used “geom_errorbar()” from ggplot2 to plot the error bars, using the SE and assigning the lower and upper CI from the previous chunk. I adjusted the width to fit the columns.

“Expand limits” from ggplot2 ensured that limits included x = 0 and y = 0 to better fit with the original graph.

Expand made sure the x and y axis so it included zero. Used label to name each value, and used break so labels and breaks were the same length. “name = NULL” removed the x axis label, like the original figure, and “name =”Percent Claimed"" relabeled the y axis.

“theme_classic()” created a white background for the graph (instead of grids).

ggtitle() from ggplot2 was used to create the “A.” title in the corner of the graph, and facet_grid() allowed us to put the title “Data by _____” in its own box, like in the original.

Similarly, all these functions were used to make Data by Site, just with labels and variables swapped out.

# data by condition

fig1_cond <- ggplot(data1, aes(x = cond)) +
  geom_col(
    aes(y = claimpercent2, fill = cond),
    width = .3
  ) +
  ggdist::stat_halfeye(
    aes(y = claimpercent, fill = cond),
    adjust = .5,
    width = .4,
    .width = 0,
    point_colour = NA,
    position = position_nudge(x = 0.17, y = 0)
  ) +
  geom_errorbar(data = group_cond,
                aes(ymin = lowerCI, ymax = upperCI),
                width = .3) +
  coord_flip() +
  expand_limits(x = 0, y = 0) +
  scale_x_continuous(name = NULL,
                     expand = c(0,0),
                     breaks = c(0, 1, 2, 3),
                     labels = c("Control","Noise","Secular","Religious")
  ) +
  scale_y_continuous(name = "Percent Claimed",
                     expand = c(0,0),
                     breaks = c(0, 20, 40, 60, 80, 100)
                     ) +
  ggtitle(
    label = "Data by condition"
  ) +
  theme_classic() +
  ggtitle(label = "A.") +
  facet_grid(. ~ "Data by condition") 
# data by site

fig1_site <- ggplot(data1, aes(x = site)) +
  geom_col(
    aes(y = claimpercent3),
    width = .3
  ) +
  scale_colour_manual(values = c("#adadad","#ecb234")
                      ) +
  geom_errorbar(data = group_site,
                aes(ymin = lowerCI, ymax = upperCI),
                width = .3
  ) +
  ggdist::stat_halfeye(
    aes(y = claimpercent),
    adjust = .5,
    width = .3,
    .width = 0,
    point_colour = NA,
    position = position_nudge(x = 0.2, y = 0)
  ) +
  coord_flip() +
  expand_limits(x = 0, y = 0) +
  scale_x_continuous(name = NULL,
                     expand = c(0,0), 
                     breaks = c(0, 1, 2),
                     labels = c("USA","Japan","Czech Rep.")
                     ) +
  scale_y_continuous(name = "Percent Claimed",
                     expand = c(0,0),
                     breaks = c(0, 20, 40, 60, 80, 100)
                     ) +
  theme_classic() +
  ggtitle(
    label = "B.") +
  facet_grid(. ~ "Data by site") 

combining fig1_cond and fig1_site

Used “grid.arrange” from the gridExtra package to combine the two plots into one! ncol = 2 arranged it in 2 columns instead of rows.

grid.arrange(fig1_cond, fig1_site, ncol = 2)

The results look like this:

I am aware that the colours and legend for Data by Condition are wrong, I was just messing around with colours to see what I could get, and planning to remove the legend once I know.

Challenges

When trying to change the colours for the aesthetics, it was extremely difficult to do. I believe this is because our plots use continuous values so we cannot use scale_fill_colour or any other functions that change colours of discrete values.

So, we tried changing cond to a discrete value to see if that would work but that screwed up the renaming section by changing all 0 values to NA, thus screwing up our graph.

Using any other function that works for continuous creates gradients, so I am not sure what else I can do about the colours. I could use scale_fill_viridis which does change the colours, but I can’t get the exact colours the original plot does.

As previously mentioned, I also cannot knit my markdown files because rstudio cannot find the ggplot function even though I can run the ggplot function just fine in each chunk. Results from Google haven’t worked for me yet :(

Next Steps

  • See if I can change the colours of the graphs
  • Find out why ggplot can’t be knitted in R Markdown