Loading in the essential packages and read the corresponding data. The initial step is to first establish what package we should use, and we decided that ggplot’s facet wrap function might be beneficial for creating the plot. However, after loading the data, we soon realised that facet wrap is not an appropriate function since the y axis for the two plots are describing different variables. Hence, we added a cowplot package, which allow us to integrate two ggplot separately into one graph.
library(cowplot)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ lubridate::stamp() masks cowplot::stamp()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the data
data1 <- read.csv("./Study 1 data.csv")
data2 <- read.csv("./Study 2.csv")
We first want to create the basic line graph without the error bars. In the original study, the researchers separate the participants into two groups: most extroverted and most introverted. These groups are categorised based on the percentile of the extraversion scores in the study. Therefore, we want to change our datasets using both mutate and pivot function so that we can work with the correct data. This process is the most challenging part of reproducing this plot, but with the new quantile() function, the participants are successfully separated
# Calculate percentiles for Study 1
study1_extraversion_25 <- quantile(data1$EXTRAVERSION, 0.25)
study1_extraversion_75 <- quantile(data1$EXTRAVERSION, 0.75)
# Create "ExtraversionGroup" for Study 1
study1_data <- data1 %>%
mutate(ExtraversionGroup = case_when(
EXTRAVERSION <= study1_extraversion_25 ~ "Most Introverted",
EXTRAVERSION >= study1_extraversion_75 ~ "Most Extraverted",
TRUE ~ "Moderate"
)) %>%
filter(ExtraversionGroup != "Moderate")
# Reshape the Study 1 data to long format
study1_long <- study1_data %>%
pivot_longer(cols = starts_with("SCAVERAGE.T"),
names_to = "Time",
values_to = "SocialConnectedness") %>%
mutate(Time = recode(Time, "SCAVERAGE.T1" = "Before Pandemic", "SCAVERAGE.T2" = "During Pandemic"))
# Summarize Study 1 data
study1_summary <- study1_long %>%
group_by(ExtraversionGroup, Time) %>%
summarise(MeanConnectedness = mean(SocialConnectedness),
.groups = 'drop')
# Calculate percentiles for Study 2
study2_extraversion_25 <- quantile(data2$T1Extraversion, 0.25)
study2_extraversion_75 <- quantile(data2$T1Extraversion, 0.75)
# Create "ExtraversionGroup" for Study 2
study2_data <- data2 %>%
mutate(ExtraversionGroup = case_when(
T1Extraversion <= study2_extraversion_25 ~ "Most Introverted",
T1Extraversion >= study2_extraversion_75 ~ "Most Extraverted",
TRUE ~ "Moderate"
)) %>%
filter(ExtraversionGroup != "Moderate")
# Reshape the Study 2 data to long format
study2_long <- study2_data %>%
pivot_longer(cols = c("T1Lonely", "T2Lonely"),
names_to = "Time",
values_to = "Loneliness") %>%
mutate(Time = recode(Time, "T1Lonely" = "Before Pandemic", "T2Lonely" = "During Pandemic"))
# Summarize Study 2 data
study2_summary <- study2_long %>%
group_by(ExtraversionGroup, Time) %>%
summarise(MeanLoneliness = mean(Loneliness),
.groups = 'drop')
Since all the data are mutated with the same variable name as the original study, and after checking the study1_summary and study2_summary, we can proceed to create our initial plot using ggplot. Since everything is well prepared, this step is quite easy. Additionally, no stylisation is added before the output data is correct
p1 <- ggplot(study1_summary, aes(x = Time, y = MeanConnectedness, group = ExtraversionGroup, linetype = ExtraversionGroup)) +
geom_line() +
geom_point()
# Create the right plot
p2 <- ggplot(study2_summary, aes(x = Time, y = MeanLoneliness, group = ExtraversionGroup, linetype = ExtraversionGroup)) +
geom_line() +
geom_point()
# Arrange the plots side by side
final_plot <- plot_grid(p1, p2)
# Display the final combined plot
print(final_plot)
The produced plot above seems to align with the pattern of the original study except the bottom section of the right graph. We double check with the original data and the manipulated data, but only this section of the graph shows a discrepancy. We therefore looked into the original study’s code on how they group the data. Consequently, we realised this is due to the difference in coding when generating the percentile. In the original study, author chose to generate the percentile and manually input the corresponding extraversion score for later calculation. Therefore, after employing the similar approach, this issue was fixed.
# Calculate quantiles to understand the threshold values
quantile(data1$EXTRAVERSION)
## 0% 25% 50% 75% 100%
## 1.50000 3.41667 4.16667 4.83333 6.75000
# Create "ExtraversionGroup" for Study 1 using the exact thresholds from the original study
study1_data <- data1 %>%
mutate(ExtraversionGroup = case_when(
EXTRAVERSION <= 3.41667 ~ "Most Introverted" , # 25th quartile threshold
EXTRAVERSION >= 4.83333 ~ "Most Extraverted" # 75th quartile threshold
)) %>%
filter(!is.na(ExtraversionGroup)) # Remove rows where ExtraversionGroup is NA
# Reshape the Study 1 data to long format
study1_long <- study1_data %>%
pivot_longer(cols = starts_with("SCAVERAGE.T"),
names_to = "Time",
values_to = "SocialConnectedness") %>%
mutate(Time = recode(Time, "SCAVERAGE.T1" = "Before Pandemic", "SCAVERAGE.T2" = "During Pandemic"))
# Calculate quantiles to understand the threshold values
quantile(data2$T1Extraversion)
## 0% 25% 50% 75% 100%
## 2.083333 3.333333 3.833333 4.416667 6.000000
# Create "ExtraversionGroup" for Study 2 using the exact thresholds from the original study
study2_data <- data2 %>%
mutate(ExtraversionGroup = case_when(
T1Extraversion <= 3.333333 ~ "Most Introverted", # 25th quartile threshold
T1Extraversion >= 4.416667 ~ "Most Extraverted" # 75th quartile threshold
)) %>%
filter(!is.na(ExtraversionGroup)) # Remove rows where ExtraversionGroup is NA
# Reshape the Study 2 data to long format
study2_long <- study2_data %>%
pivot_longer(cols = c("T1Lonely", "T2Lonely"),
names_to = "Time",
values_to = "Loneliness") %>%
mutate(Time = recode(Time, "T1Lonely" = "Before Pandemic", "T2Lonely" = "During Pandemic"))
The produced graph have MANY stylisation issues, but first, a critical element of the graph is not added, which is the error bar. The original study created the error bar using 95% confidence interval. Hence, we replicate the same method and created CI upper and CI lower using the summarise function based on the standard error. Additionally, the geom_errorbar() is added under the ggplot function to include the error bar.
# Summarize Study 2 data with CI
study2_summary <- study2_long %>%
group_by(ExtraversionGroup, Time) %>%
summarise(MeanLoneliness = mean(Loneliness),
SE = sd(Loneliness) / sqrt(n()),
CI_Lower = MeanLoneliness - qt(0.975, df=n()-1) * SE,
CI_Upper = MeanLoneliness + qt(0.975, df=n()-1) * SE,
.groups = 'drop')
# Summarize Study 1 data with CI
study1_summary <- study1_long %>%
group_by(ExtraversionGroup, Time) %>%
summarise(MeanConnectedness = mean(SocialConnectedness),
SE = sd(SocialConnectedness) / sqrt(n()),
CI_Lower = MeanConnectedness - qt(0.975, df=n()-1) * SE,
CI_Upper = MeanConnectedness + qt(0.975, df=n()-1) * SE,
.groups = 'drop')
# Adding error bars for the left plot
p1 <- ggplot(study1_summary, aes(x = Time, y = MeanConnectedness, group = ExtraversionGroup, linetype = ExtraversionGroup)) +
geom_line() +
geom_point() +
geom_errorbar(aes(ymin = CI_Lower, ymax = CI_Upper))
# # Adding error bars for the right plot
p2 <- ggplot(study2_summary, aes(x = Time, y = MeanLoneliness, group = ExtraversionGroup, linetype = ExtraversionGroup)) +
geom_line() +
geom_point() +
geom_errorbar(aes(ymin = CI_Lower, ymax = CI_Upper))
# Arrange the plots side by side
final_plot <- plot_grid(p1, p2)
# Display the final combined plot
print(final_plot)
Many elements of the graph are not displayed in the ideal way. Here is the list of stylisation elements that we need to implement to replciate the graph - Size of the line, error bar - Lacking titles for both plot - Overlapping titles for “Before Pandemic” and “After Pandemic” - Absence of axes - Absence of ticks - Incorrect scaling for y axis - Incorrect axes’ title - Unnecessary legends in the middle of the plot - Unnecessary title for the legend - Unnecessary major and minor lines - Changing grey to white background Therefore, the following chunk is to fix the stylisation issue in the initial draft
# Create the left plot with stylisation adjustments
p1 <- ggplot(study1_summary, aes(x = Time, y = MeanConnectedness, group = ExtraversionGroup, linetype = ExtraversionGroup)) +
geom_line(size = 1) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = CI_Lower, ymax = CI_Upper), width = 0.2) + # Changing the size of error bar
labs(title = "Social Connectedness Changes
Based on Extraversion", x = "Time", y = "Mean Social Connectedness") +
scale_y_continuous(
limits = c(3.17, 5.1), # Set y-axis limits
breaks = seq(3.5, 5, by = 0.5)) + # Set tick marks
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Create the right plot with stylisation adjustments
p2 <- ggplot(study2_summary, aes(x = Time, y = MeanLoneliness, group = ExtraversionGroup, linetype = ExtraversionGroup)) +
geom_line(size = 1) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = CI_Lower, ymax = CI_Upper), width = 0.2) + # Changing the size of the error bar
labs(title = "Loneliness Changes
Based on Extraversion", x = "Time", y = "Mean Loneliness") +
scale_y_continuous(
limits = c(1.17, 3.1), # Set y-axis limits
breaks = seq(1.5, 3, by = 0.5)) + # Set tick marks
theme_minimal()
p1 <- p1 +
theme(
legend.position = "none", # Hide legend
axis.line = element_line(), # Adding axis lines
axis.ticks = element_line(), # Adding tick marks
panel.grid.major = element_blank(), # Remove major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
)
p2 <- p2 +
theme(
legend.position = "right", # Place legend on the right
legend.title = element_blank(),
axis.line = element_line(), # Adding axis lines
axis.ticks = element_line(), # Adding tick marks
panel.grid.major = element_blank(), # Remove major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
)
# Arrange the plots side by side
final_plot <- plot_grid(p1, p2)
# Display the final combined plot
print(final_plot)
The stylised graph already looks somewhat similar to the original graph. however, many details of the graph are still pretty off, here is the list of the issues: - Bold axes titles - Uneven spaces for the two plots - Axes and ticks are black - Titles are too big and not centralised - Incorrect font The following chunk fix the issue above:
p1 <- p1 +
theme(
plot.title = element_text(size = 12, # Smaller title
hjust = 0.5), # Centralise the title
axis.line = element_line(colour = "grey", size = 1.1), # Stylise axis lines
axis.ticks = element_line(colour = "grey", size = 1.1), # Stylise tick marks
axis.title = element_text(size = 10, face = "bold", colour = alpha("black", 0.7)), # Smaller and plain y-axis title with transparent black
text = element_text(family = "serif") # Change fonts
)
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p2 <- p2 +
theme(
plot.title = element_text(size = 12, # Smaller title
hjust = 0.5), # Centralise the title
axis.line = element_line(colour = "grey", size = 1.1), # Stylise axis lines
axis.ticks = element_line(colour = "grey", size = 1.1), # Stylise tick marks
axis.title = element_text(size = 10, face = "bold", colour = alpha("black", 0.7)), # Smaller and plain y-axis title with transparent black
text = element_text(family = "serif") # Change fonts
)
# Arrange the plots side by side with adjusted widths
final_plot <- plot_grid(p1, p2, labels = NULL, ncol = 2, rel_widths = c(1, 1.61))
# Display the final combined plot
print(final_plot)