Documentation

Reproducing 1st Table (Study 1 Table 1)

Original Image

The image below showcases the original table presented in the PDF version of the original paper. This is the first figure that we were attempting to recreate. Note that in the original table, there are other rows that tie with correlational statistics, and our focus is the last line of the table, which is the descriptive statistic.

Preparation and Ideation (Loading initial packages and data frames)

Initially, when we were first exposed to recreating this, we immediately thought of using the summarise() function in the tidyverse package to get the correct mean and standard deviation. We first want to validate whether this approach will allow us to generate the reported result, and then we will build upon this with stylisation.

Hence, we loaded in the necessary package (tidyverse) first and the corresponding dataframe.

library(tidyverse) 
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
study1 <- read_csv(file = "Study 1 data.csv")
## Rows: 467 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): Gender, Age
## dbl (13): Participant ID, LETHAVERAGE.T1, LETHAVERAGE.T2, LethDiff, SCAVERAG...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Producing the Mean and SD

1st Attempt (Draft)

The following code demonstrates the use of the summarise() function to produce the corresponding data. I want to highlight that in the original CSV file, there is no section where the authors documented the meaning behind each corresponding variable. Although we, as a group, successfully guessed what each variable means based on the study (e.g., SC stands for social connectedness), this created totally avoidable obstacles in our replication process. This issue arose again during our replication process for Study 2, where they used inconsistent variable naming, but I will further elaborate on this later during my discussion on reproducing Figure 3.

# Using mean() and sd() function in summarise() to obtain the values
study1meansdraw <- study1 %>% 
  summarise(
    T1Lethargy = mean(LETHAVERAGE.T1), # An example of using mean()
    T2Lethargy = mean(LETHAVERAGE.T2),
    LethargyDiff = mean(LethDiff),
    T1SocialConnectedness = mean(SCAVERAGE.T1),
    T2SocialConnectedness = mean(SCAVERAGE.T2),
    ConnectednessDiff = mean(SCdiff),
    Extraversion = mean(EXTRAVERSION),
    T1Lethargy_sd = sd(LETHAVERAGE.T1), # An example of using sd()
    T2Lethargy_sd = sd(LETHAVERAGE.T2),
    LethargyDiff_sd = sd(LethDiff),
    T1SocialConnectedness_sd = sd(SCAVERAGE.T1),
    T2SocialConnectedness_sd = sd(SCAVERAGE.T2),
    ConnectednessDiff_sd = sd(SCdiff),
    Extraversion_sd = sd(EXTRAVERSION)
)

By checking the “study1meansdraw” variable in the environment, we are pleasantly surprised that the data matches the reported data. However, we notice that the decimal places are inconsistent compared to the original graph, and we would like to match it using some sort of rounding function. Additionally, we noticed that the codes and the outputted table are quite extended and repetitive. Therefore, our group members were brainstorming whether there is a more concise manner in coding this.

2nd Attempt (Rounding)

To address the first issue mentioned previously, we want to make sure that the results are two decimal places. By using this function prior to mean() or sd() and stating the desired decimal places, we can conveniently round the values to the desired format.

# Using round() function to keep the values as two decimal places
study1meansdround <- study1 %>% 
  summarise(
    T1Lethargy = round(mean(LETHAVERAGE.T1),2),
    T2Lethargy = round(mean(LETHAVERAGE.T2),2),
    LethargyDiff = round(mean(LethDiff),2),
    T1SocialConnectedness = round(mean(SCAVERAGE.T1),2),
    T2SocialConnectedness = round(mean(SCAVERAGE.T2),2),
    ConnectednessDiff = round(mean(SCdiff),2),
    Extraversion = round(mean(EXTRAVERSION),2),
    T1Lethargy_sd = round(sd(LETHAVERAGE.T1),2),
    T2Lethargy_sd = round(sd(LETHAVERAGE.T2),2),
    LethargyDiff_sd = round(sd(LethDiff),2),
    T1SocialConnectedness_sd = round(sd(SCAVERAGE.T1),2),
    T2SocialConnectedness_sd = round(sd(SCAVERAGE.T2),2),
    ConnectednessDiff_sd = round(sd(SCdiff),2),
    Extraversion_sd = round(sd(EXTRAVERSION),2)
)

However, upon checking the variable ‘study1meansdround’, the first column of the variable still appears as 2.6 rather than 2.60. Upon further investigation, we hypothesise that the printing function of R automatically ignores trailing zeros, and therefore the second decimal place is not displayed. This will be a persistent issue even if we try other methods to generate the table.

3rd Attempt (Formatting + Rounding with Sig Fig)

Hence, rather than approaching the first issue through decimal places, we selected significant figures instead. Specifically, we used the sprintf() function, which allows us to control the number of significant figures after the decimal points.

To address the second issue, where the codes and the outputted values are not demonstrated in a concise and readable manner, we decided to use the paste() function, which allows us to combine two columns into one. Additionally, this also allows us to have consistent naming across each column that matches the original study.

study1meansd <- study1 %>%  
  summarise(
    "T1 Lethargy" = paste(
      sprintf("%.2f", mean(LETHAVERAGE.T1)),
      sprintf("%.2f", sd(LETHAVERAGE.T1))),
    "T2 Lethargy" = paste(
      sprintf("%.2f", mean(LETHAVERAGE.T2)),
      sprintf("%.2f", sd(LETHAVERAGE.T2))),
    "Lethargy Diff" = paste(
      sprintf("%.2f", mean(LethDiff)),
      sprintf("%.2f", sd(LethDiff))),
    "T1 Social Connectedness" = paste(
      sprintf("%.2f", mean(SCAVERAGE.T1)),
      sprintf("%.2f", sd(SCAVERAGE.T1))),
    "T2 Social Connectedness" = paste(
      sprintf("%.2f", mean(SCAVERAGE.T2)),
      sprintf("%.2f", sd(SCAVERAGE.T2))),
    "Connectedness Diff" = paste(
      sprintf("%.2f", mean(SCdiff)),
      sprintf("%.2f", sd(SCdiff))),
    "Extraversion" = paste(
      sprintf("%.2f", mean(EXTRAVERSION)),
      sprintf("%.2f", sd(EXTRAVERSION)))
  )

Printing and Stylisation of the Table

1st Attempt (Draft)

The initial approach is very simple, which is directly printing the values out using print(). Due to the proper setup of the values in the previous section, this has created a very decent table that somewhat resembles the original study.

study1tableprint <- print(study1meansd) 
## # A tibble: 1 × 7
##   `T1 Lethargy` `T2 Lethargy` `Lethargy Diff` `T1 Social Connectedness`
##   <chr>         <chr>         <chr>           <chr>                    
## 1 2.60 1.16     3.16 1.27     0.56 1.33       4.11 0.88                
## # ℹ 3 more variables: `T2 Social Connectedness` <chr>,
## #   `Connectedness Diff` <chr>, Extraversion <chr>

However, this comes with two critical issues. Firstly, a lack of row name that we can assigned to using the generic R function. Secondly, there are no brackets around the sd.

2nd Attempt (Using Datatable)

The first attempt was using Datatable, which was suggested by ChatGPT when I encountered this issue. I downloaded the corresponding package, DT, and ran the function datatable() to see the output result. Unfortunately, although this approach cleans up the format (unlike the print function which includes and the number of columns), it is also an interactive table that includes a sorting function, which is not desirable during our replication process. Furthermore, this “solution” has failed to fix any issue stated previously.

library(DT) # Load in DT package for datatable() function
datatable(study1meansd) 

3rd Attempt (Using kable in knitr Package)

The other solution that we attempted is the kable function in the knitr package. Through very simple codes (demonstrated below), the row name is assigned correctly using rownames() function.

# Using kable to create the table
library(knitr) # Loading the package for kable() function

table1 <- as.data.frame(study1meansd) # Loading the produced value from previous section
rownames(table1) <- "Mean (SD)"  # Assigning the correct row name
  
kable(table1)
T1 Lethargy T2 Lethargy Lethargy Diff T1 Social Connectedness T2 Social Connectedness Connectedness Diff Extraversion
Mean (SD) 2.60 1.16 3.16 1.27 0.56 1.33 4.11 0.88 3.97 0.85 -0.14 0.71 4.17 1.01

Notably, the issue of missing brackets around the sd is still present, which requires the manual input of string values for brackets to properly format the table.

4th/Final Attempt (Formatting the brackets)

study1meansd2 <- study1 %>%  
  summarise(
    "T1 Lethargy" = paste(sprintf("%.2f", mean(LETHAVERAGE.T1)), "(", sprintf("%.2f", sd(LETHAVERAGE.T1)), ")"),
    "T2 Lethargy" = paste(sprintf("%.2f", mean(LETHAVERAGE.T2)), "(", sprintf("%.2f", sd(LETHAVERAGE.T2)), ")"),
    "Lethargy Diff" = paste(sprintf("%.2f", mean(LethDiff)), "(", sprintf("%.2f", sd(LethDiff)), ")"),
    "T1 Social Connectedness" = paste(sprintf("%.2f", mean(SCAVERAGE.T1)), "(", sprintf("%.2f", sd(SCAVERAGE.T1)), ")"),
    "T2 Social Connectedness" = paste(sprintf("%.2f", mean(SCAVERAGE.T2)), "(", sprintf("%.2f", sd(SCAVERAGE.T2)), ")"),
    "Connectedness Diff" = paste(sprintf("%.2f", mean(SCdiff)), "(", sprintf("%.2f", sd(SCdiff)), ")"),
    "Extraversion" = paste(sprintf("%.2f", mean(EXTRAVERSION)), "(", sprintf("%.2f", sd(EXTRAVERSION)), ")")
  )

table2 <- as.data.frame(study1meansd2)
rownames(table2) <- "Mean (SD)" 
  
kable(table2)
T1 Lethargy T2 Lethargy Lethargy Diff T1 Social Connectedness T2 Social Connectedness Connectedness Diff Extraversion
Mean (SD) 2.60 ( 1.16 ) 3.16 ( 1.27 ) 0.56 ( 1.33 ) 4.11 ( 0.88 ) 3.97 ( 0.85 ) -0.14 ( 0.71 ) 4.17 ( 1.01 )