Part 1: Summary and Reaction

Kasumovic, M. M., Hatcher, E., Blake, K. R., & Denson, T. F. (2021). Performance in video games affects self-perceived mate value and mate preferences. Evolutionary Behavioral Sciences, 15(2), 191–207. https://doi.org/10.1037/ebs0000231

Summary

In a rapidly changing world, there is an assumption that mate value, or a person’s desirability as a romantic partner is fixed. For example, mate value is associated with attractiveness in females and social status in males. This study investigated whether self-perceived mate value could be, also influenced by individual social experiences or a person’s self-perceived performance in competitions. Considering a surge in digital gaming the authors used video games to manipulate self-perceived performance to measure its effect on self-perceived mate value. With an understanding that people tend to select partners based on similarity, the second main aim of this study was to determine whether self-perceived performance in video games can shift facial preferences in short- and long-term mates.

Across three studies, the experimenters hypothesised that a participant’s self-perceived performance after playing a video game, would affect their perception of their own desirability of a mate in males but not females (Experiment 1). However, it was predicted that objective performance, based on actual game scores and rank, would better moderate self-perceived mate value compared to self-perceived performance (Experiment 2). Lastly, self-perceived performance was expected to influence facial preferences for short and long term romantic partners, with differences in sex (Experiment 3).

In Experiment 1, participants provided demographic information on their relationships status and exposure to violent video games in a survey before playing a video game, and rating their own performance on a scale of 1 to 7. They were also measured on three mating-related variables which assessed participants’ self perceived mate value, sex goal activation (SGA) and social sexual tendencies (their tendency to engage in sexual relationships without emotional commitment) (SOI). Experiment 2 followed the same procedure, except participants were also provided with a game ranking which would represent their objective performance. The effect of objective performance on self-perceived mate value was compared to the influence of self-perceived performance to determine the stronger moderator. Experiment 3 followed the same sequence as Experiment 1, but included a facial preference test after the gaming manipulation.

General results found across the 3 studies indicated that previous exposure to violent video games increased sex goal activation, and sociosexual behaviour was related to age. Within experiments, the authors found that people who rated themselves higher in their performance on video games viewed themselves as more desirable as a mate. Self-perceived performance was also shown to influence mate value 3 times more than objective measures of performance affecting people’s mate preferences in short term but not long term partners. Males who perceived themselves as higher performing preferred masculine faces in future partners while females preferred feminine faces. These findings suggest that mate value is malleable and varies with everyday interactions.

Reactions

I’m not sure I understood why the authors collected chose to add a violent vs nonviolent video game manipulation in this study. When I first read the introduction of this paper, there wasn’t a strong rationale to support the addition of this variable, and certainly no mention of how this related to the real world in the discussion section. The authors even went on to attribute differences in self-perceived performance across violent and non-violent conditions to game difficulty, instead of the violent nature of the games. This seemed to make the idea of violent game exposure redundant which could have been avoided if they framed the manipulation as a comparison of different game difficulties on self-perceived performance, and used violent and non-violent games to control difficulty instead. I can see that violent video games could be related to topics such as self-esteem or interpersonal violence in romantic relationships but I couldn’t see how this manipulation was relevant in this study. In a similar case, the authors also collects demographics on relationship status which wasn’t mentioned in any analyes or conclusions. I think that these variables unnecessarily complicated the study without adequate focus throughout the paper.

I wonder whether self perceived performance would have such a large impact on self-perceived mate value (3 times stronger than objective performance) in cultures where objective performance is more valued when considering a romantic partner. In my upbringing as a chinese-australian I have noticed a difference in how self-performance is viewed between western and asian cultures. In my household, individual performance is typically defined by social status, and wealth, which would be considered objective measures of performance in the context of this paper. In cultures where subjective self-performance is hardly considered in self-evaluation, I wonder whether the authors would still find that self-perceived performance is a stronger moderator of self-perceived mate value than objective performance in people like my immigrant parents, who value their outward appearance and social hierarchy more than anything. I think it would be interesting to compare the results of studies in people of different cultural backgrounds or generations.

The take home message that I am left with after reading this paper is that people are capable of adapting perceptions of their own mate value with moment-to-moment feedback from their changing environment. Though I am sceptical that a 5 minute exposure to digital competition in video games is able to shift preferences in mate, I can see how this study is relevant to the modern world where the internet and social media provides opportunities for people to make rapid comparisons of themselves to others. I definitely agree that this study requires replication but I think that it may be interesting to investigate how this flexibility in self-perceived mate value would translate to mating-related behaviour for example, social sexual behaviour or sexual arousal which was also measured in the paper.

Part 2: Verification

From this paper, we had to reproduce Table 1 which contains demographic statistics and Figure 1-6 as shown below. No means or SDs were reported for any mating related variables in this paper.

  • Table 1 (demographic statistics)

  • Figure 1

  • Figure 2

  • Figure 3

  • Figure 4

  • Figure 5

  • Figure 6

Load packages

library(tidyverse) #includes dplyr for data manipulation and ggplot to create figures
library(gt) # used to create tables
library(ggstance) # used to dodge points
library(gridExtra) # used to combine 2 plots

Reading the original data file

all_data <- read_csv("All_Experiments.csv")

Organising data

Relevant packages:

  • tidyverse

We started our reproducibility journey by organising the original data file received from Prof Tom Denson according to exclusion criteria in the paper.

Using filter(), we removed 144 participants who failed the attention checks from the ‘Remove data column’, 2 from the ‘Sex’ variable and 13 from ‘Relationshipstatus’ variable. This left a total of 1616 participants across all 3 experiments.

We also decided to rearrange the data to show the studies in chronological order. To do this, we first used mutate() to modify the existing ‘study’ column with using factor() to convert the values to a categorical variable with levels. This was then piped into arrange() to reorder the data so that Gaming_manipulation data came first, followed by Rank_manipulation and Face_Choice.

clean_denson <- all_data %>%
  
  # Remove data according to exclusion criteria
  filter(Removedata != "Yes",   # remove all rows containing ‘Yes’ in 'Removedata' variable
         Sex != "Other",  # remove all rows in containing ‘Other’ in 'Sex'
         Relationshipstatus != "Other") %>% #remove all rows containing ‘Other’ in 'Relationshipstatus' variable
  
  # change 'study' into a categorical variable with 3 levels and rearrange
  mutate(study = factor(study,
                        levels = c("Gaming_manipulation", "Rank_manipulation", "Face_choice" ))) %>% 
  arrange(study) # order the levels of study variable in chronological order

Table 1

Relevant packages:

  • tidyverse
  • gt

In this chunk, we calculated the mean and standard deviation for age, and computed the demographic data on relationship status for males and females in each experiment. To start, the group_by() function was used to create subsets of data by each experiment. We then used reframe() to calculate decriptive statistics for Age with mean(), min(), max() and sd(). sum() was used to count the number of males and females and people with different relationship status. We initially used summarise() instead of reframe() but RStudio suggestrf that we used reframe() instead. Upon investigation, we realised that reframe() is more appropriate when working with categorical variables like relationships status in our data. To finish off, we rounded mean_age and sd to 2 decimal places as shown in the paper with mutate().

descriptives <- clean_denson %>% 
  
  # create subsets by study
  group_by(study) %>% 
  
  # calculating demographics descriptives statistics
  reframe(mean_age = mean(Age), # calculate mean age
          min_age = min(Age), # find minimum age
          max_age = max(Age), # find maximum age
          sd = sd(Age), # calculate the standard deviation for age
          sample_size = n(), # sample size for each experiment using n()
         
          # create a variable which count males and females 
          male = sum(Sex == "Male"), 
          female = sum(Sex == "Female"),
          
          # create a variable which counts the number of people in each relationshipstatus category
          longterm = sum(Relationshipstatus == "In a long-term monogamous relationship eg. married, partnered"), 
          single = sum(Relationshipstatus == "Single"),  
          casualopen = sum(Relationshipstatus == "In an open relationship/casually dating"), 
          recentlysingle = sum(Relationshipstatus == "Recently single/divorced/separated")
          ) %>% 
  
  mutate(across(c(mean_age, sd), round, 2)) %>% # round mean_age and sd to 2d.p.
  ungroup()

descriptives
## # A tibble: 3 × 12
##   study  mean_age min_age max_age    sd sample_size  male female longterm single
##   <fct>     <dbl>   <dbl>   <dbl> <dbl>       <int> <int>  <int>    <int>  <int>
## 1 Gamin…     31.8      18      68  9.76         517   269    248      293    182
## 2 Rank_…     35.3      18      74 10.4          678   368    310      369    263
## 3 Face_…     33.8      18      71  9.42         421   245    176      258    127
## # ℹ 2 more variables: casualopen <int>, recentlysingle <int>

We found that our standard deviation for the Face Choice experiment (Experiment 3) did not match the data displayed in the paper manuscript. Our data produced sd=9.41 for the Face Choice experiment while the one in the original paper reported sd=10.41. We suspect that there may have been a typo in the original manuscript because the standard deviation that they reported for the Face Choice experiment wass identical to the Rank Manipulation experiment (Experiment 2) which is usually very unlikely given that a different sample of participants was used for each experiment.

Displaying demographic data in an APA table

To display all of the demographic data in table format, we first needed to change our descriptives data frame from wide format to long format so that the different relationship status variable appear as column values and not variable names. To do this, we created a new object called descriptives_long using pivot_longer() to pivot the male and female columns into ‘Sex’ and with a corresponding count in a ‘Sex_Count’ coloumn. We repeated this a second time, to display the the 4 different relationship status categories into new ‘Relationship_Status’ and ‘Relationship_count’ columns.

descriptives_long <- descriptives %>% 
  
  # changing descriptives dataframe to long form
  pivot_longer(cols = c("female", "male"), 
               names_to = "Sex", # move variable names into new column ‘Sex’
               values_to = "Sex_Count") %>%  # move values into new column ‘Sex_Count”
  
  pivot_longer(cols = c("longterm", "single", "casualopen", "recentlysingle"), 
               names_to = "Relationship_Status", # move variable names into ‘Relationship_Status’
               values_to ="Relationship_Count") # move values into ‘Relationship_Count’

descriptives_long
## # A tibble: 24 × 10
##    study              mean_age min_age max_age    sd sample_size Sex   Sex_Count
##    <fct>                 <dbl>   <dbl>   <dbl> <dbl>       <int> <chr>     <int>
##  1 Gaming_manipulati…     31.8      18      68  9.76         517 fema…       248
##  2 Gaming_manipulati…     31.8      18      68  9.76         517 fema…       248
##  3 Gaming_manipulati…     31.8      18      68  9.76         517 fema…       248
##  4 Gaming_manipulati…     31.8      18      68  9.76         517 fema…       248
##  5 Gaming_manipulati…     31.8      18      68  9.76         517 male        269
##  6 Gaming_manipulati…     31.8      18      68  9.76         517 male        269
##  7 Gaming_manipulati…     31.8      18      68  9.76         517 male        269
##  8 Gaming_manipulati…     31.8      18      68  9.76         517 male        269
##  9 Rank_manipulation      35.3      18      74 10.4          678 fema…       310
## 10 Rank_manipulation      35.3      18      74 10.4          678 fema…       310
## # ℹ 14 more rows
## # ℹ 2 more variables: Relationship_Status <chr>, Relationship_Count <int>

Next, we wanted to create new columns which displayed the mean age as a single string of text which combinined mean age for each experiment and its corresponding standard devations in the format “mean_age ± sd”. To do this, we used mutate() to create a new column labeled “Mean Age” with paste() to combine the pair of values from ‘mean_age’ and ‘sd’ variables into a single string of text, separated by ‘±’ with the argument sep = “±”. The same process was carried out to create the following columns.

  • ‘Age Range’ with the format ‘min_age - max_age’ e.g. ‘18-64’
  • ‘Sex Count’ with the format ‘sex : count’ e.g., ‘female: 248’
  • ‘Relationship Status’ with the format ‘relationship status: count’ e.g., casualopen: 26
string_descriptives <- descriptives_long %>%
 
   # create new variable by stringing values together
  
  mutate(
    #combine values from mean_age and sd, separated by a plus minus sign in the middle
    "Mean Age" = paste(
      descriptives_long$mean_age, descriptives_long$sd, sep = "±"),
    
    #combine values from min_age and max_age separated by a hyphen in the middle
    "Age range" = paste(
      descriptives_long$min_age, descriptives_long$max_age, sep = "-"),
    
     #combine values from Sex and Sex_Count separated by a colon in the middle
    "Sex Count" = paste(
      descriptives_long$Sex, descriptives_long$Sex_Count, sep = ": "),
    
     #combine values from Relationship_Status and Relationship_Count separated by a colon
    "Relationship Status" = paste(
      descriptives_long$Relationship_Status,descriptives_long$Relationship_Count, sep = ": "))

string_descriptives 
## # A tibble: 24 × 14
##    study              mean_age min_age max_age    sd sample_size Sex   Sex_Count
##    <fct>                 <dbl>   <dbl>   <dbl> <dbl>       <int> <chr>     <int>
##  1 Gaming_manipulati…     31.8      18      68  9.76         517 fema…       248
##  2 Gaming_manipulati…     31.8      18      68  9.76         517 fema…       248
##  3 Gaming_manipulati…     31.8      18      68  9.76         517 fema…       248
##  4 Gaming_manipulati…     31.8      18      68  9.76         517 fema…       248
##  5 Gaming_manipulati…     31.8      18      68  9.76         517 male        269
##  6 Gaming_manipulati…     31.8      18      68  9.76         517 male        269
##  7 Gaming_manipulati…     31.8      18      68  9.76         517 male        269
##  8 Gaming_manipulati…     31.8      18      68  9.76         517 male        269
##  9 Rank_manipulation      35.3      18      74 10.4          678 fema…       310
## 10 Rank_manipulation      35.3      18      74 10.4          678 fema…       310
## # ℹ 14 more rows
## # ℹ 6 more variables: Relationship_Status <chr>, Relationship_Count <int>,
## #   `Mean Age` <chr>, `Age range` <chr>, `Sex Count` <chr>,
## #   `Relationship Status` <chr>

At this point, there was a lot of repetition of the study name, sample size, age range so our next step was to try remove the repeated information and combine relationships status counts for each study into a single cell as displayed in the paper.

The group_by() function was first used here to group all the rows by study, mean, sample size, sd and Relationship status together. This was piped into summarise() to modify the ‘Sex Count’ column using paste() to collapse all values inside the ‘Sex Count’ column into a single cell, for rows with the same combination of study, mean, sample size, sd and Relationship status. This produced a column named ‘Sex’ which displays both male and female count in a single cell and removed the duplication.

This was then piped into the same coding structure, which grouped columns with the same study, mean age, sample size and sex values together while collapsing the relationship status data into a single cell. Again, summarise() was used to modify the existing ‘Relationship status’ variable by collapsing the relationship status data into a single cell with paste() for rows with the same combination of study, sample_size, Mean Age, Age range and Sex values.

descriptives_reformatted <- string_descriptives %>% 
  
  #create group data together where study, sample size, mean age, age range and relationships status is the same
  group_by(study, sample_size, `Mean Age`, `Age range`, `Relationship Status`) %>% 
  
  summarise(
    "Sex" = paste(`Sex Count`, collapse = "\n"),  # collapse rows in 'Sex Count' so that each character starts on a new line
    .groups = "drop", #remove grouping
    ) %>%
  
  #create group data together where study, sample size, mean age, age range and sex is the same
  group_by(study, sample_size, `Mean Age`,`Age range`, Sex) %>% 
  summarise(
    
    # collapse rows in 'Relationship Status' so that each character starts on a new line
    "Relationship Status" = paste(`Relationship Status`, collapse = "\n"),
    .groups = "drop" #remove grouping
    )  %>% 
  
  # change column names to reflect orignal paper
  rename("Experiment" = "study",
         "Sample size" = "sample_size") 

In our initial code, we used collapse = “” as an argument in the paste() function so that each character would start on a new line like a list. While the code ran and no errors came up, argument didn’t work. We tried using Google and ChatGPT to detect any error in our code but was unable to give us any useful feedback to resolve the problem. We had to go ahead with the descriptives_reformatted dataframe that we had to create Table 1.

Next, we used the gt() function to set up a table using values from the descriptives_reformatted data frame. The tab_style() and tab_options() function was used to change aesthetic features, replicating an APA format as shown in the original manuscript. tab_style() is a function used to change global features that apply to the entire table, column or rows. We used this function with the cell_borders() function as an argument to tab_style() to add top and bottom borders, colour and the size of these borders. In the original manuscript, there are no side borders or cell borders, so we made all borders white first. The tab_options() function was used to change colours of specified borders to black. To finish, we used cols_align() to change the text alignment.

#create Table 1 in APA format

descriptives_table <- descriptives_reformatted %>% 
  # add table format
  gt() %>% 
  
  #change global features
  tab_style(
    style = cell_borders(sides = c("top", "bottom"), # add borders to top and bottom sides
                         color = "white", # change all cell borders to white
                         weight = px(1)), # change border width to 1 pixel
    locations = cells_body()) %>%  
  
  #change border features
  tab_options(column_labels.border.top.color = "black", #change top column border to black
              column_labels.border.bottom.color = "black", # change column bottom border to black
              table_body.border.bottom.color = "black") %>%  # change bottom border to black 
 
  # align text in column to center 
   cols_align(align="center") 

descriptives_table
Experiment Sample size Mean Age Age range Sex Relationship Status
Gaming_manipulation 517 31.84±9.76 18-68 female: 248 male: 269 casualopen: 26 longterm: 293 recentlysingle: 16 single: 182
Rank_manipulation 678 35.28±10.41 18-74 female: 310 male: 368 casualopen: 25 longterm: 369 recentlysingle: 21 single: 263
Face_choice 421 33.78±9.42 18-71 female: 176 male: 245 casualopen: 23 longterm: 258 recentlysingle: 13 single: 127

Figure 1

The aim of this section was to create a scatterplot of age against self-perceived performance as depicted on the x- and y- axes of Figure 1. We quickly noticed however, that this did not match with figure description which describes Figure 1 as “the relationships between the self perception of participants’ performance and their actual performance using normalized (z-scores)…”. We ended up creating two plots for Figure 1, one to represent the figure illustrated and another for the figure description.

Figure 1 (self-perceived performance vs. age)

Here, we created a plot which shows age on the the x-axis and self-perceived performance on the y-axis.

We started by creating an general framework for our plot using by using ggplot() with the aes() function to add a plane to our plot, with ‘Age’ on the x axis and the ‘Selfscore’ to represent self-perceived performance on the y-axis. Then, geom_point() and geom_smooth() was used to create the scatterplot layer and the linear trendline using method=lm. Since Figure 1 contained 3 subplots, we also used facet_wrap() to split the data point into subplots according the the appropriate study. coord_cartesian() and scale_y_discrete() functions were required to set the axis limits and specific axis labels of the plot. From here, we were now ready to code for the aesthetic details.

First, we separated male and female data points by colour and shape using the colour= and shape= arguments and created corresponding trendlines using linetype= inside aes(). The position_dodge() function was then added to geom_point() to position the data so that male points would appear above female points. To change the labels of our subplots, as_labeller() was used inside facet_wrap(), as well as scales= “free_y” so that the y axis repeated for each subplot. We also renamed the y-axis from “Selfscore” to “Self-perceieved performance” according to the original graph. Other functions such as scale_colour_hue(), theme_classic(), and theme() were included to customise the colour shade of data points, the overall theme appearance, font and axis features on our plots.

In this section we struggle to make the ’Age” label to appear for each subplot. We tried coding with packages like lemon and ggforce but none of these were successful in in repeating the label. I later confirmed that the authors created a combined plot from 3 separate subplots in their code.

Figure_1 <- clean_denson %>%
 
  # Create a scatter plot with lines of best fit, grouped and colored by 'Sex'.
   ggplot(mapping =  
            # use different shapes and line types for each 'Sex'.
           aes(x = Age, 
               y = Selfscore, 
               colour = Sex, 
               shape = Sex, 
               linetype = Sex)) +
  
  # Add the points to the plot with dodge positioning for better visibility.
  geom_point(position = position_dodge(width = 0.5)) +
  
  # Add a line of best fit to each group and fill the area between the confidence intervals in grey.
  geom_smooth(method = "lm", se = TRUE, fill = "grey") +
  
  # Create multiple subplots (facets) based on the 'study' variable.
  facet_wrap(~ study, scales = "free_y",
             labeller = as_labeller(
               # Rename the facet labels for better presentation.
               c('Gaming_manipulation' = "Experiment 1",
                 'Rank_manipulation' = "Experiment 2",
                 'Face_choice' = "Experiment 3"))) +
  
  # Set the limits of the x and y axes to control the visible range.
  coord_cartesian(xlim = c(20, 70), ylim = c(1, 7)) +
  
  # Set the y-axis labels to a discrete scale with specific limits.
  scale_y_discrete(limits = c("1", "2", "3", "4", "5", "6", "7")) +
  
  # Label the y-axis with a custom text.
  labs(y = "Self-perceived performance") +
  
  # Set the hue (color) of the plot to 55.
  scale_colour_hue(l = 55) +
    
  # Use the classic theme for the plot.
  theme_classic() +
  
  
  # Customize the appearance of plots.
  theme(strip.background = element_blank(), # remove background
        strip.text = element_text(size = rel(0.8), face = "bold"), # change font size and make text bold
        
        # Customize the appearance of axis titles, axis lines, axis text, and axis ticks.
        axis.title = element_text(face = "bold"), # axis title bold
        axis.line = element_line(color = "grey50"), # axis line colour grey
        axis.text = element_text(color = "grey50"), #axis text grey
        axis.ticks = element_line(color = "grey50")) #axis ticks grey

print(Figure_1)

Upon comparison of our plot to Figure 1 from the original paper, we noticed that their plot was missing values for males who scored 7 in Experiment 1 and Experiment 2. We believe that the original authors may have made a mistake when setting the limits of the y-axis, causing those uppermost data points to be cut off from their graph.

Figure 1 (self-perceived performance vs normalised actual performance)

In this next section, we created a new plot which corresponds to the Figure 1 description from the original manuscript. We graphed the relationship between self-perceived performance and normalised actual performance on video games. This required two attempts.

First attempt - failed

In our first attempt we created a object called z_score to first calculate the z scores for participants’ actual performance for each video game. z scores were required because the experimenter used 6 different games in their study, each with different scoring systems, so a standardisation was necessary for performance to be compared across participants.

We started by using group_by() to group the data into subsets according to the game played during the experiment. The mutate() function was used to create a new variable called z_actualscore which represented the normalised scores of actual performance using the z score formula. We took values from ‘Score’ which represented participants’ actual scores and subtracted the appropriate mean, then divided the numerator by the standard deviation. This was piped into filter() to include participants who scored within 3 standard deviations away from the mean.

#creating a new z score variable for normalised actual scores
z_score <- clean_denson %>%
  
  #create subsets of data based on game played
  group_by(Gameplayed) %>% 
  
  #create new variable using z score formula
  mutate(z_actualscore = (Score - mean(Score)) / sd(Score)) %>%  
  
  # keep rows where z score is between -3 and 3
  filter(z_actualscore >= -3 & z_actualscore <= 3)

Using this ‘z_actualscore’ variable, we created a new plot with using ggplot() with normalised actual scores on the x axis and self-perceived performance on the y axis with colour and shape aesthetics to distinguish between males and females. Again, geom_point() and facet_wrap() functions were added to create the scatterplot layer and create 3 subplots by experiment. The scale_x_continuous() function changed the x axis limits and intervals. We chose to start the x axis from -0.5 because there were no participants who scored less that -0.5 standard deviations from the mean and ended the x axis at 3 to remove participants which had absurdly high z scores (some participants had a z score of 10) to prevent the graph from being skewed. *geom_smooth()** was also used in our code with the argument “method=lm” to impose a linear trendline onto the scatterplot.

Figure_1_corrected <- z_score %>%
  ggplot(aes(x = z_actualscore, # normalised actual scores on x axis
             y = Selfscore, # self perceived performance on y axis
             colour = Sex, # add colour according to sex
             shape = Sex, # change shape according to sex
             linetype = Sex)) + # change trendline type according to sex
  
  # add scatterplot layer
  geom_point(position=ggstance::position_dodgev(height = 0.3), size = 2) + # separate male and female points
  
  # change colour shade
  scale_colour_hue(l=55) +
  
  # split plot into subplots by study
  facet_wrap(vars(study)) + 
  
  # change limits and x axis intervals
  scale_x_continuous(limits = c(-0.5, 3), # start x axis at -0.5, end at 3
                     breaks = seq(-3, 3, by = 1)) +  # between -3 and 3, let axis interval be 1
  
  # add linear trendline
  geom_smooth(method = lm) 

print(Figure_1_corrected)

We quickly noticed that the graph didn’t look right because the distribution was not normal and data points were heavily clustered around zero (most points didnt even reaching a standard deviation of 1 on the x axis). We realised that this may be because the data contained huge outliers (with participants that had a normalised actual score of (z scores equal to 10) that was skewing the calculation of mean and sd, hence resulting in very small z scores.

To deal with this, we tried creating a new dataframe in a second attempt which removed participants who scored more that 3 standard deviations away from the mean and recalculated the z scores using the remaining participants.

Second attempt - success

To start, we took the data and piped it into the group_by() function to organise the data into subsets by the game that participants completed. We then removed all rows where the ‘Score’ variable had a value of ‘NA’. A new variable called ‘Zscore’ was then created using mutate() with the z score formula.

Then, outliers with z scores equal or greater than 3 were removed using filter(). Once outliers were removed, z scores were recalculated using this new subset of the data using the same code from the initial round of z score calculations.

# calculate z scores
zscores <- clean_denson %>% 
  group_by(Gameplayed) %>% 
  filter(!is.na(Score)) %>%  # remove rows where 'Score' is NA
  mutate(Zscore = (Score - mean(Score))/sd(Score)) %>%  # calculate z score
  ungroup()
  
# remove outliers
zscores_outliersremoved <- zscores %>% 
  filter(Zscore < 3) # keep rows where z score is less than 3

# recalculate z scores
zscores2 <- zscores_outliersremoved %>% 
  filter(!is.na(Score)) %>% 
  group_by(Gameplayed) %>% 
  mutate(Zscore = (Score - mean(Score))/sd(Score)) %>% # calculate z scores again with new data
  arrange(Zscore) %>%  # order data in ascending order of z score
  ungroup()

After removing the outliers from the data, we started creating a plot using ggplot() with normalised actual scores on the x axis and self-perceived performance on the y axis. Similar to our previous plots, we used arguments to distinguish between male and female data using colour and different shapes. We then used geom_point() to add the scatter plot layer with data points vertically separated. coord_cartesian(), and scale_y_discrete() changed the limits of the axes and added appropriate axis labels to the yaxis. We chose to start the x axis at -2.4 since our data set started from a z score or -2.38. Changes were also made to the axis names using labs(). Other aethetics features were edited using theme_classic() and theme() functions.

Figure_1_corrected2 <- zscores2 %>% 
  ggplot(mapping = 
           aes(x = Zscore, # add Zscore to x axis
             y = Selfscore, # add Selfscore to y axis
             shape = Sex,
             colour = Sex)) +
  
  # add scatter plot layer and dodge male and female points
  geom_point(position=ggstance::position_dodgev(height = 0.3), size = 2)+
  
  # add linear trendline, and confidence intervals
  geom_smooth(method = "lm", 
              se = TRUE,
              fullrange = TRUE) +
  scale_colour_hue(l=55) + # change shade of colours
  
  # set limits of axes
  coord_cartesian(xlim = c(-2.4, 3), ylim = c(1, 7)) + 
  
  # add labels to y axis using ordinal scale
  scale_y_discrete(limits = c("1", "2", "3", "4", "5", "6", "7")) +

  # change names of x and y axes
  labs(x = "Normalised scores of actual performance", 
       y = "Self-perceived performance") +
  
  # split graph into 3 subplots, one for each experiment
  facet_wrap("study",
             scales = "free_y", # y axis to repeat for each subplot
             labeller = as_labeller( # change labels of subplots
               c('Gaming_manipulation' = "Experiment 1", 
                 'Rank_manipulation' = "Experiment 2",
                 'Face_choice' = "Experiment 3"))) +
  
  # add theme
  theme_classic() + 
  
  # edit font and axis features 
  theme(strip.background = element_blank(),
        strip.text = element_text(size = rel(0.8), 
                                  face = "bold"),
        axis.title = element_text(face = "bold"),
        axis.line = element_line(color = "grey50"),
        axis.ticks = element_line(color = "grey50")
        )

print(Figure_1_corrected2)

Figure 2

Figure 2 required us to plot self-perceived performance against the type of game participants played on a violin plot. This time, we used geom_violin and geom_boxplot to create the violin and box plot layer. Since there weren’t many aesthetic features for this graph, we only needed to change the width of the plots using “width=” for both geoms and “outlier.size=” to change the size of the points which represent outliers for the box plot component. Axis label and limits, font and axis features were were edited using the same functions as done in previous sections.

Figure_2 <- clean_denson %>% 
  ggplot(mapping = aes(
    x = VorNV, # Add 'VorNV' as the x-axis variable
    y = Selfscore)) + # Add 'Selfscore' as the y-axis variable.
  
  # Adds a violin plot to the ggplot object with a width of 1 unit.
  geom_violin(width = 1) + 
  
  # Adds a boxplot to the ggplot object with a width of 0.1 unit and outlier points with a size of 0.5.
  geom_boxplot(width = 0.1, outlier.size = 0.5) + 

  # Add 'classic' theme to the ggplot
  theme_classic() + 

  # Facets the plot into multiple panels based on the 'study' variable.
    facet_wrap(~ study, 
            scales = "free_y", # Each panel has a separate y-axis scale
            labeller = as_labeller( 
               # customize the labels of each panel, mapping the values of 'study' to the specified labels of Experiment 1, Experiment 2 and Experiment 3.
                 c('Gaming_manipulation' = "Experiment 1",
                   'Rank_manipulation' = "Experiment 2",
                  'Face_choice' = "Experiment 3"))) +
  
  # x-axis label to "Game Treatment" and the y-axis to "Self-perceived performance".
  labs(x = "Game Treatment",
       y = "Self-perceived performance") + 
  
  # replacing "NV" label with "Non-Violent" and "V" with "Violent".
  scale_x_discrete(labels = c("NV" = "Non-Violent", "V" = "Violent")) + 
  
  # Modifies the y-axis discrete scale limits, show the values 1-7.
  scale_y_discrete(limits = c("1", "2", "3", "4", "5", "6", "7")) +
  
  theme(strip.background = element_blank(), # Modifies the appearance of the plot's background making it blank,
       axis.title = element_text(face = "bold"), # making the axis titles bold
       strip.text = element_text(size = rel(0.8), face = "bold")) # making the text on the graph with 0.8 size and bolded. 


print(Figure_2)

Figure 3

Figure 3 required us to plot self-perceived performance against exposure to violent video games. Since this plot was very similar to Figure 1, we used the same code, replacing x and y axes with “Averagefrequencyviolence” and “Selfscore” to represent self-perceived performance and exposure to violent video games.

Figure_3 <- clean_denson %>%
  ggplot(mapping = aes(x = Averagefrequencyviolence, # add "Averagefrequencyviolence" as x axis
                       y = Selfscore, # "Selfscore" as y axis
                       colour = Sex, # add colour according to sex
                       shape = Sex, # change shape of data points according to sex
                       linetype = Sex)) + # add trendlines according to sex
  
  # Add points to the plot with dodge positioning for better visibility.
  geom_point(position = position_dodge(width = 0.5)) + 
  
  # Add a linear regression line with confidence intervals, filled with grey.
  geom_smooth(method = "lm", se = TRUE, fill = "grey",
              fullrange = TRUE) +  #extends the regression lines to cover the entire plot area.
  
  # Use the classic theme for the plot.
  theme_classic() +  
  
   # Create multiple subplots (facets) based on the 'study' variable
  facet_wrap(~ study, scales = "free_y",  # y ais to repeat for each subplot
             labeller = as_labeller(   # renames the subplot labels.
               c('Gaming_manipulation' = "Experiment 1",
                 'Rank_manipulation' = "Experiment 2",
                 'Face_choice' = "Experiment 3"))) +
  
  # Set the limits of the x and y axes to control range.
  coord_cartesian(xlim = c(0, 50), ylim = c(1, 7)) +
  
  # Set the y-axis labels to a discrete scale with specific limits.
  scale_y_discrete(limits = c("1", "2", "3", "4", "5", "6", "7")) +
  
  # Label the y and x axes with custom text.
  labs(y = "Self-perceived performance",
       x = "Exposure to violent video games") +
  
  # Change colour shade
  scale_colour_hue(l = 55) +
  
  # Customize the appearance of plot.
  theme(strip.background = element_blank(),
        strip.text = element_text(size = rel(0.8), face = "bold"),
        # Customize the appearance of axis titles, axis lines, axis text, and axis ticks.
        axis.title = element_text(face = "bold"),
        axis.line = element_line(color = "grey50"),
        axis.text = element_text(color = "grey50"),
        axis.ticks = element_line(color = "grey50"))

print(Figure_3) 

Figure 4

Figure 4 plotted self-perceived mate value against self-perceived performance. Again, the geoms and aesthetics were very similar to previous scatter plots in this paper. We used the same code, except used the *“width =”** instead of “height=” to separate male and female points horizontally this time, instead of vertically.

Figure_4 <- clean_denson %>%
  
  # Mapping aesthetics to variables
  ggplot(mapping = aes( 
    x = Selfscore, 
    y = MV1Total, 
    colour = Sex, 
    shape = Sex, 
    linetype = Sex)) +  
 
   # Adding scatter points with dodge position and size
  geom_point(position = position_dodge(width = 0.8),  # position males points next to females points horizontally
             size = 2) + 
 
   # Customizing x-axis limits and intervals
  scale_x_continuous(breaks = seq(1, 7, by = 1)) +  
  
  # Customizing y-axis limits and intervals
  scale_y_continuous(breaks = seq(0, 30, by = 5), limits = c(0, 30)) +  
  
  # Adding linear trendlines with shaded confidence intervals
  geom_smooth(method = lm, se = TRUE, fullrange = TRUE, fill = "grey") + 
  
  # Customizing the colour shade
  scale_colour_hue(l = 50) +  
  
  # Applying a classic theme to the plot
  theme_classic() +  
  
  # Faceting the plot by the 'study' variable with custom labels
  facet_wrap(~study, scales = "free_y", 
             labeller = as_labeller(c('Gaming_manipulation' = "Experiment 1",
                                      'Rank_manipulation' = "Experiment 2",
                                      'Face_choice' = "Experiment 3"))) +
  
  # Adding axis labels
  labs(x = "Self-perceived performance", 
       y = "Self-perceived mate value") + 
  
  # changing plot theme
  theme(
    strip.background = element_blank(),  # Removing background from facet labels
    strip.text = element_text(size = rel(0.8), face = "bold"),  # Customizing facet label text
    axis.title = element_text(face = "bold"),  # Customizing axis title text
    axis.line = element_line(color = "grey50"),  # Customizing axis line color
    axis.text = element_text(color = "grey50"),  # Customizing axis text color
    axis.ticks = element_line(color = "grey50")  # Customizing axis tick color
  )


print(Figure_4)

Figure 5

Again, there seemed to be an issue in the Figure 5, this time, with the labelling of axis. The x axis of the graph appeared to be labelled “self-perceived performance which was measured on an ordinal scale from 1-7 . However the x axis displayed continuus variable which ranged from 20-80. Clearly, the authors had put the ‘Age’ variableon the x axis and incorrectly labelled it as Self-perceived performance. As a result, we create 2 figures for this section one to represent the Figure from the original manuscript and another with Self-perceived erformance corredly plotted on the on the x axis.

Figure 5 (age vs sociosexual inventory)

Given that the Figure 5 used the same geom as our previous scatter plots, we used the same coding structure with the approproate variables, a few minor changes. To replicate the same mistake that the authors made, we put Age on the x axis and SOIRTORAL on the y axis. We also intentionally labelled the x-axis with ‘Self-perceived performance’ as respresented in the original manuscript. Since we had two continuous variables this time, we also used scale_x_continuous() and scale_y_continous() to change the axis limits and intervals.

Figure_5 <- clean_denson %>%
  ggplot(mapping = aes(
    x = Age,  # adding Age onto x axis to replicate mistake
    y = SOIRTOTAL, 
    colour = Sex, 
    shape = Sex, 
    linetype = Sex))+ 
  
  # adding scatterplot 
  geom_point(size = 2) + #changing size of points
  
  
  # chaning x axis range and intervals
  scale_x_continuous(breaks = seq(20, 80, by = 10), # between 20 and 80, increase by 10 
                     limits = c(20,80)) + # axis limit between 20 and 80
  
  # chaning y axis range and intervals
  scale_y_continuous(breaks = seq(10, 80, by = 10)) + # between 20 and 80, increase by 10 
  
  # linear trendline
  geom_smooth(method = lm, se = TRUE, 
              fullrange= TRUE, 
              fill ="grey",) + 
  
  # changing colour shade
  scale_colour_hue(l=50) + 
  
  # adding plot theme
  theme_classic() +
  
  # creating subplots by study
  facet_wrap(~study, scales = "free_y", 
             # changing subplot labels
             labeller = as_labeller(
               c('Gaming_manipulation' = "Experiment 1",
                 'Rank_manipulation' = "Experiment 2",
                 'Face_choice' = "Experiment 3"))) +
  
  # changing the titles of x and y axis according to what was displayed in the paper’s graph
  labs(x = "Self-perceived performance", 
       y = "Sociosexual Inventory") + 
  
  # changing plot features
  theme(strip.background = element_blank(),
        strip.text = element_text(size = rel(0.8), face = "bold"),
        axis.title = element_text(face = "bold"),
        axis.line = element_line(color = "grey50"),
        axis.text = element_text(color = "grey50"),
         axis.ticks = element_line(color = "grey50"))

print(Figure_5)

Figure 5 (self-perceieved performance vs sociosexual inventory)

Here, we plotted Figure 5 again, according the figure description to compare the relationship between self-perceived performance and sociosexual inventory scores. This time, we correctly added the ‘Selfscore’ variable to the x axis to represent self-perceived performance and SOIRTOTAL on the y axis. All other functions remain the same.

New_Figure_5 <- clean_denson %>%
  ggplot(mapping = aes(x = Selfscore, # adding Selfscore corretly to x axis as per figure description
                       y = SOIRTOTAL,
                       colour = Sex, 
                       shape = Sex, 
                       linetype = Sex)) + 
  
  # adding scatterplot 
  geom_point(position=position_dodge(width = 0.8), size = 2) +
  
  # editing axis limites and intervals
  scale_x_continuous(breaks = seq(1, 7, by = 1), limits = c(1, 7)) +
  scale_y_continuous(breaks = seq(10, 80, by = 10), limits = c(20, 80)) +
  
  # linear trendline
  geom_smooth(method = lm, se = TRUE, fullrange = TRUE, fill = "grey") + 
  scale_colour_hue(l = 50) +
  theme_classic() +
  
  # creating subplots by study
  facet_wrap(
    ~study, scales = "free_y", labeller = as_labeller(
      c(
        'Gaming_manipulation' = "Experiment 1",
        'Rank_manipulation' = "Experiment 2",
        'Face_choice' = "Experiment 3"))) +
  
  # changing the titles of x and y axis according to what was displayed in the paper’s graph
  labs(x = "Self-perceived performance", 
       y = "Sociosexual Inventory")+
  
  # changing plot features
  theme(strip.background = element_blank(),
        strip.text = element_text(size = rel(0.8), face = "bold"),
        axis.title = element_text(face = "bold"),
        axis.line = element_line(color = "grey50"),
        axis.text = element_text(color = "grey50"),
         axis.ticks = element_line(color = "grey50"))

print(New_Figure_5)

Figure 6

Figure 6 required us to create two subplots which differed in the variables on the horizontal axes. The first subplot compared the relationship between self-perceieved performance against preference for masculine faces while the second compared age to facial preference scores. We knew that we couldn’t use facet_wrap this time because the subplots had different axes, so we did a Google search and found that grid.arrange() from the GridExtra package would allow us to code 2 plots separately, and then combine them side by side. Upon consultation with the open code available, we realised that the original authors used a similar approach of combining two separate plots into one.

Again, all the code for the subplots was the same as used previously, except facet_wrap() was removed since it was no longer required.

# first scatter plot with Self-perceived performance on the x axis.
self <- clean_denson %>%
  ggplot(mapping = aes (
    x = Selfscore, 
    y = TotalSTScore, 
    colour = Sex, 
    shape = Sex, 
    linetype = Sex)) +
  
  geom_point(position=ggstance::position_dodgev(height = 0.3), size = 2) + #separating the shapes 
  
  # chaning x axis range and intervals
  scale_x_continuous(breaks = seq(0, 7, by = 1)) +  # between 0 and 7, increase by 1
  
  # chaning y axis range and intervals
  scale_y_continuous(breaks = seq(0, 7, by = 1)) + # between 0 and 7, increase by 1
  
  # adding scatter plot
  geom_smooth(method = lm, se = TRUE, 
              fullrange= T, 
              fill ="grey",) +
  
  #changing colour shade
  scale_colour_hue(l=50) +
  
  # adding theme
  theme_classic() +
  
  # adding plot title and changing labels
  labs(title = "Short-term", # add "Short-term" as plot title
        x = "Self-perceived performance", 
        y = "Preference for masculine faces ") +
  
  # changing plot features
  theme(strip.background = element_blank(),
        strip.text = element_text(size = rel(0.8), face = "bold"),
        plot.title = element_text(hjust = 0.5, face = "bold"),
        axis.title = element_text(face = "bold"),
        axis.line = element_line(color = "grey50"),
        axis.text = element_text(color = "grey50"),
        axis.ticks = element_line(color = "grey50"))


# creating a second scatter plot with age on the x axis.
age <- clean_denson %>%
  ggplot(mapping = aes (
    x = Age, # age on x axis
    y = TotalLTScore, 
    colour = Sex, 
    shape = Sex, 
    linetype = Sex)) +
  
  geom_point(position=ggstance::position_dodgev(height = 0.3), size = 2) +
  
  scale_x_continuous(breaks = seq(20, 80, by = 10)) + 

  scale_y_continuous(breaks = seq(0, 7, by = 1)) + 
  
  geom_smooth(method = lm, se = TRUE, fullrange= T, fill ="grey",)+
  
  scale_colour_hue(l=50) +
  
  theme_classic() +
  
  
  labs(title = "Long-term", # adding a title for the entire plot
       x = "Age", 
       y = "Preference for masculine faces ") +
  
  theme(strip.background = element_blank(),
        strip.text = element_text(size = rel(0.8), face = "bold"),
        plot.title = element_text(hjust = 0.5, face = "bold"),
        axis.title = element_text(face = "bold"),
        axis.line = element_line(color = "grey50"),
        axis.text = element_text(color = "grey50"),
         axis.ticks = element_line(color = "grey50"))

#combining the two plots to appear next to each other
combined_plot <- grid.arrange(self, age, nrow = 1) 

combined_plot
## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name           grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]

Part 3: Exploration

1. Does playing violent games influence short-term facial preferences in males and females?

In Experiment 3 (Face choice), the authors found that self perceived performance influence facial preference in males and females such that females who perceived themselves as higher performing had a preference for feminised faces and males preferred masculiniseed faces. It was also found that overall, people who played violent video games had a lower self perceived performance. Hence, this section of my exploratory analyses the effect effect of game type on short term facial preferences in males and females in attempt to view mate choice in the context of violent and non-violent video games.

In this section I will compute summary descriptives for each group, conduct a two-way anova and display the pattern of results in a Figure.

Releavnt packages:

  • tidyverse
  • gt

To start off, I created a new data frame which only contains scores from Experiment 3 where participants had to complete a facial preferences task. I used filter() to keep all relevant rows”.

face_choice<- clean_denson %>% 
  filter(study == "Face_choice")

Descriptives

I first used group_by() to group the participants according to the type of game they played (violent or non-violent) and Sex to create four sets of means and standard deviations for participant’s rating of their short-term face preferences. I then tried to used summarise() to calculate the sample size, mean and sd but when I ran it, I encountered an error which suggested I add the argument .groups. I tried Googling what this error meant but didn’t understand didn’t quite understand it, so used reframe() instead which we used in our repoducibility section.

I then displayed all descriptives in a table using gt().

face_choice_descriptives <- face_choice %>% 
  
  # create subgroups according to violent or nonviolent game type and Sex
  group_by(VorNV, Sex) %>% 
  
  # calculate sample size, mean and sd for each group
  reframe(
    sample_size = n(),
    mean = mean(TotalSTScore),
    sd = sd(TotalSTScore)
  ) %>% 
  
  # round to 2 dp
  mutate(across(c(mean, sd), round, 2)) %>% # round mean_age and sd to 2d.p.
  ungroup()

# create an APA table for descriptives
face_choice_descriptives_table <- gt(face_choice_descriptives)

face_choice_descriptives_table
VorNV Sex sample_size mean sd
NV Female 92 2.10 1.33
NV Male 117 1.56 1.50
V Female 84 2.06 1.51
V Male 128 1.55 1.41

Conducting a two-way ANOVA

To see if there is any gender differences in the effect of violent of nonviolent video games on short-term facial preferences I chose to run an two way ANOVA. I used google to learn that running an ANOVA in R was fairly straight forward and only required a single line of code using the aov() function. I also wanted to present a summary of the analysis so I used summary().

# conduct a two way ANOVA with violent or nonviolent variable and Sex with interactions
VorNV_anova <- aov(TotalSTScore ~ VorNV * Sex, data = face_choice)

# create a summary of analysis
summary(VorNV_anova)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## VorNV         1    0.3   0.253   0.122 0.726761    
## Sex           1   28.0  27.989  13.519 0.000267 ***
## VorNV:Sex     1    0.0   0.011   0.005 0.941014    
## Residuals   417  863.3   2.070                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall, there is no evidence to suggest that game type (violent or non-violent) influences short term ratings of masculine faces. There is however, a main effect of sex on facial ratings. From the pattern of means it can be deduced that males prefer more feminised faces than females when averaged over game type manipulations. There is no interaction between game type and Sex.

Creating a plot

My goal was to create a plot that would show how the facial preference ratings were distributed. The initial idea was to a raincloud plot to display this except I didn’t like that it plot would make my dependent variable look like it was a continuous variable. Hence I decided to use a boxplot with geom_boxplot() and add a layer to show the data points at each value on the y axis. To do this, I used geom_point() followed by geom_jitter() to spread the data points out a bit. I made sure to keep the jitter height a minimum so that the reader is able to tell that the dependent variable was ordinal. I also added facet_wrap() to show male and female data separately. Other arguments were used to change the aesthetic features of the plot to make it more easily interpretable.

gametype_on_facepref <- face_choice %>% 
  
  # add violent or non-violent game type on x-axis and facial preference rating on y-axis
  ggplot(aes(x = VorNV,
             y = TotalSTScore,
             fill = VorNV, # fill boxplot by VorNV
             colour = VorNV)) + # colour data points by Vor NV
  
  # add boxplot
  geom_boxplot(colour = "black", # keep black border outlines
               width = 0.6, # change boxplot width
               alpha = 0.4)+ # change transparency of boxplot
  geom_point(alpha = 1)  + # change transparency of data points
  geom_jitter(width = 0.2, # change width of jitter
              height = 0.1, # reduce height of jitter
              alpha = 0.5) + # change transparency of jitter
  
  # creat subplots for Female and Male
  facet_wrap(vars(Sex), scales = "free_y") +
  
  # change axis labels and legend name
  labs(x = "Game type",
       y = "Preference for masculinised faces",
       fill = "Game type") +
  
  #change plot themes and aesthetics
  theme_classic() +
  theme(strip.background = element_blank()) +
  
  # remove duplicated colour legend
  guides(colour = "none")
  
gametype_on_facepref

Findings: Overall, there is no evidence to suggest that game type (violent or non-violent) influences short term ratings of masculine faces. There is however, a main effect of sex on facial ratings. From the pattern of means it can be deduced that males prefer more feminised faces than females when averaged over game type manipulations. There is no interaction between game type and Sex.

3: How do objective measures compare to self-perceived performance in moderating self-perceived mate value in people with different relationships status?

In this last section, I wanted to make use of relationship status data that was collected by the authors but not used in any analyses. I thought it would be interesting to see whether there were different patterns in how self-perceieved performance and actual performance influenced self perceived mate value. I would imagine that being in a long term relationship may cause people to value self-perceived performance when making their own self evalutaion because there isn’t a need to be view as objectively attractive or wealthy in a long-term relationship. For this reason, I am comparing people in a monagamous relationship which I have classified as people who reported being in a “long-term monogamous relationship eg. married, partnered” and to people who are in a “nonmonagamous” relationship or people who reported “In an open relationship/casually dating”. This will be displayed in the form of a combined scatterplot with linear trendlines imposed.

Relevant packages:

  • tidyverse
  • ggpubr

I started by removing participants that I wasn’t interested in filter filter(). These participants included people who were single or recently separated. I then created a new variable name for people in a long-term monogamous relationship eg. married, partnered” to make it more simplified and appropriate for a figure. To do this, I used mutate() and case_when() to create a new variable called Relationship type, with containing grep1() to identify and rename all values to “monagamous” where the original value contained the word “monagomous” as part of the long variable value.

From observation, it seemed like there were a lot more data point in the “monagamous” group compared to the “nonmonagamous” so I created a new tibble to show the sample size for each group which confirmed my suspicions.

relostatus_regrouped <- clean_denson %>% 
  filter(!is.na(Relationshipstatus),
         Relationshipstatus != "Single") %>% 
  mutate(RelationshipType = case_when(
    grepl("monogamous", Relationshipstatus) ~ "monogamous", 
      TRUE ~ "nonmonogamous"))

count <- relostatus_regrouped %>% 
  group_by(RelationshipType) %>% 
  reframe(sample_size = n())

count
## # A tibble: 2 × 2
##   RelationshipType sample_size
##   <chr>                  <int>
## 1 monogamous               920
## 2 nonmonogamous            124

In the next few chunk, I created 2 subplots, one comparing self perceieved performance with self-perceieved mate value, and a second plot which compares rank (objective performance) to self perceived mate value. I also applied what I had learnt to with the function grid.arrange() create a combined scatterplot that was similar to Figure 6 from Part 2 because it was most appropriate way of showing the data.

This time, I also wanted to add the Pearson correlations (R) and p values to the trendline of my plots to see the strength of the relationships. to do this, I installed the ggpubr package and added the stat_cor() function to produce these values.

library(ggpubr) # used to add Pearsons corelation and p-values to plots

Creating the first scatterplot (Self-perceived performance vs. Self-perceived mate value)

In this subplot, I added ‘Selfscore’ to the x axis to represent self-perceived performance and ‘MV1Total’ to represent self-perceived mate value. I separated data points according to Relationship status using shape= and colour and linetype= so monamagmous and nonmanagamous groups could be differentiated. Like previous scatterplots, geom_point() and geom_smooth was used to add the scatterplot and linear trendline layers. Another key feature of the graph, Pearson correlations were added with stat_cor().

selfscore_matevalue <- relostatus_regrouped %>% 
  
  # add Selfscore to x axis and MV1Total to y axis
  ggplot(aes(x = Selfscore,
         y = MV1Total,
         colour = RelationshipType, # separate relationship type by colour
         shape = RelationshipType, # separate relationship type by shape
         linetype = RelationshipType)) + # use different trendlines for each relationship type
  
  # add scatterplot
  geom_point(position = position_dodge(width = .7), size = 2) +  # dodge points horizontally
  
  # customise axis limits and intervals
  scale_x_continuous(breaks = seq(0, 7, by= 1)) +
  scale_y_continuous(breaks = seq(0, 49, by= 5)) +
  
  # add linear trendline
  geom_smooth(method = lm) +
  scale_colour_hue(l = 50) +
  
  # change labels
  labs(x = "Self-perceived performance", y = "Self-perceived mate value") +
  
  # add theme
  theme_classic() +
  
  # remove legend
  theme(legend.position = "none") +
  
  # add Pearson R and p values
  stat_cor()

selfscore_matevalue

Creating the second scatterplot (Rank vs. Self-perceived mate value)

In this subplot, I added ‘rank’ to the x axis to represent objective performance and ‘MV1Total’ to represent self-perceived mate value. I used the same code structure as the first subplot.

rank_matevalue <- relostatus_regrouped %>% 
  
  # add Rank to x axis and MV1Total to y axis
  ggplot(aes(x = Rank,
         y = MV1Total,
         colour = RelationshipType,
         shape = RelationshipType, # separate relationship type by shape
         linetype = RelationshipType)) + # use different trendlines for each relationship type
  
  # add scatterplot
  geom_point(position = position_dodge(width = .7), size = 2) + # dodge points horizontally
  
  # customise axis limits and intervals
  scale_x_continuous(breaks = seq(0, 10, by= 1)) +
  scale_y_continuous(breaks = seq(0, 30, by = 5)) + 
  
   # add linear trendline
  geom_smooth(method = lm) +
  scale_colour_hue(l = 50) + #change colour shade
  
  # change labels
  labs(x = "Rank", y = "Self-perceived mate value") +
  
  #add theme
  theme_classic() +

  # add Pearson R and p values
  stat_cor()

rank_matevalue

###combining plots Here, I combined the two subplots using the grid.arrange() function.

I realised that when the plots were combined, the legend was duplicated because it was created once for each subplot. To remove one, I went back into my code for the selfscore_matevalue plot and used the argument legend.position = “none” inside the theme() function.

#combine 2 subplots
combined_plot2 <- grid.arrange(selfscore_matevalue, rank_matevalue, nrow = 1, 
                               widths = c(1, 1.5)) # manually change width of plots to match 

Findings: From this graph, there is a weak but statistically significant relationships between self perceived performance and self perceive mate value for people in a monagamous relationship. There is no other evidence for the relationship between rank and self-perceived mate value for people in monagamous or nonmonagamous relationships.

Part 4: Recommendations

My first recommendation for the authors of the paper is to include documentation in the open code. One of the things that I struggled with the most in my reproducibility journey, was interpreting the code used for inferential analyses because the documentation was almost absent. From the paper manuscript, linear regression models were included in the analyses but were also not documented anywhere in the code. I had to rely on my my intuition to find the appropriate chunks of code which represented the analyses reported in the paper. Given that this study contained a total of 17 variables, trying to decode the code was very frustrating. I also noticed that standard error was calculated a few times because I could recognise the “se” abbreviation and formula, but this wouldn’t be understood by someone who is unfamiliar with the standard error formula. My suggestion would be to include a summary of the main purpose of each chunk of code which mentions the aim, variables involved and key functions used. Embedding comments within the code between subsections of the code where new objects are created or formulas would make the code much less intimidating and easier to digest. The authors may want to consider using this link to read more about how to create documentation that is consistent and clear https://style.tidyverse.org/documentation.html.

I would also recommend providing a ReadMe file along with the open data to make it easier for other researchers to understand how the variable names correspond to the data. While variable naming was mostly well done in the open data file, some of them were still unclear, or didn’t reflect what was described in the paper. For example, the open data file contained a variable named “MV2Average”. I expected this to be the “MV1Total” value averaged by 4 because the paper mentioned that total scores on the Mate Value survey were averaged across 4 items. Upon close examination however, I realised that “MV2Average” values was not an average from “MV1Total”. Providing a ReadMe file that contains the unabbreviated version of variable names and a description of how scores are aggregated would help link the data directly to the paper. The authors may want to consider practising how to write a clear ReadMe file at this link which contains some exercises with R project examples https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/documenting-analyses.html

Lastly I would recommend including all the code used to construct tables, in addition the the code used for the Figures. When we calculated the descriptive data which was displayed in Table 1, we found a discrepancy between the standard deviation values for Experiment 3 with what was shown in the paper, but we couldn’t confirm what went wrong because there was no code which showed how data was put into the table. While their calculations of the sd value was also done using the sd() function, we weren’t sure if they added values into the table manually or the sd() produced different data. We had to assume that the calculations were done correctly, but a mistake occurs when putting descriptive data into table format because the standard deviation values were the same for 2 experiments. Including code is crucial to allowing other researchers pinpoint where reproducibility problems occur. The authors should consider following the Open Science Framework’s guide on making sharable at the following link https://help.osf.io/article/219-sharing-research-outputs .

Part 5: Reflections

  1. When I first heard that the internship involved learning how to code in R I felt unsure of what to expect, but excited to learn something completely new. My previous understanding of coding from which was mainly informed by media representations and was made up of two things: 1’s and 0’s, and maths. While it seemed like an intriguing topic, I couldn’t grasp how coding could be related to psychological research, given that I had only been exposed to statistical packages like SPSS to perform analyses, and tools such as excel and word to create figures. Closer to the beginning of term, some of this excitement turned into fear and anxiety. I had a fixed mindset of being terrible at coding from a 2-week experience of coding in python during high school. This, combined with an email from Jenny which wrote “WARNING: the learning curve WILL BE STEEP” gave me a big scare and made coding seem like an unattainable skill for me. I’m lucky that I had some people around to reassure me that it wouldn’t be too difficult.

  2. Now that I have been learning R for 10 weeks, I feel confident in my ability to wrangle and reshape data to achieve a goal in R. Through practice, I have become much better at setting out a plan with the appropriate steps and tools to achieve a goal. After working with real data from a study, I have learned that some steps tend to come before others, and some functions work better together than others. For example, data values containing “NA” must be removed with filter() before using functions like summarise() or mutate() to compute data. I am now able to use this knowledge to create a framework to my code.

  3. The hardest thing I have encountered about learning R this term was figuring out how to use resources Google and ChatGPT to generate responses effectively in a way that would help me simplify my code and understand it better. For example, when I first started using ChatGPT, I relied on having ChatGPT to generate code for me that I could use as a template for my data. I found that this wasn’t helpful at all because often, this code was way too complicated and many of the responses didn’t produce code that would solve my questions because they were missing too much context. A lot of my conversations with ChatGPT ended up starting with “No, I mean…” which was met with the reply, “Sorry about the confusion.” . After some practice I started using ChatGPT and Google to suggest a list of functions that I would use to investigate and select the most appropriate function to use myself. My questions changed from “I want to create an APA table which combines the columnX and columnY so that….” to something more like “give me a list of functions that I can use to combine columns in R”. This not only made my coding much simpler, but also increased both the depth and breadth of my learning.

  4. I am most proud of myself for being patient with myself when the code didn’t work out or I couldn’t find the answer to my questions immediately. Coding was especially frustrating for me, because I didn’t know how to use terminology to ask the right questions or explain my situation clearly. This meant that I saw a lot of red error messages, and being unable to progress which was frustrating and intimidating. I realised that focusing on the small wins, instead of the errors and the big goal, was a much better way of approaching my coding journey because it helped to sustain my motivation to continue to look for solutions.

  5. The next thing I want to learn how to do in R is create diagrams and flowcharts. Now that I’ve seen how powerful and efficient Rmarkdown is in creating figures and running analyses, I think learning how to create diagrams would be a fun way of rounding off our journey with RStudio given that they also form a part of many research manuscripts. Though I’m not sure if it would be the most appropriate tool available to do so…