My goals for this week

  1. To finish commenting on my code for question 3!

  2. Finish the minor things (ie. check spelling etcetera.)

  3. To successfully knit my vr report into pdf format.

How did I go?

My third exploratory question is focused on examining whether there’s a relationship between preferences for physical attractiveness and relationship length?

It wasn’t apparent what the DV is for relationship length (ie. days? years?). I looked at the supplemental material and found some indication that it referred to months.

“Participants in this sample were M = 49.86 years old on average (SD = 14.48) and were in their relationships for Mdn = 156 months” (line 199-200 from Walter et al’s (2020) supplemental materials)

Load the relevant packages needed

library(tidyverse)
library(ggplot2)
library(ggpubr)
library(tidyr)
library(ggeasy)
library(gt)
library(jmv)
library(rstatix)
library(dplyr)
library(nortest)
library(kableExtra)
library(emo)

Loading in the data and viewing it

The function read_csv() from the ‘readr package’ is being used to read the data, which is being saved as ‘sdmp’ to represent the title of the paper (“Sex Differences in Mate Preferences”)

sdmp <- read_csv("ReplicationProcessedfinaldata04202018.csv")  

glimpse(sdmp)
## Rows: 14,399
## Columns: 35
## $ PIN                <dbl> 12506, 1997, 1448, 10625, 6106, 4078, 3034, 5281, 1…
## $ CIN                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ continent          <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "…
## $ country            <chr> "Algeria", "Algeria", "Algeria", "Algeria", "Algeri…
## $ city               <chr> "Algiers", "Algiers", "Setif", "Setif", "Algiers", …
## $ countrycode        <dbl> 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,…
## $ partnum            <chr> "85", "72", "277", "229", "23", "82", "86", "135", …
## $ partcode           <chr> "A85", "A72", "SB277", "S229", "A23", "A82", "A86",…
## $ sample             <dbl> 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 1, …
## $ sex                <dbl> 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ age                <dbl> 21, 22, 47, 20, 23, 20, 22, 27, 19, 19, 19, 28, 22,…
## $ religious          <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ religion           <chr> "islam", "islam", "Islam", "Islam", "islam", "islam…
## $ relstat            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ relstat2           <dbl> 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 3, …
## $ rellength          <dbl> 28, NA, 307, NA, NA, 14, 14, 9, NA, NA, NA, 14, NA,…
## $ ideal_intelligence <dbl> 3, 7, 5, 4, 7, 6, 6, 7, 5, 5, 4, 6, 6, 7, 6, 7, 4, …
## $ ideal_kindness     <dbl> 7, 7, 5, 7, 7, 7, 7, 7, 7, 5, 7, 6, 7, 7, 7, 7, 5, …
## $ ideal_health       <dbl> 6, 7, 5, 7, 7, 7, 7, 7, 5, 7, 7, 6, 7, 7, 6, 4, 5, …
## $ ideal_physatt      <dbl> 4, 7, 5, 7, 7, 7, 7, 6, 7, 5, 5, 6, 6, 7, 6, 2, 6, …
## $ ideal_resources    <dbl> 1, 6, 5, 4, 7, 7, 7, 6, 4, 5, 4, 6, 5, 7, 7, 1, 4, …
## $ mate_age           <dbl> 16, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 19, 19,…
## $ popsize            <dbl> 39667, 39667, 39667, 39667, 39667, 39667, 39667, 39…
## $ country_religion   <chr> "Muslim", "Muslim", "Muslim", "Muslim", "Muslim", "…
## $ lattitude          <dbl> 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,…
## $ gem1995            <dbl> 0.266, 0.266, 0.266, 0.266, 0.266, 0.266, 0.266, 0.…
## $ gdi1995            <dbl> 0.508, 0.508, 0.508, 0.508, 0.508, 0.508, 0.508, 0.…
## $ gii                <dbl> 0.429, 0.429, 0.429, 0.429, 0.429, 0.429, 0.429, 0.…
## $ gdi2015            <dbl> 0.854, 0.854, 0.854, 0.854, 0.854, 0.854, 0.854, 0.…
## $ gggi               <dbl> 0.642, 0.642, 0.642, 0.642, 0.642, 0.642, 0.642, 0.…
## $ gdp_percap         <dbl> 15100, 15100, 15100, 15100, 15100, 15100, 15100, 15…
## $ infect_death       <dbl> 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7…
## $ infect_yll         <dbl> 406.6, 406.6, 406.6, 406.6, 406.6, 406.6, 406.6, 40…
## $ cmc_yll            <dbl> 2039.5, 2039.5, 2039.5, 2039.5, 2039.5, 2039.5, 203…
## $ gb_path            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Creating a new, “clean” data frame with the variables of interest

The select() function is being used to select the variables we want to use. Additionally, the na.omit() function is being used to exclude all incomplete cases/missing values from the dataframe.

clean_data3 <- sdmp %>% 
  select(ideal_physatt, rellength, sex) %>% 
  na.omit()

Viewing the new dataframe created

Glimpse() is being used to present a somewhat more compact view of the data, allowing us to see each of the columns (ie. variables) in the data frame and their data type. This is helpful as we can then check to see whether our code above worked and has only included the select variables.

glimpse(clean_data3)
## Rows: 7,135
## Columns: 3
## $ ideal_physatt <dbl> 4, 5, 7, 7, 6, 6, 6, 7, 7, 7, 7, 7, 7, 5, 5, 7, 7, 7, 7,…
## $ rellength     <dbl> 28, 307, 14, 14, 9, 14, 4, 51, 64, 36, 27, 41, 50, 34, 3…
## $ sex           <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,…

Okay so just glancing at the data, I notice that some individuals have reported ‘0’ for their relationship length. I’m not sure whether this reflects a mistake on the participant’s part, or whether it means that they’re not in a relationship- in any case, I think it would be wise to remove all cases of ‘0’.

Filtering the data to exclude all ‘0’ values for relationship length

The filter() function from the dplyr package is being used to remove all ‘0’ values within the relationship length variable. The glimpse() function is being used after in order to view the data and double check if it’s worked.

clean_data3 <- clean_data3 %>%
  dplyr::filter(rellength != "0")       # "!x' is used to denote 'not x'

glimpse(clean_data3)
## Rows: 7,040
## Columns: 3
## $ ideal_physatt <dbl> 4, 5, 7, 7, 6, 6, 6, 7, 7, 7, 7, 7, 7, 5, 5, 7, 7, 7, 7,…
## $ rellength     <dbl> 28, 307, 14, 14, 9, 14, 4, 51, 64, 36, 27, 41, 50, 34, 3…
## $ sex           <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,…

Creating high and low preference groups

The mutate() function is being used to create a new variable- ‘pref_group’, with the case_when() function from the ‘dplyr’ package specifying vectors of ratings that will then form the preference groups. The as.factor() function is then being used to change the ‘pref_group’ variable into a factor.

group_pref <- clean_data3 %>% 
  mutate(
    pref_group = dplyr::case_when(
      ideal_physatt <= 4 ~ "Low",
      ideal_physatt >= 5 ~ "High"))

group_pref$pref_group <- as.factor(group_pref$pref_group)

Viewing the new dataframe created

Glimpse() is being used to present a somewhat more compact view of the data, allowing us to see each of the columns (ie. variables) in the data frame and their data type. This is helpful as we can then check to see whether our code above worked and the new ‘pref_group’ variable is there, and whether its been converted into a factor.

glimpse(group_pref)
## Rows: 7,040
## Columns: 4
## $ ideal_physatt <dbl> 4, 5, 7, 7, 6, 6, 6, 7, 7, 7, 7, 7, 7, 5, 5, 7, 7, 7, 7,…
## $ rellength     <dbl> 28, 307, 14, 14, 9, 14, 4, 51, 64, 36, 27, 41, 50, 34, 3…
## $ sex           <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,…
## $ pref_group    <fct> Low, High, High, High, High, High, High, High, High, Hig…

Calculating the mean relationship length (in months) for high vs low pref group

This code calculates the mean preference rating for ‘rellength’ (ie. relationship length) for individuals who had high or low preference ratings for physical attractiveness. The group_by() function was used to define the variables of interest as ‘pref_group’, and summarise() was used to calculate summary statistics (ie. n, mean, SD, min, and max) for relationship length. The round() function from base r was also used to specify the number of decimal places desired (ie. 2).

table_grouppref <- group_pref %>% 
  group_by(pref_group) %>% 
  summarise(
    n = n(),
    Mean = round(mean(rellength), 2),
    SD = round(sd(rellength), 2),
    Min = min(rellength),
    Max = max(rellength))

table_grouppref %>% 
  kbl() %>% 
  kable_material("striped", "hover")
pref_group n Mean SD Min Max
High 6132 88.08 103.28 0.50 725
Low 908 99.15 110.84 0.25 544

Running a t-test

I wanted to use a t-test in order to determine whether there was a statistically significant difference between the relationship lengths of those who had high or low preference ratings for physical attractiveness. The null hypothesis is that the average relationship lengths are equal, and the alternative hypothesis is that they’re not.

The filter function is being used to create seperate dataframes for the high and low preference rating groups. Then I’m using the t.test() function to run a Welch Two Sample t-test, looking at the differences between the preference groups in regards to their relationship length.

highpref <- group_pref %>%
  filter(pref_group == "High")

lowpref <- group_pref %>%
  filter(pref_group == "Low")

t.test(highpref$rellength, lowpref$rellength)
## 
##  Welch Two Sample t-test
## 
## data:  highpref$rellength and lowpref$rellength
## t = -2.8314, df = 1152.4, p-value = 0.004715
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.731177  -3.397329
## sample estimates:
## mean of x mean of y 
##  88.08195  99.14620

Interpreting the findings

It was hypothesised that individuals who prioritise physical attractiveness more (ie. have higher ratings) are more likely to hold positive illusions of their partner and therefore, are likely to have longer relationship lengths relative to individuals who have lower ratings for physical attractiveness.

Relationship lengths differed significantly according to Welch’s t-test, t(1152.40) = -2.8314, p < 0.05. On average, individuals who had high preference ratings for physical attractiveness had a relationship length of 88.08 months (SD = 103.28) whereas individuals who had low preference ratings had a relationship length of 99.15 months (SD = 110.84). The 95% confidence interval for effect of preference ratings for physical attractiveness on relationship length is between -18.73 and -3.40 percent. These results support the hypothesis.

I noticed the negative numbers above so it made me wonder if there’s a negative correlation perhaps. In order to see if a linear relationship exists, I’m now creating a scatterplot so I can inspect the distribution of data points.

Creating a scatterplot

The ‘group_pref’ dataframe is being used, with the aesthetic mappings specifying the x axis variable as ‘ideal_physatt’ (ie. physical attractiveness preferences) and the y axis variable as ‘rellength’ (ie. relationship length). Colour is also used to specify that the preferences should be differentiated based on preference group (ie. high or low), enabling us to better see the differences in relationship length between the groups.

Geom_point() is used to create a scatterplot, with the argument ‘positions = jitter’ being added so that the individual data points can be seen a bit more. Geom_smooth() is also adding a smoothing line, which enables us to more clearly see patterns within the plot. Method = “lm” is being used to fit a linear model (ie. a straight line), with size specifying the weighting of the smoothing line and colour specifying that it should be black.

The theme is specified with theme_classic(), with the scale_colour_brewer() function from the ‘ggpplot2’ package specifying the colour. The labs() function is being used to rename the x and y axis labels and a legend title is being added with the easy_add_legend_title() function from ggeasy.

p3 <- ggplot(data = group_pref, mapping = aes(
  x = ideal_physatt,
  y = rellength,
  colour = pref_group))+
  geom_point(position = "jitter", alpha=0.7)+
  geom_smooth(method = "lm", color = "black", alpha = 0.3, size = 2)+
  theme_classic()+
  scale_colour_brewer(palette = "Paired")+
  easy_add_legend_title("Preference Group")+
  labs(x = "Preference Ratings for Physical Attractiveness", y = "Relationship Length (months)", title = "Correlation between relationship length and attractiveness preferences")



print(p3)
## `geom_smooth()` using formula 'y ~ x'

Just looking at it from face value, there appears to be a very small negative slope- this could indicate a potential (weak) negative correlation.

Running a Pearson’s product-moment correlation

cor.test(clean_data3$rellength, clean_data3$ideal_physatt)
## 
##  Pearson's product-moment correlation
## 
## data:  clean_data3$rellength and clean_data3$ideal_physatt
## t = -2.7878, df = 7038, p-value = 0.005321
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05652850 -0.00985974
## sample estimates:
##         cor 
## -0.03321222

Interpreting the results

There was a very weak, negative correlation between relationship length and preferences for physical attractiveness (r(7038) = -.03; p < 0.05).

My answer to the research question

The research question was whether the biases that are associated with physical attraction, those biases that drive the power of physical appearances, last? Does a relationship between preferences for physical attractiveness and relationship length exist?

The results provide mixed evidence.

I hypothesised that individuals who prioritised physical attractiveness more (ie. have higher ratings) would be more likely to hold positive illusions of their partner and therefore, would likely have longer relationship lengths relative to individuals with lower ratings for physical attractiveness.

The independent samples t-test found that on average, individuals with higher preference ratings for physical attractiveness had shorter relationship lengths than individuals with low preference ratings. However, the Pearson’s product-moment correlation found a very weak, negative correlation between relationship length and preferences for physical attractiveness.

Quite the opposite of my hypothesis, really. I’m not too sure what conclusions to make based on the findings- it’s very interesting that there is a statistically significant difference but a weak correlationship!

Strengths

  • I’ve been able to finish my entire vr report!! In saying that, I have been neglecting my other assignments (cough cough psyc3311 cough) so I have to catch up with that. I’ve finished approximately 10% of my report and it’s due on Monday so…doing great!

Challenges

  • A major challenge I’m facing is fixing the format of the pdf version of my vr report. I’m slowly figuring it out bit by bit though, which is good!

Questions for Jenny

  • I’ve been able to knit my entire vr report to pdf however, the formatting is a bit screwed up (ie. code running off the page, weird page breaks). I’ve looked online and I found a solution for that issue with the code running off the page but I haven’t been able to find a way to add page breaks manually. Do you have any ideas?

Also, I just wanted to say thank you for all of your help so far! I really appreciate it

emo::ji("smile")
## 😄