To finish commenting on my code for question 3!
Finish the minor things (ie. check spelling etcetera.)
To successfully knit my vr report into pdf format.
My third exploratory question is focused on examining whether there’s a relationship between preferences for physical attractiveness and relationship length?
It wasn’t apparent what the DV is for relationship length (ie. days? years?). I looked at the supplemental material and found some indication that it referred to months.
“Participants in this sample were M = 49.86 years old on average (SD = 14.48) and were in their relationships for Mdn = 156 months” (line 199-200 from Walter et al’s (2020) supplemental materials)
Load the relevant packages needed
library(tidyverse)
library(ggplot2)
library(ggpubr)
library(tidyr)
library(ggeasy)
library(gt)
library(jmv)
library(rstatix)
library(dplyr)
library(nortest)
library(kableExtra)
library(emo)
Loading in the data and viewing it
The function read_csv() from the ‘readr package’ is being used to read the data, which is being saved as ‘sdmp’ to represent the title of the paper (“Sex Differences in Mate Preferences”)
sdmp <- read_csv("ReplicationProcessedfinaldata04202018.csv")
glimpse(sdmp)
## Rows: 14,399
## Columns: 35
## $ PIN <dbl> 12506, 1997, 1448, 10625, 6106, 4078, 3034, 5281, 1…
## $ CIN <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ continent <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "…
## $ country <chr> "Algeria", "Algeria", "Algeria", "Algeria", "Algeri…
## $ city <chr> "Algiers", "Algiers", "Setif", "Setif", "Algiers", …
## $ countrycode <dbl> 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,…
## $ partnum <chr> "85", "72", "277", "229", "23", "82", "86", "135", …
## $ partcode <chr> "A85", "A72", "SB277", "S229", "A23", "A82", "A86",…
## $ sample <dbl> 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 1, …
## $ sex <dbl> 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ age <dbl> 21, 22, 47, 20, 23, 20, 22, 27, 19, 19, 19, 28, 22,…
## $ religious <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ religion <chr> "islam", "islam", "Islam", "Islam", "islam", "islam…
## $ relstat <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ relstat2 <dbl> 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 3, …
## $ rellength <dbl> 28, NA, 307, NA, NA, 14, 14, 9, NA, NA, NA, 14, NA,…
## $ ideal_intelligence <dbl> 3, 7, 5, 4, 7, 6, 6, 7, 5, 5, 4, 6, 6, 7, 6, 7, 4, …
## $ ideal_kindness <dbl> 7, 7, 5, 7, 7, 7, 7, 7, 7, 5, 7, 6, 7, 7, 7, 7, 5, …
## $ ideal_health <dbl> 6, 7, 5, 7, 7, 7, 7, 7, 5, 7, 7, 6, 7, 7, 6, 4, 5, …
## $ ideal_physatt <dbl> 4, 7, 5, 7, 7, 7, 7, 6, 7, 5, 5, 6, 6, 7, 6, 2, 6, …
## $ ideal_resources <dbl> 1, 6, 5, 4, 7, 7, 7, 6, 4, 5, 4, 6, 5, 7, 7, 1, 4, …
## $ mate_age <dbl> 16, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 19, 19,…
## $ popsize <dbl> 39667, 39667, 39667, 39667, 39667, 39667, 39667, 39…
## $ country_religion <chr> "Muslim", "Muslim", "Muslim", "Muslim", "Muslim", "…
## $ lattitude <dbl> 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,…
## $ gem1995 <dbl> 0.266, 0.266, 0.266, 0.266, 0.266, 0.266, 0.266, 0.…
## $ gdi1995 <dbl> 0.508, 0.508, 0.508, 0.508, 0.508, 0.508, 0.508, 0.…
## $ gii <dbl> 0.429, 0.429, 0.429, 0.429, 0.429, 0.429, 0.429, 0.…
## $ gdi2015 <dbl> 0.854, 0.854, 0.854, 0.854, 0.854, 0.854, 0.854, 0.…
## $ gggi <dbl> 0.642, 0.642, 0.642, 0.642, 0.642, 0.642, 0.642, 0.…
## $ gdp_percap <dbl> 15100, 15100, 15100, 15100, 15100, 15100, 15100, 15…
## $ infect_death <dbl> 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7…
## $ infect_yll <dbl> 406.6, 406.6, 406.6, 406.6, 406.6, 406.6, 406.6, 40…
## $ cmc_yll <dbl> 2039.5, 2039.5, 2039.5, 2039.5, 2039.5, 2039.5, 203…
## $ gb_path <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Creating a new, “clean” data frame with the variables of interest
The select() function is being used to select the variables we want to use. Additionally, the na.omit() function is being used to exclude all incomplete cases/missing values from the dataframe.
clean_data3 <- sdmp %>%
select(ideal_physatt, rellength, sex) %>%
na.omit()
Viewing the new dataframe created
Glimpse() is being used to present a somewhat more compact view of the data, allowing us to see each of the columns (ie. variables) in the data frame and their data type. This is helpful as we can then check to see whether our code above worked and has only included the select variables.
glimpse(clean_data3)
## Rows: 7,135
## Columns: 3
## $ ideal_physatt <dbl> 4, 5, 7, 7, 6, 6, 6, 7, 7, 7, 7, 7, 7, 5, 5, 7, 7, 7, 7,…
## $ rellength <dbl> 28, 307, 14, 14, 9, 14, 4, 51, 64, 36, 27, 41, 50, 34, 3…
## $ sex <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,…
Okay so just glancing at the data, I notice that some individuals have reported ‘0’ for their relationship length. I’m not sure whether this reflects a mistake on the participant’s part, or whether it means that they’re not in a relationship- in any case, I think it would be wise to remove all cases of ‘0’.
Filtering the data to exclude all ‘0’ values for relationship length
The filter() function from the dplyr package is being used to remove all ‘0’ values within the relationship length variable. The glimpse() function is being used after in order to view the data and double check if it’s worked.
clean_data3 <- clean_data3 %>%
dplyr::filter(rellength != "0") # "!x' is used to denote 'not x'
glimpse(clean_data3)
## Rows: 7,040
## Columns: 3
## $ ideal_physatt <dbl> 4, 5, 7, 7, 6, 6, 6, 7, 7, 7, 7, 7, 7, 5, 5, 7, 7, 7, 7,…
## $ rellength <dbl> 28, 307, 14, 14, 9, 14, 4, 51, 64, 36, 27, 41, 50, 34, 3…
## $ sex <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,…
Creating high and low preference groups
The mutate() function is being used to create a new variable- ‘pref_group’, with the case_when() function from the ‘dplyr’ package specifying vectors of ratings that will then form the preference groups. The as.factor() function is then being used to change the ‘pref_group’ variable into a factor.
group_pref <- clean_data3 %>%
mutate(
pref_group = dplyr::case_when(
ideal_physatt <= 4 ~ "Low",
ideal_physatt >= 5 ~ "High"))
group_pref$pref_group <- as.factor(group_pref$pref_group)
Viewing the new dataframe created
Glimpse() is being used to present a somewhat more compact view of the data, allowing us to see each of the columns (ie. variables) in the data frame and their data type. This is helpful as we can then check to see whether our code above worked and the new ‘pref_group’ variable is there, and whether its been converted into a factor.
glimpse(group_pref)
## Rows: 7,040
## Columns: 4
## $ ideal_physatt <dbl> 4, 5, 7, 7, 6, 6, 6, 7, 7, 7, 7, 7, 7, 5, 5, 7, 7, 7, 7,…
## $ rellength <dbl> 28, 307, 14, 14, 9, 14, 4, 51, 64, 36, 27, 41, 50, 34, 3…
## $ sex <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,…
## $ pref_group <fct> Low, High, High, High, High, High, High, High, High, Hig…
Calculating the mean relationship length (in months) for high vs low pref group
This code calculates the mean preference rating for ‘rellength’ (ie. relationship length) for individuals who had high or low preference ratings for physical attractiveness. The group_by() function was used to define the variables of interest as ‘pref_group’, and summarise() was used to calculate summary statistics (ie. n, mean, SD, min, and max) for relationship length. The round() function from base r was also used to specify the number of decimal places desired (ie. 2).
table_grouppref <- group_pref %>%
group_by(pref_group) %>%
summarise(
n = n(),
Mean = round(mean(rellength), 2),
SD = round(sd(rellength), 2),
Min = min(rellength),
Max = max(rellength))
table_grouppref %>%
kbl() %>%
kable_material("striped", "hover")
| pref_group | n | Mean | SD | Min | Max |
|---|---|---|---|---|---|
| High | 6132 | 88.08 | 103.28 | 0.50 | 725 |
| Low | 908 | 99.15 | 110.84 | 0.25 | 544 |
Running a t-test
I wanted to use a t-test in order to determine whether there was a statistically significant difference between the relationship lengths of those who had high or low preference ratings for physical attractiveness. The null hypothesis is that the average relationship lengths are equal, and the alternative hypothesis is that they’re not.
The filter function is being used to create seperate dataframes for the high and low preference rating groups. Then I’m using the t.test() function to run a Welch Two Sample t-test, looking at the differences between the preference groups in regards to their relationship length.
highpref <- group_pref %>%
filter(pref_group == "High")
lowpref <- group_pref %>%
filter(pref_group == "Low")
t.test(highpref$rellength, lowpref$rellength)
##
## Welch Two Sample t-test
##
## data: highpref$rellength and lowpref$rellength
## t = -2.8314, df = 1152.4, p-value = 0.004715
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.731177 -3.397329
## sample estimates:
## mean of x mean of y
## 88.08195 99.14620
Interpreting the findings
It was hypothesised that individuals who prioritise physical attractiveness more (ie. have higher ratings) are more likely to hold positive illusions of their partner and therefore, are likely to have longer relationship lengths relative to individuals who have lower ratings for physical attractiveness.
Relationship lengths differed significantly according to Welch’s t-test, t(1152.40) = -2.8314, p < 0.05. On average, individuals who had high preference ratings for physical attractiveness had a relationship length of 88.08 months (SD = 103.28) whereas individuals who had low preference ratings had a relationship length of 99.15 months (SD = 110.84). The 95% confidence interval for effect of preference ratings for physical attractiveness on relationship length is between -18.73 and -3.40 percent. These results support the hypothesis.
I noticed the negative numbers above so it made me wonder if there’s a negative correlation perhaps. In order to see if a linear relationship exists, I’m now creating a scatterplot so I can inspect the distribution of data points.
Creating a scatterplot
The ‘group_pref’ dataframe is being used, with the aesthetic mappings specifying the x axis variable as ‘ideal_physatt’ (ie. physical attractiveness preferences) and the y axis variable as ‘rellength’ (ie. relationship length). Colour is also used to specify that the preferences should be differentiated based on preference group (ie. high or low), enabling us to better see the differences in relationship length between the groups.
Geom_point() is used to create a scatterplot, with the argument ‘positions = jitter’ being added so that the individual data points can be seen a bit more. Geom_smooth() is also adding a smoothing line, which enables us to more clearly see patterns within the plot. Method = “lm” is being used to fit a linear model (ie. a straight line), with size specifying the weighting of the smoothing line and colour specifying that it should be black.
The theme is specified with theme_classic(), with the scale_colour_brewer() function from the ‘ggpplot2’ package specifying the colour. The labs() function is being used to rename the x and y axis labels and a legend title is being added with the easy_add_legend_title() function from ggeasy.
p3 <- ggplot(data = group_pref, mapping = aes(
x = ideal_physatt,
y = rellength,
colour = pref_group))+
geom_point(position = "jitter", alpha=0.7)+
geom_smooth(method = "lm", color = "black", alpha = 0.3, size = 2)+
theme_classic()+
scale_colour_brewer(palette = "Paired")+
easy_add_legend_title("Preference Group")+
labs(x = "Preference Ratings for Physical Attractiveness", y = "Relationship Length (months)", title = "Correlation between relationship length and attractiveness preferences")
print(p3)
## `geom_smooth()` using formula 'y ~ x'
Just looking at it from face value, there appears to be a very small negative slope- this could indicate a potential (weak) negative correlation.
Running a Pearson’s product-moment correlation
cor.test(clean_data3$rellength, clean_data3$ideal_physatt)
##
## Pearson's product-moment correlation
##
## data: clean_data3$rellength and clean_data3$ideal_physatt
## t = -2.7878, df = 7038, p-value = 0.005321
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.05652850 -0.00985974
## sample estimates:
## cor
## -0.03321222
Interpreting the results
There was a very weak, negative correlation between relationship length and preferences for physical attractiveness (r(7038) = -.03; p < 0.05).
My answer to the research question
The research question was whether the biases that are associated with physical attraction, those biases that drive the power of physical appearances, last? Does a relationship between preferences for physical attractiveness and relationship length exist?
The results provide mixed evidence.
I hypothesised that individuals who prioritised physical attractiveness more (ie. have higher ratings) would be more likely to hold positive illusions of their partner and therefore, would likely have longer relationship lengths relative to individuals with lower ratings for physical attractiveness.
The independent samples t-test found that on average, individuals with higher preference ratings for physical attractiveness had shorter relationship lengths than individuals with low preference ratings. However, the Pearson’s product-moment correlation found a very weak, negative correlation between relationship length and preferences for physical attractiveness.
Quite the opposite of my hypothesis, really. I’m not too sure what conclusions to make based on the findings- it’s very interesting that there is a statistically significant difference but a weak correlationship!
Also, I just wanted to say thank you for all of your help so far! I really appreciate it
emo::ji("smile")
## 😄