To finish the mean and SD’s table for mate preferences (continued from last week).
To finish my first question of the exploratory analysis section.
Loading the relevant packages and the data set
library(knitr)
library(ggplot2)
library(tidyr)
library(ggeasy)
library(qwraps2)
library(tidyverse)
library(janitor)
library(kableExtra)
library(Hmisc)
sdmp <- read_csv("ReplicationProcessedfinaldata04202018.csv")
Viewing the data
glimpse(sdmp)
## Rows: 14,399
## Columns: 35
## $ PIN <dbl> 12506, 1997, 1448, 10625, 6106, 4078, 3034, 5281, 1…
## $ CIN <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ continent <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "…
## $ country <chr> "Algeria", "Algeria", "Algeria", "Algeria", "Algeri…
## $ city <chr> "Algiers", "Algiers", "Setif", "Setif", "Algiers", …
## $ countrycode <dbl> 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,…
## $ partnum <chr> "85", "72", "277", "229", "23", "82", "86", "135", …
## $ partcode <chr> "A85", "A72", "SB277", "S229", "A23", "A82", "A86",…
## $ sample <dbl> 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 1, …
## $ sex <dbl> 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ age <dbl> 21, 22, 47, 20, 23, 20, 22, 27, 19, 19, 19, 28, 22,…
## $ religious <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ religion <chr> "islam", "islam", "Islam", "Islam", "islam", "islam…
## $ relstat <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ relstat2 <dbl> 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 3, …
## $ rellength <dbl> 28, NA, 307, NA, NA, 14, 14, 9, NA, NA, NA, 14, NA,…
## $ ideal_intelligence <dbl> 3, 7, 5, 4, 7, 6, 6, 7, 5, 5, 4, 6, 6, 7, 6, 7, 4, …
## $ ideal_kindness <dbl> 7, 7, 5, 7, 7, 7, 7, 7, 7, 5, 7, 6, 7, 7, 7, 7, 5, …
## $ ideal_health <dbl> 6, 7, 5, 7, 7, 7, 7, 7, 5, 7, 7, 6, 7, 7, 6, 4, 5, …
## $ ideal_physatt <dbl> 4, 7, 5, 7, 7, 7, 7, 6, 7, 5, 5, 6, 6, 7, 6, 2, 6, …
## $ ideal_resources <dbl> 1, 6, 5, 4, 7, 7, 7, 6, 4, 5, 4, 6, 5, 7, 7, 1, 4, …
## $ mate_age <dbl> 16, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 19, 19,…
## $ popsize <dbl> 39667, 39667, 39667, 39667, 39667, 39667, 39667, 39…
## $ country_religion <chr> "Muslim", "Muslim", "Muslim", "Muslim", "Muslim", "…
## $ lattitude <dbl> 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,…
## $ gem1995 <dbl> 0.266, 0.266, 0.266, 0.266, 0.266, 0.266, 0.266, 0.…
## $ gdi1995 <dbl> 0.508, 0.508, 0.508, 0.508, 0.508, 0.508, 0.508, 0.…
## $ gii <dbl> 0.429, 0.429, 0.429, 0.429, 0.429, 0.429, 0.429, 0.…
## $ gdi2015 <dbl> 0.854, 0.854, 0.854, 0.854, 0.854, 0.854, 0.854, 0.…
## $ gggi <dbl> 0.642, 0.642, 0.642, 0.642, 0.642, 0.642, 0.642, 0.…
## $ gdp_percap <dbl> 15100, 15100, 15100, 15100, 15100, 15100, 15100, 15…
## $ infect_death <dbl> 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7…
## $ infect_yll <dbl> 406.6, 406.6, 406.6, 406.6, 406.6, 406.6, 406.6, 40…
## $ cmc_yll <dbl> 2039.5, 2039.5, 2039.5, 2039.5, 2039.5, 2039.5, 203…
## $ gb_path <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Recoding the sex variable
recode_sdmp <- sdmp %>% mutate(sex=recode(sex,
`0`="female", # old name = new name
`1`="male"))
Converting sex into a factor
recode_sdmp$sex <- as.factor(recode_sdmp$sex)
Creating the table
one_summary <-
list("Ideal Intelligence" =
list("mean" = ~ mean(ideal_intelligence, na.rm = TRUE),
"SD" = ~ sd(ideal_intelligence, na.rm = TRUE)),
"Ideal Kindness" =
list("mean" = ~ mean(ideal_kindness, na.rm = TRUE),
"SD" = ~ sd(ideal_kindness, na.rm = TRUE)),
"Ideal Resources" =
list("mean" = ~ mean(ideal_resources, na.rm = TRUE),
"SD" = ~ sd(ideal_resources, na.rm = TRUE)),
"Ideal Health" =
list("mean" = ~ mean(ideal_health, na.rm = TRUE),
"SD" = ~ sd(ideal_health, na.rm = TRUE)),
"Ideal Physical Attractiveness" =
list("mean" = ~ mean(ideal_physatt, na.rm = TRUE),
"SD" = ~ sd(ideal_physatt, na.rm = TRUE)))
orig_opt <- options()$qwraps2_markup
options(qwraps2_markup = "markdown")
grouped_by_table <-
summary_table(recode_sdmp, one_summary, by = "sex")
grouped_by_table
| female (N = 7909) | male (N = 6490) | |
|---|---|---|
| Ideal Intelligence | ||
| mean | 6.0282815472416 | 5.91725841309046 |
| SD | 0.932439587231402 | 1.00927079103205 |
| Ideal Kindness | ||
| mean | 6.23471975653056 | 6.12322201607916 |
| SD | 0.97985632548865 | 1.02049449178335 |
| Ideal Resources | ||
| mean | 5.48070866141732 | 5.10598933835058 |
| SD | 1.13956885008914 | 1.25039439958138 |
| Ideal Health | ||
| mean | 6.09952968094572 | 6.00201082753287 |
| SD | 1.03037199371806 | 1.09602213495145 |
| Ideal Physical Attractiveness | ||
| mean | 5.5562409765061 | 5.85487634157723 |
| SD | 1.11345448030067 | 1.10209957082629 |
Good news- I was able to make it much better looking, in terms of having seperate columns and headings.
Bad news- I’m unable to round it to two decimal places! I’ve tried every trick in the book and at this point, I’m not so sure that the round function (or any alternative method) is compatible with the summary_table() from the qwraps2 package.
Okay now on to my exploratory stuff.
Question 1: As you get older, do you prioritise qualities that reflect similarity and companionship (ie. kindness, intelligence), as opposed to more superficial things (ie. physical attractiveness)?
In the paper, the authors noted that participants completed a 5-item questionnaire on ideal mate preferences for a long-term romantic partner.
It might be a silly question but I am curious if differences exist between young people and older people in terms of the relative weight and importance they place on qualities they seek in a long-term romantic partner. I’ve heard my parents say variations of the phrase “beauty fades” etcetera, and it’s always led to me to believe that as you get older, you develop greater maturity and start to value less superficial qualities in place of more redeeming and character-based qualities.
However, the findings of the study somewhat contradict that notion, as it was found that men have higher preferences for physical attractiveness relative to women, and they increasingly reported younger mate age preferences relative to their own age. This somewhat presents the idea that men tend to value more superficial qualities throughout their lifetime.
I just really quickly want to say that I don’t believe all men are superficial or incapable of valuing traits such as kindness etc. I’m referring to “men” in a similar sense that the authors did, in terms of them as a collective- I hope that makes sense!
In terms of how I intend to examine this, I think grouping men and women by their age (ie. young, middle-aged, or old) is a good starting point. From there, I can see the differences between rated preferences and how it may differ based on the participants age.
I’ll mainly be looking at preferences for kindness and intelligence vs physical attractiveness, as the former two delve a bit more deeply into one’s character. Preferences for ideal health is kind of a baseline, I think, as everyone would hope to live a long life with their partner. I also won’t include financial prospects because it’s kind of irrelevant to my research question.
Let’s get cracking!
I can continue to work off of ‘the recode_sdmp’ data set, as I’ve already used the recode() function to relabel the variable values. Now I just have to create seperate age groups.
Now I’m going on to “clean” the data set and only select those variables of interest.
clean_data <- recode_sdmp %>%
select(age, sex, starts_with("ideal_")) %>%
na.omit()
Now seeing the new dataframe I created
glimpse(clean_data)
## Rows: 13,898
## Columns: 7
## $ age <dbl> 21, 22, 47, 20, 23, 20, 22, 27, 19, 19, 19, 28, 22,…
## $ sex <fct> male, male, female, male, male, male, male, female,…
## $ ideal_intelligence <dbl> 3, 7, 5, 4, 7, 6, 6, 7, 5, 5, 4, 6, 6, 7, 6, 7, 4, …
## $ ideal_kindness <dbl> 7, 7, 5, 7, 7, 7, 7, 7, 7, 5, 7, 6, 7, 7, 7, 7, 5, …
## $ ideal_health <dbl> 6, 7, 5, 7, 7, 7, 7, 7, 5, 7, 7, 6, 7, 7, 6, 4, 5, …
## $ ideal_physatt <dbl> 4, 7, 5, 7, 7, 7, 7, 6, 7, 5, 5, 6, 6, 7, 6, 2, 6, …
## $ ideal_resources <dbl> 1, 6, 5, 4, 7, 7, 7, 6, 4, 5, 4, 6, 5, 7, 7, 1, 4, …
Now creating new groups based on participant age.
data <- clean_data %>%
mutate(
age_group = dplyr::case_when(
age >= 18 & age <= 34 ~ "18-34",
age >= 35 & age <= 54 ~ "35-54",
age >= 55 ~ "> 55"
),
age_group = factor(
age_group,
level = c("18-34","35-54", "> 55")))
Viewing the data
glimpse(data)
## Rows: 13,898
## Columns: 8
## $ age <dbl> 21, 22, 47, 20, 23, 20, 22, 27, 19, 19, 19, 28, 22,…
## $ sex <fct> male, male, female, male, male, male, male, female,…
## $ ideal_intelligence <dbl> 3, 7, 5, 4, 7, 6, 6, 7, 5, 5, 4, 6, 6, 7, 6, 7, 4, …
## $ ideal_kindness <dbl> 7, 7, 5, 7, 7, 7, 7, 7, 7, 5, 7, 6, 7, 7, 7, 7, 5, …
## $ ideal_health <dbl> 6, 7, 5, 7, 7, 7, 7, 7, 5, 7, 7, 6, 7, 7, 6, 4, 5, …
## $ ideal_physatt <dbl> 4, 7, 5, 7, 7, 7, 7, 6, 7, 5, 5, 6, 6, 7, 6, 2, 6, …
## $ ideal_resources <dbl> 1, 6, 5, 4, 7, 7, 7, 6, 4, 5, 4, 6, 5, 7, 7, 1, 4, …
## $ age_group <fct> 18-34, 18-34, 35-54, 18-34, 18-34, 18-34, 18-34, 18…
Just looking at the tibble produced, I notice that a lot of participants seem to belong to the ‘young’ age group. In order to make it fair, I think it’s worth seeing how many participants belong to the young, middle-aged and old groups.
data %>%
group_by(age_group) %>%
summarise(count = n())
## # A tibble: 3 x 2
## age_group count
## <fct> <int>
## 1 18-34 10568
## 2 35-54 2999
## 3 > 55 331
I was right, there is significantly more young respondents than middle aged and old combined. I guess this is an issue however, it’s good that I’m aware of the differences so I know which age groups are getting more representation and I can factor that in when analysing the results.
Calculating the mean, SD, and SE for physical attractiveness
table_physatt <- data %>%
group_by(sex, age_group) %>%
summarise(
n = n(),
mean = round(mean(ideal_physatt), 2),
sd = round(sd(ideal_physatt), 2),
se = round(sd/sqrt(n), 2))
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
table_physatt %>%
kbl() %>%
kable_material("hover")
| sex | age_group | n | mean | sd | se |
|---|---|---|---|---|---|
| female | 18-34 | 5772 | 5.56 | 1.10 | 0.01 |
| female | 35-54 | 1628 | 5.57 | 1.16 | 0.03 |
| female | > 55 | 156 | 5.36 | 1.12 | 0.09 |
| male | 18-34 | 4796 | 5.86 | 1.08 | 0.02 |
| male | 35-54 | 1371 | 5.84 | 1.09 | 0.03 |
| male | > 55 | 175 | 5.91 | 1.21 | 0.09 |
Middle aged females have the highest mean preferences for physical attractiveness, however they also have the largest deviation. Young females have the second highest mean preferences, followed by older females.
In contrast, older males have the highest mean preferences for physical attractiveness and the greatest deviation. Young males have the second highest mean preferences, followed by middle aged males.
Now onto the more character-based qualities…
Calculating average rating and standard deviation for kindness
table_physatt <- data %>%
group_by(sex, age_group) %>%
summarise(
mean = round(mean(ideal_kindness), 2),
sd = round(sd(ideal_physatt), 2)
)
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
table_physatt %>%
kbl() %>%
kable_material("hover")
| sex | age_group | mean | sd |
|---|---|---|---|
| female | 18-34 | 6.21 | 1.10 |
| female | 35-54 | 6.31 | 1.16 |
| female | > 55 | 6.35 | 1.12 |
| male | 18-34 | 6.09 | 1.08 |
| male | 35-54 | 6.23 | 1.09 |
| male | > 55 | 6.27 | 1.21 |
Older females had the highest mean preference for kindness, followed by middle-aged females. Young females have the lowest preference for kindness and the lowest deviation.
Older males have the highest mean preference for kindness and the greatest deviation. Middle aged males have the second highest mean preferences, followed by young males.
In general, there isn’t much a difference between the mean scores for males and females of all age groups. Mean preferences for females tend to cluster around 6.29, whereas for males, it tends to cluster around 6.20.
Calculating average rating and standard deviation for intelligence
table_physatt <- data %>%
group_by(sex, age_group) %>%
summarise(
mean = round(mean(ideal_intelligence), 2),
sd = round(sd(ideal_physatt), 2)
)
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
table_physatt %>%
kbl() %>%
kable_material("hover")
| sex | age_group | mean | sd |
|---|---|---|---|
| female | 18-34 | 6.01 | 1.10 |
| female | 35-54 | 6.11 | 1.16 |
| female | > 55 | 6.13 | 1.12 |
| male | 18-34 | 5.89 | 1.08 |
| male | 35-54 | 6.00 | 1.09 |
| male | > 55 | 5.95 | 1.21 |
Older females have the highest mean preference for intelligence, followed by middle aged females, and then young females.
Middle aged males have the highest mean preference for intelligence, followed by older males, and then young males.
I’m now creating a larger table of this so it’s easier to view and compare the results!
one_summary <-
list("Ideal Intelligence" =
list("mean" = ~ mean(ideal_intelligence),
"SD" = ~ sd(ideal_intelligence)),
"Ideal Kindness" =
list("mean" = ~ mean(ideal_kindness),
"SD" = ~ sd(ideal_kindness)),
"Ideal Physical Attractiveness" =
list("mean" = ~ mean(ideal_physatt),
"SD" = ~ sd(ideal_physatt)))
orig_opt <- options()$qwraps2_markup
options(qwraps2_markup = "markdown")
grouped_by_table <-
summary_table(data, summaries = one_summary, by = c("sex", "age_group"))
grouped_by_table
| female.18-34 (N = 5772) | male.18-34 (N = 4796) | female.35-54 (N = 1628) | male.35-54 (N = 1371) | female.> 55 (N = 156) | male.> 55 (N = 175) | |
|---|---|---|---|---|---|---|
| Ideal Intelligence | ||||||
| mean | 6.00710325710326 | 5.8942869057548 | 6.11179361179361 | 6.00145878920496 | 6.13461538461539 | 5.95428571428571 |
| SD | 0.926394266089267 | 1.00108423611776 | 0.940665512182957 | 0.997074949200073 | 0.937302501943147 | 1.02731176286855 |
| Ideal Kindness | ||||||
| mean | 6.20963270963271 | 6.09007506255213 | 6.30528255528256 | 6.23121808898614 | 6.34615384615385 | 6.27428571428571 |
| SD | 0.981755631934942 | 1.02309999113568 | 0.980220667435759 | 0.965727994298508 | 0.968229819118515 | 0.985044154778714 |
| Ideal Physical Attractiveness | ||||||
| mean | 5.55751905751906 | 5.86447039199333 | 5.57371007371007 | 5.83807439824945 | 5.35897435897436 | 5.91428571428571 |
| SD | 1.09597518846996 | 1.08395307845261 | 1.15863843304716 | 1.08944924021985 | 1.11845929713079 | 1.21227980750786 |
Now let’s try plotting (basic) boxplots! I want to see if I can get the skeleton right first, before I add all the pretty aesthetic features.
Preferences for Kindness
p1 <- ggplot(data = data, mapping = aes(
x = age_group,
y = ideal_kindness,
fill = sex)) +
geom_boxplot(ylim = range(0:10)) +
stat_summary(fun = mean,
geom = "point",
size = 2,
color = "steelblue") +
theme_classic()+
facet_wrap(~sex)+
labs(title = "Preferences for Ideal Kindness", x = "Age Group", y = "")+
coord_cartesian(ylim = c(0, 10))
## Warning: Ignoring unknown parameters: ylim
print(p1)
That doesn’t look right at all. Granted, it’s fairly simple code but still. I’ll try with another ideal preference variable.
Preferences for Physical Attractiveness
p1 <- ggplot(data = data, mapping = aes(
x = age_group,
y = ideal_physatt,
fill = sex)) +
geom_boxplot(ylim = range(0:10)) +
stat_summary(fun = mean,
geom = "point",
size = 2,
color = "steelblue") +
theme_classic()+
facet_wrap(~sex)+
labs(title = "Preferences for Ideal Physical Attractiveness", x = "Age Group", y = "")+
coord_cartesian(ylim = c(0, 10))
## Warning: Ignoring unknown parameters: ylim
print(p1)
Okay, it happened again. Maybe there’s something wrong with the data? or the plot doesn’t fit with the data? Not sure.
Just in case, I’ll try another another form of plot. From what I know, there are a few options when you have a categorical and a numeric variable. One of the ones that was mentioned online was a violin plot. I’ve never created one before but I might try.
I’m not quite sure how to interpret them. After searching online, I found this helpful resource online that compactly summarises violin plots.
“Violin plots are used when you want to observe the distribution of numeric data, and are especially useful when you want to make a comparison of distributions between multiple groups. The peaks, valleys, and tails of each group’s density curve can be compared to see where groups are similar or different.”
Now I’m working on a basic plot just to see if it works.
Ideal Physical Attractiveness
p2 <- ggplot(data, aes(x = age_group, y = ideal_physatt, fill = sex)) +
geom_violin() +
facet_wrap(~sex)+
theme_classic()+
labs(title = "Preferences for Ideal Physical Attractiveness", x = "Age Group", y = "")
print(p2)
Ideal Kindness
p1 <- ggplot(data, mapping = aes(x = age_group, y = ideal_kindness, fill = sex)) +
geom_violin()+
facet_wrap(~sex)+
theme_classic()+
labs(title = "Preferences for Ideal Kindness", x = "Age Group", y = "")
print(p1)
Ideal Intelligence
p3 <- ggplot(data, aes(x = age_group, y = ideal_intelligence, fill = sex)) +
geom_violin() +
facet_wrap(~sex)+
theme_classic()+
labs(title = "Preferences for Ideal Intelligence", x = "Age Group", y = "")
print(p3)
Is there a way to use the round() function within the table?
Do our answers to the exploratory questions have to be straight forward? ie. yes/no.
I was able to get a lot done so far and I think it comes from excitement to work with the data and try and find answers to questions I have. So I’m hoping I can keep that up :)
I’ve been quite good with searching for answers and documenting my progress. I’m grateful that the earlier part of the verification task was group work because a lot of what I’ve learnt has come from working with others- so I’m grateful for that.
Finish up question 1.
Work on my second question: Are female preference ratings for their ideal mate’s financial prospects influenced/mediated by gender equality?