My goals for this week

  1. To finish the mean and SD’s table for mate preferences (continued from last week).

  2. To finish my first question of the exploratory analysis section.

How did I go?

Goal 1: creating a table

Loading the relevant packages and the data set

library(knitr)
library(ggplot2)
library(tidyr)
library(ggeasy)
library(qwraps2)
library(tidyverse)
library(janitor)
library(kableExtra)
library(Hmisc)

sdmp <- read_csv("ReplicationProcessedfinaldata04202018.csv")  

Viewing the data

glimpse(sdmp)
## Rows: 14,399
## Columns: 35
## $ PIN                <dbl> 12506, 1997, 1448, 10625, 6106, 4078, 3034, 5281, 1…
## $ CIN                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ continent          <chr> "Africa", "Africa", "Africa", "Africa", "Africa", "…
## $ country            <chr> "Algeria", "Algeria", "Algeria", "Algeria", "Algeri…
## $ city               <chr> "Algiers", "Algiers", "Setif", "Setif", "Algiers", …
## $ countrycode        <dbl> 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,…
## $ partnum            <chr> "85", "72", "277", "229", "23", "82", "86", "135", …
## $ partcode           <chr> "A85", "A72", "SB277", "S229", "A23", "A82", "A86",…
## $ sample             <dbl> 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 1, …
## $ sex                <dbl> 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ age                <dbl> 21, 22, 47, 20, 23, 20, 22, 27, 19, 19, 19, 28, 22,…
## $ religious          <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ religion           <chr> "islam", "islam", "Islam", "Islam", "islam", "islam…
## $ relstat            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ relstat2           <dbl> 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 3, …
## $ rellength          <dbl> 28, NA, 307, NA, NA, 14, 14, 9, NA, NA, NA, 14, NA,…
## $ ideal_intelligence <dbl> 3, 7, 5, 4, 7, 6, 6, 7, 5, 5, 4, 6, 6, 7, 6, 7, 4, …
## $ ideal_kindness     <dbl> 7, 7, 5, 7, 7, 7, 7, 7, 7, 5, 7, 6, 7, 7, 7, 7, 5, …
## $ ideal_health       <dbl> 6, 7, 5, 7, 7, 7, 7, 7, 5, 7, 7, 6, 7, 7, 6, 4, 5, …
## $ ideal_physatt      <dbl> 4, 7, 5, 7, 7, 7, 7, 6, 7, 5, 5, 6, 6, 7, 6, 2, 6, …
## $ ideal_resources    <dbl> 1, 6, 5, 4, 7, 7, 7, 6, 4, 5, 4, 6, 5, 7, 7, 1, 4, …
## $ mate_age           <dbl> 16, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 19, 19,…
## $ popsize            <dbl> 39667, 39667, 39667, 39667, 39667, 39667, 39667, 39…
## $ country_religion   <chr> "Muslim", "Muslim", "Muslim", "Muslim", "Muslim", "…
## $ lattitude          <dbl> 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,…
## $ gem1995            <dbl> 0.266, 0.266, 0.266, 0.266, 0.266, 0.266, 0.266, 0.…
## $ gdi1995            <dbl> 0.508, 0.508, 0.508, 0.508, 0.508, 0.508, 0.508, 0.…
## $ gii                <dbl> 0.429, 0.429, 0.429, 0.429, 0.429, 0.429, 0.429, 0.…
## $ gdi2015            <dbl> 0.854, 0.854, 0.854, 0.854, 0.854, 0.854, 0.854, 0.…
## $ gggi               <dbl> 0.642, 0.642, 0.642, 0.642, 0.642, 0.642, 0.642, 0.…
## $ gdp_percap         <dbl> 15100, 15100, 15100, 15100, 15100, 15100, 15100, 15…
## $ infect_death       <dbl> 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7.8, 7…
## $ infect_yll         <dbl> 406.6, 406.6, 406.6, 406.6, 406.6, 406.6, 406.6, 40…
## $ cmc_yll            <dbl> 2039.5, 2039.5, 2039.5, 2039.5, 2039.5, 2039.5, 203…
## $ gb_path            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Recoding the sex variable

recode_sdmp <- sdmp %>% mutate(sex=recode(sex, 
                         `0`="female",                   # old name = new name
                         `1`="male")) 

Converting sex into a factor

recode_sdmp$sex <- as.factor(recode_sdmp$sex)

Creating the table

one_summary <-
   list("Ideal Intelligence" =
         list("mean" = ~ mean(ideal_intelligence, na.rm = TRUE),
               "SD" = ~ sd(ideal_intelligence, na.rm = TRUE)),
        "Ideal Kindness" =
         list("mean" = ~ mean(ideal_kindness, na.rm = TRUE),
              "SD" = ~ sd(ideal_kindness, na.rm = TRUE)),
        "Ideal Resources" =
          list("mean" = ~ mean(ideal_resources, na.rm = TRUE),
               "SD" = ~ sd(ideal_resources, na.rm = TRUE)),
        "Ideal Health" =
          list("mean" = ~ mean(ideal_health, na.rm = TRUE),
               "SD" = ~ sd(ideal_health, na.rm = TRUE)),
        "Ideal Physical Attractiveness" =
          list("mean" = ~ mean(ideal_physatt, na.rm = TRUE),
               "SD" = ~ sd(ideal_physatt, na.rm = TRUE))) 
 
 orig_opt <- options()$qwraps2_markup
 options(qwraps2_markup = "markdown")
 
 grouped_by_table <-
   summary_table(recode_sdmp, one_summary, by = "sex")
 grouped_by_table
female (N = 7909) male (N = 6490)
Ideal Intelligence      
   mean 6.0282815472416 5.91725841309046
   SD 0.932439587231402 1.00927079103205
Ideal Kindness      
   mean 6.23471975653056 6.12322201607916
   SD 0.97985632548865 1.02049449178335
Ideal Resources      
   mean 5.48070866141732 5.10598933835058
   SD 1.13956885008914 1.25039439958138
Ideal Health      
   mean 6.09952968094572 6.00201082753287
   SD 1.03037199371806 1.09602213495145
Ideal Physical Attractiveness      
   mean 5.5562409765061 5.85487634157723
   SD 1.11345448030067 1.10209957082629

Good news- I was able to make it much better looking, in terms of having seperate columns and headings.

Bad news- I’m unable to round it to two decimal places! I’ve tried every trick in the book and at this point, I’m not so sure that the round function (or any alternative method) is compatible with the summary_table() from the qwraps2 package.

Goal 2: exploratory question 1

Okay now on to my exploratory stuff.

Question 1: As you get older, do you prioritise qualities that reflect similarity and companionship (ie. kindness, intelligence), as opposed to more superficial things (ie. physical attractiveness)?

In the paper, the authors noted that participants completed a 5-item questionnaire on ideal mate preferences for a long-term romantic partner.

It might be a silly question but I am curious if differences exist between young people and older people in terms of the relative weight and importance they place on qualities they seek in a long-term romantic partner. I’ve heard my parents say variations of the phrase “beauty fades” etcetera, and it’s always led to me to believe that as you get older, you develop greater maturity and start to value less superficial qualities in place of more redeeming and character-based qualities.

However, the findings of the study somewhat contradict that notion, as it was found that men have higher preferences for physical attractiveness relative to women, and they increasingly reported younger mate age preferences relative to their own age. This somewhat presents the idea that men tend to value more superficial qualities throughout their lifetime.

I just really quickly want to say that I don’t believe all men are superficial or incapable of valuing traits such as kindness etc. I’m referring to “men” in a similar sense that the authors did, in terms of them as a collective- I hope that makes sense!

In terms of how I intend to examine this, I think grouping men and women by their age (ie. young, middle-aged, or old) is a good starting point. From there, I can see the differences between rated preferences and how it may differ based on the participants age.

I’ll mainly be looking at preferences for kindness and intelligence vs physical attractiveness, as the former two delve a bit more deeply into one’s character. Preferences for ideal health is kind of a baseline, I think, as everyone would hope to live a long life with their partner. I also won’t include financial prospects because it’s kind of irrelevant to my research question.

Let’s get cracking!

I can continue to work off of ‘the recode_sdmp’ data set, as I’ve already used the recode() function to relabel the variable values. Now I just have to create seperate age groups.

Now I’m going on to “clean” the data set and only select those variables of interest.

clean_data <- recode_sdmp %>% 
  select(age, sex, starts_with("ideal_")) %>% 
  na.omit()

Now seeing the new dataframe I created

glimpse(clean_data)
## Rows: 13,898
## Columns: 7
## $ age                <dbl> 21, 22, 47, 20, 23, 20, 22, 27, 19, 19, 19, 28, 22,…
## $ sex                <fct> male, male, female, male, male, male, male, female,…
## $ ideal_intelligence <dbl> 3, 7, 5, 4, 7, 6, 6, 7, 5, 5, 4, 6, 6, 7, 6, 7, 4, …
## $ ideal_kindness     <dbl> 7, 7, 5, 7, 7, 7, 7, 7, 7, 5, 7, 6, 7, 7, 7, 7, 5, …
## $ ideal_health       <dbl> 6, 7, 5, 7, 7, 7, 7, 7, 5, 7, 7, 6, 7, 7, 6, 4, 5, …
## $ ideal_physatt      <dbl> 4, 7, 5, 7, 7, 7, 7, 6, 7, 5, 5, 6, 6, 7, 6, 2, 6, …
## $ ideal_resources    <dbl> 1, 6, 5, 4, 7, 7, 7, 6, 4, 5, 4, 6, 5, 7, 7, 1, 4, …

Now creating new groups based on participant age.

data <- clean_data %>% 
  mutate(
    age_group = dplyr::case_when(
      age >= 18 & age <= 34 ~ "18-34",
      age >= 35 & age <= 54 ~ "35-54",
      age >= 55             ~ "> 55"
    ),
    age_group = factor(
      age_group,
      level = c("18-34","35-54", "> 55")))

Viewing the data

glimpse(data)
## Rows: 13,898
## Columns: 8
## $ age                <dbl> 21, 22, 47, 20, 23, 20, 22, 27, 19, 19, 19, 28, 22,…
## $ sex                <fct> male, male, female, male, male, male, male, female,…
## $ ideal_intelligence <dbl> 3, 7, 5, 4, 7, 6, 6, 7, 5, 5, 4, 6, 6, 7, 6, 7, 4, …
## $ ideal_kindness     <dbl> 7, 7, 5, 7, 7, 7, 7, 7, 7, 5, 7, 6, 7, 7, 7, 7, 5, …
## $ ideal_health       <dbl> 6, 7, 5, 7, 7, 7, 7, 7, 5, 7, 7, 6, 7, 7, 6, 4, 5, …
## $ ideal_physatt      <dbl> 4, 7, 5, 7, 7, 7, 7, 6, 7, 5, 5, 6, 6, 7, 6, 2, 6, …
## $ ideal_resources    <dbl> 1, 6, 5, 4, 7, 7, 7, 6, 4, 5, 4, 6, 5, 7, 7, 1, 4, …
## $ age_group          <fct> 18-34, 18-34, 35-54, 18-34, 18-34, 18-34, 18-34, 18…

Just looking at the tibble produced, I notice that a lot of participants seem to belong to the ‘young’ age group. In order to make it fair, I think it’s worth seeing how many participants belong to the young, middle-aged and old groups.

data %>% 
  group_by(age_group) %>% 
  summarise(count = n())
## # A tibble: 3 x 2
##   age_group count
##   <fct>     <int>
## 1 18-34     10568
## 2 35-54      2999
## 3 > 55        331

I was right, there is significantly more young respondents than middle aged and old combined. I guess this is an issue however, it’s good that I’m aware of the differences so I know which age groups are getting more representation and I can factor that in when analysing the results.

Calculating the mean, SD, and SE for physical attractiveness

table_physatt <- data %>% 
  group_by(sex, age_group) %>% 
  summarise(
    n = n(),
    mean = round(mean(ideal_physatt), 2),
    sd = round(sd(ideal_physatt), 2),
    se = round(sd/sqrt(n), 2))
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
table_physatt %>% 
  kbl() %>% 
  kable_material("hover")
sex age_group n mean sd se
female 18-34 5772 5.56 1.10 0.01
female 35-54 1628 5.57 1.16 0.03
female > 55 156 5.36 1.12 0.09
male 18-34 4796 5.86 1.08 0.02
male 35-54 1371 5.84 1.09 0.03
male > 55 175 5.91 1.21 0.09

Middle aged females have the highest mean preferences for physical attractiveness, however they also have the largest deviation. Young females have the second highest mean preferences, followed by older females.

In contrast, older males have the highest mean preferences for physical attractiveness and the greatest deviation. Young males have the second highest mean preferences, followed by middle aged males.

Now onto the more character-based qualities…

Calculating average rating and standard deviation for kindness

table_physatt <- data %>% 
  group_by(sex, age_group) %>% 
  summarise(
    mean = round(mean(ideal_kindness), 2),
    sd = round(sd(ideal_physatt), 2)
  )
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
table_physatt %>% 
  kbl() %>% 
  kable_material("hover")
sex age_group mean sd
female 18-34 6.21 1.10
female 35-54 6.31 1.16
female > 55 6.35 1.12
male 18-34 6.09 1.08
male 35-54 6.23 1.09
male > 55 6.27 1.21

Older females had the highest mean preference for kindness, followed by middle-aged females. Young females have the lowest preference for kindness and the lowest deviation.

Older males have the highest mean preference for kindness and the greatest deviation. Middle aged males have the second highest mean preferences, followed by young males.

In general, there isn’t much a difference between the mean scores for males and females of all age groups. Mean preferences for females tend to cluster around 6.29, whereas for males, it tends to cluster around 6.20.

Calculating average rating and standard deviation for intelligence

table_physatt <- data %>% 
  group_by(sex, age_group) %>% 
  summarise(
    mean = round(mean(ideal_intelligence), 2),
    sd = round(sd(ideal_physatt), 2)
  )
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
table_physatt %>% 
  kbl() %>% 
  kable_material("hover")
sex age_group mean sd
female 18-34 6.01 1.10
female 35-54 6.11 1.16
female > 55 6.13 1.12
male 18-34 5.89 1.08
male 35-54 6.00 1.09
male > 55 5.95 1.21

Older females have the highest mean preference for intelligence, followed by middle aged females, and then young females.

Middle aged males have the highest mean preference for intelligence, followed by older males, and then young males.

I’m now creating a larger table of this so it’s easier to view and compare the results!

one_summary <-
  list("Ideal Intelligence" =
         list("mean" = ~ mean(ideal_intelligence),
               "SD" = ~ sd(ideal_intelligence)),
        "Ideal Kindness" =
         list("mean" = ~ mean(ideal_kindness),
              "SD" = ~ sd(ideal_kindness)),
        "Ideal Physical Attractiveness" =
          list("mean" = ~ mean(ideal_physatt),
               "SD" = ~ sd(ideal_physatt))) 
 
 orig_opt <- options()$qwraps2_markup
 options(qwraps2_markup = "markdown")
 
 grouped_by_table <-
   summary_table(data, summaries = one_summary, by = c("sex", "age_group"))
 grouped_by_table
female.18-34 (N = 5772) male.18-34 (N = 4796) female.35-54 (N = 1628) male.35-54 (N = 1371) female.> 55 (N = 156) male.> 55 (N = 175)
Ideal Intelligence                  
   mean 6.00710325710326 5.8942869057548 6.11179361179361 6.00145878920496 6.13461538461539 5.95428571428571
   SD 0.926394266089267 1.00108423611776 0.940665512182957 0.997074949200073 0.937302501943147 1.02731176286855
Ideal Kindness                  
   mean 6.20963270963271 6.09007506255213 6.30528255528256 6.23121808898614 6.34615384615385 6.27428571428571
   SD 0.981755631934942 1.02309999113568 0.980220667435759 0.965727994298508 0.968229819118515 0.985044154778714
Ideal Physical Attractiveness                  
   mean 5.55751905751906 5.86447039199333 5.57371007371007 5.83807439824945 5.35897435897436 5.91428571428571
   SD 1.09597518846996 1.08395307845261 1.15863843304716 1.08944924021985 1.11845929713079 1.21227980750786

Now let’s try plotting (basic) boxplots! I want to see if I can get the skeleton right first, before I add all the pretty aesthetic features.

Preferences for Kindness

p1 <- ggplot(data = data, mapping = aes(
  x = age_group, 
  y = ideal_kindness, 
  fill = sex)) + 
    geom_boxplot(ylim = range(0:10)) +
  stat_summary(fun = mean,
        geom = "point",
        size = 2,
        color = "steelblue") +
    theme_classic()+
    facet_wrap(~sex)+
  labs(title = "Preferences for Ideal Kindness", x = "Age Group", y = "")+
  coord_cartesian(ylim = c(0, 10))
## Warning: Ignoring unknown parameters: ylim
print(p1)

That doesn’t look right at all. Granted, it’s fairly simple code but still. I’ll try with another ideal preference variable.

Preferences for Physical Attractiveness

p1 <- ggplot(data = data, mapping = aes(
  x = age_group, 
  y = ideal_physatt, 
  fill = sex)) + 
    geom_boxplot(ylim = range(0:10)) +
  stat_summary(fun = mean,
        geom = "point",
        size = 2,
        color = "steelblue") +
    theme_classic()+
    facet_wrap(~sex)+
  labs(title = "Preferences for Ideal Physical Attractiveness", x = "Age Group", y = "")+
  coord_cartesian(ylim = c(0, 10))
## Warning: Ignoring unknown parameters: ylim
print(p1)

Okay, it happened again. Maybe there’s something wrong with the data? or the plot doesn’t fit with the data? Not sure.

Just in case, I’ll try another another form of plot. From what I know, there are a few options when you have a categorical and a numeric variable. One of the ones that was mentioned online was a violin plot. I’ve never created one before but I might try.

I’m not quite sure how to interpret them. After searching online, I found this helpful resource online that compactly summarises violin plots.

“Violin plots are used when you want to observe the distribution of numeric data, and are especially useful when you want to make a comparison of distributions between multiple groups. The peaks, valleys, and tails of each group’s density curve can be compared to see where groups are similar or different.”

Now I’m working on a basic plot just to see if it works.

Ideal Physical Attractiveness

p2 <- ggplot(data, aes(x = age_group, y = ideal_physatt, fill = sex)) + 
    geom_violin() +
    facet_wrap(~sex)+
  theme_classic()+
  labs(title = "Preferences for Ideal Physical Attractiveness", x = "Age Group", y = "")

print(p2)

Ideal Kindness

p1 <- ggplot(data, mapping = aes(x = age_group, y = ideal_kindness, fill = sex)) + 
    geom_violin()+
    facet_wrap(~sex)+
  theme_classic()+
  labs(title = "Preferences for Ideal Kindness", x = "Age Group", y = "")

print(p1)

Ideal Intelligence

p3 <- ggplot(data, aes(x = age_group, y = ideal_intelligence, fill = sex)) + 
    geom_violin() +
    facet_wrap(~sex)+
  theme_classic()+
  labs(title = "Preferences for Ideal Intelligence", x = "Age Group", y = "")

print(p3)

Challenges/Questions I have

  • Is there a way to use the round() function within the table?

  • Do our answers to the exploratory questions have to be straight forward? ie. yes/no.

Strengths

  • I was able to get a lot done so far and I think it comes from excitement to work with the data and try and find answers to questions I have. So I’m hoping I can keep that up :)

  • I’ve been quite good with searching for answers and documenting my progress. I’m grateful that the earlier part of the verification task was group work because a lot of what I’ve learnt has come from working with others- so I’m grateful for that.

Next Steps

  • Finish up question 1.

  • Work on my second question: Are female preference ratings for their ideal mate’s financial prospects influenced/mediated by gender equality?