Gender and Dating

Gender Difference in Dating Frequecies

GUAN LE S3655443

Last updated: 28 October, 2018

Introduction

Shakespeare said: " Journey’s end in love’s meeting. "

So are you ready for a romantic date? (image[1])

So are you ready for a romantic date? (image[1])

Do you think there is a difference on activeness of go-on-a-date between men and women?

Introduction cont.

This demonstration covers 6 sections:

Problem Statement

Data

Data Cont.

Descriptive Statistics and Visualisation

# import data sample to the working directory
speeddate <- read_csv("Speed Dating Data.csv")
speeddate %>% head()

Decsriptive Statistics Cont.

# convert gender variable type to factor
# values are based the given word document
speeddate$gender <- factor(speeddate$gender, levels = c(0, 1),
                           labels = c("female", "male"))

speeddate$gender %>% head()
## [1] female female female female female female
## Levels: female male
# covert date variable type to factor
# values are based on the given word document
speeddate$date <- factor(speeddate$date, levels = c(1, 2, 3, 4, 5, 6, 7),
                         labels = c("Several times a week", "Twice a week",
                                    "Once a week", "Twice a month",
                                    "Once a month", "Several times a year",
                                    "Almost never"), ordered = TRUE) 
speeddate$date %>% head()
## [1] Almost never Almost never Almost never Almost never Almost never
## [6] Almost never
## 7 Levels: Several times a week < Twice a week < ... < Almost never

Decsriptive Statistics Cont.

levels(speeddate$date)
## [1] "Several times a week" "Twice a week"         "Once a week"         
## [4] "Twice a month"        "Once a month"         "Several times a year"
## [7] "Almost never"
# select gender and date variables for this investigation
speeddate1 <- speeddate %>% dplyr::select(gender, date)

speeddate1 %>% head()

Decsriptive Statistics Cont.

# identify missing values
colSums(is.na(speeddate1))
## gender   date 
##      0     97
# handling missing values
speeddate1$date <- impute(speeddate1$date, fun= mode)

# verify if missing values have been successfully imputed
colSums(is.na(speeddate1))
## gender   date 
##      0      0
# assign table to tb1 and get ready for barplot in the next slide
tb1 <- table(speeddate1$date, speeddate1$gender) %>% prop.table(margin = 2) * 100

Decsriptive Statistics Cont.

barplot(tb1, ylab="Proportion of going on a date %", ylim=c(0, 70), 
        xlab = "How Actively One Go On A Date",
        legend=rownames(tb1), beside=TRUE,
        args.legend=c(x = "topright", horiz=FALSE, title="Date Frequency Category"),
        col = c("black","grey","darkgoldenrod2","pink","purple","light blue","light green"))

Decsriptive Statistics Cont.

knitr::kable(tb1)
female male
Several times a week 1.099426 1.144492
Twice a week 3.130975 4.220315
Once a week 7.552581 11.134955
Twice a month 23.900574 24.797330
Once a month 15.439771 21.030043
Several times a year 28.441683 23.867430
Almost never 20.434990 13.805436

Hypothesis Testing

Hypotheses for the Chi-square test of association:

Assumptions:

# use Chi-sq test method
chi <- chisq.test(table(speeddate1$gender, speeddate1$date))
chi
## 
##  Pearson's Chi-squared test
## 
## data:  table(speeddate1$gender, speeddate1$date)
## X-squared = 142.68, df = 6, p-value < 2.2e-16

Hypthesis Testing Cont.

# The observed values
chi$observed
##         
##          Several times a week Twice a week Once a week Twice a month
##   female                   46          131         316          1000
##   male                     48          177         467          1040
##         
##          Once a month Several times a year Almost never
##   female          646                 1190          855
##   male            882                 1001          579
# the expected values
chi$expected
##         
##          Several times a week Twice a week Once a week Twice a month
##   female              46.9439     153.8162    391.0327      1018.783
##   male                47.0561     154.1838    391.9673      1021.217
##         
##          Once a month Several times a year Almost never
##   female     763.0881             1094.192     716.1442
##   male       764.9119             1096.808     717.8558

Hypthesis Testing Cont.

Chi-square test of association summary:

Decision: Reject Ho

Discussion

According to the Chi-squre test, there is a statistically significant association between gender and date frequency.

Based on the barplot, generally speaking, male are more likely to be active to go on a date than female do while if stretch the time period to yearly, female may be willing to go on a date more often than male.

Furthermore, women are more likely to be in a zone of ‘almost never’ go on a date than men.

It’s time for all ladies to stand up and get ready for your next romantic coincidence !!!!

References

[1] image: https://mysevendevils.wordpress.com/2014/06/04/5-types-of-dates-people-go-on/

[2] insert image: https://stackoverflow.com/questions/25166624/insert-picture-table-in-r-markdown

[3] Data Preprocessing Lecture Notes

[4] dataset from kaggle: https://www.kaggle.com/annavictoria/speed-dating-experiment