library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(knitr)
library(DT)
## [1] "ExM.txt"
data<- read.csv("Fair.csv", header= TRUE, sep=",")
Getting to know the data
Sex is gender, individual age, ym is years married, nbaffairs is the number of affairs, child represents whether has child or not and years of education.
dim(data)
## [1] 601 10
str(data)
## 'data.frame': 601 obs. of 10 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ sex : Factor w/ 2 levels "female","male": 2 1 1 2 2 1 1 2 1 2 ...
## $ age : num 37 27 32 57 22 32 22 57 32 22 ...
## $ ym : num 10 4 15 15 0.75 1.5 0.75 15 15 1.5 ...
## $ child : Factor w/ 2 levels "no","yes": 1 1 2 2 1 1 1 2 2 1 ...
## $ religious : int 3 4 1 5 2 2 2 2 4 4 ...
## $ education : int 18 14 12 18 17 17 12 14 16 14 ...
## $ occupation: int 7 6 1 6 6 5 1 4 1 4 ...
## $ rate : int 4 4 4 5 3 5 3 4 2 5 ...
## $ nbaffairs : int 0 0 0 0 0 0 0 0 0 0 ...
summary(data)
## X sex age ym child
## Min. : 1 female:315 Min. :17.50 Min. : 0.125 no :171
## 1st Qu.:151 male :286 1st Qu.:27.00 1st Qu.: 4.000 yes:430
## Median :301 Median :32.00 Median : 7.000
## Mean :301 Mean :32.49 Mean : 8.178
## 3rd Qu.:451 3rd Qu.:37.00 3rd Qu.:15.000
## Max. :601 Max. :57.00 Max. :15.000
## religious education occupation rate
## Min. :1.000 Min. : 9.00 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:14.00 1st Qu.:3.000 1st Qu.:3.000
## Median :3.000 Median :16.00 Median :5.000 Median :4.000
## Mean :3.116 Mean :16.17 Mean :4.195 Mean :3.932
## 3rd Qu.:4.000 3rd Qu.:18.00 3rd Qu.:6.000 3rd Qu.:5.000
## Max. :5.000 Max. :20.00 Max. :7.000 Max. :5.000
## nbaffairs
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 1.456
## 3rd Qu.: 0.000
## Max. :12.000
head(data)
## X sex age ym child religious education occupation rate nbaffairs
## 1 1 male 37 10.00 no 3 18 7 4 0
## 2 2 female 27 4.00 no 4 14 6 4 0
## 3 3 female 32 15.00 yes 1 12 1 4 0
## 4 4 male 57 15.00 yes 5 18 6 5 0
## 5 5 male 22 0.75 no 2 17 6 3 0
## 6 6 female 32 1.50 no 2 17 5 5 0
datatable(data)
#Number of affairs > 0
t1<-data %>% select(sex, age, nbaffairs) %>% filter(nbaffairs>0) %>% arrange(desc(nbaffairs))
head(t1)
## sex age nbaffairs
## 1 female 32 12
## 2 male 37 12
## 3 female 42 12
## 4 male 37 12
## 5 female 32 12
## 6 male 27 12
The relationship between age and number of affairs
qplot(data=t1, age, geom = "histogram", color=sex, ylab = "number of affairs")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
At first glance, it would seem that men are leading in infidelity overall. We also, notice that infidelity is highest among age groups 20-35 years. There is a greater prescence of womwen than men in this age group.
qplot(data =t1, nbaffairs, geom="density", color=sex)
It is an intresting plot. Let’s investigate further.
t2<-data %>% select(sex, ym, nbaffairs) %>% filter(nbaffairs>0)
datatable(t2)
qplot(data=data, ym, nbaffairs, geom="jitter", color=sex, size= 1, alpha=0.6, xlab="Years of marriage", ylab="Number of affairs")
The scatter plot indicates that around 15 years of marriage, there is a spike of number of affairs greater than 5.
Firstly, meidan affairs by women who cheat
data %>% select(sex, nbaffairs) %>% filter(sex=="female" & nbaffairs>0) %>% summarise(female_median=median(nbaffairs))
## female_median
## 1 7
Then median affairs by men who cheat
data %>% select(sex, nbaffairs) %>% filter(sex=="male" & nbaffairs>0) %>% summarise(male_median=median(nbaffairs))
## male_median
## 1 3
we notice that media number of affairs in men is 3 while that of women is 7 affairs. So women in this instance tend to have more affairs than men.
Percentage of men who cheat
data %>% select(sex, nbaffairs) %>% filter(sex=="male" & nbaffairs>0) %>%summarise(men =n()/286*100)
## men
## 1 27.27273
Percentage of women who cheat
data %>% select(sex, nbaffairs) %>% filter(sex=="female" & nbaffairs>0) %>%summarise(women =n()/315*100)
## women
## 1 22.85714
qplot(data=data, education, nbaffairs, geom="jitter", alpha=0.6, color=sex)
This scatter plot shows that with increasing education both men and womwen tend not to cheat.
t3<-data %>% select(child, nbaffairs)
glimpse(t3)
## Observations: 601
## Variables: 2
## $ child (fctr) no, no, yes, yes, no, no, no, yes, yes, no, yes, ye...
## $ nbaffairs (int) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
qplot(data=t3, nbaffairs, geom="density", color=child)
qplot(data=t3, nbaffairs, color=child)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The plots above would indicate that infidelity is higher when there are children in the marriage.
Our analysis would suggets that 27% of men tend to cheat compared to the 23% of womwen. However, women tend to have higher median number of affairs; 7 compared to the men’s median of 3.
As the years of marriage increase, number of affairs increases especially at years of marriage greater than 5 years.
Lastly, number of affairs is highest within the age groups of 20-35 years and when they are chilrdren in the marriage.