Installing the neccessary libraries

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(knitr)
library(DT)
## [1] "ExM.txt"
data<- read.csv("Fair.csv", header= TRUE, sep=",")

Getting to know the data

Sex is gender, individual age, ym is years married, nbaffairs is the number of affairs, child represents whether has child or not and years of education.

dim(data)
## [1] 601  10
str(data)
## 'data.frame':    601 obs. of  10 variables:
##  $ X         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sex       : Factor w/ 2 levels "female","male": 2 1 1 2 2 1 1 2 1 2 ...
##  $ age       : num  37 27 32 57 22 32 22 57 32 22 ...
##  $ ym        : num  10 4 15 15 0.75 1.5 0.75 15 15 1.5 ...
##  $ child     : Factor w/ 2 levels "no","yes": 1 1 2 2 1 1 1 2 2 1 ...
##  $ religious : int  3 4 1 5 2 2 2 2 4 4 ...
##  $ education : int  18 14 12 18 17 17 12 14 16 14 ...
##  $ occupation: int  7 6 1 6 6 5 1 4 1 4 ...
##  $ rate      : int  4 4 4 5 3 5 3 4 2 5 ...
##  $ nbaffairs : int  0 0 0 0 0 0 0 0 0 0 ...
summary(data)
##        X           sex           age              ym         child    
##  Min.   :  1   female:315   Min.   :17.50   Min.   : 0.125   no :171  
##  1st Qu.:151   male  :286   1st Qu.:27.00   1st Qu.: 4.000   yes:430  
##  Median :301                Median :32.00   Median : 7.000            
##  Mean   :301                Mean   :32.49   Mean   : 8.178            
##  3rd Qu.:451                3rd Qu.:37.00   3rd Qu.:15.000            
##  Max.   :601                Max.   :57.00   Max.   :15.000            
##    religious       education       occupation         rate      
##  Min.   :1.000   Min.   : 9.00   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:14.00   1st Qu.:3.000   1st Qu.:3.000  
##  Median :3.000   Median :16.00   Median :5.000   Median :4.000  
##  Mean   :3.116   Mean   :16.17   Mean   :4.195   Mean   :3.932  
##  3rd Qu.:4.000   3rd Qu.:18.00   3rd Qu.:6.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :20.00   Max.   :7.000   Max.   :5.000  
##    nbaffairs     
##  Min.   : 0.000  
##  1st Qu.: 0.000  
##  Median : 0.000  
##  Mean   : 1.456  
##  3rd Qu.: 0.000  
##  Max.   :12.000
head(data)
##   X    sex age    ym child religious education occupation rate nbaffairs
## 1 1   male  37 10.00    no         3        18          7    4         0
## 2 2 female  27  4.00    no         4        14          6    4         0
## 3 3 female  32 15.00   yes         1        12          1    4         0
## 4 4   male  57 15.00   yes         5        18          6    5         0
## 5 5   male  22  0.75    no         2        17          6    3         0
## 6 6 female  32  1.50    no         2        17          5    5         0
datatable(data)

#Number of affairs > 0

t1<-data %>% select(sex, age, nbaffairs) %>% filter(nbaffairs>0) %>% arrange(desc(nbaffairs))
head(t1)
##      sex age nbaffairs
## 1 female  32        12
## 2   male  37        12
## 3 female  42        12
## 4   male  37        12
## 5 female  32        12
## 6   male  27        12

The relationship between age and number of affairs

qplot(data=t1, age, geom = "histogram", color=sex, ylab = "number of affairs")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

At first glance, it would seem that men are leading in infidelity overall. We also, notice that infidelity is highest among age groups 20-35 years. There is a greater prescence of womwen than men in this age group.

Density plot of number of affairs

qplot(data =t1, nbaffairs, geom="density", color=sex)

It is an intresting plot. Let’s investigate further.

Let’s see how years of marriage affects the number of affairs

t2<-data %>% select(sex, ym, nbaffairs) %>% filter(nbaffairs>0)
datatable(t2)
qplot(data=data, ym, nbaffairs, geom="jitter", color=sex, size= 1, alpha=0.6, xlab="Years of marriage", ylab="Number of affairs")

The scatter plot indicates that around 15 years of marriage, there is a spike of number of affairs greater than 5.

Average number of affairs by gender

Firstly, meidan affairs by women who cheat

data %>% select(sex, nbaffairs) %>% filter(sex=="female" & nbaffairs>0) %>% summarise(female_median=median(nbaffairs))
##   female_median
## 1             7

Then median affairs by men who cheat

data %>% select(sex, nbaffairs) %>% filter(sex=="male" & nbaffairs>0) %>% summarise(male_median=median(nbaffairs))
##   male_median
## 1           3

we notice that media number of affairs in men is 3 while that of women is 7 affairs. So women in this instance tend to have more affairs than men.

% cheaters

Percentage of men who cheat

data %>% select(sex, nbaffairs) %>% filter(sex=="male" & nbaffairs>0) %>%summarise(men =n()/286*100)
##        men
## 1 27.27273

Percentage of women who cheat

data %>% select(sex, nbaffairs) %>% filter(sex=="female" & nbaffairs>0) %>%summarise(women =n()/315*100)
##      women
## 1 22.85714

Let’s now investigate how level of education affects number of affairs

qplot(data=data, education, nbaffairs, geom="jitter", alpha=0.6, color=sex)

This scatter plot shows that with increasing education both men and womwen tend not to cheat.

Let’s investigate how children affect the number of affairs.

t3<-data %>% select(child, nbaffairs)
glimpse(t3)
## Observations: 601
## Variables: 2
## $ child     (fctr) no, no, yes, yes, no, no, no, yes, yes, no, yes, ye...
## $ nbaffairs (int) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
qplot(data=t3, nbaffairs, geom="density", color=child)

qplot(data=t3, nbaffairs, color=child)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The plots above would indicate that infidelity is higher when there are children in the marriage.

Conclusion

Our analysis would suggets that 27% of men tend to cheat compared to the 23% of womwen. However, women tend to have higher median number of affairs; 7 compared to the men’s median of 3.

As the years of marriage increase, number of affairs increases especially at years of marriage greater than 5 years.

Lastly, number of affairs is highest within the age groups of 20-35 years and when they are chilrdren in the marriage.