It’s wedding season. Again. Even though approximately 45% of married couples eventually divorce (it was closer to 50% in the 1980s), couples are still tying the knot, secretly hoping that their marriage will be the one that endures/beats the odds, thus debunking the stats. There have been more studies than not on this topic, with each one offering their opinion on what it takes to remain a happily married couple, yet none has been able to pinpoint the exact reason for the demise of the marital union. There is a myriad of reasons why marriages fail - financial struggles, infidelity, abuse(emotional, physical, mental), illness, sexual incompatibility and/or abandonment, married too young, contentiousness et al. Too many to list. For my final project, I chose the “Affairs” dataset entitled “Fair’s Extramarital Affairs Data” located at http://vincentarelbundock.github.io/Rdatasets). Described as “Infidelity data, known as Fair’s Affairs, this dataset is a cross-section data from a survey conducted by Psychology Today in 1969.”
I am curious to know whether one’s educational/occupational status leads to a greater likelihood of having affairs, or if it’s the opposite, the lack thereof that is the main cause of having affairs.
# import file, set header = FALSE to get row name header
affairs <- read.csv('https://raw.githubusercontent.com/carolc57/RBridgeFinal/main/Affairs.csv', header = TRUE, sep = ",")
head (affairs) #get a glimpse of the data
## affairs gender age yearsmarried children religiousness education occupation
## 1 0 male 37 10.00 no 3 18 7
## 2 0 female 27 4.00 no 4 14 6
## 3 0 female 32 15.00 yes 1 12 1
## 4 0 male 57 15.00 yes 5 18 6
## 5 0 male 22 0.75 no 2 17 6
## 6 0 female 32 1.50 no 2 17 5
## rating
## 1 4
## 2 4
## 3 4
## 4 5
## 5 3
## 6 5
summary (affairs)
## affairs gender age yearsmarried
## Min. : 0.000 Length:601 Min. :17.50 Min. : 0.125
## 1st Qu.: 0.000 Class :character 1st Qu.:27.00 1st Qu.: 4.000
## Median : 0.000 Mode :character Median :32.00 Median : 7.000
## Mean : 1.456 Mean :32.49 Mean : 8.178
## 3rd Qu.: 0.000 3rd Qu.:37.00 3rd Qu.:15.000
## Max. :12.000 Max. :57.00 Max. :15.000
## children religiousness education occupation
## Length:601 Min. :1.000 Min. : 9.00 Min. :1.000
## Class :character 1st Qu.:2.000 1st Qu.:14.00 1st Qu.:3.000
## Mode :character Median :3.000 Median :16.00 Median :5.000
## Mean :3.116 Mean :16.17 Mean :4.195
## 3rd Qu.:4.000 3rd Qu.:18.00 3rd Qu.:6.000
## Max. :5.000 Max. :20.00 Max. :7.000
## rating
## Min. :1.000
## 1st Qu.:3.000
## Median :4.000
## Mean :3.932
## 3rd Qu.:5.000
## Max. :5.000
affairs_mean <- RoundTo(mean (affairs$age),0.2, FUN=round)
sprintf(paste("The mean of age at marriage is", affairs_mean, "years"))
## [1] "The mean of age at marriage is 32.4 years"
affairs_median <- median(affairs$yearsmarried)
sprintf(paste("The median years married is", affairs_median, "years"))
## [1] "The median years married is 7 years"
#resort data in age column in ascending order; prep for quartile analysis
affairs_agesort <- affairs %>% arrange(age) %>% head (601)
print (head(affairs_agesort, 15)) #print 1st ten rows
## affairs gender age yearsmarried children religiousness education occupation
## 1 0 male 17.5 1.50 yes 3 18 6
## 2 0 female 17.5 0.75 no 2 18 5
## 3 0 female 17.5 10.00 no 4 14 4
## 4 1 female 17.5 0.75 no 5 14 4
## 5 12 female 17.5 0.75 yes 2 12 1
## 6 12 female 17.5 0.75 yes 2 12 3
## 7 0 male 22.0 0.75 no 2 17 6
## 8 0 female 22.0 0.75 no 2 12 1
## 9 0 male 22.0 1.50 no 4 14 4
## 10 0 female 22.0 1.50 no 2 17 5
## 11 0 female 22.0 0.75 no 3 16 5
## 12 0 female 22.0 1.50 no 2 16 5
## 13 0 female 22.0 1.50 no 2 16 5
## 14 0 female 22.0 1.50 no 2 16 5
## 15 0 female 22.0 1.50 no 2 18 5
## rating
## 1 5
## 2 4
## 3 5
## 4 5
## 5 3
## 6 5
## 7 3
## 8 3
## 9 5
## 10 4
## 11 4
## 12 5
## 13 5
## 14 5
## 15 5
print (tail(affairs_agesort, 15)) #print last 15 rows
## affairs gender age yearsmarried children religiousness education occupation
## 587 0 male 57 15 yes 4 20 5
## 588 0 female 57 15 yes 4 16 6
## 589 0 female 57 15 yes 2 18 5
## 590 0 male 57 15 yes 4 9 2
## 591 0 male 57 15 yes 4 20 6
## 592 0 male 57 15 yes 2 20 6
## 593 0 male 57 15 yes 4 9 2
## 594 0 male 57 15 yes 4 17 5
## 595 0 male 57 15 yes 5 18 5
## 596 0 female 57 15 yes 3 18 5
## 597 0 male 57 15 no 4 9 3
## 598 0 female 57 15 no 4 20 6
## 599 2 female 57 15 yes 1 18 5
## 600 2 male 57 15 yes 1 17 4
## 601 7 male 57 15 yes 5 20 4
## rating
## 587 4
## 588 4
## 589 2
## 590 2
## 591 5
## 592 4
## 593 4
## 594 5
## 595 2
## 596 2
## 597 1
## 598 5
## 599 4
## 600 4
## 601 5
affairs_quantile <- quantile(affairs$age, probs = c(0.25,0.5,0.75,1))
print (affairs_quantile)
## 25% 50% 75% 100%
## 27 32 37 57
print("The affairs_data_quantile shows that the first, second, and third quartiles are 27, 32 and 37 respectively")
## [1] "The affairs_data_quantile shows that the first, second, and third quartiles are 27, 32 and 37 respectively"
print("The first quartile, or lower quartile, is the value that cuts off the first 25% of the data when it is sorted in ascending order. The second quartile, or median, is the value that cuts off the first 50%. The third quartile, or upper quartile, is the value that cuts off the first 75%.")
## [1] "The first quartile, or lower quartile, is the value that cuts off the first 25% of the data when it is sorted in ascending order. The second quartile, or median, is the value that cuts off the first 50%. The third quartile, or upper quartile, is the value that cuts off the first 75%."
#Create a new data frame with a subset of the columns and rows. Make sure to name it
#rename a column, ensure use of rename () function in dplyer due to "Error in `chr_as_locations()`:"
affairs_colname <- plyr::rename(affairs, c(
"rating" = "marital_satisfaction",
"children" = "number_children"))
head(affairs_colname)
## affairs gender age yearsmarried number_children religiousness education
## 1 0 male 37 10.00 no 3 18
## 2 0 female 27 4.00 no 4 14
## 3 0 female 32 15.00 yes 1 12
## 4 0 male 57 15.00 yes 5 18
## 5 0 male 22 0.75 no 2 17
## 6 0 female 32 1.50 no 2 17
## occupation marital_satisfaction
## 1 7 4
## 2 6 4
## 3 1 4
## 4 6 5
## 5 6 3
## 6 5 5
#Change values in education columns to text meanings; store in new column
affairs_change <- affairs
for (i in 1:nrow(affairs_change)){
if (affairs_change$education [i] == "9"){
affairs_change$edu_descr[i] <- "< Grade school"
}else if (affairs_change$education [i] == "12"){
affairs_change$edu_descr[i] <- "HS Grad"
}else if (affairs_change$education [i] == "14"){
affairs_change$edu_descr[i] <- "Some College"
}else if (affairs_change$education [i] == "16"){
affairs_change$edu_descr[i] <- "College Grad"
}else if (affairs_change$education [i] == "17"){
affairs_change$edu_descr[i] <- "Some Grad work"
}else if (affairs_change$education [i] == "18"){
affairs_change$edu_descr[i] <- " Master's"
}else {(affairs_change$education [i] == "")
affairs_change$edu_descr[i] <- "PhD, MD, other_adv"
}
}
#reorder columns so that education and edu_descr are together, give new name
affairs_change1 <- affairs_change %>% relocate(edu_descr, .after = education)
print (head(affairs_change1, 15)) #print 1st 15 rows
## affairs gender age yearsmarried children religiousness education
## 1 0 male 37 10.00 no 3 18
## 2 0 female 27 4.00 no 4 14
## 3 0 female 32 15.00 yes 1 12
## 4 0 male 57 15.00 yes 5 18
## 5 0 male 22 0.75 no 2 17
## 6 0 female 32 1.50 no 2 17
## 7 0 female 22 0.75 no 2 12
## 8 0 male 57 15.00 yes 2 14
## 9 0 female 32 15.00 yes 4 16
## 10 0 male 22 1.50 no 4 14
## 11 0 male 37 15.00 yes 2 20
## 12 0 male 27 4.00 yes 4 18
## 13 0 male 47 15.00 yes 5 17
## 14 0 female 22 1.50 no 2 17
## 15 0 female 27 4.00 no 4 14
## edu_descr occupation rating
## 1 Master's 7 4
## 2 Some College 6 4
## 3 HS Grad 1 4
## 4 Master's 6 5
## 5 Some Grad work 6 3
## 6 Some Grad work 5 5
## 7 HS Grad 1 3
## 8 Some College 4 4
## 9 College Grad 1 2
## 10 Some College 4 5
## 11 PhD, MD, other_adv 7 2
## 12 Master's 6 4
## 13 Some Grad work 6 4
## 14 Some Grad work 5 4
## 15 Some College 5 4
print (tail(affairs_change1, 15)) #print last 15 rows
## affairs gender age yearsmarried children religiousness education
## 587 12 female 52 15.0 yes 1 18
## 588 12 male 47 15.0 no 1 18
## 589 3 female 32 15.0 yes 4 16
## 590 7 female 32 15.0 yes 3 14
## 591 7 female 27 7.0 yes 4 16
## 592 12 male 42 15.0 yes 3 18
## 593 7 female 42 15.0 yes 2 14
## 594 12 male 27 7.0 yes 2 17
## 595 3 male 32 10.0 yes 4 14
## 596 7 male 47 15.0 yes 3 16
## 597 1 male 22 1.5 yes 1 12
## 598 7 female 32 10.0 yes 2 18
## 599 2 male 32 10.0 yes 2 17
## 600 2 male 22 7.0 yes 3 18
## 601 1 female 32 15.0 yes 3 14
## edu_descr occupation rating
## 587 Master's 5 5
## 588 Master's 6 5
## 589 College Grad 4 4
## 590 Some College 3 2
## 591 College Grad 1 2
## 592 Master's 6 2
## 593 Some College 3 2
## 594 Some Grad work 5 4
## 595 Some College 4 3
## 596 College Grad 4 2
## 597 HS Grad 2 5
## 598 Master's 5 4
## 599 Some Grad work 6 5
## 600 Master's 6 2
## 601 Some College 1 5
#scatterplot
affairs_change1 %>%
filter (yearsmarried > 5) %>%
ggplot(aes (x= yearsmarried,
y= affairs))+
geom_point(aes(color = education))+
geom_smooth (method = lm,
se = T)+
labs(x = "# Years Married",
y = "# of Affairs",
title = "Affairs vs Yrs Married")+
theme_minimal()
## `geom_smooth()` using formula 'y ~ x'
ggsave("scatterplot.png", width = 4, height = 3, dpi = 1000)
## `geom_smooth()` using formula 'y ~ x'
#histogram of educational level
affairs_change1 %>%
filter(education >= 12 & affairs >= 1) %>%
ggplot(aes(x = education))+
geom_histogram(color = "black",fill = "maroon", binwidth = 2)+
theme_bw()+
labs(x = "Education",
y = NULL,
title = "Affairs by educational level, HS or higher")
ggsave ("histogram.png", width = 4, height = 3, dpi = 1000)
#boxplot
library(dplyr)
affairs_1ormore <- filter(affairs_change1, affairs >= 1)
head(affairs_1ormore)
## affairs gender age yearsmarried children religiousness education
## 1 3 male 27 1.500 no 3 18
## 2 3 female 27 4.000 yes 3 17
## 3 7 male 37 15.000 yes 5 18
## 4 12 female 32 10.000 yes 3 17
## 5 1 male 22 0.125 no 4 16
## 6 1 female 22 1.500 yes 2 14
## edu_descr occupation rating
## 1 Master's 4 4
## 2 Some Grad work 1 5
## 3 Master's 6 2
## 4 Some Grad work 5 2
## 5 College Grad 5 5
## 6 Some College 1 5
ggplot(affairs_1ormore, aes(edu_descr, affairs))+
geom_boxplot(fill = "pink", color = "darkblue" )+
labs(x = "Education level" , y = "# of Affairs had")+
coord_flip()+
theme_classic()+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
# saving the final figure as .png fle 4" wide x 3" high
ggsave ("boxplot.png", width = 4, height = 3, dpi = 1000)
After studying and analyzing this dataset in R, I conclude that the educational group(s) that had the highest incidents of cheating/affairs were those who married 5 years or more and had either HS Diploma, some college or some Grad work had the highest probability of cheating. Interestingly enough, those who graduated college, or had Master’s or advanced level degrees had a lower probability of cheating.
##import from github directory
affairs <- read.csv('https://raw.githubusercontent.com/carolc57/RBridgeFinal/main/Affairs.csv', header = TRUE, sep = ",")