The dataset that was used in this homework had to be changed from the previous one because there weren’t enough variables to test. The Affairs Dataset contains 601 cases and 9 variables consisting of affairs, gender, age, yearsmarried, children, religiousness, education, occupation, and rating. Affairs was measured in the amount of affairs per case, gender was female and male, then later I had to recode gender category into 1=female and 0=male, children was the amount of children that each case has,religiousness was measured in how religous the person was, 1 was not religious and 5 at extremely religious, education level accounted for every year of education received, occupation and rating. Ocupation and rating’s description was unknown and not stated.
Focusing on the variables affairs, children, and yearsmarried, I wanted to see if there is a relationship between the number of years married and children and their affect on the number of affairs someone has.
Here I read in my dataset, installed and called all the packages I will be using. The dependent variable I am using will be affairs.
As stated earlier, I had to recode the gender variable so that female = 1 and male = 0.
library(readr)
library(dplyr)
library(tibble)
library(ggplot2)
library(texreg)
affairs_csv <- read.csv("C:/Users/Jessica/Desktop/712/affairs.csv")
affairs2<-affairs_csv%>%
mutate(female = ifelse(gender == "female", 1, 0),
children = ifelse(children == "yes", 1, 0))
Here I took a look at the distribution of my varaiables.
head(affairs2)
## X affairs gender age yearsmarried children religiousness education
## 1 4 0 male 37 10.00 0 3 18
## 2 5 0 female 27 4.00 0 4 14
## 3 11 0 female 32 15.00 1 1 12
## 4 16 0 male 57 15.00 1 5 18
## 5 23 0 male 22 0.75 0 2 17
## 6 29 0 female 32 1.50 0 2 17
## occupation rating female
## 1 7 4 0
## 2 6 4 1
## 3 1 4 1
## 4 6 5 0
## 5 6 3 0
## 6 5 5 1
names(affairs2)
## [1] "X" "affairs" "gender" "age"
## [5] "yearsmarried" "children" "religiousness" "education"
## [9] "occupation" "rating" "female"
Children: 171 of the cases had 0 children, 430 of the cases had 1 child.
ggplot(affairs2, aes(x = children)) + geom_histogram()
table(affairs2$children)
##
## 0 1
## 171 430
Affairs : 451 cases had 0 affairs, 34 cases had 1, 17 cases had 2, 19 cases had 3, 42 cases had 7, and 38 cases had 12.
ggplot(affairs2, aes(x = affairs)) + geom_histogram()
table(affairs2$affairs)
##
## 0 1 2 3 7 12
## 451 34 17 19 42 38
Gender: 315 females and 286 males.
ggplot(affairs2, aes(x = gender)) + geom_histogram(stat="count")
table(affairs2$gender)
##
## female male
## 315 286
In the regression analysis I used affairs as my dependent variable, years married as my first independent variable, and then gender as my second as well as my iteraction variable.
My first model shows that there is a significance between years married and affairs. It shows that people with no years of marriage, on average amount of affairs is 0.55122. At each year of marriage increase, there is an increase units of affairs by 0.11063.
In my second model, I chose children as my second independent variable. The regression shows the difference in amount of affairs between those that have no children and those that have 1 child. On average, those who have one child, their average unit of affairs is -0.03 less than those that have no children. The second model shows significance compared to the first model.
In my third model, years married and children were multiplied to see if there is a significance. It shows that as the year of marriage increases and you have one child, the affairs on average decreases at -0.23 units of affairs.
m1 <- lm(affairs ~ yearsmarried,
data = affairs2)
summary(m1)
##
## Call:
## lm(formula = affairs ~ yearsmarried, data = affairs2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2106 -1.6575 -0.9937 -0.5974 11.3658
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.55122 0.23511 2.345 0.0194 *
## yearsmarried 0.11063 0.02377 4.655 4e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.243 on 599 degrees of freedom
## Multiple R-squared: 0.03491, Adjusted R-squared: 0.0333
## F-statistic: 21.67 on 1 and 599 DF, p-value: 3.996e-06
m2 <- lm(affairs ~ yearsmarried + children,
data = affairs2)
summary(m2)
##
## Call:
## lm(formula = affairs ~ yearsmarried + children, data = affairs2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2446 -1.6509 -0.9780 -0.5763 11.3865
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.56226 0.26424 2.128 0.033758 *
## yearsmarried 0.11216 0.02902 3.865 0.000123 ***
## children -0.03288 0.35804 -0.092 0.926868
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.246 on 598 degrees of freedom
## Multiple R-squared: 0.03492, Adjusted R-squared: 0.0317
## F-statistic: 10.82 on 2 and 598 DF, p-value: 2.421e-05
m3 <- lm(affairs ~ yearsmarried*children, data = affairs2)
summary(m3)
##
## Call:
## lm(formula = affairs ~ yearsmarried * children, data = affairs2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5256 -2.0239 -1.2196 -0.1911 11.0180
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03701 0.32971 -0.112 0.91067
## yearsmarried 0.30417 0.07013 4.337 1.69e-05 ***
## children 0.96414 0.48651 1.982 0.04797 *
## yearsmarried:children -0.23106 0.07693 -3.003 0.00278 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.225 on 597 degrees of freedom
## Multiple R-squared: 0.04929, Adjusted R-squared: 0.04451
## F-statistic: 10.32 on 3 and 597 DF, p-value: 1.252e-06
htmlreg(list(m1, m2, m3), doctype = FALSE)
| Model 1 | Model 2 | Model 3 | ||
|---|---|---|---|---|
| (Intercept) | 0.55* | 0.56* | -0.04 | |
| (0.24) | (0.26) | (0.33) | ||
| yearsmarried | 0.11*** | 0.11*** | 0.30*** | |
| (0.02) | (0.03) | (0.07) | ||
| children | -0.03 | 0.96* | ||
| (0.36) | (0.49) | |||
| yearsmarried:children | -0.23** | |||
| (0.08) | ||||
| R2 | 0.03 | 0.03 | 0.05 | |
| Adj. R2 | 0.03 | 0.03 | 0.04 | |
| Num. obs. | 601 | 601 | 601 | |
| RMSE | 3.24 | 3.25 | 3.22 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||
The first graph shows that the increase in years married increases the amount of affairs.
The second graph shows that as the amount of children increases, at 1.5 years of marriage, the affairs increase at a very slow rate. As children increase at 7 years of marriage, the amount of affairs decrease at a slow rate. As the amount of children increase at 15 years of marriage, the affairs decrease at a faster rate.
The third graph now shows when someone has zero children, as the number of years married increase, affairs increase at a higher rate compared to those that have one child which also increase but at a slower rate.
library(visreg)
visreg(m3, "yearsmarried", scale = "response")
## Warning: Note that you are attempting to plot a 'main effect' in a model that contains an
## interaction. This is potentially misleading; you may wish to consider using the 'by'
## argument.
## Conditions used in construction of plot
## children: 1
visreg(m3,"children", by = "yearsmarried", scale = "response")
visreg(m3, "yearsmarried", by = "children", scale = "response")