Week Six: Interaction Effects.

The dataset that was used in this homework had to be changed from the previous one because there weren’t enough variables to test. The Affairs Dataset contains 601 cases and 9 variables consisting of affairs, gender, age, yearsmarried, children, religiousness, education, occupation, and rating. Affairs was measured in the amount of affairs per case, gender was female and male, then later I had to recode gender category into 1=female and 0=male, children was the amount of children that each case has,religiousness was measured in how religous the person was, 1 was not religious and 5 at extremely religious, education level accounted for every year of education received, occupation and rating. Ocupation and rating’s description was unknown and not stated.

Focusing on the variables affairs, children, and yearsmarried, I wanted to see if there is a relationship between the number of years married and children and their affect on the number of affairs someone has.

Reading in and organizing data.

Here I read in my dataset, installed and called all the packages I will be using. The dependent variable I am using will be affairs.

As stated earlier, I had to recode the gender variable so that female = 1 and male = 0.

library(readr)
library(dplyr)
library(tibble)
library(ggplot2)
library(texreg)
affairs_csv <- read.csv("C:/Users/Jessica/Desktop/712/affairs.csv")

affairs2<-affairs_csv%>%
  mutate(female = ifelse(gender == "female", 1, 0),
         children = ifelse(children == "yes", 1, 0))

Distribution of the outcome variables and explanatory variables.

Here I took a look at the distribution of my varaiables.

head(affairs2)
##    X affairs gender age yearsmarried children religiousness education
## 1  4       0   male  37        10.00        0             3        18
## 2  5       0 female  27         4.00        0             4        14
## 3 11       0 female  32        15.00        1             1        12
## 4 16       0   male  57        15.00        1             5        18
## 5 23       0   male  22         0.75        0             2        17
## 6 29       0 female  32         1.50        0             2        17
##   occupation rating female
## 1          7      4      0
## 2          6      4      1
## 3          1      4      1
## 4          6      5      0
## 5          6      3      0
## 6          5      5      1
names(affairs2)
##  [1] "X"             "affairs"       "gender"        "age"          
##  [5] "yearsmarried"  "children"      "religiousness" "education"    
##  [9] "occupation"    "rating"        "female"

Children: 171 of the cases had 0 children, 430 of the cases had 1 child.

ggplot(affairs2, aes(x = children)) + geom_histogram()

table(affairs2$children)
## 
##   0   1 
## 171 430

Affairs : 451 cases had 0 affairs, 34 cases had 1, 17 cases had 2, 19 cases had 3, 42 cases had 7, and 38 cases had 12.

ggplot(affairs2, aes(x = affairs)) + geom_histogram()

table(affairs2$affairs)
## 
##   0   1   2   3   7  12 
## 451  34  17  19  42  38

Gender: 315 females and 286 males.

ggplot(affairs2, aes(x = gender)) + geom_histogram(stat="count")

table(affairs2$gender)
## 
## female   male 
##    315    286

Regression Analysis

In the regression analysis I used affairs as my dependent variable, years married as my first independent variable, and then gender as my second as well as my iteraction variable.

My first model shows that there is a significance between years married and affairs. It shows that people with no years of marriage, on average amount of affairs is 0.55122. At each year of marriage increase, there is an increase units of affairs by 0.11063.

In my second model, I chose children as my second independent variable. The regression shows the difference in amount of affairs between those that have no children and those that have 1 child. On average, those who have one child, their average unit of affairs is -0.03 less than those that have no children. The second model shows significance compared to the first model.

In my third model, years married and children were multiplied to see if there is a significance. It shows that as the year of marriage increases and you have one child, the affairs on average decreases at -0.23 units of affairs.

m1 <- lm(affairs ~ yearsmarried,
         data = affairs2)
summary(m1)
## 
## Call:
## lm(formula = affairs ~ yearsmarried, data = affairs2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2106 -1.6575 -0.9937 -0.5974 11.3658 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.55122    0.23511   2.345   0.0194 *  
## yearsmarried  0.11063    0.02377   4.655    4e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.243 on 599 degrees of freedom
## Multiple R-squared:  0.03491,    Adjusted R-squared:  0.0333 
## F-statistic: 21.67 on 1 and 599 DF,  p-value: 3.996e-06
m2 <- lm(affairs ~ yearsmarried + children,
         data = affairs2)
summary(m2)
## 
## Call:
## lm(formula = affairs ~ yearsmarried + children, data = affairs2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2446 -1.6509 -0.9780 -0.5763 11.3865 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.56226    0.26424   2.128 0.033758 *  
## yearsmarried  0.11216    0.02902   3.865 0.000123 ***
## children     -0.03288    0.35804  -0.092 0.926868    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.246 on 598 degrees of freedom
## Multiple R-squared:  0.03492,    Adjusted R-squared:  0.0317 
## F-statistic: 10.82 on 2 and 598 DF,  p-value: 2.421e-05
m3 <- lm(affairs ~ yearsmarried*children, data = affairs2)
summary(m3)
## 
## Call:
## lm(formula = affairs ~ yearsmarried * children, data = affairs2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5256 -2.0239 -1.2196 -0.1911 11.0180 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -0.03701    0.32971  -0.112  0.91067    
## yearsmarried           0.30417    0.07013   4.337 1.69e-05 ***
## children               0.96414    0.48651   1.982  0.04797 *  
## yearsmarried:children -0.23106    0.07693  -3.003  0.00278 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.225 on 597 degrees of freedom
## Multiple R-squared:  0.04929,    Adjusted R-squared:  0.04451 
## F-statistic: 10.32 on 3 and 597 DF,  p-value: 1.252e-06
htmlreg(list(m1, m2, m3), doctype = FALSE)
Statistical models
Model 1 Model 2 Model 3
(Intercept) 0.55* 0.56* -0.04
(0.24) (0.26) (0.33)
yearsmarried 0.11*** 0.11*** 0.30***
(0.02) (0.03) (0.07)
children -0.03 0.96*
(0.36) (0.49)
yearsmarried:children -0.23**
(0.08)
R2 0.03 0.03 0.05
Adj. R2 0.03 0.03 0.04
Num. obs. 601 601 601
RMSE 3.24 3.25 3.22
p < 0.001, p < 0.01, p < 0.05

Graphs.

The first graph shows that the increase in years married increases the amount of affairs.

The second graph shows that as the amount of children increases, at 1.5 years of marriage, the affairs increase at a very slow rate. As children increase at 7 years of marriage, the amount of affairs decrease at a slow rate. As the amount of children increase at 15 years of marriage, the affairs decrease at a faster rate.

The third graph now shows when someone has zero children, as the number of years married increase, affairs increase at a higher rate compared to those that have one child which also increase but at a slower rate.

library(visreg)
visreg(m3, "yearsmarried", scale = "response")
## Warning:   Note that you are attempting to plot a 'main effect' in a model that contains an
##   interaction.  This is potentially misleading; you may wish to consider using the 'by'
##   argument.
## Conditions used in construction of plot
## children: 1

visreg(m3,"children", by = "yearsmarried", scale = "response")

visreg(m3, "yearsmarried", by = "children", scale = "response")