The data collected is a longitudinal examination of divorce rates in the USA from 1920- 1996.The data was produced using R data package Faraway. I have chosen this data set to start my research on what variables influence the rate of divorce. This data set in particular, consists of 77 observations on 7 different variables. The variables are as follows:
library(faraway)
data(divusa)
str (divusa)
## 'data.frame': 77 obs. of 7 variables:
## $ year : int 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 ...
## $ divorce : num 8 7.2 6.6 7.1 7.2 7.2 7.5 7.8 7.8 8 ...
## $ unemployed: num 5.2 11.7 6.7 2.4 5 3.2 1.8 3.3 4.2 3.2 ...
## $ femlab : num 22.7 22.8 22.9 23 23.1 ...
## $ marriage : num 92 83 79.7 85.2 80.3 79.2 78.7 77 74.1 75.5 ...
## $ birth : num 118 120 111 110 111 ...
## $ military : num 3.22 3.56 2.46 2.21 2.29 ...
The data chose variables that would characterize social environment and change. When evaluating the variables you can figure that the variable year has many underlying factors that change the environment and personal unions.
I hypothesize that year and females entering the work force (femlab) will have a great influence on divorce causing more divorces to be taken. This is because this was a drastic change during World War 2 causing an effect on the family unit and changing the idea of gender roles in society. I also hypothesize that as births increase there will be a decline in divorce as the growth of a family increases unions and togetherness.
Prior to investigating more into each variables relationship with divorce. I have constructed five scatter plots to see if there is a visual correlation between each independent variable and the variable divorce.
Scatter plot 1 illustrates the correlation between year and divorce. The plot gives a positive correlation. Meaning that divorce rates have fairly increased as the years pass. From the visual you can also assume that many historical outcomes could have influenced this positive outcome because of the years in which the slope drastically increasing 1940- 1950 after world war 2 and 1970- 1980 during the second wave of feminism.
Scatter plot illustrates the correlation between the percent of females participating in the labor force and divorce per 1000 women aged 15 and more. There is an obvious correlation between divorce and females entering the labor force. What I find interesting is the familiar landscape of scatter plot 2 with scatter plot 1, they both show a similar wave of correlation with divorce.
Scatter plot 3, is the visual encounter that birth has with divorce. This graph seems to not be as direct as the last two scatter plot but, there still seems to be a negative correlation with births and divorces. This scatter plot shows that there may be a greater relationship but, the only way to determine this is to evaluate its regression.
Scatter plot 4, evaluates the unemployment rate and if there is a relationship with divorce. At first glimpse, there seems to be erratic outcomes at the beginning of the plot but, then a negative correlation starts to appear towards the end. From my brief overview on this scatter plot, I would like to see the relationship in a regression between unemployment and divorce to better understand what the scatter plot is illustrating. I also, wonder if the regression will be significant.
This plot illustrates the relationship between military personnel per 1000 population and divorce per 1000 women aged 15 or more. The plot shows that most of society is within 0-25 military personnel per 1000 and within this category there is a decrease in divorce for as the number of military increases. But, I hypothesize that this may not be statistically significant as it seems a cluster is developed from 0-10 on y axis.
library(tidyverse)
ggplot(data= divusa) + geom_point(mapping= aes(x =year, y = divorce))
ggplot(data= divusa) + geom_point(mapping= aes(x =femlab, y = divorce))
ggplot(data= divusa) + geom_point(mapping= aes(x =birth, y = divorce))
ggplot(data= divusa) + geom_point(mapping= aes(x =unemployed, y = divorce))
ggplot(data= divusa) + geom_point(mapping= aes(x =military, y = divorce))
To follow up, the descriptive analysis of my data on divorce I have created various regressions to determine which independent variables play a greater role in effecting divorce.
d1<- lm(divorce~ year, data= divusa)
summary (d1)
##
## Call:
## lm(formula = divorce ~ year, data = divusa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7828 -1.8092 0.1592 1.6292 7.3048
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -422.97530 27.29465 -15.50 <2e-16 ***
## year 0.22280 0.01394 15.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.719 on 75 degrees of freedom
## Multiple R-squared: 0.7731, Adjusted R-squared: 0.77
## F-statistic: 255.5 on 1 and 75 DF, p-value: < 2.2e-16
The above regression shows that for every 1 year divorce increases by .222 . This is statisically significant but, yet again I believe that there are other factors which interact with the year that have affected this to be true.
d2<- lm(divorce~ femlab, data= divusa)
summary (d2)
##
## Call:
## lm(formula = divorce ~ femlab, data = divusa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7264 -1.6385 0.1595 1.2211 8.0442
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.65527 0.92798 -3.939 0.000182 ***
## femlab 0.43867 0.02302 19.056 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.361 on 75 degrees of freedom
## Multiple R-squared: 0.8288, Adjusted R-squared: 0.8265
## F-statistic: 363.1 on 1 and 75 DF, p-value: < 2.2e-16
This specific regression has truly alarmed me. The regression suggests that for every percent of females participating in labor force dovorce increases by .438.
This alarmed me because females in the workplace are becoming more acceptable but, it also leaves new research questions that I would like to observe. I wonder if we bring this data up to date (from 1920- to 2017), will this factor still have such a high statistical significance on divorce.
d3<- lm(divorce~ birth, data= divusa)
summary (d3)
##
## Call:
## lm(formula = divorce ~ birth, data = divusa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.808 -1.991 1.150 2.884 7.359
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.90577 2.11124 15.112 < 2e-16 ***
## birth -0.20967 0.02321 -9.035 1.28e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.949 on 75 degrees of freedom
## Multiple R-squared: 0.5212, Adjusted R-squared: 0.5148
## F-statistic: 81.63 on 1 and 75 DF, p-value: 1.277e-13
The regression between birth and dovrce shows that for every unit that birth increases divorce decreases by .209. This regression shows to be extremely significant by 1.28e-13.
d4<- lm(divorce~ unemployed, data= divusa)
summary (d4)
##
## Call:
## lm(formula = divorce ~ unemployed, data = divusa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.290 -4.262 -2.724 6.691 9.432
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.9542 1.1046 13.538 <2e-16 ***
## unemployed -0.2350 0.1259 -1.866 0.066 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.579 on 75 degrees of freedom
## Multiple R-squared: 0.04435, Adjusted R-squared: 0.03161
## F-statistic: 3.481 on 1 and 75 DF, p-value: 0.06599
According to the linear regression above, for every unit unemployment increases divorce decreases by .235. The fact that the variable is unemployment it would be interesting to see if there were underlying factors to why this variable decreases divorce as it increases. But, even if we wanted to further research, this statement is in fact statistically insignificant as the t- test shows that it is above .05, at .06 .
d5<- lm(divorce~ femlab + birth, data= divusa)
summary(d5)
##
## Call:
## lm(formula = divorce ~ femlab + birth, data = divusa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2322 -1.3951 -0.3535 0.9792 8.4542
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.37560 2.06749 3.084 0.00287 **
## femlab 0.35985 0.02481 14.503 < 2e-16 ***
## birth -0.07864 0.01496 -5.258 1.36e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.028 on 74 degrees of freedom
## Multiple R-squared: 0.8754, Adjusted R-squared: 0.872
## F-statistic: 259.9 on 2 and 74 DF, p-value: < 2.2e-16
So for every 1 percent of females increases divorce by .35985 controlling for births per 1000 of women participating in labor force decreases divorce by .07864 compared to women with no births.
library(texreg)
screenreg(list(d1, d2, d3, d4, d5))
##
## ====================================================================
## Model 1 Model 2 Model 3 Model 4 Model 5
## --------------------------------------------------------------------
## (Intercept) -422.98 *** -3.66 *** 31.91 *** 14.95 *** 6.38 **
## (27.29) (0.93) (2.11) (1.10) (2.07)
## year 0.22 ***
## (0.01)
## femlab 0.44 *** 0.36 ***
## (0.02) (0.02)
## birth -0.21 *** -0.08 ***
## (0.02) (0.01)
## unemployed -0.23
## (0.13)
## --------------------------------------------------------------------
## R^2 0.77 0.83 0.52 0.04 0.88
## Adj. R^2 0.77 0.83 0.51 0.03 0.87
## Num. obs. 77 77 77 77 77
## RMSE 2.72 2.36 3.95 5.58 2.03
## ====================================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05
After evaluating the models individually I have created a small regressional outline of all models to help evaluate which variables have a greater relationship with the dependent variable divorce. As displayed above we have we can also see the R^2 the higher this figure is determined from 0-1 and the higher the R^2 the more powerful the variables relationship is. In this case femlabhas the most powerful relationship with divorce.
d8<- lm(divorce~ year*femlab, data= divusa)
summary(d8)
##
## Call:
## lm(formula = divorce ~ year * femlab, data = divusa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2405 -1.6008 -0.0959 0.9140 8.4805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.125e+02 1.338e+02 2.336 0.0222 *
## year -1.675e-01 7.088e-02 -2.363 0.0208 *
## femlab 5.548e-01 2.985e+00 0.186 0.8531
## year:femlab 9.726e-05 1.466e-03 0.066 0.9473
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.307 on 73 degrees of freedom
## Multiple R-squared: 0.841, Adjusted R-squared: 0.8345
## F-statistic: 128.7 on 3 and 73 DF, p-value: < 2.2e-16
From my previous analysis of the variables year and femlab their correlations with divorce were very similar to one another. Prior to doing the interaction function, it was already apparent that year does not influence the change of divorce for femlab and in fact there is a underlying factor which has not been provided in this dataset. But, still I have attempted to see if there was an interaction between the variables year and femlab. Using the interaction function it can be observed that for females that participate in the workforce yearly there is an increase in divorce by 0.00009726 compared to females that year is not influenced. This statement though is not significant as the t-test shows that it is .9473. Prior to doing the interaction function, it was already apparent that year does not influence the change of divorce for femlab and in fact there is a underlying factor which has not been provided in this dataset.
d7<- lm(divorce~ birth*femlab, data= divusa)
summary(d7)
##
## Call:
## lm(formula = divorce ~ birth * femlab, data = divusa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0208 -0.9859 -0.3185 0.4497 8.3239
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.914155 4.781684 -2.073 0.041668 *
## birth 0.121133 0.055510 2.182 0.032316 *
## femlab 0.784409 0.116536 6.731 3.27e-09 ***
## birth:femlab -0.005421 0.001459 -3.716 0.000394 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.873 on 73 degrees of freedom
## Multiple R-squared: 0.8952, Adjusted R-squared: 0.8909
## F-statistic: 207.9 on 3 and 73 DF, p-value: < 2.2e-16
For my final attempt in finding an interaction between variables, I have decided to see if births per 1000 women influences femlab towards divorce. From the above analysis it can be determined that females that participate in the labor force with no births have a .784409 increase in divorce while women that engage in the workforce and have a birth their chance of divorce decreases by .00542. This statement is statistically significant because the t-test value is .000394 but, it can also be observed that the statistical significance has drastically decreased compared to the variables individual relationship with divorce.