For purposes of this assignment, I will be using data from the 2014 General Social Survey. To see if I can come up with some data that may actually be related this time, I will be using somewhat different data than the data I had used in previous assignments that dealth with Abortions. This time, I will be comparing the relationship gender plays with various work characteristics. Hopefully I will find some interesting relationships specific to the data from 2014! Read on…
library(Zelig)
library(foreign)
library(DescTools)
library(ggplot2)
library(Hmisc)
d <- read.dta("/Users/laurenberkowitz/Downloads/GSS2014.DTA", convert.factors = FALSE)
names(d)
library(dplyr)
library(tidyr)
library(pander)
library(car)
ExamWork <- select(d, age, sex, marital, educ, race, yearsjob, wrkhome, famwkoff, famvswk, hrsrelax, satjob)
names(ExamWork)
Variables include:
AGE Respondent’s Age
SEX Respondent’s Sex
RACE Respondent’s Race
MARITAL Marital Status
EDUC Highest year of school completed
YEARSJOB Years at present job
WRKHOME Frequency of working from home
FAMWKOFF Difficulty in taking off work for family
FAMVSWK Reverse Frequency of family interfering with work
HRSRELAX Hours per week to relax
SATJOB Reverse Job Satisfaction
ExamWork <- na.omit(ExamWork)
ExamWork$educ <- as.numeric(ExamWork$educ)
I want to answer the question of whether sex influences the relationship between working from home and the amount of hours per week to relax. Setting up the regression I want to see the effect of different variables on marital status. I examine the effects of age, sex, years working, time working at home, time to relax, and family interference with work to see how they may influence the likelihood of being married. First I converted marital to a binomial variable. This shows someone as “married = 1” or “not married = 0”
ExamWork$BinaryMarital<- recode(ExamWork$marital, "c(1,2,3,4)='0'; 5='1'")
ExamWork$WhiteRace<-recode(ExamWork$race, "1='1' ; c(2,3)='0'")
Then I ran regressions
wkeffect1 <- glm(BinaryMarital ~ age + sex + yearsjob + wrkhome, family = binomial, data=ExamWork)
wkeffect2 <- glm(BinaryMarital ~ age + sex + yearsjob + wrkhome + hrsrelax, family = binomial, data=ExamWork)
wkeffect3 <- glm(BinaryMarital ~ age + sex + yearsjob + wrkhome + hrsrelax + famwkoff, family = binomial, data=ExamWork)
In ggplot2 I’d like to show the distribution of variables:
G <- ggplot(ExamWork, mapping = aes(x = age, y = BinaryMarital))
g1 <- G + geom_smooth()
g1
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
We see the negative relationship between marriage and age.
G2 <- ggplot(ExamWork, mapping = aes(x=wrkhome, y = BinaryMarital))
G3 <- G2 + geom_point() + stat_summary(fun.data = "mean_cl_boot", colour = "red")
G3
You can see here there is no conclusive relationship between working from home and marital status.
library(stargazer)
##
## Please cite as:
##
## Hlavac, Marek (2014). stargazer: LaTeX code and ASCII text for well-formatted regression and summary statistics tables.
## R package version 5.1. http://CRAN.R-project.org/package=stargazer
stargazer(wkeffect1, wkeffect2, wkeffect3, type="html")
| Dependent variable: | |||
| BinaryMarital | |||
| (1) | (2) | (3) | |
| age | -0.084*** | -0.085*** | -0.085*** |
| (0.007) | (0.007) | (0.007) | |
| sex | 0.071 | 0.112 | 0.116 |
| (0.144) | (0.145) | (0.145) | |
| yearsjob | -0.013 | -0.013 | -0.013 |
| (0.012) | (0.012) | (0.012) | |
| wrkhome | -0.138*** | -0.130*** | -0.130*** |
| (0.044) | (0.045) | (0.045) | |
| hrsrelax | 0.063** | 0.067** | |
| (0.028) | (0.028) | ||
| famwkoff | 0.060 | ||
| (0.071) | |||
| Constant | 2.786*** | 2.536*** | 2.375*** |
| (0.351) | (0.366) | (0.413) | |
| Observations | 1,218 | 1,218 | 1,218 |
| Log Likelihood | -592.448 | -590.026 | -589.675 |
| Akaike Inf. Crit. | 1,194.895 | 1,192.053 | 1,193.351 |
| Note: | p<0.1; p<0.05; p<0.01 | ||
I see that some of these relationships have more significance, so with that knowledge and taking into account error (Akaike), I am going to rerun the regressions:
rewkeffect1 <- glm(BinaryMarital ~ age + wrkhome, family = binomial, data=ExamWork)
rewkeffect2 <- glm(BinaryMarital ~ age + wrkhome + hrsrelax, family = binomial, data=ExamWork)
rewkeffect3 <- glm(BinaryMarital ~ age + wrkhome + hrsrelax + WhiteRace, family = binomial, data=ExamWork)
stargazer(rewkeffect1, rewkeffect2, rewkeffect3, type="html")
| Dependent variable: | |||
| BinaryMarital | |||
| (1) | (2) | (3) | |
| age | -0.088*** | -0.089*** | -0.088*** |
| (0.006) | (0.007) | (0.007) | |
| wrkhome | -0.141*** | -0.135*** | -0.128*** |
| (0.044) | (0.045) | (0.045) | |
| hrsrelax | 0.059** | 0.057** | |
| (0.028) | (0.028) | ||
| WhiteRace | -0.615*** | ||
| (0.156) | |||
| Constant | 2.972*** | 2.796*** | 3.178*** |
| (0.268) | (0.279) | (0.301) | |
| Observations | 1,218 | 1,218 | 1,218 |
| Log Likelihood | -593.181 | -590.983 | -583.252 |
| Akaike Inf. Crit. | 1,192.361 | 1,189.966 | 1,176.505 |
| Note: | p<0.1; p<0.05; p<0.01 | ||
These results show all variables to have an effect on marital status with the third model (rewkeffect3) showing the least error with an Akaike of 199.396. We see when accounting for other variable effects, age has a negative effect on being married, so with every year older a person is, they are actually .087 less likely to be married, the more you work from home the less likely you are to be married (0.133), increased hours relaxing increases your likelihood of being married (.057), but being white does not. It’s interesting to note that these variables, other than relaxation time seem to negatively affect your chances of being married.
Continuing on using some of the knowledge discovered from the general linear regression model, I want to see if being married had an effect on the relationship of education and sex.
D1 <- zelig(BinaryMarital ~ educ + sex + educ:sex, data= ExamWork, model = "logit")
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
xh1 <- setx(D1, educ= mean(ExamWork$educ)+sd(ExamWork$educ), sex = 1)
xl1 <- setx(D1, educ = mean(ExamWork$educ), sex = 1)
xh0 <- setx(D1, educ = mean(ExamWork$educ)+sd(ExamWork$educ), sex = 2)
xl0 <- setx(D1, educ = mean(ExamWork$educ), sex = 2)
zh1 <- sim(D1, x= xh1)
zl1 <- sim(D1, x=xl1)
zh0 <- sim(D1, x=xh0)
zl0 <- sim(D1, x=xl0)
eff <- (zh1$qi$ev - zl1$qi$ev) - (zh0$qi$ev - zl0$qi$ev)
quantile(eff, c(.025,.975))
## 2.5% 97.5%
## -0.10744063 0.06750388
hist(eff)
I did try to run this, but I’m not sure what I missed. It didn’t go through. The theme_tufte function wasn’t working and I couldn’t decide on an appropriate variable. I decided to include the data, so I can receive feedback and learn from this: AppMod1 <- (D1$qi)
AppMod1 <- data.frame(AppMod1$ev)
AppMod1 <- melt(AppMod1, measure=1:100)
AppMod1 <- ggplot(AppMod1, aes(variable, value)) +
geom_point() +
geom_smooth(colour=“blue”) +
theme_tufte()
AppMod1 ``` ##Continued…
Based on these results it appears there is really not an effect that marriage plays on the relationship between education and sex, as the histogram is a relatively normal distribution. I have failed to reject the null hypothesis and the results are not significant. another way to show any relationship of marriage on education and sex:
Using count variables, we want to estimate the likelihood of having more hours to relax based on race and marital status.
ExamWork$wrkhome <- as.numeric(ExamWork$wrkhome)
D2 <- zelig(hrsrelax~ WhiteRace + BinaryMarital + age + wrkhome, data = ExamWork, model="poisson")
stargazer(D2, type="html")
| Dependent variable: | |
| hrsrelax | |
| WhiteRace | -0.016 |
| (0.035) | |
| BinaryMarital | 0.110*** |
| (0.037) | |
| age | 0.007*** |
| (0.001) | |
| wrkhome | -0.035*** |
| (0.009) | |
| Constant | 1.033*** |
| (0.069) | |
| Observations | 1,218 |
| Log Likelihood | -2,782.452 |
| Akaike Inf. Crit. | 5,574.904 |
| Note: | p<0.1; p<0.05; p<0.01 |
Based on these results we see that there is not significant relationship with race and the amount of hours relaxing, but there are positive relationships based on being married and an increase in age and a negative relationship with working from home. We see the log odds of .110 and .007 for being married and having an increase in time to relax and age with having time to relax respectively, as well as a log odds of -.035 for working from home and having time to relax. This shows that your chances of having more relaxation time improve if you are married and the older you are, but the more you work from home, the less likely is your chance of having time to relax.
I’d like to examine the probility distribution of the relationship of age and marital status with the relationship to hours of relaxation to see if there is an interaction effect.
xh2 <- setx(D2, age= mean(ExamWork$age)+sd(ExamWork$age), BinaryMarital = 1)
xl2 <- setx(D2, educ = mean(ExamWork$age), BinaryMarital = 1)
xh3 <- setx(D2, educ = mean(ExamWork$age)+sd(ExamWork$age), BinaryMarital = 0)
xl3 <- setx(D2, educ = mean(ExamWork$age), BinaryMarital = 0)
zh2 <- sim(D2, x= xh2)
zl2 <- sim(D2, x=xl2)
zh3 <- sim(D2, x=xh3)
zl3 <- sim(D2, x=xl3)
gee <- (zh2$qi$ev - zl2$qi$ev) - (zh3$qi$ev - zl3$qi$ev)
quantile(gee, c(.025,.975))
## 2.5% 97.5%
## -0.07527261 0.84328562
hist(gee)
Based on these results it does seem to indicate that there is a significant relationship between marriage and age on the effect of hours to relax. #Interaction2 I’d also like to examine what kind of distribution the relationship of marriage and hours worked from home has on the amount of time to relax
xh4 <- setx(D2, age= mean(ExamWork$wrkhome)+sd(ExamWork$age), BinaryMarital = 1)
xl4 <- setx(D2, educ = mean(ExamWork$wrkhome), BinaryMarital = 1)
xh5 <- setx(D2, educ = mean(ExamWork$workhome)+sd(ExamWork$age), BinaryMarital = 0)
xl5 <- setx(D2, educ = mean(ExamWork$workhome), BinaryMarital = 0)
zh4 <- sim(D2, x= xh4)
zl4 <- sim(D2, x=xl4)
zh5 <- sim(D2, x=xh5)
zl5 <- sim(D2, x=xl5)
jay <- (zh4$qi$ev - zl4$qi$ev) - (zh5$qi$ev - zl5$qi$ev)
quantile(jay, c(.025,.975))
## 2.5% 97.5%
## -1.0766185 -0.3412463
hist(jay)
Based on these results, it appears that again marriage and hours worked from home have an effect on the amount of hours relaxed.