library(AER)
data(SmokeBan)
head(SmokeBan)
unique(SmokeBan$ban)
[1] yes no
Levels: no yes
library(dplyr)
SmokeBan1<-SmokeBan%>%
mutate(smoker1=factor(ifelse(smoker=="yes",1,0)))
head(SmokeBan1)
m0<-glm(smoker1 ~ ban, family = binomial, data=SmokeBan1)
summary(m0)
Call:
glm(formula = smoker1 ~ ban, family = binomial, data = SmokeBan1)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8269 -0.8269 -0.6904 -0.6904 1.7612
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.89735 0.03529 -25.425 <2e-16 ***
banyes -0.41534 0.04719 -8.801 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11074 on 9999 degrees of freedom
Residual deviance: 10997 on 9998 degrees of freedom
AIC: 11001
Number of Fisher Scoring iterations: 4
The first model shows the relationship between workplace smoking bans and workers. According to this model, workers have 0.41 lower odds of smoking if there is a workplace smoking ban in place. The intercept represents the reference group which is no workplace smoking ban. Both groups are statistically significant (p<.05).
m1<-glm(smoker1 ~ ban+gender, family = binomial, data=SmokeBan1)
summary(m1)
Call:
glm(formula = smoker1 ~ ban + gender, family = binomial, data = SmokeBan1)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8444 -0.8088 -0.6780 -0.6780 1.7793
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.84779 0.04203 -20.169 <2e-16 ***
banyes -0.40359 0.04751 -8.495 <2e-16 ***
genderfemale -0.10178 0.04737 -2.149 0.0317 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11074 on 9999 degrees of freedom
Residual deviance: 10993 on 9997 degrees of freedom
AIC: 10999
Number of Fisher Scoring iterations: 4
Model 2 consists of two binary independent variables, ban and gender to see the difference in workplace smoking bans and gender among workers. As we can see, females have a 0.10 lower odds of smoking than males (reference group) when a workplace smoking ban is in place. The results for females is also statistically significant (p<.05). This model shows that there are differences in workplace smoking bans and gender among workers.
m2<-glm(smoker1 ~ ban+gender+education, family = binomial, data=SmokeBan1)
summary(m2)
Call:
glm(formula = smoker1 ~ ban + gender + education, family = binomial,
data = SmokeBan1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0513 -0.8013 -0.6119 -0.4035 2.2578
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.30403 0.07442 -4.085 4.40e-05 ***
banyes -0.25975 0.04895 -5.307 1.12e-07 ***
genderfemale -0.18552 0.04883 -3.799 0.000145 ***
educationhs -0.22220 0.07875 -2.822 0.004778 **
educationsome college -0.55194 0.08193 -6.737 1.62e-11 ***
educationcollege -1.27630 0.09559 -13.352 < 2e-16 ***
educationmaster -1.71818 0.12735 -13.492 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11074 on 9999 degrees of freedom
Residual deviance: 10565 on 9993 degrees of freedom
AIC: 10579
Number of Fisher Scoring iterations: 4
Model 3 shows female workers who have a masters degree have 1.71 lower odds of smoking than male workers with who are high school drop outs (lowest education level attained) where a smoking ban is in place. All results are statistically significant (p<0.05). This model shows that females have lower odds of smoking as their education level attainment increases. A variety of reasons can be assumed for this result such as pregnancy while on the job or obeying job rules and regulations.
m3<-glm(smoker1 ~ ban+gender*education, family = binomial, data=SmokeBan1)
summary(m3)
Call:
glm(formula = smoker1 ~ ban + gender * education, family = binomial,
data = SmokeBan1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0370 -0.7942 -0.6310 -0.4176 2.2283
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.33954 0.09454 -3.591 0.000329
banyes -0.25996 0.04896 -5.310 1.10e-07
genderfemale -0.10659 0.13784 -0.773 0.439366
educationhs -0.15571 0.11037 -1.411 0.158291
educationsome college -0.54738 0.11478 -4.769 1.85e-06
educationcollege -1.17327 0.12987 -9.034 < 2e-16
educationmaster -1.79603 0.17295 -10.385 < 2e-16
genderfemale:educationhs -0.13024 0.15783 -0.825 0.409252
genderfemale:educationsome college -0.02438 0.16386 -0.149 0.881708
genderfemale:educationcollege -0.21928 0.19078 -1.149 0.250386
genderfemale:educationmaster 0.18095 0.25434 0.711 0.476812
(Intercept) ***
banyes ***
genderfemale
educationhs
educationsome college ***
educationcollege ***
educationmaster ***
genderfemale:educationhs
genderfemale:educationsome college
genderfemale:educationcollege
genderfemale:educationmaster
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11074 on 9999 degrees of freedom
Residual deviance: 10561 on 9989 degrees of freedom
AIC: 10583
Number of Fisher Scoring iterations: 4
Model 4 is an interaction model between gender and education. The results for this interaction model are not significant (p>.05), therefore, I decided to replace the variable education with hispanic in the next model. An interaction does not exist gender and education.
m4<-glm(smoker1 ~ ban+gender+hispanic, family = binomial, data=SmokeBan1)
summary(m4)
Call:
glm(formula = smoker1 ~ ban + gender + hispanic, family = binomial,
data = SmokeBan1)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8532 -0.7888 -0.6832 -0.6286 1.8540
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.82319 0.04318 -19.064 <2e-16 ***
banyes -0.40742 0.04755 -8.568 <2e-16 ***
genderfemale -0.10564 0.04741 -2.228 0.0259 *
hispanicyes -0.18486 0.07646 -2.418 0.0156 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11074 on 9999 degrees of freedom
Residual deviance: 10987 on 9996 degrees of freedom
AIC: 10995
Number of Fisher Scoring iterations: 4
Model 5 shows that workers who are hispanic have 0.18 lower odds of smoking than workers who are not hispanic when a workplace smoking ban is in place. The results are significant (p<.05)
m5<-glm(smoker1 ~ ban+gender*hispanic, family = binomial, data=SmokeBan1)
summary(m5)
Call:
glm(formula = smoker1 ~ ban + gender * hispanic, family = binomial,
data = SmokeBan1)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8512 -0.8230 -0.6894 -0.5720 1.9448
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.84844 0.04428 -19.159 < 2e-16 ***
banyes -0.40733 0.04758 -8.562 < 2e-16 ***
genderfemale -0.06020 0.05017 -1.200 0.23016
hispanicyes 0.01958 0.10383 0.189 0.85045
genderfemale:hispanicyes -0.43114 0.15468 -2.787 0.00532 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11074 on 9999 degrees of freedom
Residual deviance: 10979 on 9995 degrees of freedom
AIC: 10989
Number of Fisher Scoring iterations: 4
Model 6 is another interaction model between gender and hispanic to see the relationship to smoking when a ban is in place. Female workers who are hispanic have 0.43 lower odds of smoking where a workplace smoking ban is in place than male workers who are not hispanic. The results are significant (p<.05) indicating that an interaction exists between gender and hispanic.
library(texreg)
htmlreg(list(m0,m1,m2,m3,m4,m5))
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | ||
|---|---|---|---|---|---|---|---|
| (Intercept) | -0.90*** | -0.85*** | -0.30*** | -0.34*** | -0.82*** | -0.85*** | |
| (0.04) | (0.04) | (0.07) | (0.09) | (0.04) | (0.04) | ||
| banyes | -0.42*** | -0.40*** | -0.26*** | -0.26*** | -0.41*** | -0.41*** | |
| (0.05) | (0.05) | (0.05) | (0.05) | (0.05) | (0.05) | ||
| genderfemale | -0.10* | -0.19*** | -0.11 | -0.11* | -0.06 | ||
| (0.05) | (0.05) | (0.14) | (0.05) | (0.05) | |||
| educationhs | -0.22** | -0.16 | |||||
| (0.08) | (0.11) | ||||||
| educationsome college | -0.55*** | -0.55*** | |||||
| (0.08) | (0.11) | ||||||
| educationcollege | -1.28*** | -1.17*** | |||||
| (0.10) | (0.13) | ||||||
| educationmaster | -1.72*** | -1.80*** | |||||
| (0.13) | (0.17) | ||||||
| genderfemale:educationhs | -0.13 | ||||||
| (0.16) | |||||||
| genderfemale:educationsome college | -0.02 | ||||||
| (0.16) | |||||||
| genderfemale:educationcollege | -0.22 | ||||||
| (0.19) | |||||||
| genderfemale:educationmaster | 0.18 | ||||||
| (0.25) | |||||||
| hispanicyes | -0.18* | 0.02 | |||||
| (0.08) | (0.10) | ||||||
| genderfemale:hispanicyes | -0.43** | ||||||
| (0.15) | |||||||
| AIC | 11001.33 | 10998.72 | 10578.64 | 10582.89 | 10994.72 | 10988.87 | |
| BIC | 11015.75 | 11020.36 | 10629.11 | 10662.21 | 11023.56 | 11024.92 | |
| Log Likelihood | -5498.67 | -5496.36 | -5282.32 | -5280.45 | -5493.36 | -5489.44 | |
| Deviance | 10997.33 | 10992.72 | 10564.64 | 10560.89 | 10986.72 | 10978.87 | |
| Num. obs. | 10000 | 10000 | 10000 | 10000 | 10000 | 10000 | |
| p < 0.001, p < 0.01, p < 0.05 | |||||||
Based on the AIC and BIC values, Model 3 is the best fit model for this analysis. While Model 4 has the lowest deviance (10560.89), Model 3’s AIC (10578.64) and BIC(10629.11) is the lowest and has the most explanatory strength while requiring the fewest variables. Model 4 is not statistically significant as a better fitting model than the preceding model (Model 3)
anova(m0,m1,m2,m3,m4,m5, test= "Chisq")
Analysis of Deviance Table
Model 1: smoker1 ~ ban
Model 2: smoker1 ~ ban + gender
Model 3: smoker1 ~ ban + gender + education
Model 4: smoker1 ~ ban + gender * education
Model 5: smoker1 ~ ban + gender + hispanic
Model 6: smoker1 ~ ban + gender * hispanic
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 9998 10997
2 9997 10993 1 4.61 0.031805 *
3 9993 10565 4 428.09 < 2.2e-16 ***
4 9989 10561 4 3.74 0.441640
5 9996 10987 -7 -425.83 < 2.2e-16 ***
6 9995 10979 1 7.85 0.005076 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Model 4 has the lowest deviance (10561) but the results are not statistically significant (p>.05). Model 3 has the second lowest deviance (10565) and the results are significant (p<.05) indicating that Model 3 is the best-fit model.
library(visreg)
visreg(m2, "ban", by="gender", scale="response")
This plot shows the likelihood of smoking between male and female workers based on whether a workplace smoking ban is in place.
visreg(m5, "hispanic", by="gender", scale="response")
Plot shows the likelihood of smoking of hispanic people with differences in gender.
For this assignment, I chose to use the cross-sectional data set “SmokeBan” which contains 10,000 observations and seven variables.This is a subset of a 18,090-observation data set collected as part of the National Health Interview Survey in 1991 and then again in 1993 with different respondents. This dataset is used to estimate the effect of workplace smoking bans on smoking of indoor workers.I created a series of logisitic models to see differences in the independent variables that influenced the dependent variable. The dependent variable used in all models is “smoker1” which has levels “yes or no” indicating whether a person is a current smoker. I started with a simple model using one independent variable to see the relationship between ban and smoking. The first model shows the difference in ban among workers. In other words, the likelihood of smoking if a workplace smoking ban is in place. The second model adds in another categorical independent variable which is gender. This model shows the likelihood of smoking among male/female smokers when a ban is in place. I added a third independent variable education along with ban and gender to assess the likelihood of smoking to create more complex logistic regression models. I included an interaction model in the fourth model between gender and education to see whether a third variable influences my independent and dependent variable. However, since the result of the interaction model was not significant, I decided to replace education with another variable. The fifth model assess the likelihood of smoking based on the following factors: ban, gender, and hispanic. The sixth model and last model includes an interaction between gender and hispanic with differences in ban to see the effect on smoking.The results are shown below each model.
Online complements to Stock and Watson (2007)
Evans, W. N., Farrelly, M.C., and Montgomery, E. (1999). Do Workplace Smoking Bans Reduce Smoking? American Economic Review, 89, 728-747.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.