HW2 Dem 7283

For this HW assignment I will utilize the 2017 BRFSS data set. My dependent variable will be whether or not someone is hispanic or not (1 for Hispanic and 0 for not Hispanic). In order to determine if a person is Hispanic or not I will utilize the folloinfg independetn variables:

Education (caetgorical) 3 levels: Less than High School, High school and College +

Research Question: Can we determine whether or not someone is Hispanic with knowledge of their education, income and age?

First we will determine if the independent vaariables are significant with repsect to the dependent variable:

For education:

svychisq(~hisp + edu, design = des)

## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~hisp + edu, design = des)
## F = 192.61, ndf = 1.9379, ddf = 19825.0000, p-value < 2.2e-16

For Income:

svychisq(~hisp + inc, design = des)

## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~hisp + inc, design = des)
## F = 124.42, ndf = 1.9894, ddf = 20352.0000, p-value < 2.2e-16

For Age:

svychisq(~hisp + agec, design = des)

## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~hisp + agec, design = des)
## F = 40.36, ndf = 1.9949, ddf = 20408.0000, p-value < 2.2e-16

All three p-values are significant. We will continue:

Next we will look at the percentage of adults who are Hispanic by education:

qplot(x = tabe$edu, y = tabe$hisp, data = tabe, xlab = "Education", ylab = "% Hispanic") +
  geom_errorbar(aes(x=edu, ymin=hisp - 1.96*se, ymax= hisp+1.96*se), width=.25)+
  ggtitle(label = "% of adults who are Hispanic by Education")

Next by Income:

qplot(x = tabi$inc, y = tabi$hisp, data = tabi, xlab = "Income", ylab = "% Hispanic") +
  geom_errorbar(aes(x=inc, ymin=hisp - 1.96*se, ymax= hisp+1.96*se), width=.25)+
  ggtitle(label = "% of adults who are Hispanic by Income")

And finally by Age

qplot(x = taba$agec, y = taba$hisp, data = taba, xlab = "Age", ylab = "% Hispanic") +
  geom_errorbar(aes(x=agec, ymin=hisp - 1.96*se, ymax= hisp+1.96*se), width=.25)+
  ggtitle(label = "% of adults who are Hispanic by Age")

From these graphs we can see that the greatest proportion of the Hispanic populatin in our survey tend to be have less educating, are younger and in the lower third of the income categories.

Here are the results of the model:

fit.logit <- svyglm(hisp~edu+inc+agec,
                    design=des,
                    family=binomial,
                    data = hw2)

## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!

summary(fit.logit)

Call: svyglm(formula = hisp ~ edu + inc + agec, design = des, family = binomial, data = hw2)

Survey design: svydesign(ids = ~1, weights = ~wt, data = hw2)

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3001 0.1143 2.625 0.00867 ** educollege+ -0.6490 0.1120 -5.797 6.97e-09 edult hs 1.7032 0.1873 9.091 < 2e-16 inc>$75k -1.1021 0.1394 -7.908 2.89e-15 inc$25k to < $75k -0.3893 0.1223 -3.183 0.00146 agec45-59 -0.6125 0.1233 -4.968 6.87e-07 agec60+ -1.3088 0.1324 -9.883 < 2e-16 * — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1

(Dispersion parameter for binomial family taken to be 1.001847)

Number of Fisher Scoring iterations: 5

stargazer(fit.logit,
          type = "html",
          tyle = "demography",
          covariate.labels = c( "High School", "Less Than High School", "Greater than $75k", "$25k to $75k", "age 45-59", "age 60+", "constant"),
          ci = T,
          column.sep.width = "10pt")


	Dependent variable:

	hisp

High School	-0.649^***
	(-0.868, -0.430)

Less Than High School	1.703^***
	(1.336, 2.070)

Greater than 75k	-1.102^***
	(-1.375, -0.829)

25k to 75k	-0.389^***
	(-0.629, -0.150)

age 45-59	-0.612^***
	(-0.854, -0.371)

age 60+	-1.309^***
	(-1.568, -1.049)

constant	0.300^***
	(0.076, 0.524)


Observations	10,231
Log Likelihood	-5,078.742
Akaike Inf. Crit.	10,171.480

Note:	p<0.1; p<0.05; p<0.01

demography

#print(oddsratio)

And here we have the odds ratio and respective 95% CI’s

oddsratio <- exp(cbind("Odds ratio" = coef(fit.logit),
                        confint.default(fit.logit, level = 0.95)))
stargazer(oddsratio,
          type = "html",
          style = "demography",
          ci = T,
          column.sep.width = "10pt")


	Odds ratio	2.5 %	97.5 %

(Intercept)	1.350	1.079	1.689
educollege+	0.523	0.420	0.651
edult hs	5.491	3.804	7.927
inc>	0.332	0.253	0.436
incto <	0.678	0.533	0.861
agec45-59	0.542	0.426	0.690
agec60+	0.270	0.208	0.350

Here are all of the respective groupings of our three different variables:

dat <- expand.grid(inc=levels(hw2$inc),edu=levels(hw2$edu), agec=levels(hw2$agec))
#str(dat)

fit <- predict(fit.logit, newdat=dat, type="response")
dat$fitted.prob.lrm <- round(fit, 3)
knitr::kable(head(dat, n=27))

inc	edu	agec	fitted.prob.lrm
<$25k	hs	18-44	0.574
>$75k	hs	18-44	0.310
$25k to < $75k	hs	18-44	0.478
<$25k	college+	18-44	0.414
>$75k	college+	18-44	0.190
$25k to < $75k	college+	18-44	0.323
<$25k	lt hs	18-44	0.881
>$75k	lt hs	18-44	0.711
$25k to < $75k	lt hs	18-44	0.834
<$25k	hs	45-59	0.423
>$75k	hs	45-59	0.196
$25k to < $75k	hs	45-59	0.331
<$25k	college+	45-59	0.277
>$75k	college+	45-59	0.113
$25k to < $75k	college+	45-59	0.206
<$25k	lt hs	45-59	0.801
>$75k	lt hs	45-59	0.572
$25k to < $75k	lt hs	45-59	0.731
<$25k	hs	60+	0.267
>$75k	hs	60+	0.108
$25k to < $75k	hs	60+	0.198
<$25k	college+	60+	0.160
>$75k	college+	60+	0.060
$25k to < $75k	college+	60+	0.114
<$25k	lt hs	60+	0.667
>$75k	lt hs	60+	0.399
$25k to < $75k	lt hs	60+	0.576

Now that we have the preliminaries taken care of we’ll take a look at a couple of interesting cases.

First, let’s look at a person in the 18-44 age category, who makes less than $25k a year without a high school education:

dat[which(dat$inc=="<$25k"&dat$edu=="lt hs"&dat$agec=="18-44"),]

##     inc   edu  agec fitted.prob.lrm
## 7 <$25k lt hs 18-44           0.881

And compare that with someone that is of the same categories, but has some colle education

dat[which(dat$inc=="<$25k"&dat$edu=="college+"&dat$agec=="18-44"),]

##     inc      edu  agec fitted.prob.lrm
## 4 <$25k college+ 18-44           0.414

And we can see that the percent chance that that person is Hispanic decreases by almost 50% from 88.1% to 41.4%.

From the research question, it appears that we are able to take an “educated guess” on whether someone were hispanic or not based on the knowledge of their age, income and education level.

HW2 Dem 7283

Craig, Scott

2/8/2019

For education:

For Income:

For Age: