Use a different outcome than the one we used today.
Use a different binary concept than one that we used today.
Interpret the odds ratios for each variable in your model.
Estimate and interpret the marginal effects for a focal binary variable:
MEMs (marginal effects at the means): by R and by hand
MERs (marginal effects at representative values): by R and by hand
AMEs (average marginal effects): by R
I explore the relationship between low (<= $50,000) and high income (> $50,000) and whether the respondent owns or rents. I also include gender. Our IV will be high or low income set to a binomial dummy variable. Our DV will be whether the respondent rents or owns their home. I make a working dataset with the variables INCOME3, SEXVAR, RENTHOM1, and X_AGE80.
library(foreign)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(margins)
library(emmeans)
library(ggplot2)
library(writexl)
library(reticulate)
brfss21 = read.xport("/Users/briansurratt/Library/CloudStorage/OneDrive-UniversityofTexasatSanAntonio/DEM 7283 Stats II/Assignment 4/LLCP2021.XPT_")
#codebook: https://www.cdc.gov/brfss/annual_data/annual_2021.html
head(brfss21)
## X_STATE FMONTH IDATE IMONTH IDAY IYEAR DISPCODE SEQNO X_PSU
## 1 1 1 01192021 01 19 2021 1100 2021000001 2.021e+09
## 2 1 1 01212021 01 21 2021 1100 2021000002 2.021e+09
## 3 1 1 01212021 01 21 2021 1100 2021000003 2.021e+09
## 4 1 1 01172021 01 17 2021 1100 2021000004 2.021e+09
## 5 1 1 01152021 01 15 2021 1100 2021000005 2.021e+09
## 6 1 1 01142021 01 14 2021 1100 2021000006 2.021e+09
## CTELENM1 PVTRESD1 COLGHOUS STATERE1 CELPHON1 LADULT1 COLGSEX NUMADULT LANDSEX
## 1 1 1 NA 1 2 1 NA 2 NA
## 2 1 1 NA 1 2 1 NA 2 NA
## 3 1 1 NA 1 2 1 NA 2 NA
## 4 1 1 NA 1 2 1 NA 2 NA
## 5 1 1 NA 1 2 1 NA 2 NA
## 6 1 1 NA 1 2 1 NA 1 1
## NUMMEN NUMWOMEN RESPSLCT SAFETIME CTELNUM1 CELLFON5 CADULT1 CELLSEX PVTRESD3
## 1 1 1 2 NA NA NA NA NA NA
## 2 1 1 2 NA NA NA NA NA NA
## 3 1 1 2 NA NA NA NA NA NA
## 4 1 1 2 NA NA NA NA NA NA
## 5 1 1 1 NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA NA
## CCLGHOUS CSTATE1 LANDLINE HHADULT SEXVAR GENHLTH PHYSHLTH MENTHLTH POORHLTH
## 1 NA NA NA NA 2 5 20 10 88
## 2 NA NA NA NA 2 3 88 88 NA
## 3 NA NA NA NA 2 2 88 88 NA
## 4 NA NA NA NA 2 2 88 10 88
## 5 NA NA NA NA 1 5 30 88 30
## 6 NA NA NA NA 1 3 88 88 NA
## PRIMINSR PERSDOC3 MEDCOST1 CHECKUP1 EXERANY2 BPHIGH6 BPMEDS CHOLCHK3 TOLDHI3
## 1 3 1 2 2 2 3 NA 2 1
## 2 1 2 2 1 1 1 1 2 1
## 3 2 2 2 1 2 1 1 2 2
## 4 2 1 2 1 1 1 1 2 1
## 5 3 1 2 1 1 4 NA 2 1
## 6 3 1 2 1 2 3 NA 2 2
## CHOLMED3 CVDINFR4 CVDCRHD4 CVDSTRK3 ASTHMA3 ASTHNOW CHCSCNCR CHCOCNCR
## 1 1 2 2 2 1 1 2 2
## 2 1 2 1 2 2 NA 2 2
## 3 2 2 1 2 2 NA 2 2
## 4 2 2 2 2 2 NA 2 2
## 5 1 1 7 1 2 NA 2 2
## 6 2 2 2 2 2 NA 2 2
## CHCCOPD3 ADDEPEV3 CHCKDNY2 DIABETE4 DIABAGE3 HAVARTH5 ARTHEXER ARTHEDU
## 1 1 2 2 3 NA 1 2 2
## 2 2 2 1 1 98 1 1 2
## 3 2 2 2 1 98 2 NA NA
## 4 2 2 2 1 56 2 NA NA
## 5 2 2 2 1 65 2 NA NA
## 6 1 2 2 3 NA 2 NA NA
## LMTJOIN3 ARTHDIS2 JOINPAI2 MARITAL EDUCA RENTHOM1 NUMHHOL3 NUMPHON3 CPDEMO1B
## 1 2 1 8 1 4 1 1 1 1
## 2 1 1 10 9 6 1 2 NA 1
## 3 NA NA NA 3 4 1 2 NA 1
## 4 NA NA NA 1 4 1 2 NA 1
## 5 NA NA NA 1 3 1 2 NA 8
## 6 NA NA NA 1 5 1 2 NA 1
## VETERAN3 EMPLOY1 CHILDREN INCOME3 PREGNANT WEIGHT2 HEIGHT3 DEAF BLIND DECIDE
## 1 2 7 88 5 NA 72 411 2 2 2
## 2 2 8 88 77 NA 7777 506 2 1 1
## 3 2 7 88 3 NA 170 505 2 2 2
## 4 2 7 88 7 NA 195 504 2 2 2
## 5 2 8 88 4 NA 206 511 1 2 2
## 6 2 7 88 6 NA 195 603 2 2 2
## DIFFWALK DIFFDRES DIFFALON SMOKE100 SMOKDAY2 USENOW3 ECIGNOW1 ALCDAY5
## 1 2 2 1 1 3 3 3 888
## 2 1 2 1 2 NA 3 3 888
## 3 2 2 2 2 NA 3 3 888
## 4 2 2 2 2 NA 3 3 101
## 5 1 2 2 2 NA 2 3 888
## 6 1 2 2 1 3 2 3 888
## AVEDRNK3 DRNK3GE5 MAXDRNKS FLUSHOT7 FLSHTMY3 IMFVPLA2 PNEUVAC4 HIVTST7
## 1 NA NA NA 1 92020 1 1 2
## 2 NA NA NA 2 NA NA 2 2
## 3 NA NA NA 2 NA NA 2 2
## 4 3 1 6 1 102020 1 2 2
## 5 NA NA NA 1 92020 1 1 1
## 6 NA NA NA 1 102020 5 1 2
## HIVTSTD3 FRUIT2 FRUITJU2 FVGREEN1 FRENCHF1 POTATOE1 VEGETAB2 PDIABTST
## 1 NA 101 555 204 203 201 101 2
## 2 NA 101 555 201 555 201 207 NA
## 3 NA 101 555 555 201 201 203 NA
## 4 NA 203 205 303 204 308 205 NA
## 5 777777 101 555 101 202 202 101 NA
## 6 NA 202 555 201 555 201 201 2
## PREDIAB1 INSULIN1 BLDSUGAR FEETCHK3 DOCTDIAB CHKHEMO3 FEETCHK EYEEXAM1
## 1 3 NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA
## 6 3 NA NA NA NA NA NA NA
## DIABEYE DIABEDU TOLDCFS HAVECFS WORKCFS TOLDHEPC TRETHEPC PRIRHEPC HAVEHEPC
## 1 NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA NA
## HAVEHEPB MEDSHEPB HPVADVC4 HPVADSHT TETANUS1 SHINGLE2 LCSFIRST LCSLAST
## 1 NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA
## LCSNUMCG LCSCTSCN HADMAM HOWLONG CERVSCRN CRVCLCNC CRVCLPAP CRVCLHPV HADHYST2
## 1 NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA NA
## PSATEST1 PSATIME1 PCPSARS2 PCSTALK HADSIGM4 COLNSIGM COLNTES1 SIGMTES1
## 1 NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA
## LASTSIG4 COLNCNCR VIRCOLO1 VCLNTES1 SMALSTOL STOLTEST STOOLDN1 BLDSTFIT
## 1 NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA
## SDNATES1 CNCRDIFF CNCRAGE CNCRTYP1 CSRVTRT3 CSRVDOC1 CSRVSUM CSRVRTRN
## 1 NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA
## CSRVINST CSRVINSR CSRVDEIN CSRVCLIN CSRVPAIN CSRVCTL2 HOMBPCHK HOMRGCHK
## 1 NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA
## WHEREBP SHAREBP WTCHSALT DRADVISE CIMEMLOS CDHOUSE CDASSIST CDHELP CDSOCIAL
## 1 NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA NA
## CDDISCUS CAREGIV1 CRGVREL4 CRGVLNG1 CRGVHRS1 CRGVPRB3 CRGVALZD CRGVPER1
## 1 NA 2 NA NA NA NA NA NA
## 2 NA 2 NA NA NA NA NA NA
## 3 NA 2 NA NA NA NA NA NA
## 4 NA 2 NA NA NA NA NA NA
## 5 NA 2 NA NA NA NA NA NA
## 6 NA 2 NA NA NA NA NA NA
## CRGVHOU1 CRGVEXPT ACEDEPRS ACEDRINK ACEDRUGS ACEPRISN ACEDIVRC ACEPUNCH
## 1 NA 2 2 2 2 2 1 1
## 2 NA 2 2 2 2 2 2 1
## 3 NA 2 2 2 2 2 2 1
## 4 NA 7 1 2 2 2 2 1
## 5 NA 2 2 2 2 2 2 1
## 6 NA 2 2 2 2 2 2 1
## ACEHURT1 ACESWEAR ACETOUCH ACETTHEM ACEHVSEX ACEADSAF ACEADNED MARIJAN1
## 1 1 1 1 1 1 5 3 NA
## 2 1 1 1 1 1 5 5 NA
## 3 1 1 1 1 1 5 5 NA
## 4 3 1 1 1 1 5 5 NA
## 5 1 1 1 1 1 5 5 NA
## 6 1 1 1 1 1 5 5 NA
## USEMRJN3 RSNMRJN2 LASTSMK2 STOPSMK2 FIREARM5 GUNLOAD LOADULK2 RCSGENDR
## 1 NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA
## RCSRLTN2 CASTHDX2 CASTHNO2 BIRTHSEX SOMALE SOFEMALE TRNSGNDR QSTVER QSTLANG
## 1 NA NA NA NA NA NA NA 10 1
## 2 NA NA NA NA NA NA NA 10 1
## 3 NA NA NA NA NA NA NA 10 1
## 4 NA NA NA NA NA NA NA 10 1
## 5 NA NA NA NA NA NA NA 10 1
## 6 NA NA NA NA NA NA NA 10 1
## X_METSTAT X_URBSTAT MSCODE X_STSTR X_STRWT X_RAWRAKE X_WT2RAKE X_IMPRACE
## 1 1 1 1 11011 39.76616 2 79.53232 1
## 2 1 1 2 11011 39.76616 2 79.53232 2
## 3 1 1 1 11011 39.76616 2 79.53232 2
## 4 1 1 3 11011 39.76616 2 79.53232 1
## 5 2 1 2 11011 39.76616 2 79.53232 6
## 6 2 2 5 11011 39.76616 1 39.76616 1
## X_CHISPNC X_CRACE1 X_CPRACE1 CAGEG X_CLLCPWT X_DUALUSE X_DUALCOR X_LLCPWT2
## 1 9 NA NA NA NA 1 0.5190191 874.2429
## 2 9 NA NA NA NA 1 0.5190191 874.2429
## 3 9 NA NA NA NA 1 0.5190191 874.2429
## 4 9 NA NA NA NA 1 0.5190191 874.2429
## 5 9 NA NA NA NA 9 NA 2144.7690
## 6 9 NA NA NA NA 1 0.5190191 437.1215
## X_LLCPWT X_RFHLTH X_PHYS14D X_MENT14D X_HLTHPLN X_HCVU652 X_TOTINDA
## 1 744.7455 2 3 2 1 9 2
## 2 299.1374 1 1 1 1 9 1
## 3 587.8630 1 1 1 1 9 2
## 4 1099.6216 1 1 2 1 1 1
## 5 1711.8259 2 3 1 1 9 1
## 6 499.3652 1 1 1 1 9 2
## X_RFHYPE6 X_CHOLCH3 X_RFCHOL3 X_MICHD X_LTASTH1 X_CASTHM1 X_ASTHMS1 X_DRDXAR3
## 1 1 1 2 2 2 2 1 1
## 2 2 1 2 1 1 1 3 1
## 3 2 1 1 1 1 1 3 2
## 4 2 1 2 2 1 1 3 2
## 5 1 1 2 1 1 1 3 2
## 6 1 1 1 2 1 1 3 2
## X_LMTACT3 X_LMTWRK3 X_PRACE1 X_MRACE1 X_HISPANC X_RACE X_RACEG21 X_RACEGR3
## 1 2 1 1 1 2 1 1 1
## 2 1 1 2 2 2 2 2 2
## 3 3 3 2 2 2 2 2 2
## 4 3 3 1 1 2 1 1 1
## 5 3 3 1 7 2 7 2 4
## 6 3 3 1 1 2 1 1 1
## X_RACEPRV X_SEX X_AGEG5YR X_AGE65YR X_AGE80 X_AGE_G HTIN4 HTM4 WTKG3 X_BMI5
## 1 1 2 11 2 70 6 59 150 3266 1454
## 2 2 2 10 2 67 6 66 168 NA NA
## 3 2 2 11 2 72 6 65 165 7711 2829
## 4 1 2 9 1 62 5 64 163 8845 3347
## 5 7 1 12 2 76 6 71 180 9344 2873
## 6 1 1 13 2 80 6 75 191 8845 2437
## X_BMI5CAT X_RFBMI5 X_CHLDCNT X_EDUCAG X_INCOMG1 X_SMOKER3 X_RFSMOK3 X_CURECI1
## 1 1 1 1 2 3 3 1 1
## 2 NA 9 1 4 9 4 1 1
## 3 3 2 1 2 2 4 1 1
## 4 4 2 1 2 5 4 1 1
## 5 3 2 1 1 2 4 1 1
## 6 2 1 1 3 4 3 1 1
## DRNKANY5 DROCDY3_ X_RFBING5 X_DRNKWK1 X_RFDRHV7 X_FLSHOT7 X_PNEUMO3 X_AIDTST4
## 1 2 0 1 0 1 1 1 2
## 2 2 0 1 0 1 2 2 2
## 3 2 0 1 0 1 2 2 2
## 4 1 14 2 300 1 NA NA 2
## 5 2 0 1 0 1 1 1 1
## 6 2 0 1 0 1 1 1 2
## FTJUDA2_ FRUTDA2_ GRENDA1_ FRNCHDA_ POTADA1_ VEGEDA2_ X_MISFRT1 X_MISVEG1
## 1 0 100 57 43 14 100 0 0
## 2 0 100 14 0 14 100 0 0
## 3 0 100 0 14 14 43 0 0
## 4 71 43 10 57 27 71 0 0
## 5 0 100 100 29 29 100 0 0
## 6 0 29 14 0 14 14 0 0
## X_FRTRES1 X_VEGRES1 X_FRUTSU1 X_VEGESU1 X_FRTLT1A X_VEGLT1A X_FRT16A X_VEG23A
## 1 1 1 100 214 1 1 1 1
## 2 1 1 100 128 1 1 1 1
## 3 1 1 100 71 1 2 1 1
## 4 1 1 114 165 1 1 1 1
## 5 1 1 100 258 1 1 1 1
## 6 1 1 29 42 2 2 1 1
## X_FRUITE1 X_VEGETE1
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
#checking sample size
nrow(brfss21)
## [1] 438693
dat <- brfss21 %>%
select(INCOME3, SEXVAR, RENTHOM1, X_AGE80)
nrow(dat)
## [1] 438693
First, let’s clean INCOME3 and make a dummy variable.
tabyl(dat$INCOME3)
## dat$INCOME3 n percent valid_percent
## 1 10878 0.02479638 0.02530674
## 2 11530 0.02628262 0.02682356
## 3 14960 0.03410130 0.03480316
## 4 21071 0.04803131 0.04901988
## 5 43893 0.10005402 0.10211332
## 6 48339 0.11018867 0.11245655
## 7 59408 0.13542044 0.13820764
## 8 47838 0.10904665 0.11129102
## 9 47642 0.10859986 0.11083504
## 10 19769 0.04506340 0.04599089
## 11 18952 0.04320105 0.04409021
## 77 36138 0.08237651 0.08407197
## 99 49428 0.11267105 0.11499002
## NA 8847 0.02016672 NA
Filter out 77 (Don’t know/Not sure), 99 (Refused), NA.
dat <- dat %>%
filter(INCOME3 != "77" & INCOME3 != "99" & INCOME3 != "NA")
nrow(dat)
## [1] 344280
tabyl(dat$INCOME3)
## dat$INCOME3 n percent
## 1 10878 0.03159638
## 2 11530 0.03349018
## 3 14960 0.04345300
## 4 21071 0.06120309
## 5 43893 0.12749216
## 6 48339 0.14040606
## 7 59408 0.17255722
## 8 47838 0.13895085
## 9 47642 0.13838155
## 10 19769 0.05742129
## 11 18952 0.05504822
Recode income with a binary dummy variable.
summary(dat$INCOME3)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 5.000 7.000 6.698 9.000 11.000
$50,000 seems like a reasonable cutoff, which is the 6th bracket, so households below $50,000 will be coded as 0 (low income), and households equal to or above $50,000 will be coded as 1 (high income).
dat <- dat %>%
mutate(HIINCOME = case_when(INCOME3 <= 6 ~ 0,
INCOME3 > 6 ~ 1
)
)
tabyl(dat$HIINCOME)
## dat$HIINCOME n percent
## 0 150671 0.4376409
## 1 193609 0.5623591
Next, let’s recode SEXVAR so male = 1.
dat <- dat %>%
mutate(male = case_when(SEXVAR == 1 ~ 1,
TRUE ~ 0
)
)
nrow(dat)
## [1] 344280
tabyl(dat$male)
## dat$male n percent
## 0 179777 0.5221825
## 1 164503 0.4778175
Next, let’s clean the RENTHOM1 variable. Checking the frequency distribution of RENTHOM1.
tabyl(dat$RENTHOM1)
## dat$RENTHOM1 n percent valid_percent
## 1 246493 7.159667e-01 0.715970814
## 2 83357 2.421198e-01 0.242121193
## 3 13340 3.874753e-02 0.038747756
## 7 464 1.347740e-03 0.001347748
## 9 624 1.812478e-03 0.001812489
## NA 2 5.809225e-06 NA
We need to filter out 3 (Other arrangement), 7 (Don’t know/Not Sure), 9 (Refused), and NAs.
dat <- dat %>%
filter(RENTHOM1 %in% c("1", "2"))
nrow(dat)
## [1] 329850
tabyl(dat$RENTHOM1)
## dat$RENTHOM1 n percent
## 1 246493 0.7472882
## 2 83357 0.2527118
Code a dummy variable, RENTOWN, for renting or owning. 1 = OWN and 0 = RENT.
dat <- dat %>%
mutate(RENTOWN = case_when(RENTHOM1 == 1 ~ 1,
RENTHOM1 == 2 ~ 0
)
)
nrow(dat)
## [1] 329850
tabyl(dat$RENTOWN)
## dat$RENTOWN n percent
## 0 83357 0.2527118
## 1 246493 0.7472882
Check the frequency distribtuion of X_AGE80.
tabyl(dat$X_AGE80)
## dat$X_AGE80 n percent
## 18 969 0.002937699
## 19 1267 0.003841140
## 20 1568 0.004753676
## 21 2014 0.006105806
## 22 2164 0.006560558
## 23 2632 0.007979385
## 24 2670 0.008094588
## 25 3045 0.009231469
## 26 2956 0.008961649
## 27 3198 0.009695316
## 28 3566 0.010810975
## 29 3583 0.010862513
## 30 4289 0.013002880
## 31 3745 0.011353646
## 32 4318 0.013090799
## 33 4161 0.012614825
## 34 4307 0.013057450
## 35 4708 0.014273154
## 36 4599 0.013942701
## 37 4655 0.014112475
## 38 4955 0.015021980
## 39 4774 0.014473245
## 40 5625 0.017053206
## 41 4623 0.014015462
## 42 5170 0.015673791
## 43 4709 0.014276186
## 44 4448 0.013484917
## 45 4966 0.015055328
## 46 4495 0.013627406
## 47 4808 0.014576323
## 48 4826 0.014630893
## 49 5027 0.015240261
## 50 6359 0.019278460
## 51 5533 0.016774291
## 52 6001 0.018193118
## 53 5643 0.017107776
## 54 5440 0.016492345
## 55 6033 0.018290132
## 56 5862 0.017771714
## 57 6088 0.018456874
## 58 6456 0.019572533
## 59 6279 0.019035925
## 60 7325 0.022207064
## 61 6273 0.019017735
## 62 7237 0.021940276
## 63 6943 0.021048962
## 64 7107 0.021546157
## 65 7686 0.023301501
## 66 7041 0.021346066
## 67 7158 0.021700773
## 68 7052 0.021379415
## 69 6864 0.020809459
## 70 7259 0.022006973
## 71 6101 0.018496286
## 72 6716 0.020360770
## 73 6299 0.019096559
## 74 6183 0.018744884
## 75 5139 0.015579809
## 76 4215 0.012778536
## 77 4240 0.012854328
## 78 4314 0.013078672
## 79 3530 0.010701834
## 80 22634 0.068619069
dat$X_AGE80SQ<- dat$X_AGE80*dat$X_AGE80
dat$X_AGE80CU<- dat$X_AGE80*dat$X_AGE80SQ
mean(dat$RENTOWN)
## [1] 0.7472882
mean(dat$X_AGE80)
## [1] 54.74008
mean(dat$X_AGE80SQ)
## [1] 3279.891
mean(dat$X_AGE80CU)
## [1] 209263.7
mean(dat$male)
## [1] 0.4771108
mean(dat$HIINCOME)
## [1] 0.5699591
# Calculating statistics for specific ages so that I can compare actual values to model-based values.
ageprobs<- dat %>%
group_by(X_AGE80) %>%
summarize(p=mean(RENTOWN),n=n())
ageprobs$num <- ageprobs$p*(1-ageprobs$p)
ageprobs$sep <- sqrt(ageprobs$num/ageprobs$n)
ageprobs$me <- 2*ageprobs$sep
ggplot(data =ageprobs, aes(x = X_AGE80, y = p, ymin=p-me, ymax=p+me)) +
geom_line() + ylim(0,1) + geom_ribbon(alpha=0.3,aes(color=NULL)) +
labs(title="Home Ownership by Age",
subtitle="Respondents in Sample",
caption="Source: 2021 BRFSS Data") +
xlab(label="Age at Survey") +
ylab(label="Proportion of Respondents who Own their Residence")
# Splitting out results into an Excel file so that we can compare actual values with predicted values with different specifications of age
write_xlsx(ageprobs, "/Users/briansurratt/Library/CloudStorage/OneDrive-UniversityofTexasatSanAntonio/DEM 7283 Stats II/Assignment 5/ageprobsincome.xlsx")
model <- glm(RENTOWN ~ X_AGE80, family = "binomial", data = dat)
summary(model)
##
## Call:
## glm(formula = RENTOWN ~ X_AGE80, family = "binomial", data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2637 -0.9686 0.5421 0.7558 1.4237
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.4462900 0.0136237 -106.2 <2e-16 ***
## X_AGE80 0.0491021 0.0002657 184.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 334137 on 329848 degrees of freedom
## AIC: 334141
##
## Number of Fisher Scoring iterations: 4
# Exponentiate
modelExp = model
modelExp$coefficients = exp(modelExp$coefficients)
summary(modelExp)
##
## Call:
## glm(formula = RENTOWN ~ X_AGE80, family = "binomial", data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2637 -0.9686 0.5421 0.7558 1.4237
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2354422 0.0136237 17.28 <2e-16 ***
## X_AGE80 1.0503276 0.0002657 3952.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 334137 on 329848 degrees of freedom
## AIC: 334141
##
## Number of Fisher Scoring iterations: 4
# Showing predicted values in a spreadsheet
# MERs
# AME = Average Marginal Effects
MERs<-margins(model, variables="X_AGE80", at=list(X_AGE80=c(18,20,30,40,50,60,70,80)))
summary(MERs)
## factor X_AGE80 AME SE z p lower upper
## X_AGE80 18.0000 0.0114 0.0000 295.5095 0.0000 0.0113 0.0114
## X_AGE80 20.0000 0.0116 0.0000 263.2208 0.0000 0.0116 0.0117
## X_AGE80 30.0000 0.0123 0.0001 182.5560 0.0000 0.0121 0.0124
## X_AGE80 40.0000 0.0115 0.0001 164.1054 0.0000 0.0114 0.0116
## X_AGE80 50.0000 0.0096 0.0001 178.2028 0.0000 0.0095 0.0097
## X_AGE80 60.0000 0.0073 0.0000 225.7351 0.0000 0.0073 0.0074
## X_AGE80 70.0000 0.0052 0.0000 291.8333 0.0000 0.0052 0.0052
## X_AGE80 80.0000 0.0035 0.0000 252.6118 0.0000 0.0035 0.0035
MERs<-margins(model, variables="X_AGE80", at=list(X_AGE80=c(39,40,41)))
summary(MERs)
## factor X_AGE80 AME SE z p lower upper
## X_AGE80 39.0000 0.0116 0.0001 164.3223 0.0000 0.0115 0.0118
## X_AGE80 40.0000 0.0115 0.0001 164.1054 0.0000 0.0114 0.0116
## X_AGE80 41.0000 0.0113 0.0001 164.2188 0.0000 0.0112 0.0115
# This command is computing these for each one-unit change in x
#AMEs
AMEs<-margins(model,variables=c("X_AGE80"))
summary(AMEs)
## factor AME SE z p lower upper
## X_AGE80 0.0082 0.0000 224.1707 0.0000 0.0081 0.0082
# But what if the relationship is not linear?
model <- glm(RENTOWN ~ X_AGE80 + X_AGE80SQ, family = "binomial", data = dat)
summary(model)
##
## Call:
## glm(formula = RENTOWN ~ X_AGE80 + X_AGE80SQ, family = "binomial",
## data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0287 -0.7311 0.5434 0.6757 1.7521
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.579e+00 3.936e-02 -90.94 <2e-16 ***
## X_AGE80 1.440e-01 1.648e-03 87.41 <2e-16 ***
## X_AGE80SQ -9.429e-04 1.596e-05 -59.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 330670 on 329847 degrees of freedom
## AIC: 330676
##
## Number of Fisher Scoring iterations: 4
modelExp = model
modelExp$coefficients = exp(modelExp$coefficients)
summary(modelExp)
##
## Call:
## glm(formula = RENTOWN ~ X_AGE80 + X_AGE80SQ, family = "binomial",
## data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0287 -0.7311 0.5434 0.6757 1.7521
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.789e-02 3.936e-02 0.709 0.478
## X_AGE80 1.155e+00 1.648e-03 700.845 <2e-16 ***
## X_AGE80SQ 9.991e-01 1.596e-05 62597.613 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 330670 on 329847 degrees of freedom
## AIC: 330676
##
## Number of Fisher Scoring iterations: 4
model <- glm(RENTOWN ~ X_AGE80 + X_AGE80SQ + X_AGE80CU, family = "binomial", data = dat)
summary(model)
##
## Call:
## glm(formula = RENTOWN ~ X_AGE80 + X_AGE80SQ + X_AGE80CU, family = "binomial",
## data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0782 -0.6625 0.5609 0.6797 1.8674
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.080e+00 1.108e-01 -45.86 <2e-16 ***
## X_AGE80 2.479e-01 7.327e-03 33.84 <2e-16 ***
## X_AGE80SQ -3.139e-03 1.515e-04 -20.72 <2e-16 ***
## X_AGE80CU 1.441e-05 9.879e-07 14.59 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 330455 on 329846 degrees of freedom
## AIC: 330463
##
## Number of Fisher Scoring iterations: 4
modelExp = model
modelExp$coefficients = exp(modelExp$coefficients)
summary(modelExp)
##
## Call:
## glm(formula = RENTOWN ~ X_AGE80 + X_AGE80SQ + X_AGE80CU, family = "binomial",
## data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0782 -0.6625 0.5609 0.6797 1.8674
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.217e-03 1.108e-01 5.600e-02 0.955
## X_AGE80 1.281e+00 7.327e-03 1.749e+02 <2e-16 ***
## X_AGE80SQ 9.969e-01 1.515e-04 6.579e+03 <2e-16 ***
## X_AGE80CU 1.000e+00 9.879e-07 1.012e+06 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 330455 on 329846 degrees of freedom
## AIC: 330463
##
## Number of Fisher Scoring iterations: 4
# We really should be using this last model but it is not clear how we can capture marginal effects for nonlinear specifications in R.
# Adding in the binomial income variable.
ageprobs <- dat %>%
group_by(HIINCOME, male, X_AGE80) %>%
summarize(pown=mean(RENTOWN),n=n())
## `summarise()` has grouped output by 'HIINCOME', 'male'. You can override using
## the `.groups` argument.
ggplot(data = ageprobs, aes(x = X_AGE80, y = pown, color = HIINCOME, group = HIINCOME)) +
geom_line() +
facet_wrap(~ male)
write_xlsx(ageprobs,"/Users/briansurratt/Library/CloudStorage/OneDrive-UniversityofTexasatSanAntonio/DEM 7283 Stats II/Assignment 5/probsown.xlsx")
model <- glm(RENTOWN ~ HIINCOME, family = "binomial", data = dat)
summary(model)
##
## Call:
## glm(formula = RENTOWN ~ HIINCOME, family = "binomial", data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9684 -1.3599 0.5579 1.0053 1.0053
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.419396 0.005427 77.27 <2e-16 ***
## HIINCOME 1.362247 0.008520 159.90 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 345581 on 329848 degrees of freedom
## AIC: 345585
##
## Number of Fisher Scoring iterations: 4
modelExp = model
modelExp$coefficients = exp(modelExp$coefficients)
summary(modelExp)
##
## Call:
## glm(formula = RENTOWN ~ HIINCOME, family = "binomial", data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9684 -1.3599 0.5579 1.0053 1.0053
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.521043 0.005427 280.2 <2e-16 ***
## HIINCOME 3.904960 0.008520 458.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 345581 on 329848 degrees of freedom
## AIC: 345585
##
## Number of Fisher Scoring iterations: 4
If you make more than $50,000, the odds of owning your own residence are 3.9 and the difference is statistically significant. You are almost 4 times as likely to own your own home as a person who makes less than $50,000.
sapply(lapply(dat, unique), length)
## INCOME3 SEXVAR RENTHOM1 X_AGE80 HIINCOME male RENTOWN X_AGE80SQ
## 11 2 2 63 2 2 2 63
## X_AGE80CU
## 63
model <- glm(RENTOWN ~ male, family = "binomial", data = dat)
summary(model)
##
## Call:
## glm(formula = RENTOWN ~ male, family = "binomial", data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6595 -1.6576 0.7626 0.7640 0.7640
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.086212 0.005544 195.938 <2e-16 ***
## male -0.004213 0.008021 -0.525 0.599
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 372925 on 329848 degrees of freedom
## AIC: 372929
##
## Number of Fisher Scoring iterations: 4
modelExp = model
modelExp$coefficients = exp(modelExp$coefficients)
summary(modelExp)
##
## Call:
## glm(formula = RENTOWN ~ male, family = "binomial", data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6595 -1.6576 0.7626 0.7640 0.7640
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.963029 0.005544 534.5 <2e-16 ***
## male 0.995796 0.008021 124.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 372925 on 329848 degrees of freedom
## AIC: 372929
##
## Number of Fisher Scoring iterations: 4
If you are male, the odds of your owning your own home (compared to non-male) is .996 and the difference is statistically significant. In this case, the magnitude of the effect is marginal although the results is statistically significant. The odds of a male owning a home are almost even to a non-male owning a home.
model <- glm(RENTOWN ~ X_AGE80 + HIINCOME + male, family = "binomial", data = dat)
summary(model)
##
## Call:
## glm(formula = RENTOWN ~ X_AGE80 + HIINCOME + male, family = "binomial",
## data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7119 -0.5857 0.4504 0.6927 1.9731
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.7667854 0.0176591 -156.678 < 2e-16 ***
## X_AGE80 0.0582606 0.0002924 199.231 < 2e-16 ***
## HIINCOME 1.7575253 0.0097446 180.360 < 2e-16 ***
## male -0.0743643 0.0092136 -8.071 6.96e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 297127 on 329846 degrees of freedom
## AIC: 297135
##
## Number of Fisher Scoring iterations: 5
modelExp = model
modelExp$coefficients = exp(modelExp$coefficients)
summary(modelExp)
##
## Call:
## glm(formula = RENTOWN ~ X_AGE80 + HIINCOME + male, family = "binomial",
## data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7119 -0.5857 0.4504 0.6927 1.9731
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.0628638 0.0176591 3.56 0.000371 ***
## X_AGE80 1.0599912 0.0002924 3624.80 < 2e-16 ***
## HIINCOME 5.7980709 0.0097446 595.01 < 2e-16 ***
## male 0.9283334 0.0092136 100.76 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 372925 on 329849 degrees of freedom
## Residual deviance: 297127 on 329846 degrees of freedom
## AIC: 297135
##
## Number of Fisher Scoring iterations: 5
# MEMs: marginal effects at the mean
emmeans(model, specs = "HIINCOME",
regrid = "response")
## HIINCOME prob SE df asymp.LCL asymp.UCL
## 0 0.5951 0.0014813 Inf 0.5922 0.5980
## 1 0.8949 0.0007104 Inf 0.8935 0.8963
##
## Results are averaged over the levels of: male
## Confidence level used: 0.95
The marginal effect at the mean age and sex is that a high income person has a 89.5% probability of owning their residence. A low income person has a 60% chance of owning their home.
emmeans(model, specs = "HIINCOME",
regrid = "response") |>
contrast(method = "revpairwise") |>
confint()
## contrast estimate SE df asymp.LCL asymp.UCL
## HIINCOME1 - HIINCOME0 0.3 0.00164 Inf 0.297 0.303
##
## Results are averaged over the levels of: male
## Confidence level used: 0.95
#MERs: marginal effects at representative values
emmeans(model, specs = "HIINCOME", by = "male", at = list(X_AGE80=20),
regrid = "response") |>
contrast(method = "revpairwise") |>
confint()
## male = 0:
## contrast estimate SE df asymp.LCL asymp.UCL
## HIINCOME1 - HIINCOME0 0.371 0.00206 Inf 0.367 0.375
##
## male = 1:
## contrast estimate SE df asymp.LCL asymp.UCL
## HIINCOME1 - HIINCOME0 0.363 0.00199 Inf 0.359 0.367
##
## Confidence level used: 0.95
At the age of 20, the difference in probability of ownership for a high income male over a low income male is 36.3%. The difference in probability of ownership for a high income non-male over a low income non-male is 37.1%.
emmeans(model, specs = "HIINCOME", by = "male", at = list(X_AGE80=40),
regrid = "response") |>
contrast(method = "revpairwise") |>
confint()
## male = 0:
## contrast estimate SE df asymp.LCL asymp.UCL
## HIINCOME1 - HIINCOME0 0.397 0.00203 Inf 0.393 0.401
##
## male = 1:
## contrast estimate SE df asymp.LCL asymp.UCL
## HIINCOME1 - HIINCOME0 0.402 0.00208 Inf 0.398 0.406
##
## Confidence level used: 0.95
At the age of 40, the difference in probability of ownership for a high income male over a low income male is 40.2%. The difference in probability of ownership for a high income non-male over a low income non-male is 39.7%.
emmeans(model, specs = "HIINCOME", by = "male", at = list(X_AGE80=60),
regrid = "response") |>
contrast(method = "revpairwise") |>
confint()
## male = 0:
## contrast estimate SE df asymp.LCL asymp.UCL
## HIINCOME1 - HIINCOME0 0.249 0.00152 Inf 0.246 0.252
##
## male = 1:
## contrast estimate SE df asymp.LCL asymp.UCL
## HIINCOME1 - HIINCOME0 0.260 0.00170 Inf 0.256 0.263
##
## Confidence level used: 0.95
At the age of 60, the difference in probability of ownership for a high income male over a low income male is 26.0%. The difference in probability of ownership for a high income non-male over a low income non-male is 24.9%.
emmeans(model, specs = "HIINCOME", by = "male", at = list(X_AGE80=80),
regrid = "response") |>
contrast(method = "revpairwise") |>
confint()
## male = 0:
## contrast estimate SE df asymp.LCL asymp.UCL
## HIINCOME1 - HIINCOME0 0.105 0.000978 Inf 0.104 0.107
##
## male = 1:
## contrast estimate SE df asymp.LCL asymp.UCL
## HIINCOME1 - HIINCOME0 0.112 0.001108 Inf 0.110 0.114
##
## Confidence level used: 0.95
At the age of 80, the difference in probability of ownership for a high income male over a low income male is 11.2%. The difference in probability of ownership for a high income non-male over a low income non-male is 10.5%.
#AMEs: average marginal effects
hi_eff <- predict(model, newdata = data.frame(HIINCOME=1, male=dat$male, X_AGE80=dat$X_AGE80),
type = "response")
low_eff <- predict(model, newdata = data.frame(HIINCOME=0, male=dat$male, X_AGE80=dat$X_AGE80),
type = "response")
mean(hi_eff - low_eff)
## [1] 0.2778078
The average marginal effect of high income on owning your residence among all ages and sexes is a 27.8% increase in probability.
If I were to do this exercise again, I would choose a different binomial variable than income level. It is a ordinal numerical variable, so there are other statistical models better suited to analyzing this variable.
I initially misinterpreted the instructions for this exercise, and I thought I was supposed to find a numerical continuous variable to replace age. I experimented with number of children in the house and income as a replacement for age but both were problematic. I had to reread the instructions a few times and go through the example to understand I was supposed to use two binary IV variables as well as age.