library(haven)
NSDUH_2019 <- read_sav("NSDUH_2019.SAV")
View(NSDUH_2019)
1.Define a binary outcome variable of your choosing and define how you recode the original variable.
## Binary outcome variable ADWRSATP was used as it indicates Adults who have attempted suicide,the variable was recoded with 1=yes, and 2=0 as no. Values 94,97,98, and 99 were recoded as missing values.Additional attempts were made to get rid of missing values after line 41 failed to do so.
#print(ADWRSATP)
NSDUH_2019$attempt_suicide<-Recode(NSDUH_2019$ADWRSATP, recodes="1=1; 2=0;else=NA")
summary(NSDUH_2019$attempt_suicide, na.rm = TRUE)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 0.00 0.00 0.25 1.00 1.00 52599
2.State a research question about what factors you believe will affect your outcome variable.
## Do demographic factors such as marital status and education affect peoples willingness to attempt suicide?
3.Define at least 2 predictor variables, based on your research question. For this assignment, it’s best if these are categorical variables.
## To best test the previous research question a marital status IRMARIT was recoded into four different categories with 1= married, 2=Widowd, 3=Separated, 4=Never married. IREDUHIGHST2 recoding being 1-7= less than high school, 8= high school diploma, 9=some college, 10=associates, 11=college degree.
NSDUH_2019$marst<-Recode(NSDUH_2019$IRMARIT, recodes="1='married'; 2='divorced'; 3='widowed'; 4='separated'; else=NA", as.factor=T)
NSDUH_2019$marst<-relevel(NSDUH_2019$marst, ref='married')
NSDUH_2019$educ<-Recode(NSDUH_2019$IREDUHIGHST2, recodes="1:7='LssThnHgh'; 8='Hs'; 9='SomeCollege'; 10='Associates'; 11='Colgrad';else=NA", as.factor=T)
NSDUH_2019$educ<-relevel(NSDUH_2019$educ, ref='Colgrad')
4.1 Calculate descriptive statistics (mean or percentages) for each variable using no weights or survey design, as well as with full survey design and weights.
summary(NSDUH_2019$ANALWT_C)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.58 1262.48 2855.37 4902.76 6076.50 77284.48
des<-svydesign(ids=~1, strata=~VESTR, weights=~ANALWT_C, data = NSDUH_2019 )
cat<-wtd.table(NSDUH_2019$attempt_suicide, NSDUH_2019$educ, weights = NSDUH_2019$ANALWT_C)
prop.table(wtd.table (NSDUH_2019$attempt_suicide, NSDUH_2019$educ, weights = NSDUH_2019$ANALWT_C), margin=2)
## Colgrad Associates Hs LssThnHgh SomeCollege
## 0 0.8285310 0.7470445 0.7258592 0.5539427 0.7389289
## 1 0.1714690 0.2529555 0.2741408 0.4460573 0.2610711
cat<-wtd.table(NSDUH_2019$attempt_suicide, NSDUH_2019$marst, weights = NSDUH_2019$ANALWT_C)
prop.table(wtd.table (NSDUH_2019$attempt_suicide, NSDUH_2019$marst, weights = NSDUH_2019$ANALWT_C), margin=2)
## married divorced separated widowed
## 0 0.7756676 0.7181744 0.7617474 0.6828752
## 1 0.2243324 0.2818256 0.2382526 0.3171248
nodes<-svydesign(ids = ~1, weights = ~1, data = NSDUH_2019)
sv.table<-svyby(formula = ~attempt_suicide, by = ~educ, design = nodes, FUN = svymean, na.rm=T)
knitr::kable(sv.table,
caption = "Estimates of Suicide Attempts by Education - No survey design",
align = 'c',
format = "html")
Estimates of Suicide Attempts by Education - No survey design
|
|
educ
|
attempt_suicide
|
se
|
|
Colgrad
|
Colgrad
|
0.1643836
|
0.0125223
|
|
Associates
|
Associates
|
0.2366864
|
0.0231198
|
|
Hs
|
Hs
|
0.2885057
|
0.0153606
|
|
LssThnHgh
|
LssThnHgh
|
0.3883162
|
0.0285702
|
|
SomeCollege
|
SomeCollege
|
0.2573150
|
0.0128244
|
nodes<-svydesign(ids = ~1, weights = ~1, data = NSDUH_2019)
sv.table<-svyby(formula = ~attempt_suicide, by = ~marst, design = nodes, FUN = svymean, na.rm=T)
knitr::kable(sv.table,
caption = "Estimates of Suicide Attempts by Marital Status - No survey design",
align = 'c',
format = "html")
Estimates of Suicide Attempts by Marital Status - No survey design
|
|
marst
|
attempt_suicide
|
se
|
|
married
|
married
|
0.2455357
|
0.0143789
|
|
divorced
|
divorced
|
0.3023256
|
0.0700380
|
|
separated
|
separated
|
0.2432432
|
0.0091060
|
|
widowed
|
widowed
|
0.3015873
|
0.0236059
|
4.2 Calculate percentages, or means, for each of your independent variables for each level of your outcome variable and present this in a table, with appropriate survey-corrected test statistics. (tableone package helps)
t1<-CreateTableOne(vars = c("educ", "marst"), strata = "attempt_suicide", test = T, data = NSDUH_2019)
print(t1,format="p")
## Stratified by attempt_suicide
## 0 1 p test
## n 2650 887
## educ (%) <0.001
## Colgrad 27.6 16.2
## Associates 9.7 9.0
## Hs 23.4 28.3
## LssThnHgh 6.7 12.7
## SomeCollege 32.6 33.7
## marst (%) 0.086
## married 25.5 24.8
## divorced 1.1 1.5
## separated 63.4 60.9
## widowed 10.0 12.9
st1<-svyCreateTableOne(vars = c("educ", "marst"), strata = "attempt_suicide", test = T, data = des)
print(st1, format="p")
## Stratified by attempt_suicide
## 0 1 p test
## n 11160848.0 3663213.6
## educ (%) <0.001
## Colgrad 35.8 22.6
## Associates 10.4 10.7
## Hs 18.7 21.6
## LssThnHgh 5.4 13.2
## SomeCollege 29.6 31.9
## marst (%) 0.091
## married 34.8 30.7
## divorced 2.5 3.0
## separated 48.4 46.1
## widowed 14.3 20.2
print(t1,format="p")
## Stratified by attempt_suicide
## 0 1 p test
## n 2650 887
## educ (%) <0.001
## Colgrad 27.6 16.2
## Associates 9.7 9.0
## Hs 23.4 28.3
## LssThnHgh 6.7 12.7
## SomeCollege 32.6 33.7
## marst (%) 0.086
## married 25.5 24.8
## divorced 1.1 1.5
## separated 63.4 60.9
## widowed 10.0 12.9
##4.3 Are there substantive differences in the descriptive results between the analysis using survey design and that not using survey design?
## Comparing the descriptive results, there is a marginal difference between using survey design and not using design. Specifically differences between education, and marital status on those who were willing to make a suicide attempt by design differed by about .02 to .03 percent across degree obtained and marital status. However, there was a rather large discrepancy between less than high school education being .06 higher in the analysis not using survey design.
##Overall, the percentages were lower when using the survey design. It appears between education, and marital status, only education has a statistically significant association with having attempted suicide. With those having a high school education, and some college having attempted the most percentage wise. Combined 61% of the people in the study made an attempt, however it appears having an associates, and or less than high school had the lowest amounts of suicide attempts percentages. Those who were married made up about a quarter of the attempts, with the marital status variable being largely consistent across both having attempted and not having an attempt this was not using the survey design. Attempts to use the survey design showed lower numbers of attempts by educational status. Percentage wise 53% of the sample consisted of those with some college or higher having attempted suicide. Similarly with married and separated people having made up 76% of attempts overall, which runs counter to the notion married people attempt suicide at lower percentages. Divorced people in both matters do not attempt suicide as often. Again, the marital status portion is not statistically significant. The differences in percentages between the sample using survey and not using it in table t1 and st1 appear to be more robust across both marital status and education. With education both decreasing and increasing in percentages across each variable, which is seen again in the marital status variable.