library(haven)
NSDUH_2019 <- read_sav("NSDUH_2019.SAV")
View(NSDUH_2019)

1.Define a binary outcome variable of your choosing and define how you recode the original variable.

## Binary outcome variable ADWRSATP was used as it indicates Adults who have attempted suicide,the variable was recoded with 1=yes, and 2=0 as no. Values 94,97,98, and 99 were recoded as missing values.Additional attempts were made to get rid of missing values after line 41 failed to do so.
#print(ADWRSATP)
NSDUH_2019$attempt_suicide<-Recode(NSDUH_2019$ADWRSATP, recodes="1=1; 2=0;else=NA")
summary(NSDUH_2019$attempt_suicide, na.rm = TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    0.00    0.00    0.25    1.00    1.00   52599

2.State a research question about what factors you believe will affect your outcome variable.

## Do demographic factors such as marital status and education affect peoples willingness to attempt suicide?

3.Define at least 2 predictor variables, based on your research question. For this assignment, it’s best if these are categorical variables.

## To best test the previous research question a marital status IRMARIT was recoded into four different categories with 1= married, 2=Widowd, 3=Separated, 4=Never married. IREDUHIGHST2 recoding being 1-7= less than high school, 8= high school diploma, 9=some college, 10=associates, 11=college degree.

NSDUH_2019$marst<-Recode(NSDUH_2019$IRMARIT, recodes="1='married'; 2='divorced'; 3='widowed'; 4='separated'; else=NA", as.factor=T)
NSDUH_2019$marst<-relevel(NSDUH_2019$marst, ref='married')

NSDUH_2019$educ<-Recode(NSDUH_2019$IREDUHIGHST2, recodes="1:7='LssThnHgh'; 8='Hs'; 9='SomeCollege'; 10='Associates'; 11='Colgrad';else=NA", as.factor=T)
NSDUH_2019$educ<-relevel(NSDUH_2019$educ, ref='Colgrad')

4.Perform a descriptive analysis of the outcome variable by each of the variables you defined in part b. (e.g.��2 x 2 table, 2 x k table). Follow a similar approach to presenting your statistics as presented in Sparks 2009 (in the Google drive). This can be done easily using the tableone package!

prop.table(table(NSDUH_2019$attempt_suicide, NSDUH_2019$educ), margin=2)
##    
##       Colgrad Associates        Hs LssThnHgh SomeCollege
##   0 0.8356164  0.7633136 0.7114943 0.6116838   0.7426850
##   1 0.1643836  0.2366864 0.2885057 0.3883162   0.2573150
prop.table(table(NSDUH_2019$attempt_suicide, NSDUH_2019$marst), margin=2)
##    
##       married  divorced separated   widowed
##   0 0.7544643 0.6976744 0.7567568 0.6984127
##   1 0.2455357 0.3023256 0.2432432 0.3015873

4.1 Calculate descriptive statistics (mean or percentages) for each variable using no weights or survey design, as well as with full survey design and weights.

summary(NSDUH_2019$ANALWT_C)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     3.58  1262.48  2855.37  4902.76  6076.50 77284.48
des<-svydesign(ids=~1, strata=~VESTR, weights=~ANALWT_C, data = NSDUH_2019 )

cat<-wtd.table(NSDUH_2019$attempt_suicide, NSDUH_2019$educ, weights = NSDUH_2019$ANALWT_C)
prop.table(wtd.table (NSDUH_2019$attempt_suicide, NSDUH_2019$educ, weights = NSDUH_2019$ANALWT_C), margin=2)
##     Colgrad Associates        Hs LssThnHgh SomeCollege
## 0 0.8285310  0.7470445 0.7258592 0.5539427   0.7389289
## 1 0.1714690  0.2529555 0.2741408 0.4460573   0.2610711
cat<-wtd.table(NSDUH_2019$attempt_suicide, NSDUH_2019$marst, weights = NSDUH_2019$ANALWT_C)
prop.table(wtd.table (NSDUH_2019$attempt_suicide, NSDUH_2019$marst, weights = NSDUH_2019$ANALWT_C), margin=2)
##     married  divorced separated   widowed
## 0 0.7756676 0.7181744 0.7617474 0.6828752
## 1 0.2243324 0.2818256 0.2382526 0.3171248
nodes<-svydesign(ids = ~1,  weights = ~1, data = NSDUH_2019)

sv.table<-svyby(formula = ~attempt_suicide, by = ~educ, design = nodes, FUN = svymean, na.rm=T)


knitr::kable(sv.table,
      caption = "Estimates of Suicide Attempts by Education - No survey design",
      align = 'c',  
      format = "html")
Estimates of Suicide Attempts by Education - No survey design
educ attempt_suicide se
Colgrad Colgrad 0.1643836 0.0125223
Associates Associates 0.2366864 0.0231198
Hs Hs 0.2885057 0.0153606
LssThnHgh LssThnHgh 0.3883162 0.0285702
SomeCollege SomeCollege 0.2573150 0.0128244
nodes<-svydesign(ids = ~1,  weights = ~1, data = NSDUH_2019)

sv.table<-svyby(formula = ~attempt_suicide, by = ~marst, design = nodes, FUN = svymean, na.rm=T)


knitr::kable(sv.table,
      caption = "Estimates of Suicide Attempts by Marital Status - No survey design",
      align = 'c',  
      format = "html")
Estimates of Suicide Attempts by Marital Status - No survey design
marst attempt_suicide se
married married 0.2455357 0.0143789
divorced divorced 0.3023256 0.0700380
separated separated 0.2432432 0.0091060
widowed widowed 0.3015873 0.0236059

4.2 Calculate percentages, or means, for each of your independent variables for each level of your outcome variable and present this in a table, with appropriate survey-corrected test statistics. (tableone package helps)

t1<-CreateTableOne(vars = c("educ", "marst"), strata = "attempt_suicide", test = T, data = NSDUH_2019)
print(t1,format="p")
##                 Stratified by attempt_suicide
##                  0    1    p      test
##   n              2650  887            
##   educ (%)                 <0.001     
##      Colgrad     27.6 16.2            
##      Associates   9.7  9.0            
##      Hs          23.4 28.3            
##      LssThnHgh    6.7 12.7            
##      SomeCollege 32.6 33.7            
##   marst (%)                 0.086     
##      married     25.5 24.8            
##      divorced     1.1  1.5            
##      separated   63.4 60.9            
##      widowed     10.0 12.9
st1<-svyCreateTableOne(vars = c("educ", "marst"), strata = "attempt_suicide", test = T, data = des)
print(st1, format="p")
##                 Stratified by attempt_suicide
##                  0          1         p      test
##   n              11160848.0 3663213.6            
##   educ (%)                            <0.001     
##      Colgrad           35.8      22.6            
##      Associates        10.4      10.7            
##      Hs                18.7      21.6            
##      LssThnHgh          5.4      13.2            
##      SomeCollege       29.6      31.9            
##   marst (%)                            0.091     
##      married           34.8      30.7            
##      divorced           2.5       3.0            
##      separated         48.4      46.1            
##      widowed           14.3      20.2
print(t1,format="p")
##                 Stratified by attempt_suicide
##                  0    1    p      test
##   n              2650  887            
##   educ (%)                 <0.001     
##      Colgrad     27.6 16.2            
##      Associates   9.7  9.0            
##      Hs          23.4 28.3            
##      LssThnHgh    6.7 12.7            
##      SomeCollege 32.6 33.7            
##   marst (%)                 0.086     
##      married     25.5 24.8            
##      divorced     1.1  1.5            
##      separated   63.4 60.9            
##      widowed     10.0 12.9

##4.3 Are there substantive differences in the descriptive results between the analysis using survey design and that not using survey design?

## Comparing the descriptive results, there is a marginal difference between using survey design and not using design. Specifically differences between education, and marital status on those who were willing to make a suicide attempt by design differed by about .02 to .03 percent across degree obtained and marital status. However, there was a rather large discrepancy between less than high school education being .06 higher in the analysis not using survey design. 

##Overall, the percentages were lower when using the survey design. It appears between education, and marital status, only education has a statistically significant association with having attempted suicide. With those having a high school education, and some college having attempted the most percentage wise. Combined 61% of the people in the study made an attempt, however it appears having an associates, and or less than high school had the lowest amounts of suicide attempts percentages. Those who were married made up about a quarter of the attempts, with the marital status variable being largely consistent across both having attempted and not having an attempt this was not using the survey design. Attempts to use the survey design showed lower numbers of attempts by educational status. Percentage wise 53% of the sample consisted of those with some college or higher having attempted suicide. Similarly with married and separated people having made up 76% of attempts overall, which runs counter to the notion married people attempt suicide at lower percentages. Divorced people in both matters do not attempt suicide as often. Again, the marital status portion is not statistically significant. The differences in percentages between the sample using survey and not using it in table t1 and st1 appear to be more robust across both marital status and education. With education both decreasing and increasing in percentages across each variable, which is seen again in the marital status variable.