Homework 3 STATS2

The binary outcome variable that I will  use will be educational attainment of the respondent. The way that I will recorde this variable so that it is binary is by dummy coding the variable called "educ" which measures the respondents current highest educational attainment. This variable will be transformed into two different variables. One will be called college and the other will be called high school. For college it will be recoded as 0 being all grade levels from 12th grade and under (0-12) which will represent not going to college. While 1 will be recoded with all levels of education from first year of college onward (13-20) to represent the respondent having some degree of college education. For the highschool variable, all levels of education from 0-11 will be recoded as 0 to represent not finishing high school. While those who obtained a high school education will be recoded into 1. Those who have a education higher than that of a high school equivelent they will be recoded as missing to help isolate this scope of the research. So two true binary outcome variables will be present in the study to help observe those who achieve at least a high school diploma and those who have some college, respectively. 

The variable representing highschool attainment only will be named subgroupEduHS_1 and for those with college education subgroupCollege_1.

My research question is to analyze any potential difference in educational attainment between Hispanics and non-Hispanics due to parental capital and the effects of racialization by sex. 

The factors that I believe will effect my outcome variable will be the Hispanic, sex, income of family when 16 years old (INCOM16), if the respondent was born in the US (BORN), if the respondent has ever been treated with less repect than others before (DISRSPCT), and if the respondent has ever been seen by others as not being smart (NOTSMART). 

All variables will be recoded so that Hispanic is 1 and Non-Hispanic will be 0 turning into subgroupHis_1. 

For sex men will be recoded into 1 and women will be recoded into 0 turning into subgroupsex_1. 

Income of family when respondent was 16 will be recoded with options 3 (Average), 4 (Above Average), and 5 (Far Above Average) will be recoded into 1. While 2 (Below Average) and 1 (Far Below Average) are recoded into 0 with this variable being renamed as subgroupincom16_1. 

The variable BORN will be recoded into 1 being Yes and 2 being recoded to 0 as No transforming this variable into born_1. 

For the variable NOTSMART it will be recoded as categories 1 (almost every day) through 5 (less than once a year) into 1 and 6 (never) into 0 transforming this variable into subgroupnotsmart_1. 

For the variable DISRSPCT it will be recoded with categories 1 (almost every day) through 5 (less than once a year) into 1 and category 6 (never) into 0. Transforming the variable into subgroupdisrspct_1. 

After recoding proper survey weights were applied to the analysis to help with any potential statistical error. 

For this research the two dependent variables observing educational attainment are measured in two sepearete phases with 3 observable models for each phase observing each degree of educational attainment. When observing these models variables will be removed from each phase to see the interaction effect that could be had amongst Hispanics and Non-Hispancis and the amount of educational attainment they have. This is to observe if any variables lose or gain statistical significance in each model but more importantly the effect that it has on the key predictor variable (Hispanics) and the dependent variable of educational attaiment whether being high school or college. 

For the logit model observing those who have a high school degree as their highest educational attainment and those who have not acheived this is measured with Hispanics, sex, where respondent was born, the income of the respondent when 16, if the respondent was ever disrespected, and if the respondent was ever called or seen as not smart. 

In this model Hispanics are statistically significant at the .05 level showing that the odds of Non_Hispanics having a highschool diploma are about 2.08 times more likely than Hispanics. At the .5 level sex, respondents income at 16, and not smart are all stastically significant. 
For sex, the odds of woman having a high school education are 1.17 times more likely than men. For income16, the odds of those respondents that did not live in poverty were 1.19 times more likely to have a high school diploma than those who did live in poverty. 

For Not Smart, the odds of those are never called not smart are 1.42 times more likely to have a high school diploma than those who were percieved to be not smart. 

When you take out the Parental Factors (where you born in the US, income at 16) the Hispanic variable increases in statistical significance now the odds becoming 1.18 more likely for Non-Hispanics to have a highschool education over Hispanics. Not Smart retains its signifcane at the .5 level showing that those odds of who were never told they were not smart are 1.38 times more likely to have a HS diploma than those who were pecieved to not be smart. 

When removing those Percieved Discrimination factors (Were you ever disrespected?, Are you perceived to be not smart?) both Hispanics and sex are statistically sigficant. For Hispanics it is at the .01 level showing that the odds of Non-Hispanics to have a HS diploma are about 1.96 times more likely than Hispanics. For sex, the odds of men to have a highschool education are about  1.17 times more likely than women. 

When observing this phase for highschool education it is interesting that Hispanics retained its statistical sigificance not matter if you placed the parental factors or the percieved discrimination factors in the models. In each instance Non-Hispanics tended to have a HS diploma more so than Hispanics. The flip of directionality of sex in the last model shows that the parental factors do effect sex. More so than perceived discrimination because when those factors were in the model without the parental factors sex lost its statistical significance then for it to regain significane when removing those percieved discrimination factors also shows this interaction. 

For College education, this phase had the same flow of models just as the HS education phase previosly spoken on. 

For the first model with all variables included. Hispanics, sex, and income16 were all statsitically significant. The odds for each relationship is as follows:  Non-Hispanics were 2.22 times more likely than Hispanics to have a college education. Men were 1.08 times more likely than women to have a college education. Those who were not born in the US were 1.38 times more likely to have a college education than those who were born in the US. Those who did not live in poverty at 16 were 1.52 times more likely to obtain a college education than those who lived in poverty. 

For the second model removing the Parental Factors, Hispanics, sex, and percieved disprespect are all statistically significant. Non-Hispanics were 2.17 times more likely than Hispanics to recieve a college education. Men were 1.08 times more likely to have a college education than woman. Those who are pecieved to be disrespected are 1.40 times more likely to have a college education than those who never perceive to be disrespected. 

For model 3, both Hispanics and sex are statistically signifcant. The odds of Non-Hispanics having a college education are 1.92 times more likely than Hispanics. For sex, the odds of men have a college education are 1.06 times more likely than women. 

For all models in both phases interpretations are to be seen as with all things in the model being held constant. With these interactions the key finding is that all the way across no matter for high school or college education Non-Hispanics tend to have better education than Hispanics no matter their parental factors nor their percieved discrimination which non did not have an effect on that relationship meaning those varibles do not mediate this relationship. In many cases these factor variables are statstically significant showing further disparities between Hispanics and Non-Hispanics regarding educational attainment.

library(haven)
library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(scales)
library(sur)
library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

library(summarytools)
library(Rmisc)

## Loading required package: lattice

library(car)

## Loading required package: carData

## 
## Attaching package: 'carData'

## The following objects are masked from 'package:sur':
## 
##     Anscombe, States

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

library(forcats)
library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v tibble  3.1.6     v purrr   0.3.4
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x plyr::arrange()     masks dplyr::arrange()
## x readr::col_factor() masks scales::col_factor()
## x purrr::compact()    masks plyr::compact()
## x plyr::count()       masks dplyr::count()
## x purrr::discard()    masks scales::discard()
## x plyr::failwith()    masks dplyr::failwith()
## x dplyr::filter()     masks stats::filter()
## x plyr::id()          masks dplyr::id()
## x dplyr::lag()        masks stats::lag()
## x plyr::mutate()      masks dplyr::mutate()
## x car::recode()       masks dplyr::recode()
## x plyr::rename()      masks dplyr::rename()
## x purrr::some()       masks car::some()
## x plyr::summarise()   masks dplyr::summarise()
## x plyr::summarize()   masks dplyr::summarize()
## x tibble::view()      masks summarytools::view()

library(survey)

## Loading required package: grid

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## Loading required package: survival

## 
## Attaching package: 'survey'

## The following object is masked from 'package:graphics':
## 
##     dotchart

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

library(grid)
library(Matrix)

gss2021_ZERODraft<-read_dta("C:\\Users\\BTP\\Desktop\\STATS 2 FOLDER\\2021_sas\\gss2021.dta")

gss2021_ZERODraft %>%
tabyl(hispanic)

##  hispanic    n      percent valid_percent
##         1 3544 0.8789682540  0.8864432216
##         2  245 0.0607638889  0.0612806403
##         3   51 0.0126488095  0.0127563782
##         4   21 0.0052083333  0.0052526263
##         5    9 0.0022321429  0.0022511256
##         6    2 0.0004960317  0.0005002501
##         7    4 0.0009920635  0.0010005003
##         8    2 0.0004960317  0.0005002501
##         9    4 0.0009920635  0.0010005003
##        10    5 0.0012400794  0.0012506253
##        11    3 0.0007440476  0.0007503752
##        15   17 0.0042162698  0.0042521261
##        20    4 0.0009920635  0.0010005003
##        21    8 0.0019841270  0.0020010005
##        22   10 0.0024801587  0.0025012506
##        23    5 0.0012400794  0.0012506253
##        24    1 0.0002480159  0.0002501251
##        25    1 0.0002480159  0.0002501251
##        30   46 0.0114087302  0.0115057529
##        35    1 0.0002480159  0.0002501251
##        41    3 0.0007440476  0.0007503752
##        46    2 0.0004960317  0.0005002501
##        47    7 0.0017361111  0.0017508754
##        50    3 0.0007440476  0.0007503752
##        NA   34 0.0084325397            NA

gss2021_ZERODraft %>%
tabyl(educ)

##  educ   n      percent valid_percent
##     0   9 0.0022321429  0.0022692890
##     1   1 0.0002480159  0.0002521432
##     2   2 0.0004960317  0.0005042864
##     3   3 0.0007440476  0.0007564297
##     4   1 0.0002480159  0.0002521432
##     5   2 0.0004960317  0.0005042864
##     6  15 0.0037202381  0.0037821483
##     7   5 0.0012400794  0.0012607161
##     8  25 0.0062003968  0.0063035804
##     9  32 0.0079365079  0.0080685830
##    10  52 0.0128968254  0.0131114473
##    11  83 0.0205853175  0.0209278870
##    12 829 0.2056051587  0.2090267272
##    13 277 0.0687003968  0.0698436712
##    14 542 0.1344246032  0.1366616238
##    15 208 0.0515873016  0.0524457892
##    16 942 0.2336309524  0.2375189107
##    17 258 0.0639880952  0.0650529501
##    18 351 0.0870535714  0.0885022693
##    19 113 0.0280257937  0.0284921836
##    20 216 0.0535714286  0.0544629349
##    NA  66 0.0163690476            NA

gss2021_ZERODraft %>%
tabyl(sex)

##  sex    n    percent valid_percent
##    1 1736 0.43055556     0.4406091
##    2 2204 0.54662698     0.5593909
##   NA   92 0.02281746            NA

gss2021_ZERODraft %>%
  tabyl(incom16)

##  incom16    n    percent valid_percent
##        1  421 0.10441468    0.11003659
##        2 1013 0.25124008    0.26476738
##        3 1625 0.40302579    0.42472556
##        4  679 0.16840278    0.17746994
##        5   88 0.02182540    0.02300052
##       NA  206 0.05109127            NA

gss2021_ZERODraft %>%
  tabyl(born)

##  born    n    percent valid_percent
##     1 3516 0.87202381     0.8878788
##     2  444 0.11011905     0.1121212
##    NA   72 0.01785714            NA

gss2021_ZERODraft %>%
  tabyl(disrspct)

##  disrspct    n    percent valid_percent
##         1  136 0.03373016    0.05228758
##         2  231 0.05729167    0.08881200
##         3  327 0.08110119    0.12572088
##         4  801 0.19866071    0.30795848
##         5  552 0.13690476    0.21222607
##         6  554 0.13740079    0.21299500
##        NA 1431 0.35491071            NA

gss2021_ZERODraft %>%
  tabyl(notsmart)

##  notsmart    n    percent valid_percent
##         1  101 0.02504960    0.03881630
##         2  129 0.03199405    0.04957725
##         3  202 0.05009921    0.07763259
##         4  684 0.16964286    0.26287471
##         5  619 0.15352183    0.23789393
##         6  867 0.21502976    0.33320523
##        NA 1430 0.35466270            NA

recode hispanic

gss2021_ZERODraft %>%
  tabyl(hispanic)

##  hispanic    n      percent valid_percent
##         1 3544 0.8789682540  0.8864432216
##         2  245 0.0607638889  0.0612806403
##         3   51 0.0126488095  0.0127563782
##         4   21 0.0052083333  0.0052526263
##         5    9 0.0022321429  0.0022511256
##         6    2 0.0004960317  0.0005002501
##         7    4 0.0009920635  0.0010005003
##         8    2 0.0004960317  0.0005002501
##         9    4 0.0009920635  0.0010005003
##        10    5 0.0012400794  0.0012506253
##        11    3 0.0007440476  0.0007503752
##        15   17 0.0042162698  0.0042521261
##        20    4 0.0009920635  0.0010005003
##        21    8 0.0019841270  0.0020010005
##        22   10 0.0024801587  0.0025012506
##        23    5 0.0012400794  0.0012506253
##        24    1 0.0002480159  0.0002501251
##        25    1 0.0002480159  0.0002501251
##        30   46 0.0114087302  0.0115057529
##        35    1 0.0002480159  0.0002501251
##        41    3 0.0007440476  0.0007503752
##        46    2 0.0004960317  0.0005002501
##        47    7 0.0017361111  0.0017508754
##        50    3 0.0007440476  0.0007503752
##        NA   34 0.0084325397            NA

gss2021_ZERODraft$subgrouphis <-Recode(gss2021_ZERODraft$hispanic, recodes="1 = 0; 2:50 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgrouphis)

##  subgrouphis    n    percent valid_percent
##            0 3544 0.87896825     0.8864432
##            1  454 0.11259921     0.1135568
##         <NA>   34 0.00843254            NA

subgrouphis_1<-as.factor(ifelse(gss2021_ZERODraft$subgrouphis==1, "Hispanic", "Non Hispanic"))

tabyl(subgrouphis_1)

##  subgrouphis_1    n    percent valid_percent
##       Hispanic  454 0.11259921     0.1135568
##   Non Hispanic 3544 0.87896825     0.8864432
##           <NA>   34 0.00843254            NA

recode education college

gss2021_ZERODraft %>%
  tabyl(educ)

##  educ   n      percent valid_percent
##     0   9 0.0022321429  0.0022692890
##     1   1 0.0002480159  0.0002521432
##     2   2 0.0004960317  0.0005042864
##     3   3 0.0007440476  0.0007564297
##     4   1 0.0002480159  0.0002521432
##     5   2 0.0004960317  0.0005042864
##     6  15 0.0037202381  0.0037821483
##     7   5 0.0012400794  0.0012607161
##     8  25 0.0062003968  0.0063035804
##     9  32 0.0079365079  0.0080685830
##    10  52 0.0128968254  0.0131114473
##    11  83 0.0205853175  0.0209278870
##    12 829 0.2056051587  0.2090267272
##    13 277 0.0687003968  0.0698436712
##    14 542 0.1344246032  0.1366616238
##    15 208 0.0515873016  0.0524457892
##    16 942 0.2336309524  0.2375189107
##    17 258 0.0639880952  0.0650529501
##    18 351 0.0870535714  0.0885022693
##    19 113 0.0280257937  0.0284921836
##    20 216 0.0535714286  0.0544629349
##    NA  66 0.0163690476            NA

gss2021_ZERODraft$subgroupeducCollege <-Recode(gss2021_ZERODraft$educ, recodes="0:12 = 0; 13:20 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupeducCollege)

##  subgroupeducCollege    n    percent valid_percent
##                    0 1059 0.26264881     0.2670197
##                    1 2907 0.72098214     0.7329803
##                 <NA>   66 0.01636905            NA

subgroupEduCollege_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupeducCollege==1, "College", "No College"))

tabyl(subgroupEduCollege_1)

##  subgroupEduCollege_1    n    percent valid_percent
##               College 2907 0.72098214     0.7329803
##            No College 1059 0.26264881     0.2670197
##                  <NA>   66 0.01636905            NA

education highschool only

gss2021_ZERODraft %>%
  tabyl(educ)

##  educ   n      percent valid_percent
##     0   9 0.0022321429  0.0022692890
##     1   1 0.0002480159  0.0002521432
##     2   2 0.0004960317  0.0005042864
##     3   3 0.0007440476  0.0007564297
##     4   1 0.0002480159  0.0002521432
##     5   2 0.0004960317  0.0005042864
##     6  15 0.0037202381  0.0037821483
##     7   5 0.0012400794  0.0012607161
##     8  25 0.0062003968  0.0063035804
##     9  32 0.0079365079  0.0080685830
##    10  52 0.0128968254  0.0131114473
##    11  83 0.0205853175  0.0209278870
##    12 829 0.2056051587  0.2090267272
##    13 277 0.0687003968  0.0698436712
##    14 542 0.1344246032  0.1366616238
##    15 208 0.0515873016  0.0524457892
##    16 942 0.2336309524  0.2375189107
##    17 258 0.0639880952  0.0650529501
##    18 351 0.0870535714  0.0885022693
##    19 113 0.0280257937  0.0284921836
##    20 216 0.0535714286  0.0544629349
##    NA  66 0.0163690476            NA

gss2021_ZERODraft$subgroupeducHS <-Recode(gss2021_ZERODraft$educ, recodes="0:11 = 0; 12:12 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupeducHS)

##  subgroupeducHS    n    percent valid_percent
##               0  230 0.05704365      0.217186
##               1  829 0.20560516      0.782814
##            <NA> 2973 0.73735119            NA

subgroupEduHS_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupeducHS==1, "Highschool", "No Highschool"))

tabyl(subgroupEduHS_1)

##  subgroupEduHS_1    n    percent valid_percent
##       Highschool  829 0.20560516      0.782814
##    No Highschool  230 0.05704365      0.217186
##             <NA> 2973 0.73735119            NA

sex

gss2021_ZERODraft %>%
  tabyl(sex)

##  sex    n    percent valid_percent
##    1 1736 0.43055556     0.4406091
##    2 2204 0.54662698     0.5593909
##   NA   92 0.02281746            NA

gss2021_ZERODraft$subgroupsex <-Recode(gss2021_ZERODraft$sex, recodes="1:1 = 0; 2:2 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupsex)

##  subgroupsex    n    percent valid_percent
##            0 1736 0.43055556     0.4406091
##            1 2204 0.54662698     0.5593909
##         <NA>   92 0.02281746            NA

subgroupsex_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupsex==1, "Women", "Men"))

tabyl(subgroupsex_1)

##  subgroupsex_1    n    percent valid_percent
##            Men 1736 0.43055556     0.4406091
##          Women 2204 0.54662698     0.5593909
##           <NA>   92 0.02281746            NA

income of household of respondent when age 16

gss2021_ZERODraft %>%
  tabyl(incom16)

##  incom16    n    percent valid_percent
##        1  421 0.10441468    0.11003659
##        2 1013 0.25124008    0.26476738
##        3 1625 0.40302579    0.42472556
##        4  679 0.16840278    0.17746994
##        5   88 0.02182540    0.02300052
##       NA  206 0.05109127            NA

gss2021_ZERODraft$subgroupincom16 <-Recode(gss2021_ZERODraft$incom16, recodes="1:2 = 0; 3:5 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupincom16)

##  subgroupincom16    n    percent valid_percent
##                0 1434 0.35565476      0.374804
##                1 2392 0.59325397      0.625196
##             <NA>  206 0.05109127            NA

subgroupincom16_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupincom16==1, "secure economic resources at 16", "insecure economic resources"))

tabyl(subgroupincom16_1)

##                subgroupincom16_1    n    percent valid_percent
##      insecure economic resources 1434 0.35565476      0.374804
##  secure economic resources at 16 2392 0.59325397      0.625196
##                             <NA>  206 0.05109127            NA

if respondent was born in the US

gss2021_ZERODraft %>%
  tabyl(born)

##  born    n    percent valid_percent
##     1 3516 0.87202381     0.8878788
##     2  444 0.11011905     0.1121212
##    NA   72 0.01785714            NA

gss2021_ZERODraft$subgroupBorn <-Recode(gss2021_ZERODraft$born, recodes="1:1 = 1; 2:2 = 0; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupBorn)

##  subgroupBorn    n    percent valid_percent
##             0  444 0.11011905     0.1121212
##             1 3516 0.87202381     0.8878788
##          <NA>   72 0.01785714            NA

subgroupborn_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupborn==1, "Born in US", "Not born in US"))

## Warning: Unknown or uninitialised column: `subgroupborn`.

tabyl(subgroupborn_1)

## [1] subgroupborn_1 n              percent       
## <0 rows> (or 0-length row.names)

if respondents ever been disrespected

gss2021_ZERODraft %>%
  tabyl(disrspct)

##  disrspct    n    percent valid_percent
##         1  136 0.03373016    0.05228758
##         2  231 0.05729167    0.08881200
##         3  327 0.08110119    0.12572088
##         4  801 0.19866071    0.30795848
##         5  552 0.13690476    0.21222607
##         6  554 0.13740079    0.21299500
##        NA 1431 0.35491071            NA

gss2021_ZERODraft$subgroupdisrspct <-Recode(gss2021_ZERODraft$disrspct, recodes="1:5 = 1; 6:6 = 0; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupdisrspct)

##  subgroupdisrspct    n   percent valid_percent
##                 0  554 0.1374008      0.212995
##                 1 2047 0.5076885      0.787005
##              <NA> 1431 0.3549107            NA

subgroupdisrspct_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupdisrspct==1, "respondent has been disrespected", "Not being disrespected"))

tabyl(subgroupdisrspct_1)

##                subgroupdisrspct_1    n   percent valid_percent
##            Not being disrespected  554 0.1374008      0.212995
##  respondent has been disrespected 2047 0.5076885      0.787005
##                              <NA> 1431 0.3549107            NA

if respondents ever been called or treated like they were not smart.

gss2021_ZERODraft %>%
  tabyl(notsmart)

##  notsmart    n    percent valid_percent
##         1  101 0.02504960    0.03881630
##         2  129 0.03199405    0.04957725
##         3  202 0.05009921    0.07763259
##         4  684 0.16964286    0.26287471
##         5  619 0.15352183    0.23789393
##         6  867 0.21502976    0.33320523
##        NA 1430 0.35466270            NA

gss2021_ZERODraft$subgroupnotsmart <-Recode(gss2021_ZERODraft$notsmart, recodes="1:5 = 1; 6:6 = 0; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupnotsmart)

##  subgroupnotsmart    n   percent valid_percent
##                 0  867 0.2150298     0.3332052
##                 1 1735 0.4303075     0.6667948
##              <NA> 1430 0.3546627            NA

subgroupnotsmart_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupnotsmart==1, "respondent was told or treated as if they are not smart", "Never experienced that sort of treatment of not being smart"))

tabyl(subgroupnotsmart_1)

##                                           subgroupnotsmart_1    n   percent
##  Never experienced that sort of treatment of not being smart  867 0.2150298
##      respondent was told or treated as if they are not smart 1735 0.4303075
##                                                         <NA> 1430 0.3546627
##  valid_percent
##      0.3332052
##      0.6667948
##             NA

weighted dataset introduced

(library(srvyr))

## 
## Attaching package: 'srvyr'

## The following objects are masked from 'package:plyr':
## 
##     mutate, rename, summarise, summarize

## The following object is masked from 'package:stats':
## 
##     filter

##  [1] "srvyr"        "stargazer"    "survey"       "survival"     "Matrix"      
##  [6] "grid"         "stringr"      "purrr"        "readr"        "tidyr"       
## [11] "tibble"       "tidyverse"    "forcats"      "car"          "carData"     
## [16] "Rmisc"        "lattice"      "summarytools" "plyr"         "sur"         
## [21] "scales"       "ggplot2"      "dplyr"        "janitor"      "haven"       
## [26] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [31] "methods"      "base"

options(survey.lonely.psu = "adjust")

des<-svydesign(ids=~1, strata=~vstrat, weights=~wtssnrps, data=gss2021_ZERODraft)

prop.table(svytable(~subgrouphis_1, design = des))

## subgrouphis_1
##     Hispanic Non Hispanic 
##      0.15227      0.84773

prop.table(svytable(~subgroupsex_1, design = des))

## subgroupsex_1
##       Men     Women 
## 0.4885876 0.5114124

prop.table(svytable(~subgroupdisrspct_1, design = des))

## subgroupdisrspct_1
##           Not being disrespected respondent has been disrespected 
##                         0.207361                         0.792639

prop.table(svytable(~subgroupEduCollege_1, design = des))

## subgroupEduCollege_1
##    College No College 
##  0.6458093  0.3541907

prop.table(svytable(~subgroupEduHS_1, design = des))

## subgroupEduHS_1
##    Highschool No Highschool 
##     0.7174914     0.2825086

prop.table(svytable(~subgroupincom16_1, design = des))

## subgroupincom16_1
##     insecure economic resources secure economic resources at 16 
##                       0.4039809                       0.5960191

prop.table(svytable(~subgroupnotsmart_1, design = des))

## subgroupnotsmart_1
## Never experienced that sort of treatment of not being smart 
##                                                   0.3178649 
##     respondent was told or treated as if they are not smart 
##                                                   0.6821351

prop.table(svytable(~subgroupBorn, design = des))

## subgroupBorn
##         0         1 
## 0.1631772 0.8368228

the introduction of the logistic regression model

library(car)
library(stargazer)
library(survey)
library(questionr)

## 
## Attaching package: 'questionr'

## The following object is masked from 'package:summarytools':
## 
##     freq

library(dplyr)
library(tidyverse)

Logit Model HS Educ

fit.logitHS1<-svyglm(subgroupeducHS ~ subgrouphis + subgroupsex + subgroupBorn + subgroupincom16 + subgroupdisrspct + subgroupnotsmart, 
                  design = des,
                  family = binomial)

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

library(gtsummary)

## 
## Attaching package: 'gtsummary'

## The following object is masked from 'package:plyr':
## 
##     mutate

fit.logitHS1 %>%
  tbl_regression(exponentiate=TRUE)

Characteristic	OR¹	95% CI¹	p-value
subgrouphis
0	—	—
1	0.48	0.26, 0.92	0.026
subgroupsex
0	—	—
1	1.17	0.73, 1.85	0.5
subgroupBorn
0	—	—
1	0.87	0.41, 1.83	0.7
subgroupincom16
0	—	—
1	1.19	0.75, 1.91	0.5
subgroupdisrspct
0	—	—
1	0.84	0.44, 1.59	0.6
subgroupnotsmart
0	—	—
1	0.70	0.39, 1.27	0.2
¹ OR = Odds Ratio, CI = Confidence Interval

Without Parental Factors

fit.logitHS1NAPF<-svyglm(subgroupeducHS ~ subgrouphis + subgroupsex +  subgroupdisrspct + subgroupnotsmart, 
                  design = des,
                  family = binomial)

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

library(gtsummary)
fit.logitHS1NAPF %>%
  tbl_regression(exponentiate=TRUE)

Characteristic	OR¹	95% CI¹	p-value
subgrouphis
0	—	—
1	0.55	0.31, 0.99	0.048
subgroupsex
0	—	—
1	1.00	0.63, 1.58	>0.9
subgroupdisrspct
0	—	—
1	0.96	0.52, 1.79	>0.9
subgroupnotsmart
0	—	—
1	0.72	0.41, 1.26	0.3
¹ OR = Odds Ratio, CI = Confidence Interval

Without Factors of Percieved Discrimination

fit.logitHS1NAPFPD<-svyglm(subgroupeducHS ~ subgrouphis + subgroupsex, 
                  design = des,
                  family = binomial)

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

library(gtsummary)
fit.logitHS1NAPFPD %>%
  tbl_regression(exponentiate=TRUE)

Characteristic	OR¹	95% CI¹	p-value
subgrouphis
0	—	—
1	0.51	0.32, 0.80	0.004
subgroupsex
0	—	—
1	0.85	0.59, 1.23	0.4
¹ OR = Odds Ratio, CI = Confidence Interval

Logit Model College Educ

fit.logitCollege1<-svyglm(subgroupeducCollege ~ subgrouphis + subgroupsex + subgroupBorn + subgroupincom16 + subgroupdisrspct + subgroupnotsmart, 
                  design = des,
                  family = binomial)

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

library(gtsummary)
fit.logitCollege1 %>%
  tbl_regression(exponentiate=TRUE)

Characteristic	OR¹	95% CI¹	p-value
subgrouphis
0	—	—
1	0.45	0.31, 0.65	<0.001
subgroupsex
0	—	—
1	0.92	0.73, 1.15	0.4
subgroupBorn
0	—	—
1	0.72	0.48, 1.09	0.12
subgroupincom16
0	—	—
1	1.52	1.21, 1.92	<0.001
subgroupdisrspct
0	—	—
1	1.30	0.96, 1.77	0.086
subgroupnotsmart
0	—	—
1	1.02	0.77, 1.34	>0.9
¹ OR = Odds Ratio, CI = Confidence Interval

Without Parental Factors

fit.logitCollege1NAPF<-svyglm(subgroupeducCollege ~ subgrouphis + subgroupsex +  subgroupdisrspct + subgroupnotsmart, 
                  design = des,
                  family = binomial)

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

library(gtsummary)
fit.logitCollege1NAPF %>%
  tbl_regression(exponentiate=TRUE)

Characteristic	OR¹	95% CI¹	p-value
subgrouphis
0	—	—
1	0.46	0.33, 0.64	<0.001
subgroupsex
0	—	—
1	0.92	0.74, 1.14	0.5
subgroupdisrspct
0	—	—
1	1.40	1.04, 1.87	0.024
subgroupnotsmart
0	—	—
1	0.97	0.75, 1.27	0.8
¹ OR = Odds Ratio, CI = Confidence Interval

Without Factors of Percieved Discrimination

fit.logitCollege1NAPFPD<-svyglm(subgroupeducCollege ~ subgrouphis + subgroupsex, 
                  design = des,
                  family = binomial)

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

library(gtsummary)
fit.logitCollege1NAPFPD %>%
  tbl_regression(exponentiate=TRUE)

Characteristic	OR¹	95% CI¹	p-value
subgrouphis
0	—	—
1	0.52	0.40, 0.68	<0.001
subgroupsex
0	—	—
1	0.94	0.79, 1.12	0.5
¹ OR = Odds Ratio, CI = Confidence Interval

Homework 3 STATS2

Brandon Flores

Febuary 14, 2022

recode hispanic

recode education college

education highschool only

sex

income of household of respondent when age 16

if respondent was born in the US

if respondents ever been disrespected

if respondents ever been called or treated like they were not smart.

weighted dataset introduced

the introduction of the logistic regression model

Logit Model HS Educ

Without Parental Factors

Without Factors of Percieved Discrimination

Logit Model College Educ

Without Parental Factors

Without Factors of Percieved Discrimination