The binary outcome variable that I will use will be educational attainment of the respondent. The way that I will recorde this variable so that it is binary is by dummy coding the variable called "educ" which measures the respondents current highest educational attainment. This variable will be transformed into two different variables. One will be called college and the other will be called high school. For college it will be recoded as 0 being all grade levels from 12th grade and under (0-12) which will represent not going to college. While 1 will be recoded with all levels of education from first year of college onward (13-20) to represent the respondent having some degree of college education. For the highschool variable, all levels of education from 0-11 will be recoded as 0 to represent not finishing high school. While those who obtained a high school education will be recoded into 1. Those who have a education higher than that of a high school equivelent they will be recoded as missing to help isolate this scope of the research. So two true binary outcome variables will be present in the study to help observe those who achieve at least a high school diploma and those who have some college, respectively.
The variable representing highschool attainment only will be named subgroupEduHS_1 and for those with college education subgroupCollege_1.
My research question is to analyze any potential difference in educational attainment between Hispanics and non-Hispanics due to parental capital and the effects of racialization by sex.
The factors that I believe will effect my outcome variable will be the Hispanic, sex, income of family when 16 years old (INCOM16), if the respondent was born in the US (BORN), if the respondent has ever been treated with less repect than others before (DISRSPCT), and if the respondent has ever been seen by others as not being smart (NOTSMART).
All variables will be recoded so that Hispanic is 1 and Non-Hispanic will be 0 turning into subgroupHis_1.
For sex men will be recoded into 1 and women will be recoded into 0 turning into subgroupsex_1.
Income of family when respondent was 16 will be recoded with options 3 (Average), 4 (Above Average), and 5 (Far Above Average) will be recoded into 1. While 2 (Below Average) and 1 (Far Below Average) are recoded into 0 with this variable being renamed as subgroupincom16_1.
The variable BORN will be recoded into 1 being Yes and 2 being recoded to 0 as No transforming this variable into born_1.
For the variable NOTSMART it will be recoded as categories 1 (almost every day) through 5 (less than once a year) into 1 and 6 (never) into 0 transforming this variable into subgroupnotsmart_1.
For the variable DISRSPCT it will be recoded with categories 1 (almost every day) through 5 (less than once a year) into 1 and category 6 (never) into 0. Transforming the variable into subgroupdisrspct_1.
library(haven)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(scales)
library(sur)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
library(summarytools)
library(Rmisc)
## Loading required package: lattice
library(car)
## Loading required package: carData
##
## Attaching package: 'carData'
## The following objects are masked from 'package:sur':
##
## Anscombe, States
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
library(forcats)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble 3.1.6 v purrr 0.3.4
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x plyr::arrange() masks dplyr::arrange()
## x readr::col_factor() masks scales::col_factor()
## x purrr::compact() masks plyr::compact()
## x plyr::count() masks dplyr::count()
## x purrr::discard() masks scales::discard()
## x plyr::failwith() masks dplyr::failwith()
## x dplyr::filter() masks stats::filter()
## x plyr::id() masks dplyr::id()
## x dplyr::lag() masks stats::lag()
## x plyr::mutate() masks dplyr::mutate()
## x car::recode() masks dplyr::recode()
## x plyr::rename() masks dplyr::rename()
## x purrr::some() masks car::some()
## x plyr::summarise() masks dplyr::summarise()
## x plyr::summarize() masks dplyr::summarize()
## x tibble::view() masks summarytools::view()
library(survey)
## Loading required package: grid
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Loading required package: survival
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
library(grid)
library(Matrix)
gss2021_ZERODraft<-read_dta("C:\\Users\\BTP\\Desktop\\STATS 2 FOLDER\\2021_sas\\gss2021.dta")
gss2021_ZERODraft %>%
tabyl(hispanic)
## hispanic n percent valid_percent
## 1 3544 0.8789682540 0.8864432216
## 2 245 0.0607638889 0.0612806403
## 3 51 0.0126488095 0.0127563782
## 4 21 0.0052083333 0.0052526263
## 5 9 0.0022321429 0.0022511256
## 6 2 0.0004960317 0.0005002501
## 7 4 0.0009920635 0.0010005003
## 8 2 0.0004960317 0.0005002501
## 9 4 0.0009920635 0.0010005003
## 10 5 0.0012400794 0.0012506253
## 11 3 0.0007440476 0.0007503752
## 15 17 0.0042162698 0.0042521261
## 20 4 0.0009920635 0.0010005003
## 21 8 0.0019841270 0.0020010005
## 22 10 0.0024801587 0.0025012506
## 23 5 0.0012400794 0.0012506253
## 24 1 0.0002480159 0.0002501251
## 25 1 0.0002480159 0.0002501251
## 30 46 0.0114087302 0.0115057529
## 35 1 0.0002480159 0.0002501251
## 41 3 0.0007440476 0.0007503752
## 46 2 0.0004960317 0.0005002501
## 47 7 0.0017361111 0.0017508754
## 50 3 0.0007440476 0.0007503752
## NA 34 0.0084325397 NA
gss2021_ZERODraft %>%
tabyl(educ)
## educ n percent valid_percent
## 0 9 0.0022321429 0.0022692890
## 1 1 0.0002480159 0.0002521432
## 2 2 0.0004960317 0.0005042864
## 3 3 0.0007440476 0.0007564297
## 4 1 0.0002480159 0.0002521432
## 5 2 0.0004960317 0.0005042864
## 6 15 0.0037202381 0.0037821483
## 7 5 0.0012400794 0.0012607161
## 8 25 0.0062003968 0.0063035804
## 9 32 0.0079365079 0.0080685830
## 10 52 0.0128968254 0.0131114473
## 11 83 0.0205853175 0.0209278870
## 12 829 0.2056051587 0.2090267272
## 13 277 0.0687003968 0.0698436712
## 14 542 0.1344246032 0.1366616238
## 15 208 0.0515873016 0.0524457892
## 16 942 0.2336309524 0.2375189107
## 17 258 0.0639880952 0.0650529501
## 18 351 0.0870535714 0.0885022693
## 19 113 0.0280257937 0.0284921836
## 20 216 0.0535714286 0.0544629349
## NA 66 0.0163690476 NA
gss2021_ZERODraft %>%
tabyl(sex)
## sex n percent valid_percent
## 1 1736 0.43055556 0.4406091
## 2 2204 0.54662698 0.5593909
## NA 92 0.02281746 NA
gss2021_ZERODraft %>%
tabyl(incom16)
## incom16 n percent valid_percent
## 1 421 0.10441468 0.11003659
## 2 1013 0.25124008 0.26476738
## 3 1625 0.40302579 0.42472556
## 4 679 0.16840278 0.17746994
## 5 88 0.02182540 0.02300052
## NA 206 0.05109127 NA
gss2021_ZERODraft %>%
tabyl(born)
## born n percent valid_percent
## 1 3516 0.87202381 0.8878788
## 2 444 0.11011905 0.1121212
## NA 72 0.01785714 NA
gss2021_ZERODraft %>%
tabyl(disrspct)
## disrspct n percent valid_percent
## 1 136 0.03373016 0.05228758
## 2 231 0.05729167 0.08881200
## 3 327 0.08110119 0.12572088
## 4 801 0.19866071 0.30795848
## 5 552 0.13690476 0.21222607
## 6 554 0.13740079 0.21299500
## NA 1431 0.35491071 NA
gss2021_ZERODraft %>%
tabyl(notsmart)
## notsmart n percent valid_percent
## 1 101 0.02504960 0.03881630
## 2 129 0.03199405 0.04957725
## 3 202 0.05009921 0.07763259
## 4 684 0.16964286 0.26287471
## 5 619 0.15352183 0.23789393
## 6 867 0.21502976 0.33320523
## NA 1430 0.35466270 NA
recode hispanic
gss2021_ZERODraft %>%
tabyl(hispanic)
## hispanic n percent valid_percent
## 1 3544 0.8789682540 0.8864432216
## 2 245 0.0607638889 0.0612806403
## 3 51 0.0126488095 0.0127563782
## 4 21 0.0052083333 0.0052526263
## 5 9 0.0022321429 0.0022511256
## 6 2 0.0004960317 0.0005002501
## 7 4 0.0009920635 0.0010005003
## 8 2 0.0004960317 0.0005002501
## 9 4 0.0009920635 0.0010005003
## 10 5 0.0012400794 0.0012506253
## 11 3 0.0007440476 0.0007503752
## 15 17 0.0042162698 0.0042521261
## 20 4 0.0009920635 0.0010005003
## 21 8 0.0019841270 0.0020010005
## 22 10 0.0024801587 0.0025012506
## 23 5 0.0012400794 0.0012506253
## 24 1 0.0002480159 0.0002501251
## 25 1 0.0002480159 0.0002501251
## 30 46 0.0114087302 0.0115057529
## 35 1 0.0002480159 0.0002501251
## 41 3 0.0007440476 0.0007503752
## 46 2 0.0004960317 0.0005002501
## 47 7 0.0017361111 0.0017508754
## 50 3 0.0007440476 0.0007503752
## NA 34 0.0084325397 NA
gss2021_ZERODraft$subgrouphis <-Recode(gss2021_ZERODraft$hispanic, recodes="1 = 0; 2:50 = 1; else=NA", as.factor=T)
gss2021_ZERODraft %>%
tabyl(subgrouphis)
## subgrouphis n percent valid_percent
## 0 3544 0.87896825 0.8864432
## 1 454 0.11259921 0.1135568
## <NA> 34 0.00843254 NA
subgrouphis_1<-as.factor(ifelse(gss2021_ZERODraft$subgrouphis==1, "Hispanic", "Non Hispanic"))
tabyl(subgrouphis_1)
## subgrouphis_1 n percent valid_percent
## Hispanic 454 0.11259921 0.1135568
## Non Hispanic 3544 0.87896825 0.8864432
## <NA> 34 0.00843254 NA
recode education college
gss2021_ZERODraft %>%
tabyl(educ)
## educ n percent valid_percent
## 0 9 0.0022321429 0.0022692890
## 1 1 0.0002480159 0.0002521432
## 2 2 0.0004960317 0.0005042864
## 3 3 0.0007440476 0.0007564297
## 4 1 0.0002480159 0.0002521432
## 5 2 0.0004960317 0.0005042864
## 6 15 0.0037202381 0.0037821483
## 7 5 0.0012400794 0.0012607161
## 8 25 0.0062003968 0.0063035804
## 9 32 0.0079365079 0.0080685830
## 10 52 0.0128968254 0.0131114473
## 11 83 0.0205853175 0.0209278870
## 12 829 0.2056051587 0.2090267272
## 13 277 0.0687003968 0.0698436712
## 14 542 0.1344246032 0.1366616238
## 15 208 0.0515873016 0.0524457892
## 16 942 0.2336309524 0.2375189107
## 17 258 0.0639880952 0.0650529501
## 18 351 0.0870535714 0.0885022693
## 19 113 0.0280257937 0.0284921836
## 20 216 0.0535714286 0.0544629349
## NA 66 0.0163690476 NA
gss2021_ZERODraft$subgroupeducCollege <-Recode(gss2021_ZERODraft$educ, recodes="0:12 = 0; 13:20 = 1; else=NA", as.factor=T)
gss2021_ZERODraft %>%
tabyl(subgroupeducCollege)
## subgroupeducCollege n percent valid_percent
## 0 1059 0.26264881 0.2670197
## 1 2907 0.72098214 0.7329803
## <NA> 66 0.01636905 NA
subgroupEduCollege_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupeducCollege==1, "College", "No College"))
tabyl(subgroupEduCollege_1)
## subgroupEduCollege_1 n percent valid_percent
## College 2907 0.72098214 0.7329803
## No College 1059 0.26264881 0.2670197
## <NA> 66 0.01636905 NA
education highschool only
gss2021_ZERODraft %>%
tabyl(educ)
## educ n percent valid_percent
## 0 9 0.0022321429 0.0022692890
## 1 1 0.0002480159 0.0002521432
## 2 2 0.0004960317 0.0005042864
## 3 3 0.0007440476 0.0007564297
## 4 1 0.0002480159 0.0002521432
## 5 2 0.0004960317 0.0005042864
## 6 15 0.0037202381 0.0037821483
## 7 5 0.0012400794 0.0012607161
## 8 25 0.0062003968 0.0063035804
## 9 32 0.0079365079 0.0080685830
## 10 52 0.0128968254 0.0131114473
## 11 83 0.0205853175 0.0209278870
## 12 829 0.2056051587 0.2090267272
## 13 277 0.0687003968 0.0698436712
## 14 542 0.1344246032 0.1366616238
## 15 208 0.0515873016 0.0524457892
## 16 942 0.2336309524 0.2375189107
## 17 258 0.0639880952 0.0650529501
## 18 351 0.0870535714 0.0885022693
## 19 113 0.0280257937 0.0284921836
## 20 216 0.0535714286 0.0544629349
## NA 66 0.0163690476 NA
gss2021_ZERODraft$subgroupeducHS <-Recode(gss2021_ZERODraft$educ, recodes="0:11 = 0; 12:12 = 1; else=NA", as.factor=T)
gss2021_ZERODraft %>%
tabyl(subgroupeducHS)
## subgroupeducHS n percent valid_percent
## 0 230 0.05704365 0.217186
## 1 829 0.20560516 0.782814
## <NA> 2973 0.73735119 NA
subgroupEduHS_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupeducHS==1, "Highschool", "No Highschool"))
tabyl(subgroupEduHS_1)
## subgroupEduHS_1 n percent valid_percent
## Highschool 829 0.20560516 0.782814
## No Highschool 230 0.05704365 0.217186
## <NA> 2973 0.73735119 NA
sex
gss2021_ZERODraft %>%
tabyl(sex)
## sex n percent valid_percent
## 1 1736 0.43055556 0.4406091
## 2 2204 0.54662698 0.5593909
## NA 92 0.02281746 NA
gss2021_ZERODraft$subgroupsex <-Recode(gss2021_ZERODraft$sex, recodes="1:1 = 0; 2:2 = 1; else=NA", as.factor=T)
gss2021_ZERODraft %>%
tabyl(subgroupsex)
## subgroupsex n percent valid_percent
## 0 1736 0.43055556 0.4406091
## 1 2204 0.54662698 0.5593909
## <NA> 92 0.02281746 NA
subgroupsex_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupsex==1, "Women", "Men"))
tabyl(subgroupsex_1)
## subgroupsex_1 n percent valid_percent
## Men 1736 0.43055556 0.4406091
## Women 2204 0.54662698 0.5593909
## <NA> 92 0.02281746 NA
income of household of respondent when age 16
gss2021_ZERODraft %>%
tabyl(incom16)
## incom16 n percent valid_percent
## 1 421 0.10441468 0.11003659
## 2 1013 0.25124008 0.26476738
## 3 1625 0.40302579 0.42472556
## 4 679 0.16840278 0.17746994
## 5 88 0.02182540 0.02300052
## NA 206 0.05109127 NA
gss2021_ZERODraft$subgroupincom16 <-Recode(gss2021_ZERODraft$incom16, recodes="1:2 = 0; 3:5 = 1; else=NA", as.factor=T)
gss2021_ZERODraft %>%
tabyl(subgroupincom16)
## subgroupincom16 n percent valid_percent
## 0 1434 0.35565476 0.374804
## 1 2392 0.59325397 0.625196
## <NA> 206 0.05109127 NA
subgroupincom16_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupincom16==1, "secure economic resources at 16", "insecure economic resources"))
tabyl(subgroupincom16_1)
## subgroupincom16_1 n percent valid_percent
## insecure economic resources 1434 0.35565476 0.374804
## secure economic resources at 16 2392 0.59325397 0.625196
## <NA> 206 0.05109127 NA
if respondent was born in the US
gss2021_ZERODraft %>%
tabyl(born)
## born n percent valid_percent
## 1 3516 0.87202381 0.8878788
## 2 444 0.11011905 0.1121212
## NA 72 0.01785714 NA
gss2021_ZERODraft$subgroupBorn <-Recode(gss2021_ZERODraft$born, recodes="1:1 = 1; 2:2 = 0; else=NA", as.factor=T)
gss2021_ZERODraft %>%
tabyl(subgroupBorn)
## subgroupBorn n percent valid_percent
## 0 444 0.11011905 0.1121212
## 1 3516 0.87202381 0.8878788
## <NA> 72 0.01785714 NA
subgroupborn_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupborn==1, "Born in US", "Not born in US"))
## Warning: Unknown or uninitialised column: `subgroupborn`.
tabyl(subgroupborn_1)
## [1] subgroupborn_1 n percent
## <0 rows> (or 0-length row.names)
if respondents ever been disrespected
gss2021_ZERODraft %>%
tabyl(disrspct)
## disrspct n percent valid_percent
## 1 136 0.03373016 0.05228758
## 2 231 0.05729167 0.08881200
## 3 327 0.08110119 0.12572088
## 4 801 0.19866071 0.30795848
## 5 552 0.13690476 0.21222607
## 6 554 0.13740079 0.21299500
## NA 1431 0.35491071 NA
gss2021_ZERODraft$subgroupdisrspct <-Recode(gss2021_ZERODraft$disrspct, recodes="1:5 = 1; 6:6 = 0; else=NA", as.factor=T)
gss2021_ZERODraft %>%
tabyl(subgroupdisrspct)
## subgroupdisrspct n percent valid_percent
## 0 554 0.1374008 0.212995
## 1 2047 0.5076885 0.787005
## <NA> 1431 0.3549107 NA
subgroupdisrspct_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupdisrspct==1, "respondent has been disrespected", "Not being disrespected"))
tabyl(subgroupdisrspct_1)
## subgroupdisrspct_1 n percent valid_percent
## Not being disrespected 554 0.1374008 0.212995
## respondent has been disrespected 2047 0.5076885 0.787005
## <NA> 1431 0.3549107 NA
if respondents ever been called or treated like they were not smart.
gss2021_ZERODraft %>%
tabyl(notsmart)
## notsmart n percent valid_percent
## 1 101 0.02504960 0.03881630
## 2 129 0.03199405 0.04957725
## 3 202 0.05009921 0.07763259
## 4 684 0.16964286 0.26287471
## 5 619 0.15352183 0.23789393
## 6 867 0.21502976 0.33320523
## NA 1430 0.35466270 NA
gss2021_ZERODraft$subgroupnotsmart <-Recode(gss2021_ZERODraft$notsmart, recodes="1:5 = 1; 6:6 = 0; else=NA", as.factor=T)
gss2021_ZERODraft %>%
tabyl(subgroupnotsmart)
## subgroupnotsmart n percent valid_percent
## 0 867 0.2150298 0.3332052
## 1 1735 0.4303075 0.6667948
## <NA> 1430 0.3546627 NA
subgroupnotsmart_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupnotsmart==1, "respondent was told or treated as if they are not smart", "Never experienced that sort of treatment of not being smart"))
tabyl(subgroupnotsmart_1)
## subgroupnotsmart_1 n percent
## Never experienced that sort of treatment of not being smart 867 0.2150298
## respondent was told or treated as if they are not smart 1735 0.4303075
## <NA> 1430 0.3546627
## valid_percent
## 0.3332052
## 0.6667948
## NA