Homework 2 STATS2

The binary outcome variable that I will  use will be educational attainment of the respondent. The way that I will recorde this variable so that it is binary is by dummy coding the variable called "educ" which measures the respondents current highest educational attainment. This variable will be transformed into two different variables. One will be called college and the other will be called high school. For college it will be recoded as 0 being all grade levels from 12th grade and under (0-12) which will represent not going to college. While 1 will be recoded with all levels of education from first year of college onward (13-20) to represent the respondent having some degree of college education. For the highschool variable, all levels of education from 0-11 will be recoded as 0 to represent not finishing high school. While those who obtained a high school education will be recoded into 1. Those who have a education higher than that of a high school equivelent they will be recoded as missing to help isolate this scope of the research. So two true binary outcome variables will be present in the study to help observe those who achieve at least a high school diploma and those who have some college, respectively. 

The variable representing highschool attainment only will be named subgroupEduHS_1 and for those with college education subgroupCollege_1.

My research question is to analyze any potential difference in educational attainment between Hispanics and non-Hispanics due to parental capital and the effects of racialization by sex. 

The factors that I believe will effect my outcome variable will be the Hispanic, sex, income of family when 16 years old (INCOM16), if the respondent was born in the US (BORN), if the respondent has ever been treated with less repect than others before (DISRSPCT), and if the respondent has ever been seen by others as not being smart (NOTSMART). 

All variables will be recoded so that Hispanic is 1 and Non-Hispanic will be 0 turning into subgroupHis_1. 

For sex men will be recoded into 1 and women will be recoded into 0 turning into subgroupsex_1. 

Income of family when respondent was 16 will be recoded with options 3 (Average), 4 (Above Average), and 5 (Far Above Average) will be recoded into 1. While 2 (Below Average) and 1 (Far Below Average) are recoded into 0 with this variable being renamed as subgroupincom16_1. 

The variable BORN will be recoded into 1 being Yes and 2 being recoded to 0 as No transforming this variable into born_1. 

For the variable NOTSMART it will be recoded as categories 1 (almost every day) through 5 (less than once a year) into 1 and 6 (never) into 0 transforming this variable into subgroupnotsmart_1. 

For the variable DISRSPCT it will be recoded with categories 1 (almost every day) through 5 (less than once a year) into 1 and category 6 (never) into 0. Transforming the variable into subgroupdisrspct_1.

library(haven)
library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(scales)
library(sur)
library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

library(summarytools)
library(Rmisc)

## Loading required package: lattice

library(car)

## Loading required package: carData

## 
## Attaching package: 'carData'

## The following objects are masked from 'package:sur':
## 
##     Anscombe, States

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

library(forcats)
library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v tibble  3.1.6     v purrr   0.3.4
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x plyr::arrange()     masks dplyr::arrange()
## x readr::col_factor() masks scales::col_factor()
## x purrr::compact()    masks plyr::compact()
## x plyr::count()       masks dplyr::count()
## x purrr::discard()    masks scales::discard()
## x plyr::failwith()    masks dplyr::failwith()
## x dplyr::filter()     masks stats::filter()
## x plyr::id()          masks dplyr::id()
## x dplyr::lag()        masks stats::lag()
## x plyr::mutate()      masks dplyr::mutate()
## x car::recode()       masks dplyr::recode()
## x plyr::rename()      masks dplyr::rename()
## x purrr::some()       masks car::some()
## x plyr::summarise()   masks dplyr::summarise()
## x plyr::summarize()   masks dplyr::summarize()
## x tibble::view()      masks summarytools::view()

library(survey)

## Loading required package: grid

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## Loading required package: survival

## 
## Attaching package: 'survey'

## The following object is masked from 'package:graphics':
## 
##     dotchart

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

library(grid)
library(Matrix)

gss2021_ZERODraft<-read_dta("C:\\Users\\BTP\\Desktop\\STATS 2 FOLDER\\2021_sas\\gss2021.dta")

gss2021_ZERODraft %>%
tabyl(hispanic)

##  hispanic    n      percent valid_percent
##         1 3544 0.8789682540  0.8864432216
##         2  245 0.0607638889  0.0612806403
##         3   51 0.0126488095  0.0127563782
##         4   21 0.0052083333  0.0052526263
##         5    9 0.0022321429  0.0022511256
##         6    2 0.0004960317  0.0005002501
##         7    4 0.0009920635  0.0010005003
##         8    2 0.0004960317  0.0005002501
##         9    4 0.0009920635  0.0010005003
##        10    5 0.0012400794  0.0012506253
##        11    3 0.0007440476  0.0007503752
##        15   17 0.0042162698  0.0042521261
##        20    4 0.0009920635  0.0010005003
##        21    8 0.0019841270  0.0020010005
##        22   10 0.0024801587  0.0025012506
##        23    5 0.0012400794  0.0012506253
##        24    1 0.0002480159  0.0002501251
##        25    1 0.0002480159  0.0002501251
##        30   46 0.0114087302  0.0115057529
##        35    1 0.0002480159  0.0002501251
##        41    3 0.0007440476  0.0007503752
##        46    2 0.0004960317  0.0005002501
##        47    7 0.0017361111  0.0017508754
##        50    3 0.0007440476  0.0007503752
##        NA   34 0.0084325397            NA

gss2021_ZERODraft %>%
tabyl(educ)

##  educ   n      percent valid_percent
##     0   9 0.0022321429  0.0022692890
##     1   1 0.0002480159  0.0002521432
##     2   2 0.0004960317  0.0005042864
##     3   3 0.0007440476  0.0007564297
##     4   1 0.0002480159  0.0002521432
##     5   2 0.0004960317  0.0005042864
##     6  15 0.0037202381  0.0037821483
##     7   5 0.0012400794  0.0012607161
##     8  25 0.0062003968  0.0063035804
##     9  32 0.0079365079  0.0080685830
##    10  52 0.0128968254  0.0131114473
##    11  83 0.0205853175  0.0209278870
##    12 829 0.2056051587  0.2090267272
##    13 277 0.0687003968  0.0698436712
##    14 542 0.1344246032  0.1366616238
##    15 208 0.0515873016  0.0524457892
##    16 942 0.2336309524  0.2375189107
##    17 258 0.0639880952  0.0650529501
##    18 351 0.0870535714  0.0885022693
##    19 113 0.0280257937  0.0284921836
##    20 216 0.0535714286  0.0544629349
##    NA  66 0.0163690476            NA

gss2021_ZERODraft %>%
tabyl(sex)

##  sex    n    percent valid_percent
##    1 1736 0.43055556     0.4406091
##    2 2204 0.54662698     0.5593909
##   NA   92 0.02281746            NA

gss2021_ZERODraft %>%
  tabyl(incom16)

##  incom16    n    percent valid_percent
##        1  421 0.10441468    0.11003659
##        2 1013 0.25124008    0.26476738
##        3 1625 0.40302579    0.42472556
##        4  679 0.16840278    0.17746994
##        5   88 0.02182540    0.02300052
##       NA  206 0.05109127            NA

gss2021_ZERODraft %>%
  tabyl(born)

##  born    n    percent valid_percent
##     1 3516 0.87202381     0.8878788
##     2  444 0.11011905     0.1121212
##    NA   72 0.01785714            NA

gss2021_ZERODraft %>%
  tabyl(disrspct)

##  disrspct    n    percent valid_percent
##         1  136 0.03373016    0.05228758
##         2  231 0.05729167    0.08881200
##         3  327 0.08110119    0.12572088
##         4  801 0.19866071    0.30795848
##         5  552 0.13690476    0.21222607
##         6  554 0.13740079    0.21299500
##        NA 1431 0.35491071            NA

gss2021_ZERODraft %>%
  tabyl(notsmart)

##  notsmart    n    percent valid_percent
##         1  101 0.02504960    0.03881630
##         2  129 0.03199405    0.04957725
##         3  202 0.05009921    0.07763259
##         4  684 0.16964286    0.26287471
##         5  619 0.15352183    0.23789393
##         6  867 0.21502976    0.33320523
##        NA 1430 0.35466270            NA

recode hispanic

gss2021_ZERODraft %>%
  tabyl(hispanic)

##  hispanic    n      percent valid_percent
##         1 3544 0.8789682540  0.8864432216
##         2  245 0.0607638889  0.0612806403
##         3   51 0.0126488095  0.0127563782
##         4   21 0.0052083333  0.0052526263
##         5    9 0.0022321429  0.0022511256
##         6    2 0.0004960317  0.0005002501
##         7    4 0.0009920635  0.0010005003
##         8    2 0.0004960317  0.0005002501
##         9    4 0.0009920635  0.0010005003
##        10    5 0.0012400794  0.0012506253
##        11    3 0.0007440476  0.0007503752
##        15   17 0.0042162698  0.0042521261
##        20    4 0.0009920635  0.0010005003
##        21    8 0.0019841270  0.0020010005
##        22   10 0.0024801587  0.0025012506
##        23    5 0.0012400794  0.0012506253
##        24    1 0.0002480159  0.0002501251
##        25    1 0.0002480159  0.0002501251
##        30   46 0.0114087302  0.0115057529
##        35    1 0.0002480159  0.0002501251
##        41    3 0.0007440476  0.0007503752
##        46    2 0.0004960317  0.0005002501
##        47    7 0.0017361111  0.0017508754
##        50    3 0.0007440476  0.0007503752
##        NA   34 0.0084325397            NA

gss2021_ZERODraft$subgrouphis <-Recode(gss2021_ZERODraft$hispanic, recodes="1 = 0; 2:50 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgrouphis)

##  subgrouphis    n    percent valid_percent
##            0 3544 0.87896825     0.8864432
##            1  454 0.11259921     0.1135568
##         <NA>   34 0.00843254            NA

subgrouphis_1<-as.factor(ifelse(gss2021_ZERODraft$subgrouphis==1, "Hispanic", "Non Hispanic"))

tabyl(subgrouphis_1)

##  subgrouphis_1    n    percent valid_percent
##       Hispanic  454 0.11259921     0.1135568
##   Non Hispanic 3544 0.87896825     0.8864432
##           <NA>   34 0.00843254            NA

recode education college

gss2021_ZERODraft %>%
  tabyl(educ)

##  educ   n      percent valid_percent
##     0   9 0.0022321429  0.0022692890
##     1   1 0.0002480159  0.0002521432
##     2   2 0.0004960317  0.0005042864
##     3   3 0.0007440476  0.0007564297
##     4   1 0.0002480159  0.0002521432
##     5   2 0.0004960317  0.0005042864
##     6  15 0.0037202381  0.0037821483
##     7   5 0.0012400794  0.0012607161
##     8  25 0.0062003968  0.0063035804
##     9  32 0.0079365079  0.0080685830
##    10  52 0.0128968254  0.0131114473
##    11  83 0.0205853175  0.0209278870
##    12 829 0.2056051587  0.2090267272
##    13 277 0.0687003968  0.0698436712
##    14 542 0.1344246032  0.1366616238
##    15 208 0.0515873016  0.0524457892
##    16 942 0.2336309524  0.2375189107
##    17 258 0.0639880952  0.0650529501
##    18 351 0.0870535714  0.0885022693
##    19 113 0.0280257937  0.0284921836
##    20 216 0.0535714286  0.0544629349
##    NA  66 0.0163690476            NA

gss2021_ZERODraft$subgroupeducCollege <-Recode(gss2021_ZERODraft$educ, recodes="0:12 = 0; 13:20 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupeducCollege)

##  subgroupeducCollege    n    percent valid_percent
##                    0 1059 0.26264881     0.2670197
##                    1 2907 0.72098214     0.7329803
##                 <NA>   66 0.01636905            NA

subgroupEduCollege_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupeducCollege==1, "College", "No College"))

tabyl(subgroupEduCollege_1)

##  subgroupEduCollege_1    n    percent valid_percent
##               College 2907 0.72098214     0.7329803
##            No College 1059 0.26264881     0.2670197
##                  <NA>   66 0.01636905            NA

education highschool only

gss2021_ZERODraft %>%
  tabyl(educ)

##  educ   n      percent valid_percent
##     0   9 0.0022321429  0.0022692890
##     1   1 0.0002480159  0.0002521432
##     2   2 0.0004960317  0.0005042864
##     3   3 0.0007440476  0.0007564297
##     4   1 0.0002480159  0.0002521432
##     5   2 0.0004960317  0.0005042864
##     6  15 0.0037202381  0.0037821483
##     7   5 0.0012400794  0.0012607161
##     8  25 0.0062003968  0.0063035804
##     9  32 0.0079365079  0.0080685830
##    10  52 0.0128968254  0.0131114473
##    11  83 0.0205853175  0.0209278870
##    12 829 0.2056051587  0.2090267272
##    13 277 0.0687003968  0.0698436712
##    14 542 0.1344246032  0.1366616238
##    15 208 0.0515873016  0.0524457892
##    16 942 0.2336309524  0.2375189107
##    17 258 0.0639880952  0.0650529501
##    18 351 0.0870535714  0.0885022693
##    19 113 0.0280257937  0.0284921836
##    20 216 0.0535714286  0.0544629349
##    NA  66 0.0163690476            NA

gss2021_ZERODraft$subgroupeducHS <-Recode(gss2021_ZERODraft$educ, recodes="0:11 = 0; 12:12 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupeducHS)

##  subgroupeducHS    n    percent valid_percent
##               0  230 0.05704365      0.217186
##               1  829 0.20560516      0.782814
##            <NA> 2973 0.73735119            NA

subgroupEduHS_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupeducHS==1, "Highschool", "No Highschool"))

tabyl(subgroupEduHS_1)

##  subgroupEduHS_1    n    percent valid_percent
##       Highschool  829 0.20560516      0.782814
##    No Highschool  230 0.05704365      0.217186
##             <NA> 2973 0.73735119            NA

sex

gss2021_ZERODraft %>%
  tabyl(sex)

##  sex    n    percent valid_percent
##    1 1736 0.43055556     0.4406091
##    2 2204 0.54662698     0.5593909
##   NA   92 0.02281746            NA

gss2021_ZERODraft$subgroupsex <-Recode(gss2021_ZERODraft$sex, recodes="1:1 = 0; 2:2 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupsex)

##  subgroupsex    n    percent valid_percent
##            0 1736 0.43055556     0.4406091
##            1 2204 0.54662698     0.5593909
##         <NA>   92 0.02281746            NA

subgroupsex_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupsex==1, "Women", "Men"))

tabyl(subgroupsex_1)

##  subgroupsex_1    n    percent valid_percent
##            Men 1736 0.43055556     0.4406091
##          Women 2204 0.54662698     0.5593909
##           <NA>   92 0.02281746            NA

income of household of respondent when age 16

gss2021_ZERODraft %>%
  tabyl(incom16)

##  incom16    n    percent valid_percent
##        1  421 0.10441468    0.11003659
##        2 1013 0.25124008    0.26476738
##        3 1625 0.40302579    0.42472556
##        4  679 0.16840278    0.17746994
##        5   88 0.02182540    0.02300052
##       NA  206 0.05109127            NA

gss2021_ZERODraft$subgroupincom16 <-Recode(gss2021_ZERODraft$incom16, recodes="1:2 = 0; 3:5 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupincom16)

##  subgroupincom16    n    percent valid_percent
##                0 1434 0.35565476      0.374804
##                1 2392 0.59325397      0.625196
##             <NA>  206 0.05109127            NA

subgroupincom16_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupincom16==1, "secure economic resources at 16", "insecure economic resources"))

tabyl(subgroupincom16_1)

##                subgroupincom16_1    n    percent valid_percent
##      insecure economic resources 1434 0.35565476      0.374804
##  secure economic resources at 16 2392 0.59325397      0.625196
##                             <NA>  206 0.05109127            NA

if respondent was born in the US

gss2021_ZERODraft %>%
  tabyl(born)

##  born    n    percent valid_percent
##     1 3516 0.87202381     0.8878788
##     2  444 0.11011905     0.1121212
##    NA   72 0.01785714            NA

gss2021_ZERODraft$subgroupBorn <-Recode(gss2021_ZERODraft$born, recodes="1:1 = 1; 2:2 = 0; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupBorn)

##  subgroupBorn    n    percent valid_percent
##             0  444 0.11011905     0.1121212
##             1 3516 0.87202381     0.8878788
##          <NA>   72 0.01785714            NA

subgroupborn_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupborn==1, "Born in US", "Not born in US"))

## Warning: Unknown or uninitialised column: `subgroupborn`.

tabyl(subgroupborn_1)

## [1] subgroupborn_1 n              percent       
## <0 rows> (or 0-length row.names)

if respondents ever been disrespected

gss2021_ZERODraft %>%
  tabyl(disrspct)

##  disrspct    n    percent valid_percent
##         1  136 0.03373016    0.05228758
##         2  231 0.05729167    0.08881200
##         3  327 0.08110119    0.12572088
##         4  801 0.19866071    0.30795848
##         5  552 0.13690476    0.21222607
##         6  554 0.13740079    0.21299500
##        NA 1431 0.35491071            NA

gss2021_ZERODraft$subgroupdisrspct <-Recode(gss2021_ZERODraft$disrspct, recodes="1:5 = 1; 6:6 = 0; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupdisrspct)

##  subgroupdisrspct    n   percent valid_percent
##                 0  554 0.1374008      0.212995
##                 1 2047 0.5076885      0.787005
##              <NA> 1431 0.3549107            NA

subgroupdisrspct_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupdisrspct==1, "respondent has been disrespected", "Not being disrespected"))

tabyl(subgroupdisrspct_1)

##                subgroupdisrspct_1    n   percent valid_percent
##            Not being disrespected  554 0.1374008      0.212995
##  respondent has been disrespected 2047 0.5076885      0.787005
##                              <NA> 1431 0.3549107            NA

if respondents ever been called or treated like they were not smart.

gss2021_ZERODraft %>%
  tabyl(notsmart)

##  notsmart    n    percent valid_percent
##         1  101 0.02504960    0.03881630
##         2  129 0.03199405    0.04957725
##         3  202 0.05009921    0.07763259
##         4  684 0.16964286    0.26287471
##         5  619 0.15352183    0.23789393
##         6  867 0.21502976    0.33320523
##        NA 1430 0.35466270            NA

gss2021_ZERODraft$subgroupnotsmart <-Recode(gss2021_ZERODraft$notsmart, recodes="1:5 = 1; 6:6 = 0; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupnotsmart)

##  subgroupnotsmart    n   percent valid_percent
##                 0  867 0.2150298     0.3332052
##                 1 1735 0.4303075     0.6667948
##              <NA> 1430 0.3546627            NA

subgroupnotsmart_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupnotsmart==1, "respondent was told or treated as if they are not smart", "Never experienced that sort of treatment of not being smart"))

tabyl(subgroupnotsmart_1)

##                                           subgroupnotsmart_1    n   percent
##  Never experienced that sort of treatment of not being smart  867 0.2150298
##      respondent was told or treated as if they are not smart 1735 0.4303075
##                                                         <NA> 1430 0.3546627
##  valid_percent
##      0.3332052
##      0.6667948
##             NA

Homework 2 STATS2

Brandon Flores

Febuary 7, 2022

recode hispanic

recode education college

education highschool only

sex

income of household of respondent when age 16

if respondent was born in the US

if respondents ever been disrespected

if respondents ever been called or treated like they were not smart.