Blog Post Descriptive Stats

For the desciptive statisitcs of this study each variable of the model will be observed. For all of the variables descriptive tables, box plots, and density plots were created to observe the data distrobution. The dependent variable measured for the box plots will be education.

For Hispanics, looking at the descriptive table it is shown that the data is majority non-hispanic white. For the box plot it shows that most Hispanics are just about on par with Non-Hispanics except for the difference in the median of the data with Hispanics being much lower. The density plot further reflects the descriptive statistics 

For sex, the majority of the data are woman with men being slightly more educated than women. 

For income of household at 16, the majority of the data said they had secure economic resources at 16 with those who said they had secure resources having much higher education than those who said otherwise. 

For those who are foreign born, the majority have been born in the US with those not born in the US having higher education than those born in the US. The density plot truly shows this huge difference with the majority of the proportion of the data being from those not born in the US. 

For those who have ever been percieved to have been disrespect, the majority of the data said they have not experienced being disrespected. This variable has a large missing percentage at 35%. The box plots shows that those who have been disrespected actually have higher education than those who have not been disrespected.

For those who have ever been percieved to not be smart, the majority of the data said they have not experienced this but also has about 35% of the data missing. This is the same as the last variable. Those who never expereinced this had just slightly higher educational attainment but nothing truly notable. 

For the key control variable age from the density plot shows that most of the respondents were either around 60 or about 35 years old. When the variable was changed to a interval output in the box plots it can be shown that those who were between 24-39 had higher educational attainment than all other groups. Group 0-24 being the lowest and groups 39 - 99 all had about the same educational attainment. 

When observing the variables overall the main concern would be the percieved discrimination type variables with the amount of missing cases being both about 35%, respectivly. No other variables that will be used do not present such a problem to this degree. 

Although true the statistical regression techniques that will be used, the added survey weights to adjust for survey issues, and the nesting of the models that is all theoretically driven thats conceptually based. With these statements being true this will help for any potential error that will occur in the analysis.

library(haven)
library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(scales)
library(sur)
library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

library(summarytools)
library(Rmisc)

## Loading required package: lattice

library(car)

## Loading required package: carData

## 
## Attaching package: 'carData'

## The following objects are masked from 'package:sur':
## 
##     Anscombe, States

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

library(forcats)
library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v tibble  3.1.6     v purrr   0.3.4
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x plyr::arrange()     masks dplyr::arrange()
## x readr::col_factor() masks scales::col_factor()
## x purrr::compact()    masks plyr::compact()
## x plyr::count()       masks dplyr::count()
## x purrr::discard()    masks scales::discard()
## x plyr::failwith()    masks dplyr::failwith()
## x dplyr::filter()     masks stats::filter()
## x plyr::id()          masks dplyr::id()
## x dplyr::lag()        masks stats::lag()
## x plyr::mutate()      masks dplyr::mutate()
## x car::recode()       masks dplyr::recode()
## x plyr::rename()      masks dplyr::rename()
## x purrr::some()       masks car::some()
## x plyr::summarise()   masks dplyr::summarise()
## x plyr::summarize()   masks dplyr::summarize()
## x tibble::view()      masks summarytools::view()

library(survey)

## Loading required package: grid

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## Loading required package: survival

## 
## Attaching package: 'survey'

## The following object is masked from 'package:graphics':
## 
##     dotchart

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

library(grid)
library(Matrix)
library(caret)

## 
## Attaching package: 'caret'

## The following object is masked from 'package:survival':
## 
##     cluster

## The following object is masked from 'package:purrr':
## 
##     lift

gss2021_ZERODraft<-read_dta("C:\\Users\\BTP\\Desktop\\STATS 2 FOLDER\\2021_sas\\gss2021.dta")

recode hispanic

gss2021_ZERODraft$subgrouphis <-Recode(gss2021_ZERODraft$hispanic, recodes="1 = 0; 2:50 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgrouphis)

##  subgrouphis    n    percent valid_percent
##            0 3544 0.87896825     0.8864432
##            1  454 0.11259921     0.1135568
##         <NA>   34 0.00843254            NA

subgrouphis_1<-as.factor(ifelse(gss2021_ZERODraft$subgrouphis==1, "Hispanic", "Non Hispanic"))

tabyl(subgrouphis_1)

##  subgrouphis_1    n    percent valid_percent
##       Hispanic  454 0.11259921     0.1135568
##   Non Hispanic 3544 0.87896825     0.8864432
##           <NA>   34 0.00843254            NA

recode education outcome variable for Multinomial model for those with less than high school, high school, and college

gss2021_ZERODraft$AllEducLevels <-Recode(gss2021_ZERODraft$educ, recodes="0:11 = 1; 12 = 2; 13:16 = 3; 17:20 = 4; else=NA", as.factor=T)

gss2021_ZERODraft$AllEducLevels<-relevel(gss2021_ZERODraft$AllEducLevels, ref = "1")

gss2021_ZERODraft %>%
  tabyl(AllEducLevels)

##  AllEducLevels    n    percent valid_percent
##              1  230 0.05704365    0.05799294
##              2  829 0.20560516    0.20902673
##              3 1969 0.48834325    0.49646999
##              4  938 0.23263889    0.23651034
##           <NA>   66 0.01636905            NA

sex

gss2021_ZERODraft$subgroupsex <-Recode(gss2021_ZERODraft$sex, recodes="1:1 = 0; 2:2 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupsex)

##  subgroupsex    n    percent valid_percent
##            0 1736 0.43055556     0.4406091
##            1 2204 0.54662698     0.5593909
##         <NA>   92 0.02281746            NA

subgroupsex_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupsex==1, "Women", "Men"))

tabyl(subgroupsex_1)

##  subgroupsex_1    n    percent valid_percent
##            Men 1736 0.43055556     0.4406091
##          Women 2204 0.54662698     0.5593909
##           <NA>   92 0.02281746            NA

income of household of respondent when age 16

gss2021_ZERODraft$subgroupincom16 <-Recode(gss2021_ZERODraft$incom16, recodes="1:2 = 0; 3:5 = 1; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupincom16)

##  subgroupincom16    n    percent valid_percent
##                0 1434 0.35565476      0.374804
##                1 2392 0.59325397      0.625196
##             <NA>  206 0.05109127            NA

subgroupincom16_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupincom16==1, "secure economic resources at 16", "insecure economic resources"))

tabyl(subgroupincom16_1)

##                subgroupincom16_1    n    percent valid_percent
##      insecure economic resources 1434 0.35565476      0.374804
##  secure economic resources at 16 2392 0.59325397      0.625196
##                             <NA>  206 0.05109127            NA

if respondent was born in the US

gss2021_ZERODraft$subgroupBorn <-Recode(gss2021_ZERODraft$born, recodes="1:1 = 1; 2:2 = 0; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupBorn)

##  subgroupBorn    n    percent valid_percent
##             0  444 0.11011905     0.1121212
##             1 3516 0.87202381     0.8878788
##          <NA>   72 0.01785714            NA

subgroupborn_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupBorn==1, "Born in US", "Not born in US"))

tabyl(subgroupborn_1)

##  subgroupborn_1    n    percent valid_percent
##      Born in US 3516 0.87202381     0.8878788
##  Not born in US  444 0.11011905     0.1121212
##            <NA>   72 0.01785714            NA

if respondents ever been disrespected

gss2021_ZERODraft$subgroupdisrspct <-Recode(gss2021_ZERODraft$disrspct, recodes="1:5 = 1; 6:6 = 0; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupdisrspct)

##  subgroupdisrspct    n   percent valid_percent
##                 0  554 0.1374008      0.212995
##                 1 2047 0.5076885      0.787005
##              <NA> 1431 0.3549107            NA

subgroupdisrspct_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupdisrspct==1, "respondent has been disrespected", "Not being disrespected"))

tabyl(subgroupdisrspct_1)

##                subgroupdisrspct_1    n   percent valid_percent
##            Not being disrespected  554 0.1374008      0.212995
##  respondent has been disrespected 2047 0.5076885      0.787005
##                              <NA> 1431 0.3549107            NA

if respondents ever been called or treated like they were not smart.

gss2021_ZERODraft$subgroupnotsmart <-Recode(gss2021_ZERODraft$notsmart, recodes="1:5 = 1; 6:6 = 0; else=NA", as.factor=T)

gss2021_ZERODraft %>%
  
tabyl(subgroupnotsmart)

##  subgroupnotsmart    n   percent valid_percent
##                 0  867 0.2150298     0.3332052
##                 1 1735 0.4303075     0.6667948
##              <NA> 1430 0.3546627            NA

subgroupnotsmart_1<-as.factor(ifelse(gss2021_ZERODraft$subgroupnotsmart==1, "respondent was told or treated as if they are not smart", "Never experienced that sort of treatment of not being smart"))

tabyl(subgroupnotsmart_1)

##                                           subgroupnotsmart_1    n   percent
##  Never experienced that sort of treatment of not being smart  867 0.2150298
##      respondent was told or treated as if they are not smart 1735 0.4303075
##                                                         <NA> 1430 0.3546627
##  valid_percent
##      0.3332052
##      0.6667948
##             NA

age cut into intervals

age1<-cut(gss2021_ZERODraft$age,
          breaks = c(0,24,39,59,79,99))

#boxplots

gss2021_ZERODraft %>% 
  ggplot(mapping=aes(y=educ, x=factor(subgrouphis_1)))+
  geom_boxplot()+ 
  ggtitle(label="Distribution of education by Hispanics") +
  xlab(label="Hispanics")

## Warning: Removed 66 rows containing non-finite values (stat_boxplot).

gss2021_ZERODraft %>% 
  ggplot(mapping=aes(y=educ, x=factor(subgroupsex_1)))+
  geom_boxplot()+ 
  ggtitle(label="Distribution of educaiton by sex") +
  xlab(label="sex")

## Warning: Removed 66 rows containing non-finite values (stat_boxplot).

gss2021_ZERODraft %>% 
  ggplot(mapping=aes(y=educ, x=factor(subgroupincom16_1)))+
  geom_boxplot()+ 
  ggtitle(label="Distribution of education by percieved income at 16") +
  xlab(label="Percieved income at 16")

## Warning: Removed 66 rows containing non-finite values (stat_boxplot).

gss2021_ZERODraft %>% 
  ggplot(mapping=aes(y=educ, x=factor(subgroupborn_1)))+
  geom_boxplot()+ 
  ggtitle(label="Distribution of education by those foriegn born") +
  xlab(label="foriegn born")

## Warning: Removed 66 rows containing non-finite values (stat_boxplot).

gss2021_ZERODraft %>% 
  ggplot(mapping=aes(y=educ, x=factor(subgroupdisrspct_1)))+
  geom_boxplot()+ 
  ggtitle(label="Distribution of education by those who percieved to be disrispected") +
  xlab(label="Percieved Disrespect")

## Warning: Removed 66 rows containing non-finite values (stat_boxplot).

gss2021_ZERODraft %>% 
  ggplot(mapping=aes(y=educ, x=factor(subgroupnotsmart_1)))+
  geom_boxplot()+ 
  ggtitle(label="Distribution of education by those who percieved to not be smart") +
  xlab(label="Percieved not being smart")

## Warning: Removed 66 rows containing non-finite values (stat_boxplot).

gss2021_ZERODraft %>% 
  ggplot(mapping=aes(y=educ, x=factor(age1)))+
  geom_boxplot()+ 
  ggtitle(label="Distribution of education by age intervals") +
  xlab(label="age intervals")

## Warning: Removed 66 rows containing non-finite values (stat_boxplot).

#Density Plots

gss2021_ZERODraft %>%
ggplot(data = gss2021_ZERODraft,mapping = aes(educ, stat=..density..))+geom_density()+ggtitle(label="Distribution of Education")+xlab(label="Educational Attainment")

## Warning: Removed 66 rows containing non-finite values (stat_density).

gss2021_ZERODraft %>%
ggplot(data = gss2021_ZERODraft,mapping = aes(hispanic, stat=..density..))+geom_density()+ggtitle(label="Distribution of Hispanics")+xlab(label="Hispanic ethinicity")

## Warning: Removed 34 rows containing non-finite values (stat_density).

gss2021_ZERODraft %>%
ggplot(data = gss2021_ZERODraft,mapping = aes(sex, stat=..density..))+geom_density()+ggtitle(label="Distribution of Sex")+xlab(label="Sex")

## Warning: Removed 92 rows containing non-finite values (stat_density).

gss2021_ZERODraft %>%
ggplot(data = gss2021_ZERODraft,mapping = aes(notsmart, stat=..density..))+geom_density()+ggtitle(label="Distribution of Percieved being Not Smart")+xlab(label="Not Smart")

## Warning: Removed 1430 rows containing non-finite values (stat_density).

gss2021_ZERODraft %>%
ggplot(data = gss2021_ZERODraft,mapping = aes(disrspct, stat=..density..))+geom_density()+ggtitle(label="Distribution of Percieved Discrimination")+xlab(label="Percieved Discrimination")

## Warning: Removed 1431 rows containing non-finite values (stat_density).

gss2021_ZERODraft %>%
ggplot(data = gss2021_ZERODraft,mapping = aes(born, stat=..density..))+geom_density()+ggtitle(label="Distribution of Foriegn Born")+xlab(label="Foriegn Born")

## Warning: Removed 72 rows containing non-finite values (stat_density).

gss2021_ZERODraft %>%
ggplot(data = gss2021_ZERODraft,mapping = aes(incom16, stat=..density..))+geom_density()+ggtitle(label="Distribution of Percieved Wealth at 16")+xlab(label="Percieved Wealth at 16")

## Warning: Removed 206 rows containing non-finite values (stat_density).

gss2021_ZERODraft %>%
ggplot(data = gss2021_ZERODraft,mapping = aes(age, stat=..density..))+geom_density()+ggtitle(label="Distribution of Age")+xlab(label="Age")

## Warning: Removed 333 rows containing non-finite values (stat_density).

Blog Post Descriptive Stats

Brandon Flores

March 21, 2022

recode hispanic

recode education outcome variable for Multinomial model for those with less than high school, high school, and college

sex

income of household of respondent when age 16

if respondent was born in the US

if respondents ever been disrespected

if respondents ever been called or treated like they were not smart.

age cut into intervals