Title: Homework 9
Author: Brandon Flores
Date: Nov. 10th, 2021
When observing the two variables of sex at birth and sexual orientation within the crosstab you can see that there is a difference between those who are men or women at birth and thier sexual orientation later in life. This effect can be seen more so when those missing values are included into the data rather than the table that does not include the missing data. When missing data is accounted for the difference is slight for within each variable differing around .2% to .3%. Because of the notable differences seen within the crosstabs, it gives reason to run a Chi-Square test of statistical significance. 

When conducting the Chi-Square test between the two varibales there was a  statisitcally significance relatinship found at the p < .001 level. When observing the Cramer's V; it observed a relationship strength of 0.096. This suggests that although a statisically significant relationship is found; it is a very weak relationship. 

The figure provided of the Chi-Square test shows exactly where between the two variables a statstical signficance was found. For those with a sex of male at birth there was a larger than expected number of those who identified as gay and a smaller than expected number of those who identified as bisexual within this group. For those who had a female sex at birth found a larger than expected number of those who identified as bisexual and a smaller than expected observation of those who identified as lesbian. Because of these findings between the variables gives reason for statsitical significance from the Chi-Square test. This being true, when observing the Cramer's V this statistical relationship between sex at birth and sexual orientation is extreamly weak. 
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)
library(rvest)
## 
## Attaching package: 'rvest'
## The following object is masked from 'package:readr':
## 
##     guess_encoding
library(httr)
library(purrr)
library(stringr)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
library(ggplot2)
library(Rmisc)
## Loading required package: lattice
library(dat)
## To use dplyr as backend set 'options(dat.use.dplyr = TRUE)'.
## 
## Attaching package: 'dat'
## The following object is masked from 'package:purrr':
## 
##     map
## The following object is masked from 'package:tidyr':
## 
##     extract
## The following object is masked from 'package:base':
## 
##     replace
pulse39 <-read.csv("C:\\Users\\BTP\\Downloads\\pulse2021_puf_39.csv")
pulse39$subgroup <-paste(pulse39 $EGENID_BIRTH,
pulse39$SEXUAL_ORIENTATION, sep = "")
pulse39 %>%
tabyl(subgroup)
##  subgroup     n     percent
##      1-99   520 0.009112575
##        11  1121 0.019644610
##        12 20711 0.362943362
##        13   451 0.007903407
##        14   297 0.005204682
##        15   358 0.006273658
##      2-99   744 0.013037992
##        21   722 0.012652460
##        22 29768 0.521659891
##        23  1409 0.024691574
##        24   508 0.008902285
##        25   455 0.007973503
pulse39$subgroupcat <-car::Recode(pulse39$ subgroup,
recodes=" '11' = 'Born male and is gay'; '12' = 'Born male and is straight'; '21' = 'Born female and lesbian'; '22' = 'Born female and straight'; '23' = 'Born female and identifies as bisexual'; '13' = 'Born male and is bisexual'; else=NA",
as.factor=T)
pulse39 %>%
tabyl(subgroupcat)
##                             subgroupcat     n     percent valid_percent
##  Born female and identifies as bisexual  1409 0.024691574   0.026004946
##                 Born female and lesbian   722 0.012652460   0.013325459
##                Born female and straight 29768 0.521659891   0.549407552
##               Born male and is bisexual   451 0.007903407   0.008323798
##                    Born male and is gay  1121 0.019644610   0.020689528
##               Born male and is straight 20711 0.362943362   0.382248717
##                                    <NA>  2882 0.050504696            NA
pulse39 %>%
tabyl(EGENID_BIRTH,subgroupcat,show_missing_levels=F,show_na = FALSE) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits=2) %>%
adorn_ns() %>%
knitr::kable()
EGENID_BIRTH Born female and identifies as bisexual Born female and lesbian Born female and straight Born male and is bisexual Born male and is gay Born male and is straight
1 0.00% (0) 0.00% (0) 0.00% (0) 2.02% (451) 5.03% (1121) 92.95% (20711)
2 4.42% (1409) 2.26% (722) 93.32% (29768) 0.00% (0) 0.00% (0) 0.00% (0)
pulse39 %>%
tabyl(EGENID_BIRTH,subgroupcat) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits=2) %>%
adorn_ns() %>%
knitr::kable()
EGENID_BIRTH Born female and identifies as bisexual Born female and lesbian Born female and straight Born male and is bisexual Born male and is gay Born male and is straight NA_
1 0.00% (0) 0.00% (0) 0.00% (0) 1.92% (451) 4.78% (1121) 88.29% (20711) 5.01% (1175)
2 4.19% (1409) 2.15% (722) 88.58% (29768) 0.00% (0) 0.00% (0) 0.00% (0) 5.08% (1707)
library(vcd)
## Loading required package: grid
assocstats(table(pulse39$EGENID_BIRTH, pulse39$SEXUAL_ORIENTATION))
##                     X^2 df P(> X^2)
## Likelihood Ratio 532.02  5        0
## Pearson          523.27  5        0
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.095 
## Cramer's V        : 0.096
library(vcd)

mosaic(~ EGENID_BIRTH + SEXUAL_ORIENTATION,
  direction = c("v", "h"),
  data = pulse39,
  shade = TRUE
)