STATS-HW-9.knit

Title: Homework 9

Author: Brandon Flores

Date: Nov. 10th, 2021

When observing the two variables of sex at birth and sexual orientation within the crosstab you can see that there is a difference between those who are men or women at birth and thier sexual orientation later in life. This effect can be seen more so when those missing values are included into the data rather than the table that does not include the missing data. When missing data is accounted for the difference is slight for within each variable differing around .2% to .3%. Because of the notable differences seen within the crosstabs, it gives reason to run a Chi-Square test of statistical significance. 

When conducting the Chi-Square test between the two varibales there was a  statisitcally significance relatinship found at the p < .001 level. When observing the Cramer's V; it observed a relationship strength of 0.096. This suggests that although a statisically significant relationship is found; it is a very weak relationship. 

The figure provided of the Chi-Square test shows exactly where between the two variables a statstical signficance was found. For those with a sex of male at birth there was a larger than expected number of those who identified as gay and a smaller than expected number of those who identified as bisexual within this group. For those who had a female sex at birth found a larger than expected number of those who identified as bisexual and a smaller than expected observation of those who identified as lesbian. Because of these findings between the variables gives reason for statsitical significance from the Chi-Square test. This being true, when observing the Cramer's V this statistical relationship between sex at birth and sexual orientation is extreamly weak.

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)
library(rvest)

## 
## Attaching package: 'rvest'

## The following object is masked from 'package:readr':
## 
##     guess_encoding

library(httr)
library(purrr)
library(stringr)
library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following object is masked from 'package:purrr':
## 
##     compact

library(ggplot2)
library(Rmisc)

## Loading required package: lattice

library(dat)

## To use dplyr as backend set 'options(dat.use.dplyr = TRUE)'.

## 
## Attaching package: 'dat'

## The following object is masked from 'package:purrr':
## 
##     map

## The following object is masked from 'package:tidyr':
## 
##     extract

## The following object is masked from 'package:base':
## 
##     replace

pulse39 <-read.csv("C:\\Users\\BTP\\Downloads\\pulse2021_puf_39.csv")

pulse39$subgroup <-paste(pulse39 $EGENID_BIRTH,
pulse39$SEXUAL_ORIENTATION, sep = "")
pulse39 %>%
tabyl(subgroup)

##  subgroup     n     percent
##      1-99   520 0.009112575
##        11  1121 0.019644610
##        12 20711 0.362943362
##        13   451 0.007903407
##        14   297 0.005204682
##        15   358 0.006273658
##      2-99   744 0.013037992
##        21   722 0.012652460
##        22 29768 0.521659891
##        23  1409 0.024691574
##        24   508 0.008902285
##        25   455 0.007973503

pulse39$subgroupcat <-car::Recode(pulse39$ subgroup,
recodes=" '11' = 'Born male and is gay'; '12' = 'Born male and is straight'; '21' = 'Born female and lesbian'; '22' = 'Born female and straight'; '23' = 'Born female and identifies as bisexual'; '13' = 'Born male and is bisexual'; else=NA",
as.factor=T)
pulse39 %>%
tabyl(subgroupcat)

##                             subgroupcat     n     percent valid_percent
##  Born female and identifies as bisexual  1409 0.024691574   0.026004946
##                 Born female and lesbian   722 0.012652460   0.013325459
##                Born female and straight 29768 0.521659891   0.549407552
##               Born male and is bisexual   451 0.007903407   0.008323798
##                    Born male and is gay  1121 0.019644610   0.020689528
##               Born male and is straight 20711 0.362943362   0.382248717
##                                    <NA>  2882 0.050504696            NA

pulse39 %>%
tabyl(EGENID_BIRTH,subgroupcat,show_missing_levels=F,show_na = FALSE) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits=2) %>%
adorn_ns() %>%
knitr::kable()

EGENID_BIRTH	Born female and identifies as bisexual	Born female and lesbian	Born female and straight	Born male and is bisexual	Born male and is gay	Born male and is straight
1	0.00% (0)	0.00% (0)	0.00% (0)	2.02% (451)	5.03% (1121)	92.95% (20711)
2	4.42% (1409)	2.26% (722)	93.32% (29768)	0.00% (0)	0.00% (0)	0.00% (0)

pulse39 %>%
tabyl(EGENID_BIRTH,subgroupcat) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits=2) %>%
adorn_ns() %>%
knitr::kable()

EGENID_BIRTH	Born female and identifies as bisexual	Born female and lesbian	Born female and straight	Born male and is bisexual	Born male and is gay	Born male and is straight	NA_
1	0.00% (0)	0.00% (0)	0.00% (0)	1.92% (451)	4.78% (1121)	88.29% (20711)	5.01% (1175)
2	4.19% (1409)	2.15% (722)	88.58% (29768)	0.00% (0)	0.00% (0)	0.00% (0)	5.08% (1707)

library(vcd)

## Loading required package: grid

assocstats(table(pulse39$EGENID_BIRTH, pulse39$SEXUAL_ORIENTATION))

##                     X^2 df P(> X^2)
## Likelihood Ratio 532.02  5        0
## Pearson          523.27  5        0
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.095 
## Cramer's V        : 0.096

library(vcd)

mosaic(~ EGENID_BIRTH + SEXUAL_ORIENTATION,
  direction = c("v", "h"),
  data = pulse39,
  shade = TRUE
)