library(ggplot2)
library(dplyr)
library(statsr)
library(tidyr)
library(readr)
library(vcd)
library(knitr)
library(rmarkdown)
load("gss.Rdata")
dim(gss)
## [1] 57061 114
The data for this report comes from the General Social Survey, which has since 1972 been producing quality reports on social, econonomic, attitudinal and religious matters on the United states and other countries. The information is gathered by questionnaire so the gathering mathod is observational. Because of their methods we can assume random sampling and therfore we can use this report to make generalizable but not causal conclusions. As indicated above the set is 57,061 X 114.
My research question deals with a very old topic. Premarital sex. Premarital sex is one of many forms of commodification of the female body. Along with contraception and abortion they form a hard triad around womens right to be sexually autonomous beings. This is couples with a variety of social controls suxh as shunning, shaming, stoning, expulusion and death in some countries. Premairtal sex is the leverage point to break many of these restrictions. Once attitudes about this change and become normalized the other principles can fall into place or by the wayside accordingly. Since we are in 21 century America we can examine if there has indeed been any significant progress.So the question is Are attitudes about premarital sex independent of gender?
It would be a good idea to form a few visual tools to assist in seeing the relationships in question. First the GSS data set must be subsetted to the desired variables.
psx<-gss%>%select(sex,premarsx)
And then any NA values removed
psx <- na.omit(psx)
any(is.na(psx))
## [1] FALSE
With the data set formed and cleaned we can take a look at its size.
head(psx)
## sex premarsx
## 1 Female Not Wrong At All
## 2 Male Always Wrong
## 3 Female Always Wrong
## 4 Female Always Wrong
## 5 Female Sometimes Wrong
## 6 Male Sometimes Wrong
dim(psx)
## [1] 33548 2
This set is only 1%of the original and consists of the two genders and four categories reflecting attitudes about premarital sex. First lets examine gender as a factor of attitude.
ggplot(data = psx, aes(x = premarsx, fill=sex)) +
geom_bar()+ coord_flip()
What is interesting is the “not wrong”" category seems to be an almost even split and the “always wrong” category seems to have more womwn than men. Lets examine the data this time as attitude as a part of gender
ggplot(data = psx, aes(x = sex, fill= premarsx)) +
geom_bar()+
coord_flip()
Here we can see that there are approximately 7500 more women than men and this could be skewing the amounts in certain categories. It is clearer as a result the size difference in the “always wrong” is substantial. Is this because of numbers or proportion? Lets look at absolute counts first:
psx%>%
group_by(sex, premarsx)%>%
summarise(count = n())
## # A tibble: 8 x 3
## # Groups: sex [?]
## sex premarsx count
## <fctr> <fctr> <int>
## 1 Male Always Wrong 3349
## 2 Male Almst Always Wrg 1224
## 3 Male Sometimes Wrong 3105
## 4 Male Not Wrong At All 7122
## 5 Female Always Wrong 5895
## 6 Female Almst Always Wrg 1976
## 7 Female Sometimes Wrong 3939
## 8 Female Not Wrong At All 6938
The difference in categories between males and females are close to each other except for “always wrong” with a 2800 difference.
Another way to get a handle on this is with a mosaic plot which will show relative proportional sizes of categories.
attach(psx)
mosaicplot(data = psx, ~premarsx + sex, col = c("gold", "blue"))
We can see from the plot above that the “not wrong at all” category is larger than either group and two out of the three ways to combine them. This would imply some degree of parity. This leads us to our need for a hypothesis test.
The hypothesis in question deals with the attitude association with repect to gender. Our null and alternative hypotheses are as follows:
HO: Gender and attitude about premarital sex are independent HA: Gender and attitude about premarital sex are dependent
To answer this we will need a contingency table and perform a chi-square independence test
obs_tbl<-table(psx$sex, psx$premarsx)
obs_tbl
##
## Always Wrong Almst Always Wrg Sometimes Wrong Not Wrong At All
## Male 3349 1224 3105 7122
## Female 5895 1976 3939 6938
##
## Other
## Male 0
## Female 0
x <- matrix(c(3349,5895,1224,1976,3105,3939,7122,6938), nrow = 2)
View(x)
x
## [,1] [,2] [,3] [,4]
## [1,] 3349 1224 3105 7122
## [2,] 5895 1976 3939 6938
Now that we have a contincecy table we can run the chi test.
chisq.test(x)
##
## Pearson's Chi-squared test
##
## data: x
## X-squared = 521.71, df = 3, p-value < 2.2e-16
Our p-value is very small so we can reject the null hypothesis and conclude that the data provides convincing evidence at the 5% significance level that gender and premarital sexual attitude are associated.
An outstanding question is what are expected values would look like. The table below shows this, As we can see the expected amount for females are all higer except for “sometimes wrong” which were very close in both the observed and expected tables.
chisq.test(x)$expected
## [,1] [,2] [,3] [,4]
## [1,] 4078.073 1411.709 3107.524 6202.695
## [2,] 5165.927 1788.291 3936.476 7857.305
This begs the question as to the nature of the dependent association and what may be the cause. Such an examination requires an experiment to establish causlaity.