Setup

(Data link)[https://d3c33hcgiwev3.cloudfront.net/_5db435f06000e694f6050a2d43fc7be3_gss.Rdata?Expires=1531612800&Signature=B5YCW4bDA1pQanV-ZrN5UgBZ1GKoU7OfS-e4CDerratjY~DLXRBy2j2nZ~nNE-p6EJcof5qVk95Xw8g4hgbPqRSr6UU~vNh-IvAvulM5cuMJ75i0wfwqpLY8Al-coHZXlJUwCGeQ162XcrZT4l7s3gcs3hFnEXT5urpyWeqEzoI_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A]

Load packages

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
library(statsr)
library(data.table)
library(janitor)
## Warning: package 'janitor' was built under R version 3.4.4

Load data

load("gss.Rdata")

Part 1: Data

I think the Data is randomly selected as according to the Wikipedia “The target population of the GSS is adults(18+) living in households in the united states. The GSS sample is drawn using an area probability design that randomly selects respondents in households across the United States to take part in the survey”.But it also says that Respondents that become part of the GSS sample are from a mix of urban, suburban, and rural geographic areas. Participation in the study is strictly voluntary. Which means we are not so confident about the “Generalizability” as It is possible that voluntary can miss the information about some part of society for example If a person is introvert there is a good chance that he will not want to participate and don’t want to share his personal details.

Even though the data is collected randomly but causality is not confirmed but may be co-related as we know “co-relation don’t lead to causation”

Reference: Wikipedia


Part 2: Research question

How is Sex-related with the thought of pre-marital sex? Is there convincing evidence that views on “sex before marriage is changed with respect to sex?


Part 3: Exploratory data analysis

#Selecting required coulums from the gss data
data1<-gss%>%select(sex,premarsx)
#removing Nas
data1<-data1[!is.na(data1$premarsx),]
#plotting data
ggplot(data1,aes(x=premarsx))+geom_bar(aes(fill=sex))+labs(x="Thoughts on sex before marriege",y="Number")

table(data1)
##         premarsx
## sex      Always Wrong Almst Always Wrg Sometimes Wrong Not Wrong At All
##   Male           3349             1224            3105             7122
##   Female         5895             1976            3939             6938
##         premarsx
## sex      Other
##   Male       0
##   Female     0

As we can see from the graph that there are some variations in the count of views with respect to sex. Now we will try to see from the inference that is there convincing evidence about this,

Summary statistics

summary(data1)
##      sex                    premarsx    
##  Male  :14800   Always Wrong    : 9244  
##  Female:18748   Almst Always Wrg: 3200  
##                 Sometimes Wrong : 7044  
##                 Not Wrong At All:14060  
##                 Other           :    0

In summary statistics, we can see that people have 5 opinions and the maximum number of people choose “Not wrong at all”.


Part 4: Inference

We want to know here is there a relationship between sex and their views on marriage before sex

conditions:

Independence: since the sample is randomly selected it is independent also n < 10% of the population and we can also see from the table that each case only contributes to one cell in the table.

Sample size: Each particular scenario (i.e. cell) must have at least 5 expected cases. It is clear from the table all cell has more than 5 cases.

Here we have two hypothesis: H0(Null)=There is no relation between sex and views on marriage before sex Ha(Alternative)=There is a relation between sex and their views on marriage before sex.

Here we will use the Chi-Square method as we have more than two variables

First, we make a table which contains the observed values

here we are making data frame from the table which will show totals of all row and coulmn

data4<-data.frame("sex"=c("Male","Female"),"Always Wrong"=c(3349,5895),"Almst Always Wrg"=c(1224,1976 ),"Sometimes Wrong "=c(3105,3939),"Not Wrong At All"=c(7122, 6938))
data4<-data4%>%mutate("Total"=rowSums(data4[,2:5]))
## Warning: package 'bindrcpp' was built under R version 3.4.4
data4<-data4%>%adorn_totals("row")
data4
##     sex Always.Wrong Almst.Always.Wrg Sometimes.Wrong. Not.Wrong.At.All
##    Male         3349             1224             3105             7122
##  Female         5895             1976             3939             6938
##   Total         9244             3200             7044            14060
##  Total
##  14800
##  18748
##  33548

To calculate Expected value we need to find the ratio of row total by table total and multiply it to the column total Row1(Male)=14800/33548 and Row2=18748/33548

Row1=14800/33548
Row2=18748/33548
data5<-data4
data5<-data5[1,2:5]*Row1
data6<-data4[2,2:5]*Row2
data5<-rbind(data5,data6)
data5
##   Always.Wrong Almst.Always.Wrg Sometimes.Wrong. Not.Wrong.At.All
## 1     1477.441         539.9785         1369.798         3141.934
## 2     3294.368        1104.2699         2201.275         3877.239

by calculating chi-square and degree of freedom we get p-value equal to:
we got the chi_square value =103751105.7(hand calculation) also a degree of freedom=3(hand calculation)

Reference:chi-square

pchisq(103751105.7,3,lower.tail = F)
## [1] 0

Since the p-value is 0 we reject the null hypothesis so this data provide convincing evidence that there is a relation between “sex” and their views on marriage before sex.