Do you believe in the Afterlife? https://nationalpost.com/news/canada/millennials-do-you-believe-in-life-after-life A survey was conducted and a random sample of 1091 questionnaires is given in the form of the following contingency table:
## Believe
## Gender Yes No
## Female 435 375
## Male 147 134
Our task is to check if there is a significant relationship between the belief in the afterlife and gender. We can perform this procedure with the simple chi-square statistics and chosen qualitative correlation coefficient (two-way 2x2 table).
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: dane
## X-squared = 0.11103, df = 1, p-value = 0.739
## Believe
## Gender Yes No
## Female 0.3987168 0.3437214
## Male 0.1347388 0.1228231
As you can see we can calculate our chi-square statistic really quickly for two-way tables or larger. Now we can standardize this contingency measure to see if the relationship is significant.
##
## Attaching package: 'DescTools'
## The following objects are masked from 'package:psych':
##
## AUC, ICC, SD
## [1] 0.01218871
Let’s consider the titanic dataset which contains a complete list of passengers and crew members on the RMS Titanic. It includes a variable indicating whether a person did survive the sinking of the RMS Titanic on April 15, 1912. A data frame contains 2456 observations on 14 variables.
The website http://www.encyclopedia-titanica.org/ offers detailed information about passengers and crew members on the RMS Titanic. According to the website 1317 passengers and 890 crew member were aboard.
8 musicians and 9 employees of the shipyard company are listed as passengers, but travelled with a free ticket, which is why they have NA values in fare. In addition to that, fare is truely missing for a few regular passengers.
In the following chunk, please find few significant correlations between nominal variables, present their distribution on the plot and in the form of a contingency table.
How to visualize cross-tabulations? Please find some hints here and here.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(psych)
attach(titanic)
first_class_survivors <- sum(titanic$Status=="Survivor" & titanic$Class...Department == "1st Class")
second_class_survivors <-sum(titanic$Status=="Survivor" & titanic$Class...Department == "2nd Class")
third_class_survivors <-sum(titanic$Status=="Survivor" & titanic$Class...Department == "3rd Class")
first_class_victims <- sum(titanic$Status=="Victim" & titanic$Class...Department == "1st Class")
second_class_victims <- sum(titanic$Status=="Victim" & titanic$Class...Department == "2nd Class")
third_class_victims <- sum(titanic$Status=="Victim" & titanic$Class...Department == "3rd Class")
x=c(first_class_survivors,second_class_survivors,third_class_survivors,first_class_victims,second_class_victims,third_class_victims)
dim(x)=c(3,2)
dimnames(x)=list(Class=c('1st Class','2nd Class','3rd Class'),Status=c('Survivor','Victim'))
Data<-as.table(x)
Data
## Status
## Class Survivor Victim
## 1st Class 201 123
## 2nd Class 119 166
## 3rd Class 180 528
df <- data.frame(Class=c('1st Class','2nd Class','3rd Class'), Survivors=c(first_class_survivors, second_class_survivors, third_class_survivors))
ggplot(df, aes(x=Class, y=Survivors)) +
geom_bar(stat = "identity")
chisq.test(Data)
##
## Pearson's Chi-squared test
##
## data: Data
## X-squared = 128.74, df = 2, p-value < 2.2e-16
prop.table(Data)
## Status
## Class Survivor Victim
## 1st Class 0.15261959 0.09339408
## 2nd Class 0.09035687 0.12604404
## 3rd Class 0.13667426 0.40091116
Phi(Data)
## [1] 0.3126497
ContCoef(Data)
## [1] 0.2984051
CramerV(Data)
## [1] 0.3126497
TschuprowT(Data)
## [1] 0.262906
mosaicplot(Data)
barplot(Data)
crew_survivors <- sum(titanic$Status=="Survivor" & titanic$Crew.or.Passenger. == "Crew")
passenger_survivors <-sum(titanic$Status=="Survivor" & titanic$Crew.or.Passenger. == "Passenger")
crew_victims <- sum(titanic$Status=="Victim" & titanic$Crew.or.Passenger. == "Crew")
passenger_victims <- sum(titanic$Status=="Victim" & titanic$Crew.or.Passenger. == "Passenger")
x=c(crew_survivors,passenger_survivors,crew_victims,passenger_victims)
dim(x)=c(2,2)
dimnames(x)=list(Status=c('Survivor','Victim'), Crew.or.Passenger.=c('Crew','Passenger'))
Data<-as.table(x)
chisq.test(Data)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: Data
## X-squared = 48.786, df = 1, p-value = 2.855e-12
prop.table(Data)
## Crew.or.Passenger.
## Status Crew Passenger
## Survivor 0.09560489 0.30765745
## Victim 0.22655188 0.37018577
Phi(Data)
## [1] 0.1496655
ContCoef(Data)
## [1] 0.1480169
CramerV(Data)
## [1] 0.1496655
TschuprowT(Data)
## [1] 0.1496655
mosaicplot(Data)
barplot(Data)
Here, please interpret your findings.
The number of victims increases with the number of class.
The correlation between being part of a crew and surviving the titanic is negligible.