Do you believe in the Afterlife? https://nationalpost.com/news/canada/millennials-do-you-believe-in-life-after-life A survey was conducted and a random sample of 1091 questionnaires is given in the form of the following contingency table:
## Believe
## Gender Yes No
## Female 435 375
## Male 147 134
Our task is to check if there is a significant relationship between the belief in the afterlife and gender. We can perform this procedure with the simple chi-square statistics and chosen qualitative correlation coefficient (two-way 2x2 table).
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: dane
## X-squared = 0.11103, df = 1, p-value = 0.739
## Believe
## Gender Yes No
## Female 0.3987168 0.3437214
## Male 0.1347388 0.1228231
As you can see we can calculate our chi-square statistic really quickly for two-way tables or larger. Now we can standardize this contingency measure to see if the relationship is significant.
##
## Attaching package: 'DescTools'
## The following objects are masked from 'package:psych':
##
## AUC, ICC, SD
## [1] 0.01218871
Let’s consider the titanic dataset which contains a complete list of passengers and crew members on the RMS Titanic. It includes a variable indicating whether a person did survive the sinking of the RMS Titanic on April 15, 1912. A data frame contains 2456 observations on 14 variables.
The website http://www.encyclopedia-titanica.org/ offers detailed information about passengers and crew members on the RMS Titanic. According to the website 1317 passengers and 890 crew member were aboard.
8 musicians and 9 employees of the shipyard company are listed as passengers, but travelled with a free ticket, which is why they have NA values in fare. In addition to that, fare is truely missing for a few regular passengers.
In the following chunk, please find few significant correlations between nominal variables, present their distribution on the plot and in the form of a contingency table.
How to visualize cross-tabulations? Please find some hints here and here.
This is data for correlation between gender and surviving the titanic crash:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(psych)
attach(titanic)
male_survivors <- sum(titanic$Status=="Survivor" & titanic$Gender == "Male")
female_survivors <-sum(titanic$Status=="Survivor" & titanic$Gender == "Female")
male_victims <- sum(titanic$Status=="Victim" & titanic$Gender == "Male")
female_victims <- sum(titanic$Status=="Victim" & titanic$Gender == "Female")
x=c(female_survivors,male_survivors,female_victims,male_victims)
dim(x)=c(2,2)
dane<-as.table(x)
dimnames(dane)=list(Gender=c('Female','Male'),Status=c('Survivor','Victim'))
dane
## Status
## Gender Survivor Victim
## Female 359 130
## Male 352 1366
chisq.test(dane)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: dane
## X-squared = 485.87, df = 1, p-value < 2.2e-16
prop.table(dane)
## Status
## Gender Survivor Victim
## Female 0.16266425 0.05890349
## Male 0.15949252 0.61893974
Phi(dane)
## [1] 0.4703662
ContCoef(dane)
## [1] 0.4256325
CramerV(dane)
## [1] 0.4703662
TschuprowT(dane)
## [1] 0.4703662
mosaicplot(dane)
barplot(dane)
Here’s data for correlation between gender and surviving the titanic crash:
crew_survivors <- sum(titanic$Status=="Survivor" & titanic$Crew.or.Passenger. == "Crew")
passenger_survivors <-sum(titanic$Status=="Survivor" & titanic$Crew.or.Passenger. == "Passenger")
crew_victims <- sum(titanic$Status=="Victim" & titanic$Crew.or.Passenger. == "Crew")
passenger_victims <- sum(titanic$Status=="Victim" & titanic$Crew.or.Passenger. == "Passenger")
x=c(crew_survivors,passenger_survivors,crew_victims,passenger_victims)
dim(x)=c(2,2)
dane<-as.table(x)
dimnames(dane)=list(Status=c('Survivor','Victim'), Crew.or.Passenger.=c('Crew','Passenger'))
dane
## Crew.or.Passenger.
## Status Crew Passenger
## Survivor 211 679
## Victim 500 817
chisq.test(dane)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: dane
## X-squared = 48.786, df = 1, p-value = 2.855e-12
prop.table(dane)
## Crew.or.Passenger.
## Status Crew Passenger
## Survivor 0.09560489 0.30765745
## Victim 0.22655188 0.37018577
Phi(dane)
## [1] 0.1496655
ContCoef(dane)
## [1] 0.1480169
CramerV(dane)
## [1] 0.1496655
TschuprowT(dane)
## [1] 0.1496655
mosaicplot(dane)
barplot(dane)
Here, please interpret your findings.
There is a medium correlation between gender and surviving the titanic, according to correlation coefficients. There is a insignificant correlation between being part of a crew and surviving the titanic, according to correlation coefficients. The data is dominated by the Male gender - the ratio of male to female is ~78:22 The rule “women and children first” worked properly(ratio of female survivors/victims is much higher than that of male ones).