Do you believe in the Afterlife? https://nationalpost.com/news/canada/millennials-do-you-believe-in-life-after-life A survey was conducted and a random sample of 1091 questionnaires is given in the form of the following contingency table:
## Believe
## Gender Yes No
## Female 435 375
## Male 147 134
Our task is to check if there is a significant relationship between the belief in the afterlife and gender. We can perform this procedure with the simple chi-square statistics and chosen qualitative correlation coefficient (two-way 2x2 table).
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: dane
## X-squared = 0.11103, df = 1, p-value = 0.739
## Believe
## Gender Yes No
## Female 0.3987168 0.3437214
## Male 0.1347388 0.1228231
As you can see we can calculate our chi-square statistic really quickly for two-way tables or larger. Now we can standardize this contingency measure to see if the relationship is significant.
##
## Attaching package: 'DescTools'
## The following objects are masked from 'package:psych':
##
## AUC, ICC, SD
## [1] 0.01218871
Let’s consider the titanic dataset which contains a complete list of passengers and crew members on the RMS Titanic. It includes a variable indicating whether a person did survive the sinking of the RMS Titanic on April 15, 1912. A data frame contains 2456 observations on 14 variables.
The website http://www.encyclopedia-titanica.org/ offers detailed information about passengers and crew members on the RMS Titanic. According to the website 1317 passengers and 890 crew member were aboard.
8 musicians and 9 employees of the shipyard company are listed as passengers, but travelled with a free ticket, which is why they have NA values in fare. In addition to that, fare is truely missing for a few regular passengers.
In the following chunk, please find few significant correlations between nominal variables, present their distribution on the plot and in the form of a contingency table.
How to visualize cross-tabulations? Please find some hints here and here.
#female and male
ND<-titanic[which(titanic$Disembarked.at== 'Not Disembarked'), ]
tb1<-table(ND$Status,ND$Gender)
tb1
##
## Female Male
## Survivor 359 352
## Victim 130 1366
fourfoldplot(tb1)
chisq.test(tb1)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tb1
## X-squared = 485.87, df = 1, p-value < 2.2e-16
prop.table(tb1)
##
## Female Male
## Survivor 0.16266425 0.15949252
## Victim 0.05890349 0.61893974
#kid and old
labels <- c("< 15", "15 - 30", "30 - 45", "45 - 60", "60 - 75")
breaks <- c(1,15,30,45,60,75)
agetable <- cut(ND$Age, breaks = breaks, labels = labels, right = TRUE )
tb2<-table(ND$Status,Age=agetable)
tb2
## Age
## < 15 15 - 30 30 - 45 45 - 60 60 - 75
## Survivor 52 334 233 69 7
## Victim 60 774 504 118 31
tb2.df <- as.data.frame(tb2)
names(tb2.df) <- c("Status", "Age", "Frequency")
ggplot(tb2.df, aes(x=Status, y=Frequency, fill=Age)) + geom_col()
chisq.test(tb2)
##
## Pearson's Chi-squared test
##
## data: tb2
## X-squared = 17.823, df = 4, p-value = 0.001337
prop.table(tb2)
## Age
## < 15 15 - 30 30 - 45 45 - 60 60 - 75
## Survivor 0.023831347 0.153070577 0.106782768 0.031622365 0.003208066
## Victim 0.027497709 0.354720440 0.230980752 0.054078827 0.014207149
#crew and passenger
tb3<-table(ND$Status,ND$Crew.or.Passenger.)
tb3
##
## Crew Passenger
## Survivor 211 500
## Victim 679 817
mosaicplot(tb3,shade=TRUE)
chisq.test(tb3)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tb3
## X-squared = 48.786, df = 1, p-value = 2.855e-12
prop.table(tb3)
##
## Crew Passenger
## Survivor 0.09560489 0.22655188
## Victim 0.30765745 0.37018577
#all
#install.packages("vcd")
library(vcd)
## Warning: package 'vcd' was built under R version 4.0.5
## Loading required package: grid
mosaic(Titanic,shade=TRUE,legend=TRUE)
Here, please interpret your findings.
From picture 1, the female is the majority of survivor, and male is the majority of victim. So when they can go, the female is priority to male.
From picture 2, the age in rate are similar, so they use gender priority to age.
From the p-value of chi-square statistics we can get the same conclusion.
From picture 3 and its statistic value, we can get find most of crew are victims.