Contingency tables - qualitative data

Introductory exercise

Do you believe in the Afterlife? https://nationalpost.com/news/canada/millennials-do-you-believe-in-life-after-life A survey was conducted and a random sample of 1091 questionnaires is given in the form of the following contingency table:

##         Believe
## Gender   Yes  No
##   Female 435 375
##   Male   147 134

Our task is to check if there is a significant relationship between the belief in the afterlife and gender. We can perform this procedure with the simple chi-square statistics and chosen qualitative correlation coefficient (two-way 2x2 table).

##         Believe
## Gender         Yes        No
##   Female 0.3987168 0.3437214
##   Male   0.1347388 0.1228231

As you can see we can calculate our chi-square statistic really quickly for two-way tables or larger. Now we can standardize this contingency measure to see if the relationship is significant.

## [1] 0.01218871

Laboratory - 21/04/2021. Bivariate analysis for the ‘Titanic’ data.

Let’s consider the titanic dataset which contains a complete list of passengers and crew members on the RMS Titanic. It includes a variable indicating whether a person did survive the sinking of the RMS Titanic on April 15, 1912. A data frame contains 2456 observations on 14 variables.

The website http://www.encyclopedia-titanica.org/ offers detailed information about passengers and crew members on the RMS Titanic. According to the website 1317 passengers and 890 crew member were aboard.

8 musicians and 9 employees of the shipyard company are listed as passengers, but travelled with a free ticket, which is why they have NA values in fare. In addition to that, fare is truely missing for a few regular passengers.

In the following chunk, please find few significant correlations between nominal variables, present their distribution on the plot and in the form of a contingency table.

How to visualize cross-tabulations? Please find some hints here and here.

#Deleting rows with info about ppl other than victims or survivors
ship<-filter(titanic,Status!="")
head(ship)
##                                Status  Disembarked.at    Home.Country Age
## COLERIDGE, Mr Reginald Charles Victim Not Disembarked         England  29
## STOKES, Mr Philip Joseph       Victim Not Disembarked         England  25
## REEVES, Mr David               Victim Not Disembarked         England  36
## PARKER, Mr Clifford Richard    Victim Not Disembarked Channel Islands  28
## MITCHELL, Mr Henry Michael     Victim Not Disembarked Channel Islands  71
## PAIN, Dr Alfred                Victim Not Disembarked          Canada  23
##                                Year.of.Birth Crew.or.Passenger. Gender
## COLERIDGE, Mr Reginald Charles          1883          Passenger   Male
## STOKES, Mr Philip Joseph                1887          Passenger   Male
## REEVES, Mr David                        1876          Passenger   Male
## PARKER, Mr Clifford Richard             1884          Passenger   Male
## MITCHELL, Mr Henry Michael              1841          Passenger   Male
## PAIN, Dr Alfred                         1888          Passenger   Male
##                                Class...Department    Embarked
## COLERIDGE, Mr Reginald Charles          2nd Class Southampton
## STOKES, Mr Philip Joseph                2nd Class Southampton
## REEVES, Mr David                        2nd Class Southampton
## PARKER, Mr Clifford Richard             2nd Class Southampton
## MITCHELL, Mr Henry Michael              2nd Class Southampton
## PAIN, Dr Alfred                         2nd Class Southampton
##                                                   Job            Job.details
## COLERIDGE, Mr Reginald Charles Advertising Consultant Advertising Consultant
## STOKES, Mr Philip Joseph                   Bricklayer             Bricklayer
## REEVES, Mr David                    Carpenter, Joiner     Carpenter / Joiner
## PARKER, Mr Clifford Richard                     Clerk                  Clerk
## MITCHELL, Mr Henry Michael              Coach Painter          Coach Painter
## PAIN, Dr Alfred                                Doctor                 Doctor
##                                Ticket.Number Fare.Price Fare_GBP Fare_today
## COLERIDGE, Mr Reginald Charles         14263    P10 10s     10.5    862.155
## STOKES, Mr Philip Joseph               13540    P10 10s     10.5    862.155
## REEVES, Mr David                       17248    P10 10s     10.5    862.155
## PARKER, Mr Clifford Richard            14888    P10 10s     10.5    862.155
## MITCHELL, Mr Henry Michael             24580    P10 10s     10.5    862.155
## PAIN, Dr Alfred                       244278    P10 10s     10.5    862.155
##                                                                                   Profile.on.Encyclopedia.Titanica
## COLERIDGE, Mr Reginald Charles http://www.encyclopedia-titanica.org/titanic-victim/reginald-charles-coleridge.html
## STOKES, Mr Philip Joseph             http://www.encyclopedia-titanica.org/titanic-victim/philip-joseph-stokes.html
## REEVES, Mr David                             http://www.encyclopedia-titanica.org/titanic-victim/david-reeves.html
## PARKER, Mr Clifford Richard       http://www.encyclopedia-titanica.org/titanic-victim/clifford-richard-parker.html
## MITCHELL, Mr Henry Michael         http://www.encyclopedia-titanica.org/titanic-victim/henry-michael-mitchell.html
## PAIN, Dr Alfred                               http://www.encyclopedia-titanica.org/titanic-victim/alfred-pain.html
test<-group_by(ship,Status)
test<-summarise(test,n())
test
## # A tibble: 2 x 2
##   Status   `n()`
## * <chr>    <int>
## 1 Survivor   711
## 2 Victim    1496
#Creating new dataset that contains only info about passengers
d <- ship %>% group_by(Class...Department) %>% summarise(Dead=sum(Status=="Victim"),Alive=sum(Status=="Survivor"))
deaths<-as.data.frame(d)
row.names(deaths)<-c('1st','2nd','3rd','Deck','Engineer','Restaurant','Victualling')
deaths<-deaths[,-1]
print(deaths)
##             Dead Alive
## 1st          123   201
## 2nd          166   119
## 3rd          528   180
## Deck          23    43
## Engineer     253    71
## Restaurant    66     3
## Victualling  337    94
prop.table(deaths)
##                   Dead       Alive
## 1st         0.05573176 0.091073856
## 2nd         0.07521522 0.053919348
## 3rd         0.23923879 0.081558677
## Deck        0.01042139 0.019483462
## Engineer    0.11463525 0.032170367
## Restaurant  0.02990485 0.001359311
## Victualling 0.15269597 0.042591754
Phi(deaths)
## [1] 0.3387276
ContCoef(deaths)
## [1] 0.3208222
CramerV(deaths)
## [1] 0.3387276
TschuprowT(deaths)
## [1] 0.2164277
mosaicplot(deaths)

Here, please interpret your findings.

Almost whole Restaurant Crew died. The higher passenger’s Class, the higher chances of surviving.