Learning objectives

  1. Construct contingency tables using categorical data and use these tables to calculate row and column proportions of the data
  2. Construct barplots and segmented bar plots in order to identify differences in proportions between groups of objects

The titanic case study

The data set provided on google classroom is the actual data on passengers from the Titanic disaster. There were 1,313 passengers aboard the Titanic and within our dataset we have data on each passenger’s name, sex, passenger class and survival status. During this case study we will use a contingency table as well as a bar plot to help answer the question: Were the survival rates different for different passenger classes aboard the Titanic?

  1. Use the COUNTIFS() function in google sheets to construct a contingency table with the passenger class as the rows, and the survival status as the columns. Then use the SUM() function to calculate the row sum and column sums.

  2. For each class, calculate the proportion of that class that survived the Titanic disaster.

  3. What proportion of survivors were designated 1st class? What proportion of survivors were designated 2nd class? What proportion of survivors were designated 3rd class?

  4. Use the contingency table you constructed to create a segmented bar plot that shows the proportion of each class that survived and did not survive.

  5. Based on the evidence you have constructed, does it seem that the survival rate is different between different classes? If so, form a hypothesis as to why this might be.

Extension: Titanic survival rates and sex

Our next task is to investigate whether or not survival rates differed between male and female passengers on the Titanic.

  1. Construct a contingency table that has passenger sex as the rows and survival status as the columns.

  2. For each gender, calculate the probability that gender survived the Titanic disaster.

  3. Calculate the proportion of survivors that were male and female respectively.

  4. Create a segmented bar plot that displays the proportion of each sex that survived the disaster.

  5. Based on the evidence we have gathered, does it seem that there is a large difference in the proportion of men who survived vs. the number of females that survived? If so, form a hypothesis as to why this might be.