Introduction

The sinking of the RMS Titanic is one of the most significant maritime disasters in history. The tragedy presents a unique set of data which can be used to conduct sociological and statistical analyses. Instead of a ‘casualty tally’ this report aims to reveal insights from the story of the tragedy. Considering socio-economic status, demographic profiles, and family dynamics as indicators of the probability of survival, this analysis will examine the intersections of these factors. The five distinct visualizations presented open the opportunity to trace the journey from basic passenger demographics to the intersection of financial status and social capital, illustrating how these two variables acted as gatekeepers to survivability on the Titanic.

Data Summary and Key Findings

To gain an initial understanding of the dataset, we compute summary statistics for the numeric variables including Age, Fare, and family related metrics. These summary statistics give insights into the central tendencies and variability of passenger demographics, which directly influenced survival outcomes across different variables found within the dataframe.

# Defining numeric columns
numeric_summary_cols = ['Age', 'Fare', 'Siblings_Spouses_Aboard', 'Parents_Children_Aboard', 'Ticket_Class']

# Creating Summary for Min, Mean, Median, and Max
numeric_summary = df[numeric_summary_cols].agg(['min', 'mean', 'median', 'max']).transpose()

# Formatting and Rounding
numeric_summary.columns = ['Min', 'Mean', 'Median', 'Max']
numeric_summary['Mean'] = numeric_summary['Mean'].round(2)

numeric_summary
##                           Min   Mean  Median       Max
## Age                      0.17  30.18    27.0   76.0000
## Fare                     0.00  40.98    16.0  512.3292
## Siblings_Spouses_Aboard  0.00   0.48     0.0    8.0000
## Parents_Children_Aboard  0.00   0.40     0.0    6.0000
## Ticket_Class             1.00   2.14     2.0    3.0000

Fare

When the mean is significantly higher than the median (Mean Fare: £40.98 & Median Fare: £16.00), it shows a right skew in the data. This proves that a small group of extremely wealthy elite, represented by the £512.33 Max, is pulling the average up while the average passenger paid much less.

Travel Group

Even though the Titanic is often remembered for family tragedies, the zeros found in both Medians for Siblings_Spouses and Parents_Children prove that more than 50% of the passengers were on board without any family.

Ticket Class

The Mean Ticket Class is 2.14 and the Median is 2.0. This finding shows that the average person on the Titanic was part of one of the lower classes. Showing that the ship wasn’t just a luxury line for the 1st Class; it was for the most part a transport for 2nd and 3rd Class passengers.

Plots

Graphs and plots provide a clearer view of the Titanic’s passenger data, highlighting trends in socio-economic distribution, age demographics, family structures, and survival outcomes. These visualizations help identify patterns and compare the distinct passenger experiences across different variables and columns.

Average Demographic per Ticket Class

The Titanic’s passenger demographics varied dramatically based on the socio-economic status associated with their ticket class. The mean price and average age for 1st, 2nd, and 3rd classes are displayed in this dual-axis chart. It is a visual guide to the report, which emphasizes the hierarchy of the ship by age and financial class.

# Grouping by Class and Calculating the Mean for Age and Fare
class_stats = df.groupby('Ticket_Class')[['Fare', 'Age']].mean().reset_index()

# Creating the First Axis
fig, ax1 = plt.subplots(figsize=(12, 7))
sns.set_theme(style="white")

# Creating the Bar Chart for Fare
color_fare = 'steelblue'
sns.barplot(data=class_stats, x='Ticket_Class', y='Fare', ax=ax1, color=color_fare)
ax1.set_ylabel('Average Fare Paid (£)', fontsize=12, color=color_fare)
ax1.set_xlabel('Ticket Class', fontsize=12)
ax1.tick_params(axis='y', labelcolor=color_fare)

# Creating the second axis
ax2 = ax1.twinx()
color_age = 'darkred'

# Creating a Line for Age Visibility
sns.lineplot(data=class_stats, x=range(len(class_stats)), y='Age', ax=ax2, 
             color=color_age, marker='o', linewidth=3, markersize=10)
ax2.set_ylabel('Average Age (Years)', fontsize=12, color=color_age)
ax2.tick_params(axis='y', labelcolor=color_age)

# Adding Titles and Ticks
plt.title('Average Fare and Age per Ticket Class', fontsize=16)
ax1.set_xticks([0, 1, 2])
ax1.set_xticklabels(['1st', '2nd', '3rd'])

plt.show()

This chart shows a high degree of economic disconnect between the ticket classes. The 1st Class was the most expensive and was, on average, held by the oldest demographic, with an average age of around 40. By contrast, 3rd Class passengers paid a fraction of the price and were much younger. The key takeaway from this chart is the clear demographic structure across the ship’s hierarchy, establishing a baseline socio-economic divide that would define the passengers’ experience during the voyage.

Survival Correlation

Statistical correlations can vary dramatically depending on which demographic variables are prioritized. As a visualization, this heatmap shows survival correlations between ticket class, age, fare, number of siblings/spous, and number of parents/children across the passenger manifest. It offers a quantitative map for the report emphasizing the advantages of specific variables that are not easily illustrated in standard bar charts.

import matplotlib.pyplot as plt
import seaborn as sns

# Creating numeric survival variable for calculation
df['Survived_Numeric'] = df['Survived_Label'].map({'Yes': 1, 'No': 0})

# Shortening Names for Axis Readability
short_names = {
    'Survived_Numeric': 'Survived',
    'Ticket_Class': 'Class',
    'Siblings_Spouses_Aboard': 'Sib/Sp',
    'Parents_Children_Aboard': 'Par/Chi',
    'Fare': 'Fare',
    'Age': 'Age'
}

# Calculating the correlation between columns
# We use .rename() here so the names in the heatmap are short
corr_columns = ['Survived_Numeric', 'Ticket_Class', 'Age', 'Siblings_Spouses_Aboard', 'Parents_Children_Aboard', 'Fare']
corr_data = df[corr_columns].rename(columns=short_names).corr()

plt.figure(figsize=(10, 8))

# Creating the heatmap
sns.heatmap(corr_data, annot=True, fmt=".2f", cmap='RdBu_r', center=0, linewidths=0.5, cbar_kws={"label": "Correlation Strength (-1 to +1)"} 
)

# Adding Titles
plt.title('Titanic Survival Correlation', fontsize=16)
plt.xticks(rotation=45, ha='right', fontsize=12) 
plt.yticks(rotation=0, fontsize=12)
plt.show()

This heatmap reveals a multi-layered statistical divide beyond just survival outcomes. While age maintains at about neutral correlation with survival, it shows a strong positive correlation with fare and a negative correlation with ticket class. This reconfirms that the ship’s hierarchy was divided by both age and wealth. Also, the correlation between family variables (sib/sp and par/chi) suggest that when passengers did not travel alone, they traveled with their full family. With that said, the presence of parents or children provided a higher statistical survival boost than siblings or spouses. Ultimately, these correlations highlight that the Titanic’s survival “formula” was heavily weighted toward financial status, age, wealth, and family structure. These variables created a web of privilege that served as the primary gatekeeper to survival.

Solo vs. Family Travel Survival Rates

On the Titanic, social interactions and support systems differed significantly based on passengers traveling solo versus together with family. A nested pie chart illustrates the distribution of survival outcomes across solo travelers and family units. It acts as an important demographic layer to the report, emphasizing the human element of the tragedy and how individual relationships helped make it possible for a passenger to get to safety.

import matplotlib.pyplot as plt
from matplotlib.lines import Line2D

# Creating Travel_Group variable
df['Travel_Group'] = (df['Siblings_Spouses_Aboard'] + df['Parents_Children_Aboard'] > 0).map({
    True: 'With Family', 
    False: 'Solo Traveler'
})

# Preparing data for rings
outer_data = df['Travel_Group'].value_counts()
inner_data = df.groupby(['Travel_Group', 'Survived_Label']).size()

# Creating the Plot
fig, ax = plt.subplots(figsize=(10, 8)) # Slightly larger for better readability
size = 0.3

outer_colors = ['#3498db', '#95a5a6'] 
inner_colors = ['#e74c3c', '#2ecc71', '#e74c3c', '#2ecc71'] 

# Adding Outer Ring Labels
ax.pie(outer_data, radius=1, labels=outer_data.index, 
       autopct='%1.1f%%', pctdistance=0.85, 
       colors=outer_colors, 
       textprops={'fontsize': 12, 'weight': 'bold'},
       wedgeprops=dict(width=size, edgecolor='w'))
# Adding Inner Ring Values
ax.pie(inner_data, radius=1-size, 
       autopct='%1.1f%%', 
       pctdistance=0.7,  
       colors=inner_colors,
       textprops={'fontsize': 10, 'color': 'white', 'weight': 'bold'}, 
       wedgeprops=dict(width=size, edgecolor='w'))
# Creating Key
legend_elements = [
    Line2D([0], [0], marker='o', color='w', label='Survived',
           markerfacecolor='#2ecc71', markersize=10),
    Line2D([0], [0], marker='o', color='w', label='Perished',
           markerfacecolor='#e74c3c', markersize=10)
]

ax.legend(handles=legend_elements, loc="center left", 
          bbox_to_anchor=(1, 0, 0.5, 1), title="Legend")

# Adding Title
plt.title('Solo vs. Family Travel Survival Rates', fontsize=16)
plt.tight_layout()
plt.show()

This chart reveals a strict survival divide between social groups. While Solo Travelers made up the majority of the passenger manifest, they were found with a significantly lower survival rate. By contrast, those traveling with at least one family member saw a much higher percentage of green “Survived” outcomes. The key takeaway from this chart is the visualization of the social capital advantage; traveling in a family unit acted as a protective buffer, likely providing the necessary coordination and support to navigate the chaos of the evacuation.

Survival by Port

Passenger origins for the Titanic differed tremendously depending upon the port of embarkation and socio-economic factors connected to that region. A two-axis plot shows a distribution of Survival Counts and Average Ticket Class in Cherbourg, Queenstown, and Southampton. It is a geographic benchmark with the report’s narrative, showing how a passenger’s point of arrival helped to filter a passenger, both in terms of social status and overall safety.

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# Preparing the data
port_survival = df.groupby(['Embarked', 'Survived_Label']).size().unstack().fillna(0)

# Grouping for Trend Line
port_stats = df.groupby('Embarked')['Ticket_Class'].mean().reset_index()

#  Creating the plot
fig, ax1 = plt.subplots(figsize=(12, 8))

# Creating the Primary Y-Axis
port_survival.plot(kind='bar', stacked=True, ax=ax1, color=['#e74c3c', '#2ecc71'])

# Adding labels to the bars
for container in ax1.containers:
    ax1.bar_label(container, label_type='center', color='white', weight='bold', fontsize=12)

ax1.set_ylabel('Number of Passengers', fontsize=12)
ax1.set_xlabel('Port of Embarkation', fontsize=12)

# Adding the Trend Line/Secondary Y-Axis
ax2 = ax1.twinx()
color_line = '#2c3e50'

sns.lineplot(x=port_stats.index, y=port_stats['Ticket_Class'], ax=ax2, 
             color=color_line, marker='s', linewidth=4, markersize=12, label='Avg Ticket Class')

ax2.set_ylabel('Average Ticket Class (1.0 = 1st, 3.0 = 3rd)', fontsize=12, color=color_line)
ax2.set_ylim(0, 3.5)
ax2.tick_params(axis='y', labelcolor=color_line)

# Adding Title
plt.title('Survival by Port with Average Class Trend', fontsize=16)
ax1.set_xticklabels(port_survival.index, rotation=0)

# Adding legend
lines, labels = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.get_legend().remove() # Remove auto-generated legend
plt.legend(lines + lines2, ['Perished', 'Survived', 'Avg Ticket Class'], loc='upper left', bbox_to_anchor=(1.1, 1))

plt.tight_layout()
plt.show()

This chart indicates a complex relationship between geography, socio-economic status, and survival. While Cherbourg was the most affluent boarding point with the highest concentration of 1st and 2nd class passengers, it did not have the highest survival rate. Surprisingly, Queenstown, despite having the lowest average ticket class, showed the highest percentage of survivors relative to its total passengers. By contrast, Southampton was the deadliest port of origin, seeing nearly double the number of fatalities compared to survivors yet having a higher average ticket class per passenger.

Survival by Class and Family Status

Depending on how their situation intersected with their socio-economic status, and also on the strength of their own support system, passenger experiences and eventual survival on the Titanic differed greatly. This stacked bar chart shows how survival counts are spread over ticket class and family status (Solo vs. With Family). This chart is a visual and demographic conclusion of the report, illustrating how these two primary factors both ultimately affected a passenger’s ability to ensure their survival.

import matplotlib.pyplot as plt
import numpy as np

stackedbar_data = df.groupby(['Ticket_Class', 'Travel_Group', 'Survived_Label']).size().unstack().fillna(0)

# Creating X-Axis Labels
clean_labels = [f"{cls} {grp}" for cls, grp in stackedbar_data.index]

# Creating the Plot
ax = stackedbar_data.plot(kind='bar', stacked=True, figsize=(12, 8), color=['#e74c3c', '#2ecc71'])
ax.set_xticklabels(clean_labels, rotation=45, ha='right', fontsize=11)

# Adding labels inside the red/green boxes
for container in ax.containers:
    ax.bar_label(container, label_type='center', color='white', weight='bold', fontsize=11)
current_max = int(ax.get_ylim()[1]) 
plt.yticks(np.arange(0, current_max, 20)) 
plt.grid(axis='y', linestyle=':')
ax.set_axisbelow(True)

# Adding Titles and Legend
plt.title('Survival Counts by Class and Family Status', fontsize=16)
plt.xlabel('Passenger Category (Class + Travel Group)', fontsize=12)
plt.ylabel('Number of Passengers', fontsize=12)
plt.legend(['Perished', 'Survived'], loc='upper left', bbox_to_anchor=(1, 1))

plt.tight_layout()
plt.show()

This chart shows a wide and devastating survival divide in the social hierarchy. While passengers with family did better than those within the same class with no family members, the truly isolated were the 3rd Class Solo Travelers. They made up the largest category of the ships manifest and were disproportionately subject to a much higher number of deaths with only a fraction of their population surviving. In contrast 1st class families had a survival rate of 57%. The crucial point of this chart is how the level of survival on the Titanic depended heavily on the passengers’ success in combining their social-class status with their ability to command social support, represented in contrast after finding both 3rd class status and solo travel was the most vulnerable category.

Conclusion

This analysis of the Titanic’s passenger manifest reveals that survival was not a random occurrence, but a reflection of a strictly stratified social ecosystem. By examining the intersections of survival based on ticket class, geographic origin, and family structure, we can see how social capital functioned as a gatekeeper to survival. The data confirms that while 1st class status provided a baseline of protection, the most devastating mortality rates were found among solo travelers, particularly those in the second and third classes. The most striking finding was the 18% survival rate for 2nd class solo travelers, a figure even higher than 3rd class solo travelers holding a 27% survival rate. This suggests that in the middle of the ship’s hierarchy, the absence of a family support network was a fatal disadvantage. Furthermore, after analysis, it was found that the embarkation points were found as a factor of survival as well. While Southampton saw the highest volume of fatalities, Queenstown emerged as a statistical anomaly, managing to maintain a survival rate of over 54% despite its largest ticket class being between 2nd and 3rd class. In summary, while the sinking of the Titanic was a tragedy, the resulting loss of life was a direct product of the era’s socio-economic divisions. For a passenger in 1912, safety was a commodity purchased not just with a ticket, but through the strength of one’s social ties which can be linked to the quality of their port of embarkment. These findings demonstrate that even in a crisis, the pre-existing structures of wealth and social connection remained the ultimate predictors of who would reach the lifeboats.