Assignment 2

titanic <- read_csv("titanic_data.csv")

titanic %>%
  group_by(Sex) %>%
  summarize(avg_fare = mean(Fare, na.rm = TRUE))


titanic %>%
  group_by(Pclass) %>%
  summarize(avg_fare = mean(Fare, na.rm = TRUE))


titanic %>%
  group_by(Sex) %>%
  summarize(survival_rate = mean(Survived, na.rm = TRUE))

titanic %>%
  group_by(Pclass) %>%
  summarize(survival_rate = mean(Survived, na.rm = TRUE))

ggplot(titanic, aes(x = Sex, y = Fare)) +
  geom_boxplot() +
  labs(title = "Ticket Fare by Sex", x = "Sex", y = "Fare") +
  theme_minimal()

# Fare by Passenger Class
ggplot(titanic, aes(x = factor(Pclass), y = Fare)) +
  geom_boxplot() +
  labs(title = "Ticket Fare by Passenger Class", x = "Pclass", y = "Fare") +
  theme_minimal()

# Survival Rate by Sex
ggplot(titanic, aes(x = Sex, y = Survived)) +
  geom_bar(stat = "summary", fun = "mean") +
  labs(title = "Survival Rate by Sex", x = "Sex", y = "Survival Rate") +
  theme_minimal()

# Survival Rate by Passenger Class
ggplot(titanic, aes(x = factor(Pclass), y = Survived)) +
  geom_bar(stat = "summary", fun = "mean") +
  labs(title = "Survival Rate by Passenger Class", x = "Pclass", y = "Survival Rate") +
  theme_minimal()

# Dataset on Motor Vehicles 
data(mtcars)

# Average miles per gallon (mpg) by number of cylinders (cyl)
mtcars %>%
  group_by(cyl) %>%
  summarize(avg_mpg = mean(mpg, na.rm = TRUE))

# Visualization for mpg by Cylinder
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot() +
  labs(title = "Miles per Gallon by Cylinder Count", x = "Cylinders", y = "Miles per Gallon") +
  theme_minimal()

data(mtcars)

# Average miles per gallon (mpg) by number of cylinders (cyl)
mtcars %>%
  group_by(cyl) %>%
  summarize(avg_mpg = mean(mpg, na.rm = TRUE))

# Visualization for mpg by Cylinder
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot() +
  labs(title = "Miles per Gallon by Cylinder Count", x = "Cylinders", y = "Miles per Gallon") +
  theme_minimal()
# Compare average MPG by Transmission Type (0 = Automatic, 1 = Manual)
mtcars %>%
  group_by(am) %>%
  summarize(avg_mpg = mean(mpg, na.rm = TRUE))

# Compare average MPG by Number of Cylinders
mtcars %>%
  group_by(cyl) %>%
  summarize(avg_mpg = mean(mpg, na.rm = TRUE))

# MPG by Transmission Type
ggplot(mtcars, aes(x = factor(am), y = mpg)) +
  geom_boxplot() +
  labs(title = "Fuel Efficiency (MPG) by Transmission Type", 
       x = "Transmission (0 = Automatic, 1 = Manual)", 
       y = "Miles per Gallon") +
  theme_minimal()

# MPG by Number of Cylinders
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot() +
  labs(title = "Fuel Efficiency (MPG) by Cylinder Count", 
       x = "Number of Cylinders", 
       y = "Miles per Gallon") +
  theme_minimal()

# Horsepower by Number of Cylinders
ggplot(mtcars, aes(x = factor(cyl), y = hp)) +
  geom_boxplot() +
  labs(title = "Horsepower by Cylinder Count", 
       x = "Number of Cylinders", 
       y = "Horsepower") +
  theme_minimal()

In this analysis of the Titanic dataset, we explored the differences in ticket fare and survival rates between men and women and across passenger classes. First, we found that the average ticket fare varied significantly between men and women, with women generally paying slightly higher fares on average. This could be attributed to factors such as the type of accommodations or cabins reserved for different genders. However, it’s important to note that this difference may not entirely reflect individual choices, given the class-based structure of the ship.

A clear trend emerged when examining ticket fares by passenger class: passengers in higher classes (Pclass 1) paid substantially more for their tickets than those in lower classes (Pclass 3). This price discrepancy aligns with the ship’s class-based system, where first-class passengers had access to more luxurious accommodations priced accordingly. The fare distribution was spread out more broadly for Pclass 3, indicating that lower-class passengers had a wider range of ticket prices, likely due to the availability of more affordable options.

In terms of survival rates, there was a marked difference between men and women, with women having a significantly higher survival rate than men. This finding supports the historical narrative of the “women and children first” protocol during the Titanic disaster, which likely contributed to the higher survival rate among women. Similarly, when we analyzed survival by passenger class, it was evident that passengers in higher classes had a greater chance of survival. This could be explained by their proximity to the lifeboats, better access to emergency resources, and prioritization during the evacuation process. In contrast, passengers in lower classes (Pclass 3) had a much lower survival rate, likely due to their distance from lifeboats and the prioritization of first- and second-class passengers during the evacuation.

However, it’s important to address the presence of missing data in the dataset. In both the Fare and Survived variables, we observed missing values for certain passengers. For instance, some Fare values were missing, which could affect the accuracy of the average fare calculations. The missing values were excluded from the analysis using na.rm = TRUE, but this does raise a point about potential biases in the dataset. Missing data may reflect incomplete records or passengers who were not assigned fares, and this could impact the reliability of our conclusions, especially for passengers in lower classes or certain demographic groups. Similarly, missing values in the survival status could potentially skew our understanding of the survival rates, especially if missing data were not randomly distributed.

The visualizations further underscore these findings, showing that survival rates for women were consistently higher across all passenger classes and that first-class passengers had the highest survival rates overall. These patterns suggest that both gender and social class played significant roles in the likelihood of survival during the Titanic disaster. Nevertheless, missing data should be carefully considered when interpreting these patterns, as it may influence the robustness of our conclusions.

Assignment 2

Angel Bayron

2025-02-24