There is a notable difference in the average ticket price between men and women. On average women pay almost twice as much as men, with men paying 25$ and women paying 44$. One potential reason for the difference in ticket prices between men and women could be the gender inequalities that existed during the early 20th century, including the fact that women were often charged more than men for certain products and services.
There is a significant difference in ticket price across the different passenger classes. First class passengers have the highest average ticket price which is 84$, second class passengers pay a lower average price for their tickets which is 20$ and third class passengers have the lowest average ticket price which is 13$. First class typically costs more than the other two classes because it offers more amenities, such as spacious rooms, gourmet meals, luxury lounges, entertainment and activities.
There is a big discrepancy between the survival rate of men and women. On average women are much more likey to survive compared to men. Women have a 74% survival rate, while men only have a 18% survival rate. Women have a higher survival rate compared to men because as the Titanic was sinking priority was given to women and children allowing them to board the lifeboats before men did.
There is a meaningful
difference in the survival rates between the different passenger
classes. First-class passengers had the highest survival rate at 62%,
followed by second-class passengers with a survival rate of 47% and
lastly third class passengers had the lowest survival rate at just 24%.
One potential reason for the higher survival rate of first-class
passengers is their wealth and social standing, which likely gave them
priority access to lifeboats and life vest.
#Set up Session
rm(list =ls())
gc()
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 543957 29.1 1204041 64.4 NA 700242 37.4
## Vcells 1008960 7.7 8388608 64.0 16384 1963155 15.0
setwd("/Users/kevingregov/Desktop/Pratctice")
#Import the Data
library(readr)
titanic_data <- read_csv("titanic_data.csv", col_names = TRUE)
## Rows: 891 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Name, Sex, Ticket, Cabin, Embarked
## dbl (7): PassengerId, Survived, Pclass, Age, SibSp, Parch, Fare
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Load Packages that I will use
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
#Gain a better understanding of the Variables and Observations in my Data set
glimpse(titanic_data)
## Rows: 891
## Columns: 12
## $ PassengerId <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ Survived <dbl> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1…
## $ Pclass <dbl> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3, 3…
## $ Name <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Fl…
## $ Sex <chr> "male", "female", "female", "female", "male", "male", "mal…
## $ Age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, …
## $ SibSp <dbl> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1, 0…
## $ Parch <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0, 0…
## $ Ticket <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "37…
## $ Fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625,…
## $ Cabin <chr> NA, "C85", NA, "C123", NA, NA, "E46", NA, NA, NA, "G6", "C…
## $ Embarked <chr> "S", "C", "S", "S", "S", "Q", "S", "S", "S", "C", "S", "S"…
head(titanic_data)
## # A tibble: 6 × 12
## PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 1 0 3 Braund… male 22 1 0 A/5 2… 7.25 <NA>
## 2 2 1 1 Cuming… fema… 38 1 0 PC 17… 71.3 C85
## 3 3 1 3 Heikki… fema… 26 0 0 STON/… 7.92 <NA>
## 4 4 1 1 Futrel… fema… 35 1 0 113803 53.1 C123
## 5 5 0 3 Allen,… male 35 0 0 373450 8.05 <NA>
## 6 6 0 3 Moran,… male NA 0 0 330877 8.46 <NA>
## # ℹ 1 more variable: Embarked <chr>
tail(titanic_data)
## # A tibble: 6 × 12
## PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 886 0 3 "Rice,… fema… 39 0 5 382652 29.1 <NA>
## 2 887 0 2 "Montv… male 27 0 0 211536 13 <NA>
## 3 888 1 1 "Graha… fema… 19 0 0 112053 30 B42
## 4 889 0 3 "Johns… fema… NA 1 2 W./C.… 23.4 <NA>
## 5 890 1 1 "Behr,… male 26 0 0 111369 30 C148
## 6 891 0 3 "Doole… male 32 0 0 370376 7.75 <NA>
## # ℹ 1 more variable: Embarked <chr>
dim(titanic_data)
## [1] 891 12
#These are the Research Questions I will address.
#1. Is there a difference in ticket price between men and women(Sex)?
#2. Is there a difference in ticket price between Passenger Classes?
#3. Is there a difference in survival chance between men and women(Sex)?
#4. Is there a difference in survival chance between the different Passenger Classes?
#The variables I will analyze will be placed in a separate data set called titanic_data1
titanic_data1 <- select(titanic_data, Name, Sex, Pclass, Fare, Survived)
#I will rename some of the variables in titanic_data1 to make them easier for me to understand
titanic_data1 <- rename(titanic_data1, "Passenger Class" = Pclass, "Ticket Price" = Fare, "Survival Chance" = Survived)
#I will recode the observations for the Passenger Class variable to make them easier to understand."
titanic_data1$`Passenger Class` <- factor(titanic_data1$`Passenger Class`, levels = c(1,2,3), labels = c("First Class","Second Class","Third Class"))
#I will create a new data set that will provide the average ticket prices for Men and Women (Sex)
titanic_data1 <- group_by(titanic_data1,Sex)
Avg.Ticket.Price <- summarize(titanic_data1, "Avg Ticket Price" = mean(`Ticket Price`, na.rm = TRUE))
#Make a graph for average ticket prices for men and women
graph <- ggplot(Avg.Ticket.Price, aes(x = Sex, y = `Avg Ticket Price`, fill = Sex)) + geom_bar(stat = "identity", position = position_dodge()) + labs(title = "Average Ticket Price($) by Sex", x = "Sex", y = "Average Ticket Price")
#I will create a new data set that will provide the average ticket prices for Passenger Class.
titanic_data1 <- group_by(titanic_data1,`Passenger Class`)
Avg.Ticket.Price.PClass <- summarize(titanic_data1, Avg.Ticket.Price = mean(`Ticket Price`, na.rm = TRUE))
#Make a graph for average ticket price for Passenger Class
graph2 <- ggplot(Avg.Ticket.Price.PClass, aes(x = `Passenger Class`, y = Avg.Ticket.Price, fill = `Passenger Class`)) + geom_bar(stat = "identity", postion = position_dodge()) + labs(title = "Ticket Price($) Comparison Across Passenger Classes", x = "Passenger Class", y = "Avg Ticket Price")
## Warning in geom_bar(stat = "identity", postion = position_dodge()): Ignoring
## unknown parameters: `postion`
##I will create a new dataset that will provide the average survival rate between Men and Women(Sex).
titanic_data1 <- group_by(titanic_data1, Sex)
avg.survuval.rate.sex <- summarize(titanic_data1, "Avg Survival Rate" = mean(`Survival Chance`, na.rm = TRUE))
avg.survuval.rate.sex <- mutate(avg.survuval.rate.sex,"Avg.Survival.Rate" = `Avg Survival Rate` * 100)
#Create a Graph
Graph3 <- ggplot(avg.survuval.rate.sex, aes(x = Sex, y = Avg.Survival.Rate, fill = Sex)) + geom_bar(stat = "identity", postion = position_dodge()) + labs(title = "Average Survival Rate(%) Between Men and Women(Sex)", x = "Sex", y = "Avg Survival Rate")
## Warning in geom_bar(stat = "identity", postion = position_dodge()): Ignoring
## unknown parameters: `postion`
#What is the average difference in survival rate between the different passenger classes
titanic_data1 <- group_by(titanic_data1, `Passenger Class`)
avg.survival.rate.PClass <- summarize(titanic_data1, "Avg Survival Rate" = mean(`Survival Chance`, na.rm = TRUE))
avg.survival.rate.PClass <- mutate(avg.survival.rate.PClass,"Avg.Survival.Rate" = `Avg Survival Rate` * 100)
#Create a Graph
graph4 <- ggplot(avg.survival.rate.PClass, aes(x = `Passenger Class`, y = Avg.Survival.Rate, fill = `Passenger Class`)) + geom_bar(stat = "identity", position = position_dodge()) + labs(title = "Average Difference in Survival Rate(%) Between Passenger Class", x = "Passenger Class", y = "Avg Survival Rate(%)")