titanic_data <- read.csv("C:\\Users\\18328\\Desktop\\train.csv")
In this analysis, my goal is to understand the age distribution of passengers.
I want to identify the age groups present among Titanic passengers.
I will analyze the age data to create a summary and visualize the distribution using a histogram.
summary(titanic_data$Age)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.42 20.12 28.00 29.70 38.00 80.00 177
hist(titanic_data$Age)
To understand the distribution of genders among passengers. ### What: Analyzing gender data to identify the proportion of males and females.
table(titanic_data$Sex)
##
## female male
## 314 577
barplot(table(titanic_data$Sex), main="Gender Distribution", xlab="Gender", ylab="Count", col=c("blue", "pink"))
To understand the distribution of fares paid by passengers. ### What: Analyzing fare data to identify the financial status of passengers, as financial status may impact overall well-being and health.
summary(titanic_data$Fare)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 7.91 14.45 32.20 31.00 512.33
hist(titanic_data$Fare, main="Fare Distribution", xlab="Fare", ylab="Count", col="navy")
titanic_data$Survived <- factor(titanic_data$Survived)
boxplot(Age ~ Survived, data = titanic_data, col = c("yellow", "blue"), main = "Age vs. Survival", xlab = "Survived", ylab = "Age")
t.test(Age ~ Survived, data = titanic_data)
##
## Welch Two Sample t-test
##
## data: Age by Survived
## t = 2.046, df = 598.84, p-value = 0.04119
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.09158472 4.47339446
## sample estimates:
## mean in group 0 mean in group 1
## 30.62618 28.34369
gender_survival_table <- table(titanic_data$Sex, titanic_data$Survived)
chisq.test(gender_survival_table)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: gender_survival_table
## X-squared = 260.72, df = 1, p-value < 2.2e-16
boxplot(Fare ~ Survived, data = titanic_data, col = c("blue", "red"), main = "Fare vs. Survival", xlab = "Survived", ylab = "Fare")
t.test(Fare ~ Survived, data = titanic_data)
##
## Welch Two Sample t-test
##
## data: Fare by Survived
## t = -6.8391, df = 436.7, p-value = 2.699e-11
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -33.82912 -18.72592
## sample estimates:
## mean in group 0 mean in group 1
## 22.11789 48.39541
I observe differences in age distributions between survivors and non-survivors. However, the results do not show a clear difference or variation in survival rates between older and younger passengers.
Women are more likely to have higher survival rates compared to men. This indicates a strong association between gender and survival.
Passengers who pay higher fares tend to have better chances of survival. The t-test also shows a difference in the average fare between survivors and non-survivors. The p-value is very small (p-value = 2.699e-11), indicating a significant difference in the average fare between the two groups.