Data description:
Calculate the survival rate for male and female passengers. Visualize the survival rates using a bar plot.
## # A tibble: 4 × 4
## # Groups: Sex [2]
## Sex Survived n percentage
## <chr> <fct> <int> <dbl>
## 1 female No 81 25.8
## 2 female Yes 233 74.2
## 3 male No 468 81.1
## 4 male Yes 109 18.9
Investigate the relationship between ticket class (Pclass) and survival. What is the survival rate for each passenger class? Visualize it using a stacked bar plot.
##
## Pearson's Chi-squared test
##
## data: table(titanic_3$Pclass, titanic_3$Survived)
## X-squared = 102.89, df = 2, p-value < 2.2e-16
comment:
Null Hypothesis (H0): There is no association between the survival rate and the ticket class(Pclass)
Alternative Hypothesis (H1): There is an association between the survival rate and the ticket class(Pclass).
Since the p-value is much smaller than the common significance level(0.05), we reject the null hypothesis. This means that survival and ticket class are not independent & survival on the Titanic was significantly associated with the class of ticket(Pclass).
##
## Pearson's product-moment correlation
##
## data: titanic_3$Survived and titanic_3$Pclass
## t = -10.725, df = 889, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3953692 -0.2790061
## sample estimates:
## cor
## -0.338481
Comment: the Pearson correlation coefficient between Survived and Pclass is a value of -0.338 which indicates a negative correlation, meaning that as passenger class(Pclass) increases the probability of survival decreases. The strength of this correlation is moderate.
## # A tibble: 6 × 4
## # Groups: Pclass [3]
## Survived Pclass n Survival_rate_pclass
## <fct> <fct> <int> <dbl>
## 1 No 1st 80 37.0
## 2 No 2nd 97 52.7
## 3 No 3rd 372 75.8
## 4 Yes 1st 136 63.0
## 5 Yes 2nd 87 47.3
## 6 Yes 3rd 119 24.2
Analyze the age distribution of passengers on the Titanic. Compare the distribution for those who survived and those who did not using a boxplot.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.42 20.12 28.00 29.70 38.00 80.00
Analyze how family size (number of siblings, spouses, parents, or children aboard) affects the likelihood of survival. Visualize the relationship between family size and survival rate using an appropriate plot.
##
## Pearson's Chi-squared test
##
## data: table(titanic_4$Family_size, titanic_4$Survived)
## X-squared = 1.7778, df = 8, p-value = 0.9871
comment:
Null Hypothesis(H0): There is no association between the family size and the likelihood of survival.
Alternative Hypothesis (H1): There is an association between the family size and the likelihood of survival.
Since the p-value(0.9871) is much larger than significance level(0.05), we fail to reject the null hypothesis. This means that family size does not affect the likelihood of survival.
Visualize the age distribution of passengers based on their class (Pclass) and gender (Sex) using a facet grid plot.
Group passengers into age categories (children, adults, elderly) and calculate the survival rate for each age group. Visualize the survival rate using a bar plot. Hints: You may need to create a new column.
## # A tibble: 6 × 4
## # Groups: Age_category [3]
## Age_category Survived n Survival_rt
## <fct> <fct> <int> <dbl>
## 1 Children No 52 46.0
## 2 Children Yes 61 54.0
## 3 Adult No 353 61.4
## 4 Adult Yes 222 38.6
## 5 Elderly No 19 73.1
## 6 Elderly Yes 7 26.9
Perform a chi-square test to determine if there is a significant association between gender (Sex) and survival.
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(titanic_3$Sex, titanic_3$Survived)
## X-squared = 260.72, df = 1, p-value < 2.2e-16
Null Hypothesis (H0): There is no association between the Sex and the Survival.
Alternative Hypothesis (H1): There is an association between the Sex and the Survival.
Since the p-value is much smaller than the common significance level(0.05), we reject the null hypothesis. This means that survival and sex are not independent & survival on the Titanic was significantly associated with the gender.
Analyze the correlation between fare, and survival. Use scatter plots to visualize the relationships and calculate correlation coefficients. Hints: Use cor.test() to test the significance of the relationship.
##
## Pearson's product-moment correlation
##
## data: titanic_3$Fare and titanic_3$Survived
## t = 7.9392, df = 889, p-value = 6.12e-15
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1949232 0.3176165
## sample estimates:
## cor
## 0.2573065
comment:
Null Hypothesis (H0): There is no linear correlation between fare and survival on the Titanic.
Alternative Hypothesis (H1): There is a linear correlation between fare and survival on the Titanic.
Since the p-value is significantly smaller than 0.05, we reject the null hypothesis. This means there is strong evidence that there is a linear relationship between fare and survival. The positive correlation coefficient (0.2573) suggests that passengers who paid higher fares had a greater likelihood of surviving.