Load Packages

Load Dataset

Data description:

Question 1

Calculate the survival rate for male and female passengers. Visualize the survival rates using a bar plot.

## # A tibble: 4 × 4
## # Groups:   Sex [2]
##   Sex    Survived     n percentage
##   <chr>  <fct>    <int>      <dbl>
## 1 female No          81       25.8
## 2 female Yes        233       74.2
## 3 male   No         468       81.1
## 4 male   Yes        109       18.9

Question 2

Investigate the relationship between ticket class (Pclass) and survival. What is the survival rate for each passenger class? Visualize it using a stacked bar plot.

## 
##  Pearson's Chi-squared test
## 
## data:  table(titanic_3$Pclass, titanic_3$Survived)
## X-squared = 102.89, df = 2, p-value < 2.2e-16

comment:

Null Hypothesis (H0): There is no association between the survival rate and the ticket class(Pclass)

Alternative Hypothesis (H1): There is an association between the survival rate and the ticket class(Pclass).

Since the p-value is much smaller than the common significance level(0.05), we reject the null hypothesis. This means that survival and ticket class are not independent & survival on the Titanic was significantly associated with the class of ticket(Pclass).

## 
##  Pearson's product-moment correlation
## 
## data:  titanic_3$Survived and titanic_3$Pclass
## t = -10.725, df = 889, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3953692 -0.2790061
## sample estimates:
##       cor 
## -0.338481

Comment: the Pearson correlation coefficient between Survived and Pclass is a value of -0.338 which indicates a negative correlation, meaning that as passenger class(Pclass) increases the probability of survival decreases. The strength of this correlation is moderate.

## # A tibble: 6 × 4
## # Groups:   Pclass [3]
##   Survived Pclass     n Survival_rate_pclass
##   <fct>    <fct>  <int>                <dbl>
## 1 No       1st       80                 37.0
## 2 No       2nd       97                 52.7
## 3 No       3rd      372                 75.8
## 4 Yes      1st      136                 63.0
## 5 Yes      2nd       87                 47.3
## 6 Yes      3rd      119                 24.2

Question 3

Analyze the age distribution of passengers on the Titanic. Compare the distribution for those who survived and those who did not using a boxplot.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.42   20.12   28.00   29.70   38.00   80.00

Question 4

Analyze how family size (number of siblings, spouses, parents, or children aboard) affects the likelihood of survival. Visualize the relationship between family size and survival rate using an appropriate plot.

## 
##  Pearson's Chi-squared test
## 
## data:  table(titanic_4$Family_size, titanic_4$Survived)
## X-squared = 1.7778, df = 8, p-value = 0.9871

comment:

Null Hypothesis(H0): There is no association between the family size and the likelihood of survival.

Alternative Hypothesis (H1): There is an association between the family size and the likelihood of survival.

Since the p-value(0.9871) is much larger than significance level(0.05), we fail to reject the null hypothesis. This means that family size does not affect the likelihood of survival.

Question 5

Visualize the age distribution of passengers based on their class (Pclass) and gender (Sex) using a facet grid plot.

Question 6

Group passengers into age categories (children, adults, elderly) and calculate the survival rate for each age group. Visualize the survival rate using a bar plot. Hints: You may need to create a new column.

## # A tibble: 6 × 4
## # Groups:   Age_category [3]
##   Age_category Survived     n Survival_rt
##   <fct>        <fct>    <int>       <dbl>
## 1 Children     No          52        46.0
## 2 Children     Yes         61        54.0
## 3 Adult        No         353        61.4
## 4 Adult        Yes        222        38.6
## 5 Elderly      No          19        73.1
## 6 Elderly      Yes          7        26.9

Question 7

Perform a chi-square test to determine if there is a significant association between gender (Sex) and survival.

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(titanic_3$Sex, titanic_3$Survived)
## X-squared = 260.72, df = 1, p-value < 2.2e-16

Null Hypothesis (H0): There is no association between the Sex and the Survival.

Alternative Hypothesis (H1): There is an association between the Sex and the Survival.

Since the p-value is much smaller than the common significance level(0.05), we reject the null hypothesis. This means that survival and sex are not independent & survival on the Titanic was significantly associated with the gender.

Question 8

Analyze the correlation between fare, and survival. Use scatter plots to visualize the relationships and calculate correlation coefficients. Hints: Use cor.test() to test the significance of the relationship.

## 
##  Pearson's product-moment correlation
## 
## data:  titanic_3$Fare and titanic_3$Survived
## t = 7.9392, df = 889, p-value = 6.12e-15
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1949232 0.3176165
## sample estimates:
##       cor 
## 0.2573065

comment:

Null Hypothesis (H0): There is no linear correlation between fare and survival on the Titanic.

Alternative Hypothesis (H1): There is a linear correlation between fare and survival on the Titanic.

Since the p-value is significantly smaller than 0.05, we reject the null hypothesis. This means there is strong evidence that there is a linear relationship between fare and survival. The positive correlation coefficient (0.2573) suggests that passengers who paid higher fares had a greater likelihood of surviving.