Download

Need a Customized solution for your data analysis projects? Are you interested in learning through Zoom? Hire me as your data analyst. I have five years of experience and a PhD. I can help you with data analysis projects and problems using R and other tools. To hire me, you can visit this link and fill out the order form. You can also contact me at info@data03.online for any questions or inquiries. 

The Titanic dataset is a well-known dataset in the field of data analysis and statistics. It contains information about the passengers on the Titanic, including their class, age, sex, and whether they survived or not. In this article, we will explore the dataset using R, step by step.

Step 1: Loading the Dataset

To begin our analysis, we first need to load the Titanic dataset. In R, you can do this using the data() function as follows:

This command loads the Titanic dataset into your R environment.

Step 2: Understanding the Dataset

Before we dive into the analysis, it’s essential to understand the structure of the dataset. We can use the str() function to display the structure of the dataset:

## 'data.frame':    32 obs. of  5 variables:
##  $ Class   : Factor w/ 4 levels "1st","2nd","3rd",..: 1 2 3 4 1 2 3 4 1 2 ...
##  $ Sex     : Factor w/ 2 levels "Male","Female": 1 1 1 1 2 2 2 2 1 1 ...
##  $ Age     : Factor w/ 2 levels "Child","Adult": 1 1 1 1 1 1 1 1 2 2 ...
##  $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Freq    : num  0 0 35 0 0 0 17 0 118 154 ...

This command will provide you with an overview of the dataset’s variables and their types.

Step 3: Creating Contingency Tables

One of the fundamental steps in analyzing categorical data is to create contingency tables. These tables show the relationships between different variables. In our case, we are interested in the relationship between “Sex” and “Survived” on the Titanic. To do this, we can use the table() function as follows:

##         Survived
## Sex      No Yes
##   Male    8   8
##   Female  8   8

This table shows the number of passengers who survived or didn’t survive based on their gender.

Step 4: Adding Margins to the Contingency Table

To further analyze the data, we can add margins to the contingency table to see the totals. The addmargins() function helps us achieve this:

##         Survived
## Sex      No Yes Sum
##   Male    8   8  16
##   Female  8   8  16
##   Sum    16  16  32

This table includes row and column totals, providing a more comprehensive view of the data.

Step 5: Calculating Proportions

In statistics, it’s often valuable to work with proportions rather than raw counts. To do this, we can use the prop.table() function:

##         Survived
## Sex        No  Yes
##   Male   0.25 0.25
##   Female 0.25 0.25

This table represents the proportions of survivors and non-survivors based on gender.

Step 6: Creating a Mosaic Plot

Visualizing the data is crucial for a better understanding. We can create a mosaic plot using the plot() function:

This mosaic plot provides a graphical representation of the relationship between gender and survival on the Titanic.

Step 7: Analyzing Class, Age, and Survival

Our analysis doesn’t have to stop at gender and survival. We can create a more complex contingency table involving “Class,” “Sex,” “Age,” and “Survived.” This can be done using the ftable() function:

##                    Survived No Yes
## Class Sex    Age                  
## 1st   Male   Child           1   1
##              Adult           1   1
##       Female Child           1   1
##              Adult           1   1
## 2nd   Male   Child           1   1
##              Adult           1   1
##       Female Child           1   1
##              Adult           1   1
## 3rd   Male   Child           1   1
##              Adult           1   1
##       Female Child           1   1
##              Adult           1   1
## Crew  Male   Child           1   1
##              Adult           1   1
##       Female Child           1   1
##              Adult           1   1

This table allows us to explore how different factors relate to the passengers’ survival.

Step 8: Performing a Chi-Square Test

Statistical tests are essential for drawing meaningful conclusions. To test the independence of “Sex” and “Survived,” we can use the chi-square test:

## 
##  Pearson's Chi-squared test
## 
## data:  table(Titanic[, c("Sex", "Survived")])
## X-squared = 0, df = 1, p-value = 1

This test will tell us whether the variables “Sex” and “Survived” are related or independent.

Step 9: Calculating Correlation

Finally, we can calculate the correlation between “Sex” and “Survived” using the cor() function:

##     No Yes
## No   1  NA
## Yes NA   1

This will give us a measure of the strength and direction of the relationship between these two variables.

In conclusion, the Titanic dataset provides an excellent opportunity for practicing data analysis techniques in R. By following these steps, you can explore the relationships between variables, perform statistical tests, and gain valuable insights into the factors that influenced survival on the Titanic. Happy analyzing!