Read the detailed article also download this code: Secrets of R Contingency Tables Revealed: A PhD’s Experience.
Learn more about Data Analysis using Rstudio; solve examples with data and code. You can visit the Data Analysis website.
Want to stay ahead with recent industry trends and developments in data analysis? You can join our community groups.
Need a Customized solution for your data analysis projects? Are you interested in learning through Zoom? Hire me as your data analyst. I have five years of experience and a PhD. I can help you with data analysis projects and problems using R and other tools. To hire me, you can visit this link and fill out the order form. You can also contact me at info@data03.online for any questions or inquiries.
The Titanic dataset is a well-known dataset in the field of data analysis and statistics. It contains information about the passengers on the Titanic, including their class, age, sex, and whether they survived or not. In this article, we will explore the dataset using R, step by step.
To begin our analysis, we first need to load the Titanic dataset. In
R, you can do this using the data()
function as
follows:
This command loads the Titanic dataset into your R environment.
Before we dive into the analysis, it’s essential to understand the
structure of the dataset. We can use the str()
function to
display the structure of the dataset:
## 'data.frame': 32 obs. of 5 variables:
## $ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 1 2 3 4 1 2 3 4 1 2 ...
## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 2 2 2 2 1 1 ...
## $ Age : Factor w/ 2 levels "Child","Adult": 1 1 1 1 1 1 1 1 2 2 ...
## $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq : num 0 0 35 0 0 0 17 0 118 154 ...
This command will provide you with an overview of the dataset’s variables and their types.
One of the fundamental steps in analyzing categorical data is to
create contingency tables. These tables show the relationships between
different variables. In our case, we are interested in the relationship
between “Sex” and “Survived” on the Titanic. To do this, we can use the
table()
function as follows:
## Survived
## Sex No Yes
## Male 8 8
## Female 8 8
This table shows the number of passengers who survived or didn’t survive based on their gender.
To further analyze the data, we can add margins to the contingency
table to see the totals. The addmargins()
function helps us
achieve this:
## Survived
## Sex No Yes Sum
## Male 8 8 16
## Female 8 8 16
## Sum 16 16 32
This table includes row and column totals, providing a more comprehensive view of the data.
In statistics, it’s often valuable to work with proportions rather
than raw counts. To do this, we can use the prop.table()
function:
## Survived
## Sex No Yes
## Male 0.25 0.25
## Female 0.25 0.25
This table represents the proportions of survivors and non-survivors based on gender.
Visualizing the data is crucial for a better understanding. We can
create a mosaic plot using the plot()
function:
This mosaic plot provides a graphical representation of the relationship between gender and survival on the Titanic.
Our analysis doesn’t have to stop at gender and survival. We can
create a more complex contingency table involving “Class,” “Sex,” “Age,”
and “Survived.” This can be done using the ftable()
function:
## Survived No Yes
## Class Sex Age
## 1st Male Child 1 1
## Adult 1 1
## Female Child 1 1
## Adult 1 1
## 2nd Male Child 1 1
## Adult 1 1
## Female Child 1 1
## Adult 1 1
## 3rd Male Child 1 1
## Adult 1 1
## Female Child 1 1
## Adult 1 1
## Crew Male Child 1 1
## Adult 1 1
## Female Child 1 1
## Adult 1 1
This table allows us to explore how different factors relate to the passengers’ survival.
Statistical tests are essential for drawing meaningful conclusions. To test the independence of “Sex” and “Survived,” we can use the chi-square test:
##
## Pearson's Chi-squared test
##
## data: table(Titanic[, c("Sex", "Survived")])
## X-squared = 0, df = 1, p-value = 1
This test will tell us whether the variables “Sex” and “Survived” are related or independent.
Finally, we can calculate the correlation between “Sex” and
“Survived” using the cor()
function:
## No Yes
## No 1 NA
## Yes NA 1
This will give us a measure of the strength and direction of the relationship between these two variables.
In conclusion, the Titanic dataset provides an excellent opportunity for practicing data analysis techniques in R. By following these steps, you can explore the relationships between variables, perform statistical tests, and gain valuable insights into the factors that influenced survival on the Titanic. Happy analyzing!