This report presents a basic exploratory data analysis (EDA) of the Titanic dataset using the R programming language. The analysis aims to uncover key patterns and insights related to passenger demographics and survival outcomes aboard the RMS Titanic. Various visualizations such as bar plots, histograms, boxplots, and scatterplots are employed to summarize and explore the relationships between variables including age, sex, passenger class, and fare.
The dataset used in this analysis contains information about Titanic passengers, including whether they survived, their age, sex, ticket class, and fare paid. By performing EDA, we aim to identify potential trends and differences across different passenger groups.
This work is part of an academic exercise conducted by an undergraduate student in the Data Science program at Sepuluh Nopember Institute of Technology (ITS), aiming to apply and strengthen foundational skills in data wrangling, visualization, and statistical interpretation using R.
library(dplyr)
library(ggplot2)
titanic <- read.csv("C:/!KULIAH/COLLEGE/titanic2.csv.csv")
ggplot(titanic, aes(x=Sex)) +
geom_bar(fill='pink') +
labs(title = "Gender Distribution of Titanic Passengers")
ggplot(titanic, aes(x=factor(Pclass), fill = factor(Survived))) +
geom_bar(position = "fill") +
labs(title = "Survival Proportions by Passenger Class")
ggplot(titanic, aes(x=Age)) +
geom_histogram(bins=20, fill='purple') +
labs(title = "Age Distribution of Titanic Passengers")
ggplot(titanic, aes(x=factor(Survived), y=Age, fill=factor(Survived))) +
geom_boxplot() +
labs(title="Boxplot")
ggplot(titanic, aes(x=Age, y=Fare)) +
geom_point() +
geom_smooth(method="lm", color='blue') +
labs(title="Relationship Between Age and Ticket Fare")
## `geom_smooth()` using formula = 'y ~ x'