Introduction

This report presents a basic exploratory data analysis (EDA) of the Titanic dataset using the R programming language. The analysis aims to uncover key patterns and insights related to passenger demographics and survival outcomes aboard the RMS Titanic. Various visualizations such as bar plots, histograms, boxplots, and scatterplots are employed to summarize and explore the relationships between variables including age, sex, passenger class, and fare.

The dataset used in this analysis contains information about Titanic passengers, including whether they survived, their age, sex, ticket class, and fare paid. By performing EDA, we aim to identify potential trends and differences across different passenger groups.

This work is part of an academic exercise conducted by an undergraduate student in the Data Science program at Sepuluh Nopember Institute of Technology (ITS), aiming to apply and strengthen foundational skills in data wrangling, visualization, and statistical interpretation using R.

Week9 VDE 250425

library(dplyr)
library(ggplot2)

import dataset

titanic <- read.csv("C:/!KULIAH/COLLEGE/titanic2.csv.csv")

1. Perbandingan jenis kelamin

ggplot(titanic, aes(x=Sex)) +
  geom_bar(fill='pink') +
  labs(title = "Gender Distribution of Titanic Passengers")

2. Perbandingan data selamat atau tidak

ggplot(titanic, aes(x=factor(Pclass), fill = factor(Survived))) +
  geom_bar(position = "fill") +
  labs(title = "Survival Proportions by Passenger Class")

3. Distribusi umur penumpang

ggplot(titanic, aes(x=Age)) + 
  geom_histogram(bins=20, fill='purple') + 
  labs(title = "Age Distribution of Titanic Passengers")

4. Boxplot

ggplot(titanic, aes(x=factor(Survived), y=Age, fill=factor(Survived))) +
  geom_boxplot() +
  labs(title="Boxplot")

5. Scatterplot

ggplot(titanic, aes(x=Age, y=Fare)) +
  geom_point() +
  geom_smooth(method="lm", color='blue') +
  labs(title="Relationship Between Age and Ticket Fare")
## `geom_smooth()` using formula = 'y ~ x'