rm = (list = ls())
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 537935 28.8 1193727 63.8 686460 36.7
## Vcells 979941 7.5 8388608 64.0 1876069 14.4
knitr::opts_chunk$set(error = TRUE)
library(readr)
## Warning: package 'readr' was built under R version 4.4.2
d_csv<- read_csv("C:/DATA 712/titanic_data.csv", col_names = TRUE)
## Rows: 891 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Name, Sex, Ticket, Cabin, Embarked
## dbl (7): PassengerId, Survived, Pclass, Age, SibSp, Parch, Fare
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(d_csv)
## # A tibble: 6 × 12
## PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 1 0 3 Braund… male 22 1 0 A/5 2… 7.25 <NA>
## 2 2 1 1 Cuming… fema… 38 1 0 PC 17… 71.3 C85
## 3 3 1 3 Heikki… fema… 26 0 0 STON/… 7.92 <NA>
## 4 4 1 1 Futrel… fema… 35 1 0 113803 53.1 C123
## 5 5 0 3 Allen,… male 35 0 0 373450 8.05 <NA>
## 6 6 0 3 Moran,… male NA 0 0 330877 8.46 <NA>
## # ℹ 1 more variable: Embarked <chr>
install.packages(“tidyverse”) library(tidyverse)
if (!requireNamespace("tidyverse", quietly = TRUE)) {
install.packages("tidyverse", repos = "https://cloud.r-project.org/")
}
chooseCRANmirror()
## Error in .chooseMirror(m, "CRAN", graphics, ind): cannot choose a CRAN mirror non-interactively
d_csv %>%
group_by(Sex, Pclass) %>%
summarise(avg_fare = mean(Fare, na.rm = TRUE)) %>%
arrange(desc(avg_fare))
## Error in d_csv %>% group_by(Sex, Pclass) %>% summarise(avg_fare = mean(Fare, : could not find function "%>%"
install.packages("readr")
## Warning: package 'readr' is in use and will not be installed
install.packages("dplyr")
## Installing package into 'C:/Users/viole/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
d_csv %>%
group_by(Sex, Pclass) %>%
summarise(survival_rate = mean(Survived, na.rm = TRUE)) %>%
arrange(desc(survival_rate))
## `summarise()` has grouped output by 'Sex'. You can override using the `.groups`
## argument.
## # A tibble: 6 × 3
## # Groups: Sex [2]
## Sex Pclass survival_rate
## <chr> <dbl> <dbl>
## 1 female 1 0.968
## 2 female 2 0.921
## 3 female 3 0.5
## 4 male 1 0.369
## 5 male 2 0.157
## 6 male 3 0.135
Based on my analysis of the Titanic dataset, I noticed that there are significant differences in ticket prices and survival rates based on sex and passenger class. The average fare paid by passengers varied significantly, with those in higher classes (Pclass 1) paying more than those in lower classes. Additionally, women generally paid higher fares than men within the same class. This could suggest that wealthier passengers, who had access to first-class accommodations, may have had more financial resources, which could have influenced their experience aboard the Titanic.
Survival rates also showed notable patterns. Women had a significantly higher survival rate than men across all classes. Additionally, first-class passengers had a much higher chance of survival compared to those in second and third class. This could also suggest that social and economic status may have played a critical role in survival, possibly due to better cabin locations, earlier access to lifeboats, or preferential treatment during evacuation. These findings could be used to highlight the inequalities present and emphasize how wealth and gender could have an influence on survival outcomes.