TItanic_Test

Cleaning Data

rm(list=ls())
test <- read.csv("C:/Users/aarav/Downloads/test.csv")
df_clean <- test

for(i in colnames(df_clean))
  df_clean[,i][is.na(df_clean[,i])] <- mean(df_clean[,i], na.rm=TRUE)
Warning in mean.default(df_clean[, i], na.rm = TRUE): argument is not numeric
or logical: returning NA
Warning in mean.default(df_clean[, i], na.rm = TRUE): argument is not numeric
or logical: returning NA
Warning in mean.default(df_clean[, i], na.rm = TRUE): argument is not numeric
or logical: returning NA
Warning in mean.default(df_clean[, i], na.rm = TRUE): argument is not numeric
or logical: returning NA
Warning in mean.default(df_clean[, i], na.rm = TRUE): argument is not numeric
or logical: returning NA

Summary Statistics

library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
?stargazer
starting httpd help server ...
 done
stargazer(df_clean, type = "text", omit.summary.stat = "N", digits = 2, title = "Titanic Test Summary Stats")

Titanic Test Summary Stats
=========================================
Statistic     Mean   St. Dev. Min   Max  
-----------------------------------------
PassengerId 1,100.50  120.81  892  1,309 
Pclass        2.27     0.84    1     3   
Age          30.27    12.63   0.17 76.00 
SibSp         0.45     0.90    0     8   
Parch         0.39     0.98    0     9   
Fare         35.63    55.84   0.00 512.33
-----------------------------------------

Observations:

Parch Statistic:

Parch describes the number of Parents and Children and indiviual in this dataset had. The Mean of Parch is 0.39, which is less than one and displays how more often than not, people aboard the titanic did not have parents or children

Fare Statistic:

Fare describes the the price of the ticket each occupant of the titanic had purchased. Looking at just the Min and Max of this vector, the average would seem to be around 250, but instead, the average is 36, which shows a higher concentration of ticket fares towards the bottom of the price spectrum

Box Plot

boxplot(df_clean$Fare, horizontal = TRUE,
        main = "Box Plot of Ticket Fares for the Titanic", 
        xlab = "Age")

Histogram

hist(df_clean$Age,
     main = "Histogram of Ages on the Titanic", 
     xlab = "Age",           
     ylab = "Frequency")

Key takeaways:

The desired outcome for the formatting and contents of a graph is very easy to manipulate, and functions for almost any want or need is within R. One specific set of functions for titling and labeling axes on a table is main =, xlab =, and ylab =.