Day 3 Homework

Author

SB

  1. My first observation is that the st. deviation for pclass and parch are the same meaning that there was not very much variation.
  2. My second observation is that the fare min was 0$ meaning some one got on the boat for free.
train <- read.csv("~/Downloads/train.csv")
x <- na.omit(train)
df_clean <- x

library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
?stargazer

stargazer(x,                                     
          type   = "text",                            
          title  = "Summary Statistics Table",        
          digits = 1)

Summary Statistics Table
========================================
Statistic    N  Mean  St. Dev. Min  Max 
----------------------------------------
PassengerId 714 448.6  259.1    1   891 
Survived    714  0.4    0.5     0    1  
Pclass      714  2.2    0.8     1    3  
Age         714 29.7    14.5   0.4 80.0 
SibSp       714  0.5    0.9     0    5  
Parch       714  0.4    0.9     0    6  
Fare        714 34.7    52.9   0.0 512.3
----------------------------------------
# Load the package
library(ggplot2)

ggplot(x, aes(x = factor(Survived), y = Age)) +
  geom_boxplot() +
  labs(Title = "Class Distribution by Survival",
       x = "Survived",
       y = "Class")

ggplot(data = df_clean, aes(x = Age)) +                                     # creating a histogram
  geom_histogram(binwidth = 2, fill = "lightpink", color = "black") +
  facet_wrap(~ Survived) +
  labs(title = "Histogram of Age by Survival Status",                       # title
       x = "Age",                                                           # axis labels
       y = "Frequency") +
   theme_minimal()    

My final takeaway from this chart is that age was not a big factor in if a passenger survived or didn’t since the graphs follow a similar curve.