day 3 discussion

Titanic Data

library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
library(ggplot2)
library(rmarkdown)


df <- read.csv("~/Downloads/test.csv")

head(df)
  PassengerId Pclass                                         Name    Sex  Age
1         892      3                             Kelly, Mr. James   male 34.5
2         893      3             Wilkes, Mrs. James (Ellen Needs) female 47.0
3         894      2                    Myles, Mr. Thomas Francis   male 62.0
4         895      3                             Wirz, Mr. Albert   male 27.0
5         896      3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0
6         897      3                   Svensson, Mr. Johan Cervin   male 14.0
  SibSp Parch  Ticket    Fare Cabin Embarked
1     0     0  330911  7.8292              Q
2     1     0  363272  7.0000              S
3     0     0  240276  9.6875              Q
4     0     0  315154  8.6625              S
5     1     1 3101298 12.2875              S
6     0     0    7538  9.2250              S
df_clean <- df


mean_age <- mean(df_clean$Age, na.rm = TRUE)


df_clean$Age[is.na(df_clean$Age)] <- mean_age

mean_fare <- mean(df_clean$Fare, na.rm = TRUE)
df_clean$Fare[is.na(df_clean$Fare)] <- mean_fare


summary(df_clean)
  PassengerId         Pclass          Name               Sex           
 Min.   : 892.0   Min.   :1.000   Length:418         Length:418        
 1st Qu.: 996.2   1st Qu.:1.000   Class :character   Class :character  
 Median :1100.5   Median :3.000   Mode  :character   Mode  :character  
 Mean   :1100.5   Mean   :2.266                                        
 3rd Qu.:1204.8   3rd Qu.:3.000                                        
 Max.   :1309.0   Max.   :3.000                                        
      Age            SibSp            Parch           Ticket         
 Min.   : 0.17   Min.   :0.0000   Min.   :0.0000   Length:418        
 1st Qu.:23.00   1st Qu.:0.0000   1st Qu.:0.0000   Class :character  
 Median :30.27   Median :0.0000   Median :0.0000   Mode  :character  
 Mean   :30.27   Mean   :0.4474   Mean   :0.3923                     
 3rd Qu.:35.75   3rd Qu.:1.0000   3rd Qu.:0.0000                     
 Max.   :76.00   Max.   :8.0000   Max.   :9.0000                     
      Fare            Cabin             Embarked        
 Min.   :  0.000   Length:418         Length:418        
 1st Qu.:  7.896   Class :character   Class :character  
 Median : 14.454   Mode  :character   Mode  :character  
 Mean   : 35.627                                        
 3rd Qu.: 31.500                                        
 Max.   :512.329                                        
library(knitr)


table_data <- head(df_clean[c("PassengerId", "Pclass", "Sex", "Age", "Fare")], 5)


kable(table_data, caption = "First 5 Passengers of the Titanic Test Set")
First 5 Passengers of the Titanic Test Set
PassengerId Pclass Sex Age Fare
892 3 male 34.5 7.8292
893 3 female 47.0 7.0000
894 2 male 62.0 9.6875
895 3 male 27.0 8.6625
896 3 female 22.0 12.2875
library(stargazer)


stargazer(df_clean,              
          type = "text",         
          title = "Titanic Test Data Summary Statistics",
          header = FALSE,        
          keep = c("Pclass", "Age", "SibSp", "Parch", "Fare"))

Titanic Test Data Summary Statistics
===========================================
Statistic  N   Mean  St. Dev.  Min    Max  
-------------------------------------------
Pclass    418 2.266   0.842     1      3   
Age       418 30.273  12.635  0.170 76.000 
SibSp     418 0.447   0.897     0      8   
Parch     418 0.392   0.981     0      9   
Fare      418 35.627  55.841  0.000 512.329
-------------------------------------------

interesting things

The min and max age are very far apart.

somone didnt pay at all for the ride.

the vast majority of people we in class 2 and 3.