Spaceship Titanic

Welcome to the year 2912, where your data science skills are needed to solve a cosmic mystery. We’ve received a transmission from four lightyears away and things aren’t looking good.

The Spaceship Titanic was an interstellar passenger liner launched a month ago. With almost 13,000 passengers on board, the vessel set out on its maiden voyage transporting emigrants from our solar system to three newly habitable exoplanets orbiting nearby stars.

While rounding Alpha Centauri en route to its first destination—the torrid 55 Cancri E—the unwary Spaceship Titanic collided with a spacetime anomaly hidden within a dust cloud. Sadly, it met a similar fate as its namesake from 1000 years before. Though the ship stayed intact, almost half of the passengers were transported to an alternate dimension!

Download Data

library(readr)
test <- read_csv("test.csv")
train <- read_csv("train.csv")
library(DataExplorer)
create_report(train)
## 
  |                                           
  |                                     |   0%
  |                                           
  |.                                    |   2%                                 
  |                                           
  |..                                   |   5% [global_options]                
  |                                           
  |...                                  |   7%                                 
  |                                           
  |....                                 |  10% [introduce]                     
  |                                           
  |....                                 |  12%                                 
  |                                           
  |.....                                |  14% [plot_intro]                    
  |                                           
  |......                               |  17%                                 
  |                                           
  |.......                              |  19% [data_structure]                
  |                                           
  |........                             |  21%                                 
  |                                           
  |.........                            |  24% [missing_profile]               
  |                                           
  |..........                           |  26%                                 
  |                                           
  |...........                          |  29% [univariate_distribution_header]
  |                                           
  |...........                          |  31%                                 
  |                                           
  |............                         |  33% [plot_histogram]                
  |                                           
  |.............                        |  36%                                 
  |                                           
  |..............                       |  38% [plot_density]                  
  |                                           
  |...............                      |  40%                                 
  |                                           
  |................                     |  43% [plot_frequency_bar]            
  |                                           
  |.................                    |  45%                                 
  |                                           
  |..................                   |  48% [plot_response_bar]             
  |                                           
  |..................                   |  50%                                 
  |                                           
  |...................                  |  52% [plot_with_bar]                 
  |                                           
  |....................                 |  55%                                 
  |                                           
  |.....................                |  57% [plot_normal_qq]                
  |                                           
  |......................               |  60%                                 
  |                                           
  |.......................              |  62% [plot_response_qq]              
  |                                           
  |........................             |  64%                                 
  |                                           
  |.........................            |  67% [plot_by_qq]                    
  |                                           
  |..........................           |  69%                                 
  |                                           
  |..........................           |  71% [correlation_analysis]          
  |                                           
  |...........................          |  74%                                 
  |                                           
  |............................         |  76% [principal_component_analysis]  
  |                                           
  |.............................        |  79%                                 
  |                                           
  |..............................       |  81% [bivariate_distribution_header] 
  |                                           
  |...............................      |  83%                                 
  |                                           
  |................................     |  86% [plot_response_boxplot]         
  |                                           
  |.................................    |  88%                                 
  |                                           
  |.................................    |  90% [plot_by_boxplot]               
  |                                           
  |..................................   |  93%                                 
  |                                           
  |...................................  |  95% [plot_response_scatterplot]     
  |                                           
  |.................................... |  98%                                 
  |                                           
  |.....................................| 100% [plot_by_scatterplot]           
                                                                                                                           
## "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS "C:/Users/hutku/Desktop/FinalPorjesi/report.knit.md" --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc513069a82ffd.html --lua-filter "C:\Users\hutku\AppData\Local\R\win-library\4.3\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\hutku\AppData\Local\R\win-library\4.3\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 6 --template "C:\Users\hutku\AppData\Local\R\win-library\4.3\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable theme=yeti --mathjax --variable "mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --include-in-header "C:\Users\hutku\AppData\Local\Temp\Rtmp27i2e6\rmarkdown-str51304df24e7a.html"
library(tidyverse)
library(explore)
train %>% describe_all()
## # A tibble: 14 × 8
##    variable     type     na na_pct unique   min   mean   max
##    <chr>        <chr> <int>  <dbl>  <int> <dbl>  <dbl> <dbl>
##  1 PassengerId  chr       0    0     8693    NA  NA       NA
##  2 HomePlanet   chr     201    2.3      4    NA  NA       NA
##  3 CryoSleep    lgl     217    2.5      3     0   0.36     1
##  4 Cabin        chr     199    2.3   6561    NA  NA       NA
##  5 Destination  chr     182    2.1      4    NA  NA       NA
##  6 Age          dbl     179    2.1     81     0  28.8     79
##  7 VIP          lgl     203    2.3      3     0   0.02     1
##  8 RoomService  dbl     181    2.1   1274     0 225.   14327
##  9 FoodCourt    dbl     183    2.1   1508     0 458.   29813
## 10 ShoppingMall dbl     208    2.4   1116     0 174.   23492
## 11 Spa          dbl     183    2.1   1328     0 311.   22408
## 12 VRDeck       dbl     188    2.2   1307     0 305.   24133
## 13 Name         chr     200    2.3   8474    NA  NA       NA
## 14 Transported  lgl       0    0        2     0   0.5      1