This is an
R Markdown Notebook. When you
execute code within the notebook, the results appear beneath the
code.
Try executing this chunk by clicking the
Run button within the
chunk or by placing your cursor inside it and pressing
Ctrl+Shift+Enter.
Add a new chunk by clicking the
Insert Chunk button on the
toolbar or by pressing
Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output
will be saved alongside it (click the
Preview button or press
Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the
editor. Consequently, unlike
Knit,
Preview does not
run any R code chunks. Instead, the output of the chunk when it was last
run in the editor is displayed.
install packages
if (!require(tidyverse)) install.packages("tidyverse")
Loading required package: tidyverse
Warning: package ‘tidyverse’ was built under R version 4.4.1
Warning: package ‘ggplot2’ was built under R version 4.4.1
Warning: package ‘tibble’ was built under R version 4.4.1
Warning: package ‘tidyr’ was built under R version 4.4.1
Warning: package ‘readr’ was built under R version 4.4.1
Warning: package ‘purrr’ was built under R version 4.4.1
Warning: package ‘dplyr’ was built under R version 4.4.1
Warning: package ‘forcats’ was built under R version 4.4.1
Warning: package ‘lubridate’ was built under R version 4.4.1
── Attaching core tidyverse packages ─────────────────
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ──────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
if (!require(readr)) install.packages("readr")
if (!require(dplyr)) install.packages("dplyr")
Train Dataset
# Define the URL of the training data
train_url <- "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/train.csv"
# Read the data from the URL using readr::read_csv
train <- read_csv(train_url)
Rows: 891 Columns: 12
── Column specification ──────────────────────────────
Delimiter: ","
chr (5): Name, Sex, Ticket, Cabin, Embarked
dbl (7): PassengerId, Survived, Pclass, Age, SibSp...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(train)
# Get summary statistics
summary(train)
PassengerId Survived Pclass
Min. : 1.0 Min. :0.0000 Min. :1.000
1st Qu.:223.5 1st Qu.:0.0000 1st Qu.:2.000
Median :446.0 Median :0.0000 Median :3.000
Mean :446.0 Mean :0.3838 Mean :2.309
3rd Qu.:668.5 3rd Qu.:1.0000 3rd Qu.:3.000
Max. :891.0 Max. :1.0000 Max. :3.000
Name Sex
Length:891 Length:891
Class :character Class :character
Mode :character Mode :character
Age SibSp Parch
Min. : 0.42 Min. :0.000 Min. :0.0000
1st Qu.:20.12 1st Qu.:0.000 1st Qu.:0.0000
Median :28.00 Median :0.000 Median :0.0000
Mean :29.70 Mean :0.523 Mean :0.3816
3rd Qu.:38.00 3rd Qu.:1.000 3rd Qu.:0.0000
Max. :80.00 Max. :8.000 Max. :6.0000
NA's :177
Ticket Fare
Length:891 Min. : 0.00
Class :character 1st Qu.: 7.91
Mode :character Median : 14.45
Mean : 32.20
3rd Qu.: 31.00
Max. :512.33
Cabin Embarked
Length:891 Length:891
Class :character Class :character
Mode :character Mode :character
Testing Dataset
# Download test data (replace with your download method)
test_url <- "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/test.csv"
test <- read.csv(test_url)
Check is any missing value left?
is.na(test)
# Select features for prediction
X_test <- test[c("Pclass", "Sex", "Age", "Embarked")]
# Predict using your model (assuming model is already trained)
y_pred <- predict(model, newdata = X_test, type = "response") # type argument for probabilities
# Create submission dataframe
df <- data.frame(PassengerId = test$PassengerId, Survived = y_pred)
# Save predictions as CSV
write.csv(df, file = "predictions.csv", row.names = FALSE)
print("CSV file saved successfully!")
LS0tDQp0aXRsZTogIlRpdGFuaWMgTG9naXN0aWMgUmVncmVzc2lvbiBQcmVkaWN0aW9uIHdpdGggUiBOb3RlYm9vayBTaG93Y2FzZSINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCnwgVGhpcyBpcyBhbiBbUiBNYXJrZG93bl0oaHR0cDovL3JtYXJrZG93bi5yc3R1ZGlvLmNvbSkgTm90ZWJvb2suIFdoZW4geW91IGV4ZWN1dGUgY29kZSB3aXRoaW4gdGhlIG5vdGVib29rLCB0aGUgcmVzdWx0cyBhcHBlYXIgYmVuZWF0aCB0aGUgY29kZS4NCnwgVHJ5IGV4ZWN1dGluZyB0aGlzIGNodW5rIGJ5IGNsaWNraW5nIHRoZSAqUnVuKiBidXR0b24gd2l0aGluIHRoZSBjaHVuayBvciBieSBwbGFjaW5nIHlvdXIgY3Vyc29yIGluc2lkZSBpdCBhbmQgcHJlc3NpbmcgKkN0cmwrU2hpZnQrRW50ZXIqLg0KfCBBZGQgYSBuZXcgY2h1bmsgYnkgY2xpY2tpbmcgdGhlICpJbnNlcnQgQ2h1bmsqIGJ1dHRvbiBvbiB0aGUgdG9vbGJhciBvciBieSBwcmVzc2luZyAqQ3RybCtBbHQrSSouDQp8IFdoZW4geW91IHNhdmUgdGhlIG5vdGVib29rLCBhbiBIVE1MIGZpbGUgY29udGFpbmluZyB0aGUgY29kZSBhbmQgb3V0cHV0IHdpbGwgYmUgc2F2ZWQgYWxvbmdzaWRlIGl0IChjbGljayB0aGUgKlByZXZpZXcqIGJ1dHRvbiBvciBwcmVzcyAqQ3RybCtTaGlmdCtLKiB0byBwcmV2aWV3IHRoZSBIVE1MIGZpbGUpLg0KfCBUaGUgcHJldmlldyBzaG93cyB5b3UgYSByZW5kZXJlZCBIVE1MIGNvcHkgb2YgdGhlIGNvbnRlbnRzIG9mIHRoZSBlZGl0b3IuIENvbnNlcXVlbnRseSwgdW5saWtlICpLbml0KiwgKlByZXZpZXcqIGRvZXMgbm90IHJ1biBhbnkgUiBjb2RlIGNodW5rcy4gSW5zdGVhZCwgdGhlIG91dHB1dCBvZiB0aGUgY2h1bmsgd2hlbiBpdCB3YXMgbGFzdCBydW4gaW4gdGhlIGVkaXRvciBpcyBkaXNwbGF5ZWQuDQoNCiMjIyBpbnN0YWxsIHBhY2thZ2VzDQoNCmBgYHtyfQ0KaWYgKCFyZXF1aXJlKHRpZHl2ZXJzZSkpIGluc3RhbGwucGFja2FnZXMoInRpZHl2ZXJzZSIpDQppZiAoIXJlcXVpcmUocmVhZHIpKSBpbnN0YWxsLnBhY2thZ2VzKCJyZWFkciIpDQppZiAoIXJlcXVpcmUoZHBseXIpKSBpbnN0YWxsLnBhY2thZ2VzKCJkcGx5ciIpDQpgYGANCg0KIyBUcmFpbiBEYXRhc2V0DQoNCmBgYHtyfQ0KIyBEZWZpbmUgdGhlIFVSTCBvZiB0aGUgdHJhaW5pbmcgZGF0YQ0KdHJhaW5fdXJsIDwtICJodHRwOi8vczMuYW1hem9uYXdzLmNvbS9hc3NldHMuZGF0YWNhbXAuY29tL2NvdXJzZS9LYWdnbGUvdHJhaW4uY3N2Ig0KDQojIFJlYWQgdGhlIGRhdGEgZnJvbSB0aGUgVVJMIHVzaW5nIHJlYWRyOjpyZWFkX2Nzdg0KdHJhaW4gPC0gcmVhZF9jc3YodHJhaW5fdXJsKQ0KYGBgDQoNCmBgYHtyfQ0KaGVhZCh0cmFpbikNCmBgYA0KDQpgYGB7cn0NCiMgR2V0IHN1bW1hcnkgc3RhdGlzdGljcw0Kc3VtbWFyeSh0cmFpbikNCmBgYA0KDQojIyMjIFJlcGxhY2UgbWlzc2luZyB2YWx1ZXMgOiBBZ2UgcmVwbGFjZWQgd2l0aCBtZWRpYW4sIEVtYmFya2VkLFNleCBhcmUgY2F0ZWdvcmljYWwgd2UgbmVlZCB0byBjb252ZXJ0IGludG8gbnVtZXJpYyBhbmQgcmVwbGFjZWQgd2l0aCBtb2RlLg0KDQpgYGB7cn0NCmlmKCEocmVxdWlyZShtaWNlKSkpaW5zdGFsbC5wYWNrYWdlcygibWljZSIpDQpsaWJyYXJ5KGltcHV0ZVIpICAjIEZvciBpbXB1dGF0aW9uDQojIEFzc3VtaW5nICd0cmFpbicgaXMgeW91ciBkYXRhIGZyYW1lIGluIFINCg0KIyBEZWZpbmUgcmVwbGFjZW1lbnRzIGFzIGEgbmFtZWQgdmVjdG9yDQpyZXBsYWNlbWVudHNfZW1iYXJrZWQgPC0gYygiUyIgPSAwLCAiQyIgPSAxLCAiUSIgPSAyKQ0KcmVwbGFjZW1lbnRzX3NleCA8LSBjKG1hbGUgPSAwLCBmZW1hbGUgPSAxKQ0KDQojIEhhbmRsZSBtaXNzaW5nIHZhbHVlcyAocmVwbGFjZSB3aXRoIG1vZGUpDQp0cmFpbiRBZ2UgPC0gcmVwbGFjZSh0cmFpbiRBZ2UsIGlzLm5hKHRyYWluJEFnZSksIG1lZGlhbih0cmFpbiRBZ2UsIG5hLnJtID0gVFJVRSkpDQoNCnRyYWluJEVtYmFya2VkW2lzLm5hKHRyYWluJEVtYmFya2VkKV0gPC0gbW9kZSh0cmFpbiRFbWJhcmtlZCkNCg0KdHJhaW4kU2V4W2lzLm5hKHRyYWluJFNleCldIDwtIG1vZGUodHJhaW4kU2V4KQ0KIyBDb252ZXJ0IEVtYmFya2VkIGFuZCBTZXggdG8gbnVtZXJpYw0KdHJhaW4kRW1iYXJrZWQgPC0gYXMubnVtZXJpYyhmYWN0b3IodHJhaW4kRW1iYXJrZWQpKQ0KDQp0cmFpbiRTZXggPC0gYXMubnVtZXJpYyhmYWN0b3IodHJhaW4kU2V4KSkNCg0KDQpgYGANCg0KYGBge3J9DQppcy5uYSh0cmFpbikNCmBgYA0KDQpgYGB7cn0NCiMgSW5zdGFsbCByZXF1aXJlZCBwYWNrYWdlcyBpZiBub3QgYWxyZWFkeSBpbnN0YWxsZWQNCmlmICghcmVxdWlyZShnbG1uZXQpKSBpbnN0YWxsLnBhY2thZ2VzKCJnbG1uZXQiKQ0KIyBMb2FkIGxpYnJhcmllcw0KbGlicmFyeShnbG1uZXQpICAjIENvbnNpZGVyIHVzaW5nIGdsbW5ldCBmb3IgY29tcGF0aWJpbGl0eSB3aXRoIHN0YXRzbW9kZWxzDQoNCiMgUHJlcGFyZSBkYXRhIChhc3N1bWluZyB0cmFpbiBpcyBhIGRhdGEuZnJhbWUpDQpkYXRhIDwtIHRyYWluW2MoIlBjbGFzcyIsICJTZXgiLCAiQWdlIiwgIkVtYmFya2VkIiwiU3Vydml2ZWQiKV0NCg0KIyBDcmVhdGUgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbA0KbW9kZWwgPC0gZ2xtKGZvcm11bGEgPSBTdXJ2aXZlZCB+IFBjbGFzcyArIFNleCArIEFnZSArIEVtYmFya2VkLCBkYXRhID0gZGF0YSwgZmFtaWx5ID0gYmlub21pYWwobGluayA9ICJsb2dpdCIpKQ0KDQojIFNldCByYW5kb20gc3RhdGUgKGVxdWl2YWxlbnQgdG8gc2V0dGluZyBzZWVkIGluIFB5dGhvbikNCnNldC5zZWVkKDE2KQ0KDQpgYGANCg0KIyBUZXN0aW5nIERhdGFzZXQNCg0KYGBge3J9DQojIERvd25sb2FkIHRlc3QgZGF0YSAocmVwbGFjZSB3aXRoIHlvdXIgZG93bmxvYWQgbWV0aG9kKQ0KdGVzdF91cmwgPC0gImh0dHA6Ly9zMy5hbWF6b25hd3MuY29tL2Fzc2V0cy5kYXRhY2FtcC5jb20vY291cnNlL0thZ2dsZS90ZXN0LmNzdiINCnRlc3QgPC0gcmVhZC5jc3YodGVzdF91cmwpDQpgYGANCg0KIyMjIyBSZXBsYWNlIG1pc3NpbmcgdmFsdWVzOiBBZ2UgcmVwbGFjZSB3aXRoIG1lZGlhbiwgRW1iYXJrZWQsIFNleCBhcmUgY2F0ZWdvcmljYWwgd2UgbmVlZCB0byBjb252ZXJ0IGludG8gbnVtZXJpYyBhbmQgcmVwbGFjZSB3aXRoIG1vZGUuDQoNCmBgYHtyfQ0KIyBJbXB1dGUgbWlzc2luZyB2YWx1ZXMgaW4gQWdlDQp0ZXN0JEFnZSA8LSB3aXRoKHRlc3QsIHJlcGxhY2UodGVzdCRBZ2UsIGlzLm5hKHRlc3QkQWdlKSwgbWVkaWFuKHRlc3QkQWdlKSkpDQoNCiMgQ29udmVydCBTZXggdG8gZmFjdG9yIGFuZCBhc3NpZ24gdmFsdWVzDQp0ZXN0JFNleFtpcy5uYSh0ZXN0JFNleCldIDwtIG1vZGUodGVzdCRTZXgpDQp0ZXN0JEVtYmFya2VkW2lzLm5hKHRlc3QkRW1iYXJrZWQpXSA8LSBtb2RlKHRlc3QkRW1iYXJrZWQpDQoNCnRlc3QkRW1iYXJrZWQgPC0gYXMubnVtZXJpYyhmYWN0b3IodGVzdCRFbWJhcmtlZCkpDQp0ZXN0JFNleCA8LSBhcy5udW1lcmljKGZhY3Rvcih0ZXN0JFNleCkpDQoNCmBgYA0KDQojIyMjIENoZWNrIGlzIGFueSBtaXNzaW5nIHZhbHVlIGxlZnQ/DQoNCmBgYHtyfQ0KaXMubmEodGVzdCkNCmBgYA0KDQpgYGB7cn0NCiMgU2VsZWN0IGZlYXR1cmVzIGZvciBwcmVkaWN0aW9uDQpYX3Rlc3QgPC0gdGVzdFtjKCJQY2xhc3MiLCAiU2V4IiwgIkFnZSIsICJFbWJhcmtlZCIpXQ0KDQojIFByZWRpY3QgdXNpbmcgeW91ciBtb2RlbCAoYXNzdW1pbmcgbW9kZWwgaXMgYWxyZWFkeSB0cmFpbmVkKQ0KeV9wcmVkIDwtIHByZWRpY3QobW9kZWwsIG5ld2RhdGEgPSBYX3Rlc3QsIHR5cGUgPSAicmVzcG9uc2UiKSAgIyB0eXBlIGFyZ3VtZW50IGZvciBwcm9iYWJpbGl0aWVzDQoNCiMgQ3JlYXRlIHN1Ym1pc3Npb24gZGF0YWZyYW1lDQpkZiA8LSBkYXRhLmZyYW1lKFBhc3NlbmdlcklkID0gdGVzdCRQYXNzZW5nZXJJZCwgU3Vydml2ZWQgPSB5X3ByZWQpDQoNCiMgU2F2ZSBwcmVkaWN0aW9ucyBhcyBDU1YNCndyaXRlLmNzdihkZiwgZmlsZSA9ICJwcmVkaWN0aW9ucy5jc3YiLCByb3cubmFtZXMgPSBGQUxTRSkNCg0KcHJpbnQoIkNTViBmaWxlIHNhdmVkIHN1Y2Nlc3NmdWxseSEiKQ0KDQpgYGANCg==