Titanic projesi
Veri seti açıklayın
veri dosayasi.
Veri seti, belirli bir amaca yönelik olarak toplanmış ve organize edilmiş verilerin bir koleksiyonudur. Genellikle bilgisayar algoritmalarını eğitmek, test etmek veya analiz etmek için kullanılırlar. Veri setleri, çeşitli kaynaklardan gelir ve genellikle tablo, metin dosyası veya veritabanı gibi yapılandırılmış formatlarda depolanır.
Veri setleri, makine öğrenimi ve yapay zeka gibi alanlarda çok önemlidir çünkü bu teknolojilerin etkili bir şekilde çalışabilmesi için büyük miktarda veriye ihtiyaçları vardır. Veri setleri, özellikle belirli bir görev veya probleme odaklanan veri bilimcileri ve araştırmacılar için değerli bir araçtır.
## Rows: 4277 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): PassengerId, HomePlanet, Cabin, Destination, Name
## dbl (6): Age, RoomService, FoodCourt, ShoppingMall, Spa, VRDeck
## lgl (2): CryoSleep, VIP
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 8693 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): PassengerId, HomePlanet, Cabin, Destination, Name
## dbl (6): Age, RoomService, FoodCourt, ShoppingMall, Spa, VRDeck
## lgl (3): CryoSleep, VIP, Transported
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
PassengerId - Her yolcu için benzersiz bir Kimlik. Her Kimlik gggg_pp biçimini alır; burada gggg yolcunun birlikte seyahat ettiği grubu belirtir ve pp grup içindeki numaradır. Bir gruptaki insanlar çoğunlukla aile üyeleridir, ancak her zaman değil.
HomePlanet - Yolcunun ayrıldığı gezegen, genellikle daimi ikamet ettikleri gezegen.
CryoSleep - Yolcunun yolculuk süresince askıya alınmış animasyona alınmayı seçip seçmediğini belirtir. Dondurucu uykudaki yolcular kabinlerine hapsedilir.
Cabin - Yolcunun kaldığı kabin numarası. Güverte/numara/yan formunu alır; burada taraf, İskele için P veya Sancak için S olabilir.
Destination : Yolcunun ineceÄŸi gezegen.
Age* - Yolcunun yaşı.
VIP - Yolcunun yolculuk sırasında özel VIP hizmeti için ödeme yapıp yapmadığı.
RoomService, FoodCourt, ShoppingMall, Spa, VRDeck - Yolcunun, Uzay Gemisi Titanic’in birçok lüks olanağının her birinde fatura ettiği tutar.
Name - Yolcunun adı ve soyadı.
Transported - Yolcunun başka bir boyuta taşınıp taşınmadığı. Bu hedeftir, tahmin etmeye çalıştığınız sütundur.
test.csv - Yolcuların geri kalan üçte birinin (~4300) kişisel kayıtları, test verisi olarak kullanılacak. Göreviniz bu setteki yolcular için Taşınan değerini tahmin etmektir.
sample_submission.csv - Doğru formatta bir gönderim dosyası. PassengerId - Test setindeki her yolcunun kimliği. Taşınan - Hedef. Her yolcu için Doğru veya Yanlış’ı tahmin edin.
PassengerId - Test setindeki her yolcunun kimliÄŸi.
Transported - Hedef. Her yolcu için Doğru veya Yanlış’ı tahmin edin.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ purrr 1.0.2
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## # A tibble: 13 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 PassengerId chr 0 0 4277 NA NA NA
## 2 HomePlanet chr 87 2 4 NA NA NA
## 3 CryoSleep lgl 93 2.2 3 0 0.37 1
## 4 Cabin chr 100 2.3 3266 NA NA NA
## 5 Destination chr 92 2.2 4 NA NA NA
## 6 Age dbl 91 2.1 80 0 28.7 79
## 7 VIP lgl 93 2.2 3 0 0.02 1
## 8 RoomService dbl 82 1.9 843 0 219. 11567
## 9 FoodCourt dbl 106 2.5 903 0 439. 25273
## 10 ShoppingMall dbl 98 2.3 716 0 177. 8292
## 11 Spa dbl 101 2.4 834 0 303. 19844
## 12 VRDeck dbl 80 1.9 797 0 311. 22272
## 13 Name chr 94 2.2 4177 NA NA NA
## # A tibble: 14 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 PassengerId chr 0 0 8693 NA NA NA
## 2 HomePlanet chr 201 2.3 4 NA NA NA
## 3 CryoSleep lgl 217 2.5 3 0 0.36 1
## 4 Cabin chr 199 2.3 6561 NA NA NA
## 5 Destination chr 182 2.1 4 NA NA NA
## 6 Age dbl 179 2.1 81 0 28.8 79
## 7 VIP lgl 203 2.3 3 0 0.02 1
## 8 RoomService dbl 181 2.1 1274 0 225. 14327
## 9 FoodCourt dbl 183 2.1 1508 0 458. 29813
## 10 ShoppingMall dbl 208 2.4 1116 0 174. 23492
## 11 Spa dbl 183 2.1 1328 0 311. 22408
## 12 VRDeck dbl 188 2.2 1307 0 305. 24133
## 13 Name chr 200 2.3 8474 NA NA NA
## 14 Transported lgl 0 0 2 0 0.5 1
test[c('group', 'pp')] <- str_split_fixed(test$PassengerId,'-', 2)
train[c('group', 'pp')] <- str_split_fixed(train$PassengerId,'-', 2)## # A tibble: 6 × 3
## PassengerId group pp
## <chr> <chr> <chr>
## 1 0001_01 0001_01 ""
## 2 0002_01 0002_01 ""
## 3 0003_01 0003_01 ""
## 4 0003_02 0003_02 ""
## 5 0004_01 0004_01 ""
## 6 0005_01 0005_01 ""
## # A tibble: 6 × 4
## PassengerId group pp withgroup
## <chr> <chr> <chr> <dbl>
## 1 0013_01 0013_01 "" 0
## 2 0018_01 0018_01 "" 0
## 3 0019_01 0019_01 "" 0
## 4 0021_01 0021_01 "" 0
## 5 0023_01 0023_01 "" 0
## 6 0027_01 0027_01 "" 0
## # A tibble: 6 × 4
## PassengerId group pp withgroup
## <chr> <chr> <chr> <dbl>
## 1 0001_01 0001_01 "" 0
## 2 0002_01 0002_01 "" 0
## 3 0003_01 0003_01 "" 0
## 4 0003_02 0003_02 "" 0
## 5 0004_01 0004_01 "" 0
## 6 0005_01 0005_01 "" 0
train[c('deck', 'num', 'side')] <- str_split_fixed(train$Cabin, '/', 3)
test[c('deck', 'num', 'side')] <- str_split_fixed(test$Cabin, '/', 3)## PassengerId HomePlanet CryoSleep Cabin
## 0001_01: 1 Earth :4602 Mode :logical G/734/S: 8
## 0002_01: 1 Europa:2131 FALSE:5439 B/11/S : 7
## 0003_01: 1 Mars :1759 TRUE :3037 B/201/P: 7
## 0003_02: 1 NA's : 201 NA's :217 B/82/S : 7
## 0004_01: 1 C/137/S: 7
## 0005_01: 1 (Other):8458
## (Other):8687 NA's : 199
## Destination Age VIP RoomService
## 55 Cancri e :1800 Min. : 0.00 Mode :logical Min. : 0.0
## PSO J318.5-22: 796 1st Qu.:19.00 FALSE:8291 1st Qu.: 0.0
## TRAPPIST-1e :5915 Median :27.00 TRUE :199 Median : 0.0
## NA's : 182 Mean :28.83 NA's :203 Mean : 224.7
## 3rd Qu.:38.00 3rd Qu.: 47.0
## Max. :79.00 Max. :14327.0
## NA's :179 NA's :181
## FoodCourt ShoppingMall Spa VRDeck
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 458.1 Mean : 173.7 Mean : 311.1 Mean : 304.9
## 3rd Qu.: 76.0 3rd Qu.: 27.0 3rd Qu.: 59.0 3rd Qu.: 46.0
## Max. :29813.0 Max. :23492.0 Max. :22408.0 Max. :24133.0
## NA's :183 NA's :208 NA's :183 NA's :188
## Name Transported group pp withgroup
## Alraium Disivering: 2 Mode :logical 0001_01: 1 :8693 Min. :0
## Ankalik Nateansive: 2 FALSE:4315 0002_01: 1 1st Qu.:0
## Anton Woody : 2 TRUE :4378 0003_01: 1 Median :0
## Apix Wala : 2 0003_02: 1 Mean :0
## Asch Stradick : 2 0004_01: 1 3rd Qu.:0
## (Other) :8483 0005_01: 1 Max. :0
## NA's : 200 (Other):8687
## deck num side
## F :2794 : 199 : 199
## G :2559 82 : 28 P:4206
## E : 876 19 : 22 S:4288
## B : 779 86 : 22
## C : 747 176 : 21
## (Other): 739 56 : 21
## NA's : 199 (Other):8380
train$num <- droplevels(train$num)
test$num <- droplevels(test$num)
train$side <- droplevels(train$side)
test$side <- droplevels(test$side)## PassengerId HomePlanet CryoSleep Cabin
## 0001_01: 1 Earth :4602 Mode :logical G/734/S: 8
## 0002_01: 1 Europa:2131 FALSE:5439 B/11/S : 7
## 0003_01: 1 Mars :1759 TRUE :3037 B/201/P: 7
## 0003_02: 1 NA's : 201 NA's :217 B/82/S : 7
## 0004_01: 1 C/137/S: 7
## 0005_01: 1 (Other):8458
## (Other):8687 NA's : 199
## Destination Age VIP RoomService
## 55 Cancri e :1800 Min. : 0.00 Mode :logical Min. : 0.0
## PSO J318.5-22: 796 1st Qu.:19.00 FALSE:8291 1st Qu.: 0.0
## TRAPPIST-1e :5915 Median :27.00 TRUE :199 Median : 0.0
## NA's : 182 Mean :28.83 NA's :203 Mean : 224.7
## 3rd Qu.:38.00 3rd Qu.: 47.0
## Max. :79.00 Max. :14327.0
## NA's :179 NA's :181
## FoodCourt ShoppingMall Spa VRDeck
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 458.1 Mean : 173.7 Mean : 311.1 Mean : 304.9
## 3rd Qu.: 76.0 3rd Qu.: 27.0 3rd Qu.: 59.0 3rd Qu.: 46.0
## Max. :29813.0 Max. :23492.0 Max. :22408.0 Max. :24133.0
## NA's :183 NA's :208 NA's :183 NA's :188
## Name Transported group pp withgroup
## Alraium Disivering: 2 Mode :logical 0001_01: 1 :8693 Min. :0
## Ankalik Nateansive: 2 FALSE:4315 0002_01: 1 1st Qu.:0
## Anton Woody : 2 TRUE :4378 0003_01: 1 Median :0
## Apix Wala : 2 0003_02: 1 Mean :0
## Asch Stradick : 2 0004_01: 1 3rd Qu.:0
## (Other) :8483 0005_01: 1 Max. :0
## NA's : 200 (Other):8687
## deck num side
## F :2794 : 199 : 199
## G :2559 82 : 28 P:4206
## E : 876 19 : 22 S:4288
## B : 779 86 : 22
## C : 747 176 : 21
## (Other): 739 56 : 21
## NA's : 199 (Other):8380
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
most_frequent_hp <- train %>%
filter(!is.na(HomePlanet)) %>%
group_by(Destination, HomePlanet) %>%
summarize(count = n()) %>%
arrange(Destination, desc(count)) %>%
slice(1) %>%
ungroup()## `summarise()` has grouped output by 'Destination'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 3
## Destination HomePlanet count
## <fct> <fct> <int>
## 1 55 Cancri e Europa 886
## 2 PSO J318.5-22 Earth 712
## 3 TRAPPIST-1e Earth 3101
## 4 <NA> Earth 99
train$HomePlanet <- as.character(train$HomePlanet)
train$Destination <- as.character(train$Destination)train <- train %>%
mutate(HomePlanet= ifelse(is.na(HomePlanet) & Destination == "55 Cancri e","Europa",
ifelse(is.na(HomePlanet), "Earth", HomePlanet)))test <- test %>%
mutate(HomePlanet= ifelse(is.na(HomePlanet) & Destination == "55 Cancri e","Europa",
ifelse(is.na(HomePlanet), "Earth", HomePlanet)))## # A tibble: 20 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 PassengerId fct 0 0 8693 NA NA NA
## 2 HomePlanet chr 4 0 4 NA NA NA
## 3 CryoSleep lgl 217 2.5 3 0 0.36 1
## 4 Cabin fct 199 2.3 6561 NA NA NA
## 5 Destination chr 182 2.1 4 NA NA NA
## 6 Age dbl 179 2.1 81 0 28.8 79
## 7 VIP lgl 203 2.3 3 0 0.02 1
## 8 RoomService dbl 181 2.1 1274 0 225. 14327
## 9 FoodCourt dbl 183 2.1 1508 0 458. 29813
## 10 ShoppingMall dbl 208 2.4 1116 0 174. 23492
## 11 Spa dbl 183 2.1 1328 0 311. 22408
## 12 VRDeck dbl 188 2.2 1307 0 305. 24133
## 13 Name fct 200 2.3 8474 NA NA NA
## 14 Transported lgl 0 0 2 0 0.5 1
## 15 group fct 0 0 8693 NA NA NA
## 16 pp fct 0 0 1 NA NA NA
## 17 withgroup dbl 0 0 1 0 0 0
## 18 deck fct 199 2.3 9 NA NA NA
## 19 num fct 0 0 1818 NA NA NA
## 20 side fct 0 0 3 NA NA NA
## # A tibble: 19 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 PassengerId fct 0 0 4277 NA NA NA
## 2 HomePlanet chr 2 0 4 NA NA NA
## 3 CryoSleep lgl 93 2.2 3 0 0.37 1
## 4 Cabin fct 100 2.3 3266 NA NA NA
## 5 Destination chr 92 2.2 4 NA NA NA
## 6 Age dbl 91 2.1 80 0 28.7 79
## 7 VIP lgl 93 2.2 3 0 0.02 1
## 8 RoomService dbl 82 1.9 843 0 219. 11567
## 9 FoodCourt dbl 106 2.5 903 0 439. 25273
## 10 ShoppingMall dbl 98 2.3 716 0 177. 8292
## 11 Spa dbl 101 2.4 834 0 303. 19844
## 12 VRDeck dbl 80 1.9 797 0 311. 22272
## 13 Name fct 94 2.2 4177 NA NA NA
## 14 group fct 0 0 4277 NA NA NA
## 15 pp fct 0 0 1 NA NA NA
## 16 withgroup dbl 0 0 1 0 0 0
## 17 deck fct 100 2.3 9 NA NA NA
## 18 num fct 0 0 1506 NA NA NA
## 19 side fct 0 0 3 NA NA NA
most_frequent_destinations <- train %>%
filter(!is.na(Destination)) %>%
group_by(HomePlanet, Destination ) %>%
summarize(count = n()) %>%
arrange(HomePlanet, desc(count)) %>%
slice(1) %>%
ungroup()## `summarise()` has grouped output by 'HomePlanet'. You can override using the
## `.groups` argument.
## # A tibble: 3 × 3
## HomePlanet Destination count
## <chr> <chr> <int>
## 1 Earth TRAPPIST-1e 3251
## 2 Europa TRAPPIST-1e 1189
## 3 Mars TRAPPIST-1e 1475
test$HomePlanet <- as.factor(test$HomePlanet)
test$Destination <- as.factor(test$Destination)
train$HomePlanet <- as.factor(train$HomePlanet)
train$Destination <- as.factor(train$Destination)## PassengerId HomePlanet CryoSleep Cabin
## 0001_01: 1 Earth :4772 Mode :logical G/734/S: 8
## 0002_01: 1 Europa:2162 FALSE:5439 B/11/S : 7
## 0003_01: 1 Mars :1759 TRUE :3037 B/201/P: 7
## 0003_02: 1 NA's :217 B/82/S : 7
## 0004_01: 1 C/137/S: 7
## 0005_01: 1 (Other):8458
## (Other):8687 NA's : 199
## Destination Age VIP RoomService
## 55 Cancri e :1800 Min. : 0.00 Mode :logical Min. : 0.0
## PSO J318.5-22: 796 1st Qu.:19.00 FALSE:8291 1st Qu.: 0.0
## TRAPPIST-1e :6097 Median :27.00 TRUE :199 Median : 0.0
## Mean :28.83 NA's :203 Mean : 224.7
## 3rd Qu.:38.00 3rd Qu.: 47.0
## Max. :79.00 Max. :14327.0
## NA's :179 NA's :181
## FoodCourt ShoppingMall Spa VRDeck
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 458.1 Mean : 173.7 Mean : 311.1 Mean : 304.9
## 3rd Qu.: 76.0 3rd Qu.: 27.0 3rd Qu.: 59.0 3rd Qu.: 46.0
## Max. :29813.0 Max. :23492.0 Max. :22408.0 Max. :24133.0
## NA's :183 NA's :208 NA's :183 NA's :188
## Name Transported group pp withgroup
## Alraium Disivering: 2 Mode :logical 0001_01: 1 :8693 Min. :0
## Ankalik Nateansive: 2 FALSE:4315 0002_01: 1 1st Qu.:0
## Anton Woody : 2 TRUE :4378 0003_01: 1 Median :0
## Apix Wala : 2 0003_02: 1 Mean :0
## Asch Stradick : 2 0004_01: 1 3rd Qu.:0
## (Other) :8483 0005_01: 1 Max. :0
## NA's : 200 (Other):8687
## deck num side
## F :2794 : 199 : 199
## G :2559 82 : 28 P:4206
## E : 876 19 : 22 S:4288
## B : 779 86 : 22
## C : 747 176 : 21
## (Other): 739 56 : 21
## NA's : 199 (Other):8380
train <- train %>%
mutate(RoomService = coalesce(RoomService, 0),
FoodCourt = coalesce(FoodCourt, 0),
ShoppingMall = coalesce(ShoppingMall, 0),
Spa = coalesce(Spa, 0),
VRDeck = coalesce(VRDeck, 0))test <- test %>%
mutate(RoomService = coalesce(RoomService, 0),
FoodCourt = coalesce(FoodCourt, 0),
ShoppingMall = coalesce(ShoppingMall, 0),
Spa = coalesce(Spa, 0),
VRDeck = coalesce(VRDeck, 0))test <- test %>%
group_by(HomePlanet, Destination) %>%
mutate_at(vars(Age), ~replace_na(., mean(., na.rm = TRUE)))train <- train %>%
group_by(HomePlanet, Destination) %>%
mutate_at(vars(Age), ~replace_na(., mean(., na.rm = TRUE)))train$expense <- train$RoomService + train$FoodCourt + train$ShoppingMall + train$Spa + train$VRDeck## # A tibble: 21 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 PassengerId fct 0 0 8693 NA NA NA
## 2 HomePlanet fct 0 0 3 NA NA NA
## 3 CryoSleep chr 98 1.1 3 NA NA NA
## 4 Cabin fct 199 2.3 6561 NA NA NA
## 5 Destination fct 0 0 3 NA NA NA
## 6 Age dbl 0 0 88 0 28.8 79
## 7 VIP lgl 203 2.3 3 0 0.02 1
## 8 RoomService dbl 0 0 1273 0 220. 14327
## 9 FoodCourt dbl 0 0 1507 0 448. 29813
## 10 ShoppingMall dbl 0 0 1115 0 170. 23492
## # ℹ 11 more rows
## PassengerId HomePlanet CryoSleep Cabin
## 0001_01: 1 Earth :4772 Length:8693 G/734/S: 8
## 0002_01: 1 Europa:2162 Class :character B/11/S : 7
## 0003_01: 1 Mars :1759 Mode :character B/201/P: 7
## 0003_02: 1 B/82/S : 7
## 0004_01: 1 C/137/S: 7
## 0005_01: 1 (Other):8458
## (Other):8687 NA's : 199
## Destination Age VIP RoomService
## 55 Cancri e :1800 Min. : 0.00 Mode :logical Min. : 0
## PSO J318.5-22: 796 1st Qu.:20.00 FALSE:8291 1st Qu.: 0
## TRAPPIST-1e :6097 Median :27.00 TRUE :199 Median : 0
## Mean :28.83 NA's :203 Mean : 220
## 3rd Qu.:37.00 3rd Qu.: 41
## Max. :79.00 Max. :14327
##
## FoodCourt ShoppingMall Spa VRDeck
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 448.4 Mean : 169.6 Mean : 304.6 Mean : 298.3
## 3rd Qu.: 61.0 3rd Qu.: 22.0 3rd Qu.: 53.0 3rd Qu.: 40.0
## Max. :29813.0 Max. :23492.0 Max. :22408.0 Max. :24133.0
##
## Name Transported group pp withgroup
## Alraium Disivering: 2 Mode :logical 0001_01: 1 :8693 Min. :0
## Ankalik Nateansive: 2 FALSE:4315 0002_01: 1 1st Qu.:0
## Anton Woody : 2 TRUE :4378 0003_01: 1 Median :0
## Apix Wala : 2 0003_02: 1 Mean :0
## Asch Stradick : 2 0004_01: 1 3rd Qu.:0
## (Other) :8483 0005_01: 1 Max. :0
## NA's : 200 (Other):8687
## deck num side expense cryosleep
## F :2794 : 199 : 199 Min. : 0 FALSE:5558
## G :2559 82 : 28 P:4206 1st Qu.: 0 TRUE :3037
## E : 876 19 : 22 S:4288 Median : 716 NA's : 98
## B : 779 86 : 22 Mean : 1441
## C : 747 176 : 21 3rd Qu.: 1441
## (Other): 739 56 : 21 Max. :35987
## NA's : 199 (Other):8380
## # A tibble: 22 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 PassengerId fct 0 0 8693 NA NA NA
## 2 HomePlanet fct 0 0 3 NA NA NA
## 3 CryoSleep chr 98 1.1 3 NA NA NA
## 4 Cabin fct 199 2.3 6561 NA NA NA
## 5 Destination fct 0 0 3 NA NA NA
## 6 Age dbl 0 0 88 0 28.8 79
## 7 VIP lgl 0 0 2 0 0.02 1
## 8 RoomService dbl 0 0 1273 0 220. 14327
## 9 FoodCourt dbl 0 0 1507 0 448. 29813
## 10 ShoppingMall dbl 0 0 1115 0 170. 23492
## # ℹ 12 more rows
## PassengerId HomePlanet CryoSleep Cabin
## 0001_01: 1 Earth :4772 Length:8693 G/734/S: 8
## 0002_01: 1 Europa:2162 Class :character B/11/S : 7
## 0003_01: 1 Mars :1759 Mode :character B/201/P: 7
## 0003_02: 1 B/82/S : 7
## 0004_01: 1 C/137/S: 7
## 0005_01: 1 (Other):8458
## (Other):8687 NA's : 199
## Destination Age VIP RoomService
## 55 Cancri e :1800 Min. : 0.00 Mode :logical Min. : 0
## PSO J318.5-22: 796 1st Qu.:20.00 FALSE:8494 1st Qu.: 0
## TRAPPIST-1e :6097 Median :27.00 TRUE :199 Median : 0
## Mean :28.83 Mean : 220
## 3rd Qu.:37.00 3rd Qu.: 41
## Max. :79.00 Max. :14327
##
## FoodCourt ShoppingMall Spa VRDeck
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 448.4 Mean : 169.6 Mean : 304.6 Mean : 298.3
## 3rd Qu.: 61.0 3rd Qu.: 22.0 3rd Qu.: 53.0 3rd Qu.: 40.0
## Max. :29813.0 Max. :23492.0 Max. :22408.0 Max. :24133.0
##
## Name Transported group pp withgroup
## Alraium Disivering: 2 Mode :logical 0001_01: 1 :8693 Min. :0
## Ankalik Nateansive: 2 FALSE:4315 0002_01: 1 1st Qu.:0
## Anton Woody : 2 TRUE :4378 0003_01: 1 Median :0
## Apix Wala : 2 0003_02: 1 Mean :0
## Asch Stradick : 2 0004_01: 1 3rd Qu.:0
## (Other) :8483 0005_01: 1 Max. :0
## NA's : 200 (Other):8687
## deck num side expense cryosleep
## F :2794 : 199 : 199 Min. : 0 FALSE:5558
## G :2559 82 : 28 P:4206 1st Qu.: 0 TRUE :3037
## E : 876 19 : 22 S:4288 Median : 716 NA's : 98
## B : 779 86 : 22 Mean : 1441
## C : 747 176 : 21 3rd Qu.: 1441
## (Other): 739 56 : 21 Max. :35987
## NA's : 199 (Other):8380
most_frequent_deck <- train %>%
filter(!is.na(deck)) %>%
group_by (HomePlanet, deck) %>%
summarize(count = n()) %>%
arrange(HomePlanet, desc(count)) %>%
slice (1) %>%
ungroup()## `summarise()` has grouped output by 'HomePlanet'. You can override using the
## `.groups` argument.
## # A tibble: 3 × 3
## HomePlanet deck count
## <fct> <fct> <int>
## 1 Earth G 2553
## 2 Europa B 771
## 3 Mars F 1110
train <- train %>%
mutate(deck = ifelse(is.na(deck) & HomePlanet == "Earth", "G",
ifelse(is.na(deck) & HomePlanet == "Europa", "B",
ifelse(is.na(deck) & HomePlanet == "Mars", "F", deck))))test <- test %>%
mutate(deck = ifelse(is.na(deck) & HomePlanet == "Earth", "G",
ifelse(is.na(deck) & HomePlanet == "Europa", "B",
ifelse(is.na(deck) & HomePlanet == "Mars", "F", deck))))most_frequent_side <- train %>%
filter(!is.na(deck)) %>%
group_by (HomePlanet, deck) %>%
summarize(count = n()) %>%
arrange(HomePlanet, desc(count)) %>%
slice (1) %>%
ungroup()## `summarise()` has grouped output by 'HomePlanet'. You can override using the
## `.groups` argument.
## # A tibble: 3 × 3
## HomePlanet deck count
## <chr> <chr> <int>
## 1 Earth G 2652
## 2 Europa B 834
## 3 Mars F 1147
train <- train %>%
mutate(deck = ifelse(is.na(side) & HomePlanet == "Earth", "G",
ifelse(is.na(side) & HomePlanet == "Europa", "B",
ifelse(is.na(side) & HomePlanet == "Mars", "F", deck))))test <- test %>%
mutate(deck = ifelse(is.na(side) & HomePlanet == "Earth", "G",
ifelse(is.na(side) & HomePlanet == "Europa", "B",
ifelse(is.na(side) & HomePlanet == "Mars", "F", deck))))## # A tibble: 22 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 PassengerId fct 0 0 8693 NA NA NA
## 2 HomePlanet chr 0 0 3 NA NA NA
## 3 CryoSleep chr 98 1.1 3 NA NA NA
## 4 Cabin fct 199 2.3 6561 NA NA NA
## 5 Destination fct 0 0 3 NA NA NA
## 6 Age dbl 0 0 88 0 28.8 79
## 7 VIP lgl 0 0 2 0 0.02 1
## 8 RoomService dbl 0 0 1273 0 220. 14327
## 9 FoodCourt dbl 0 0 1507 0 448. 29813
## 10 ShoppingMall dbl 0 0 1115 0 170. 23492
## # ℹ 12 more rows
## Adding missing grouping variables: `group`
## Adding missing grouping variables: `group`
## # A tibble: 18 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 group fct 0 0 8693 NA NA NA
## 2 PassengerId fct 0 0 8693 NA NA NA
## 3 HomePlanet chr 0 0 3 NA NA NA
## 4 CryoSleep chr 98 1.1 3 NA NA NA
## 5 Destination fct 0 0 3 NA NA NA
## 6 Age dbl 0 0 88 0 28.8 79
## 7 VIP lgl 0 0 2 0 0.02 1
## 8 RoomService dbl 0 0 1273 0 220. 14327
## 9 FoodCourt dbl 0 0 1507 0 448. 29813
## 10 ShoppingMall dbl 0 0 1115 0 170. 23492
## 11 Spa dbl 0 0 1327 0 305. 22408
## 12 VRDeck dbl 0 0 1306 0 298. 24133
## 13 Transported lgl 0 0 2 0 0.5 1
## 14 withgroup dbl 0 0 1 0 0 0
## 15 deck chr 0 0 8 NA NA NA
## 16 side chr 0 0 3 NA NA NA
## 17 expense dbl 0 0 2336 0 1441. 35987
## 18 cryosleep fct 98 1.1 3 NA NA NA
## `mutate_if()` ignored the following grouping variables:
## • Column `group`
## `mutate_if()` ignored the following grouping variables:
## • Column `group`
## group PassengerId HomePlanet CryoSleep
## 0001_01: 1 Length:8693 Europa:2162 FALSE:5558
## 0002_01: 1 Class :character Earth :4772 TRUE :3037
## 0003_01: 1 Mode :character Mars :1759 NA's : 98
## 0003_02: 1
## 0004_01: 1
## 0005_01: 1
## (Other):8687
## Destination Age VIP RoomService
## 55 Cancri e :1800 Min. : 0.00 Mode :logical Min. : 0
## PSO J318.5-22: 796 1st Qu.:20.00 FALSE:8494 1st Qu.: 0
## TRAPPIST-1e :6097 Median :27.00 TRUE :199 Median : 0
## Mean :28.83 Mean : 220
## 3rd Qu.:37.00 3rd Qu.: 41
## Max. :79.00 Max. :14327
##
## FoodCourt ShoppingMall Spa VRDeck
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 448.4 Mean : 169.6 Mean : 304.6 Mean : 298.3
## 3rd Qu.: 61.0 3rd Qu.: 22.0 3rd Qu.: 53.0 3rd Qu.: 40.0
## Max. :29813.0 Max. :23492.0 Max. :22408.0 Max. :24133.0
##
## Transported withgroup deck side expense
## Mode :logical Min. :0 F :2831 P:4206 Min. : 0
## FALSE:4315 1st Qu.:0 G :2658 S:4288 1st Qu.: 0
## TRUE :4378 Median :0 E : 876 : 199 Median : 716
## Mean :0 B : 842 Mean : 1441
## 3rd Qu.:0 C : 747 3rd Qu.: 1441
## Max. :0 D : 478 Max. :35987
## (Other): 261
## cryosleep
## FALSE:5558
## TRUE :3037
## NA's : 98
##
##
##
##
## # A tibble: 17 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 PassengerId chr 0 0 8693 NA NA NA
## 2 HomePlanet chr 0 0 3 NA NA NA
## 3 CryoSleep lgl 98 1.1 3 0 0.35 1
## 4 Destination chr 0 0 3 NA NA NA
## 5 Age dbl 0 0 88 0 28.8 79
## 6 VIP lgl 0 0 2 0 0.02 1
## 7 RoomService int 0 0 1273 0 220. 14327
## 8 FoodCourt int 0 0 1507 0 448. 29813
## 9 ShoppingMall int 0 0 1115 0 170. 23492
## 10 Spa int 0 0 1327 0 305. 22408
## 11 VRDeck int 0 0 1306 0 298. 24133
## 12 Transported lgl 0 0 2 0 0.5 1
## 13 withgroup int 0 0 1 0 0 0
## 14 deck chr 0 0 8 NA NA NA
## 15 side chr 0 0 3 NA NA NA
## 16 expense int 0 0 2336 0 1441. 35987
## 17 cryosleep lgl 98 1.1 3 0 0.35 1
## # A tibble: 17 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 PassengerId chr 0 0 8693 NA NA NA
## 2 HomePlanet chr 0 0 3 NA NA NA
## 3 CryoSleep lgl 98 1.1 3 0 0.35 1
## 4 Destination chr 0 0 3 NA NA NA
## 5 Age dbl 0 0 88 0 28.8 79
## 6 VIP lgl 0 0 2 0 0.02 1
## 7 RoomService int 0 0 1273 0 220. 14327
## 8 FoodCourt int 0 0 1507 0 448. 29813
## 9 ShoppingMall int 0 0 1115 0 170. 23492
## 10 Spa int 0 0 1327 0 305. 22408
## 11 VRDeck int 0 0 1306 0 298. 24133
## 12 Transported lgl 0 0 2 0 0.5 1
## 13 withgroup int 0 0 1 0 0 0
## 14 deck chr 0 0 8 NA NA NA
## 15 side chr 0 0 3 NA NA NA
## 16 expense int 0 0 2336 0 1441. 35987
## 17 cryosleep lgl 98 1.1 3 0 0.35 1
test$HomePlanet <- as.factor(test$HomePlanet)
test$Destination <- as.factor(test$Destination)
test$deck <- as.factor(test$deck)
test$side <- as.factor(test$side)train$HomePlanet <- as.factor(train$HomePlanet)
train$Destination <- as.factor(train$Destination)
train$deck <- as.factor(train$deck)
train$side <- as.factor(train$side)## Warning in cor(D): the standard deviation is zero
## corrplot 0.92 loaded
##
##
## processing file: report.rmd
## | | | 0% | |. | 2% | |.. | 5% [global_options] | |... | 7% | |.... | 10% [introduce] | |.... | 12% | |..... | 14% [plot_intro] | |...... | 17% | |....... | 19% [data_structure] | |........ | 21% | |......... | 24% [missing_profile] | |.......... | 26% | |........... | 29% [univariate_distribution_header] | |........... | 31% | |............ | 33% [plot_histogram] | |............. | 36% | |.............. | 38% [plot_density] | |............... | 40% | |................ | 43% [plot_frequency_bar] | |................. | 45% | |.................. | 48% [plot_response_bar] | |.................. | 50% | |................... | 52% [plot_with_bar] | |.................... | 55% | |..................... | 57% [plot_normal_qq] | |...................... | 60% | |....................... | 62% [plot_response_qq] | |........................ | 64% | |......................... | 67% [plot_by_qq] | |.......................... | 69% | |.......................... | 71% [correlation_analysis] | |........................... | 74% | |............................ | 76% [principal_component_analysis] | |............................. | 79% | |.............................. | 81% [bivariate_distribution_header] | |............................... | 83% | |................................ | 86% [plot_response_boxplot] | |................................. | 88% | |................................. | 90% [plot_by_boxplot] | |.................................. | 93% | |................................... | 95% [plot_response_scatterplot] | |.................................... | 98% | |.....................................| 100% [plot_by_scatterplot]
## output file: C:/Users/cabdi/OneDrive/Documents/titanic proje2/report.knit.md
## "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS "C:\Users\cabdi\OneDrive\DOCUME~1\TITANI~1\REPORT~1.MD" --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc3d08774341fb.html --lua-filter "C:\Users\cabdi\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\cabdi\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 6 --template "C:\Users\cabdi\AppData\Local\R\win-library\4.4\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable theme=yeti --mathjax --variable "mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --include-in-header "C:\Users\cabdi\AppData\Local\Temp\RtmpA5InqM\rmarkdown-str3d08693e2b9c.html"
##
## Output created: report.html
##
## Call:
## lm(formula = Transported ~ ., data = train[, 2:16])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.47726 -0.30338 -0.03261 0.28466 1.74104
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.595e-01 4.921e-02 7.306 3.00e-13 ***
## HomePlanetEuropa 1.853e-01 2.783e-02 6.656 2.98e-11 ***
## HomePlanetMars 8.492e-02 1.463e-02 5.804 6.71e-09 ***
## CryoSleepTRUE 3.970e-01 1.166e-02 34.062 < 2e-16 ***
## DestinationPSO J318.5-22 -4.581e-02 1.812e-02 -2.528 0.0115 *
## DestinationTRAPPIST-1e -4.454e-02 1.136e-02 -3.920 8.93e-05 ***
## Age -2.258e-03 3.156e-04 -7.154 9.13e-13 ***
## VIPTRUE -3.282e-02 2.967e-02 -1.106 0.2687
## RoomService -1.143e-04 7.054e-06 -16.198 < 2e-16 ***
## FoodCourt 4.453e-05 3.057e-06 14.567 < 2e-16 ***
## ShoppingMall 8.238e-05 7.452e-06 11.056 < 2e-16 ***
## Spa -8.445e-05 4.111e-06 -20.543 < 2e-16 ***
## VRDeck -8.076e-05 4.113e-06 -19.637 < 2e-16 ***
## withgroup NA NA NA NA
## deckB 1.156e-01 2.906e-02 3.977 7.05e-05 ***
## deckC 1.445e-01 2.939e-02 4.917 8.94e-07 ***
## deckD 3.850e-02 3.486e-02 1.105 0.2694
## deckE -9.918e-03 3.595e-02 -0.276 0.7826
## deckF 8.113e-02 3.666e-02 2.213 0.0269 *
## deckG 2.612e-02 3.823e-02 0.683 0.4945
## deckT 5.196e-02 1.810e-01 0.287 0.7741
## sideP -2.148e-02 2.960e-02 -0.726 0.4681
## sideS 6.536e-02 2.958e-02 2.210 0.0271 *
## expense NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4002 on 8573 degrees of freedom
## (98 observations deleted due to missingness)
## Multiple R-squared: 0.3609, Adjusted R-squared: 0.3594
## F-statistic: 230.6 on 21 and 8573 DF, p-value: < 2.2e-16
library(caTools)
set.seed(123)
split = sample.split(train$Transported, SplitRatio = 0.75)
train_train = subset(train, split == TRUE)
train_test = subset(train, split == FALSE)##
## Call:
## lm(formula = Transported ~ ., data = train_train[, -c(1)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.55246 -0.30616 -0.03245 0.28473 1.72031
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.804e-01 5.691e-02 6.685 2.51e-11 ***
## HomePlanetEuropa 1.718e-01 3.220e-02 5.337 9.79e-08 ***
## HomePlanetMars 7.364e-02 1.686e-02 4.368 1.27e-05 ***
## CryoSleepTRUE 3.923e-01 1.343e-02 29.202 < 2e-16 ***
## DestinationPSO J318.5-22 -6.199e-02 2.087e-02 -2.970 0.002989 **
## DestinationTRAPPIST-1e -4.951e-02 1.312e-02 -3.773 0.000163 ***
## Age -2.396e-03 3.666e-04 -6.535 6.84e-11 ***
## VIPTRUE -1.085e-02 3.376e-02 -0.321 0.747941
## RoomService -1.138e-04 7.789e-06 -14.615 < 2e-16 ***
## FoodCourt 4.044e-05 3.468e-06 11.662 < 2e-16 ***
## ShoppingMall 8.551e-05 8.406e-06 10.173 < 2e-16 ***
## Spa -8.337e-05 4.707e-06 -17.712 < 2e-16 ***
## VRDeck -8.022e-05 4.772e-06 -16.813 < 2e-16 ***
## withgroup NA NA NA NA
## deckB 1.069e-01 3.323e-02 3.216 0.001305 **
## deckC 1.360e-01 3.355e-02 4.054 5.09e-05 ***
## deckD 1.807e-02 3.969e-02 0.455 0.648946
## deckE -4.120e-02 4.148e-02 -0.993 0.320636
## deckF 6.175e-02 4.211e-02 1.466 0.142587
## deckG 9.273e-03 4.394e-02 0.211 0.832861
## deckT 4.238e-02 1.816e-01 0.233 0.815497
## sideP -1.001e-02 3.456e-02 -0.290 0.772074
## sideS 8.117e-02 3.452e-02 2.351 0.018753 *
## expense NA NA NA NA
## cryosleepTRUE NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4002 on 6424 degrees of freedom
## (74 observations deleted due to missingness)
## Multiple R-squared: 0.3616, Adjusted R-squared: 0.3596
## F-statistic: 173.3 on 21 and 6424 DF, p-value: < 2.2e-16
## reg_transported_tahmin
## transported_gercek 0 1
## 0 903 170
## 1 315 761
## [1] 0.7743136
## reg_transported_tahmin
## transported_gercek 0 1
## 0 2668 547
## 1 930 2301
## [1] 0.7708657
## Zorunlu paket yükleniyor: rJava
## Zorunlu paket yükleniyor: leaps
regresyon_opt <- glmulti(Transported ~ HomePlanet + Cryosleep + Destination + Age + VIP + RoomService + FoodCourt + ShoppingMall + Spa + VRDeck + withgroup + deck + side, + level = 1, crit = bic, data =train)
modelglmulti <- lm(Transported ~ 1 + HomePlanet + Destination + deck + side + Cryosleep + Age + RoomService + FoodCourt + ShoppingMall + Spa + VRDeck, data = train )
reg_tahmin_glmulti = predict(modelglmulti, newdata = test[, -c(1)])
reg_transported_test_tahmin_glmulti <- ifelse(reg_tahmin_glmulti > 0.5, TRUE, FALSE)
Transported <- as.character(reg_transported_test_tahmin_glmulti)
PassengerId <- test$PassengerId
Transported<-as.vector(Transported)
submission_regrasyon_glmulti <- cbind(PassengerId, Transported)
submission_regrasyon_glmulti <- as.data.frame(submission_regrasyon_glmulti)
submission_regrasyon_glmulti$Transported <- str_to_title(submission_regrasyon_glmulti$Transpored)
write.csv(submission_regrasyon_glmulti, "submission_regrasyon_glmulti.csv", row.names =FALSE, quote=FALSE)
fotograf
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Call:
## glm(formula = Transported ~ ., family = binomial, data = train_train[,
## -c(1)])
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.176e-02 4.265e-01 -0.028 0.978004
## HomePlanetEuropa 1.313e+00 2.522e-01 5.207 1.92e-07 ***
## HomePlanetMars 4.706e-01 1.098e-01 4.285 1.83e-05 ***
## CryoSleepTRUE 1.345e+00 9.356e-02 14.374 < 2e-16 ***
## DestinationPSO J318.5-22 -5.119e-01 1.307e-01 -3.918 8.94e-05 ***
## DestinationTRAPPIST-1e -4.505e-01 9.464e-02 -4.760 1.94e-06 ***
## Age -9.359e-03 2.419e-03 -3.869 0.000109 ***
## VIPTRUE -1.637e-01 2.926e-01 -0.560 0.575788
## RoomService -1.692e-03 1.136e-04 -14.891 < 2e-16 ***
## FoodCourt 4.599e-04 4.441e-05 10.357 < 2e-16 ***
## ShoppingMall 5.432e-04 7.598e-05 7.149 8.77e-13 ***
## Spa -1.924e-03 1.160e-04 -16.579 < 2e-16 ***
## VRDeck -1.852e-03 1.155e-04 -16.038 < 2e-16 ***
## withgroup NA NA NA NA
## deckB 1.198e+00 2.935e-01 4.080 4.50e-05 ***
## deckC 2.328e+00 3.316e-01 7.020 2.22e-12 ***
## deckD 4.992e-01 3.216e-01 1.552 0.120605
## deckE -1.290e-01 3.272e-01 -0.394 0.693288
## deckF 5.016e-01 3.306e-01 1.517 0.129186
## deckG 7.179e-02 3.402e-01 0.211 0.832849
## deckT -3.072e-01 1.793e+00 -0.171 0.863922
## sideP -1.586e-01 2.462e-01 -0.644 0.519550
## sideS 4.515e-01 2.465e-01 1.832 0.067006 .
## expense NA NA NA NA
## cryosleepTRUE NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8936.0 on 6445 degrees of freedom
## Residual deviance: 5539.3 on 6424 degrees of freedom
## (74 observations deleted due to missingness)
## AIC: 5583.3
##
## Number of Fisher Scoring iterations: 7
## 2 3 7 14 16 21
## -1.0380056 -10.8408556 0.8859320 -1.6134173 -0.2589114 -1.7534862
## ── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
## ✔ broom 1.0.6 ✔ rsample 1.2.1
## ✔ dials 1.2.1 ✔ tune 1.2.1
## ✔ infer 1.0.7 ✔ workflows 1.1.4
## ✔ modeldata 1.3.0 ✔ workflowsets 1.1.0
## ✔ parsnip 1.2.1 ✔ yardstick 1.3.1
## ✔ recipes 1.0.10
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ data.table::between() masks dplyr::between()
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ data.table::first() masks dplyr::first()
## ✖ recipes::fixed() masks stringr::fixed()
## ✖ dplyr::lag() masks stats::lag()
## ✖ data.table::last() masks dplyr::last()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step() masks stats::step()
## ✖ data.table::transpose() masks purrr::transpose()
## • Learn how to get started at https://www.tidymodels.org/start/
result$Transported <- as.factor(result$Transported)
result$logistic_transported_tahmin <- as.factor(result$logistic_transported_tahmin)## # A tibble: 1 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.785
## Truth
## Prediction 0 1
## 0 913 302
## 1 160 774
## Zorunlu paket yükleniyor: lattice
##
## Attaching package: 'caret'
## The following objects are masked from 'package:yardstick':
##
## precision, recall, sensitivity, specificity
## The following object is masked from 'package:purrr':
##
## lift
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 913 160
## 1 302 774
##
## Accuracy : 0.785
## 95% CI : (0.767, 0.8022)
## No Information Rate : 0.5654
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.5701
##
## Mcnemar's Test P-Value : 5.384e-11
##
## Sensitivity : 0.7514
## Specificity : 0.8287
## Pos Pred Value : 0.8509
## Neg Pred Value : 0.7193
## Prevalence : 0.5654
## Detection Rate : 0.4248
## Detection Prevalence : 0.4993
## Balanced Accuracy : 0.7901
##
## 'Positive' Class : 0
##
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
NAIVE BAYES
##
## Attaching package: 'e1071'
## The following object is masked from 'package:tune':
##
## tune
## The following object is masked from 'package:rsample':
##
## permutations
## The following object is masked from 'package:parsnip':
##
## tune
##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
##
## A-priori probabilities:
## Y
## FALSE TRUE
## 0.496319 0.503681
##
## Conditional probabilities:
## HomePlanet
## Y Earth Europa Mars
## FALSE 0.6251545 0.1749073 0.1999382
## TRUE 0.4649817 0.3285627 0.2064555
##
## CryoSleep
## Y FALSE TRUE
## FALSE 0.8693624 0.1306376
## TRUE 0.4243268 0.5756732
##
## Destination
## Y 55 Cancri e PSO J318.5-22 TRAPPIST-1e
## FALSE 0.16069221 0.09363412 0.74567367
## TRUE 0.24969549 0.09043849 0.65986602
##
## Age
## Y [,1] [,2]
## FALSE 30.01152 13.45216
## TRUE 27.84835 14.87502
##
## VIP
## Y FALSE TRUE
## FALSE 0.97126082 0.02873918
## TRUE 0.98142509 0.01857491
##
## RoomService
## Y [,1] [,2]
## FALSE 402.20365 916.6547
## TRUE 56.93484 246.8105
##
## FoodCourt
## Y [,1] [,2]
## FALSE 396.2923 1258.123
## TRUE 514.8018 1918.620
##
## ShoppingMall
## Y [,1] [,2]
## FALSE 160.2250 432.1103
## TRUE 182.7135 748.5867
##
## Spa
## Y [,1] [,2]
## FALSE 561.16193 1554.2636
## TRUE 61.11571 264.0556
##
## VRDeck
## Y [,1] [,2]
## FALSE 533.98733 1542.8703
## TRUE 69.07186 295.0542
##
## withgroup
## Y [,1] [,2]
## FALSE 0 0
## TRUE 0 0
##
## deck
## Y A B C D E
## FALSE 0.0296662546 0.0553152040 0.0568603214 0.0655129790 0.1338071693
## TRUE 0.0301461632 0.1419001218 0.1178440926 0.0484165652 0.0672959805
## deck
## Y F G T
## FALSE 0.3634116193 0.2941903585 0.0012360939
## TRUE 0.2834957369 0.3105968331 0.0003045067
##
## side
## Y P S
## FALSE 0.02348578 0.54079110 0.43572311
## TRUE 0.02131547 0.43300853 0.54567600
##
## expense
## Y [,1] [,2]
## FALSE 2053.8702 3224.219
## TRUE 884.6376 2307.962
##
## cryosleep
## Y FALSE TRUE
## FALSE 0.8693624 0.1306376
## TRUE 0.4243268 0.5756732
## FALSE. TRUE.
## 1 0.33040033 6.695997e-01
## 2 1.00000000 3.164597e-135
## 3 0.06879377 9.312062e-01
## 4 0.71667750 2.833225e-01
## 5 0.03632331 9.636767e-01
## 6 0.62129515 3.787048e-01
## [1] 1 0 1 0 1 0
## Transported
## 2 1
## 3 0
## 7 1
## 14 0
## 16 0
## 21 0
## Transported_pred_nb
## Transported_test_train 0 1
## 0 534 545
## 1 86 1008
## [1] 0.709618
Transported <- as.character(Transported_pred_nb)
PassengerId <- test$PassengerId
Transported <- as.vector(Transported)
sample_submision <- cbind(PassengerId, Transported)
sample_submision <- as.data.frame(sample_submision)
sample_submision$Transported <-str_to_title(sample_submision$Transported)
write.csv(sample_submision, "sub_svm_csv", row.names = FALSE, quote = FALSE)SVM
fit_svm <- svm(Transported ~ .,data = train_train[, -1],
type = 'C-classification',
kernel = 'linear') ## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'withgroup' constant. Cannot scale data.
## .
## 2 TRUE
## 3 FALSE
## 7 TRUE
## 14 TRUE
## 16 TRUE
## 21 FALSE
## Transported_pred_nb
## Transported_test_train 0 1
## 0 534 545
## 1 86 1008
## [1] 0.7100782
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'withgroup' constant. Cannot scale data.
Transported <- as.character(Transported_pred_svm)
PassengerId <- test$PassengerId
Transported <- as.vector(Transported)
sample_submision <- cbind(PassengerId, Transported)## Warning in cbind(PassengerId, Transported): number of rows of result is not a
## multiple of vector length (arg 2)
SVM radial
fit_svm <- svm(Transported ~ ., data = train_train[, -1],
type= 'C-classification',
kernel = 'radial' )## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'withgroup' constant. Cannot scale data.
## Transported_pred_nb
## Transported_test_train 0 1
## 0 534 545
## 1 86 1008
## [1] 0.709618
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'withgroup' constant. Cannot scale data.
Transported <- as.character(Transported_pred_svm)
PassengerId <- test$PassengerId
Transported <- as.vector(Transported)
sample_submission <- cbind(PassengerId,Transported)## Warning in cbind(PassengerId, Transported): number of rows of result is not a
## multiple of vector length (arg 2)
sample_submission <- as.data.frame(sample_submission)
sample_submission$Transported <- str_to_title(sample_submission$Transported)
write.csv(sample_submission,"sub_svm_radial.csv",row.names = FALSE, quote = FALSE)P <- ggplot(train_train,aes(x=HomePlanet, y=deck,color=factor(Transported))) +
geom_point(aes(shape=factor(Transported)), size=3) +
scale_color_viridis_d() +
labs(title = "", x="HomePlanet", y="deck") +
theme_minimal() +
theme(legend.position = "top")
P##
## Attaching package: 'rpart'
## The following object is masked from 'package:dials':
##
## prune
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
## Call:
## rpart::rpart(formula = Transported ~ ., data = train_train[,
## -1])
## n= 6520
##
## CP nsplit rel error xerror xstd
## 1 0.23321359 0 1.0000000 1.0001811 0.0001977604
## 2 0.04981598 1 0.7667864 0.7671015 0.0103212807
## 3 0.02807107 2 0.7169704 0.7173428 0.0090686704
## 4 0.02784643 3 0.6888994 0.7031918 0.0095398822
## 5 0.01770479 4 0.6610529 0.6654236 0.0096767137
## 6 0.01381587 5 0.6433481 0.6488743 0.0100411278
## 7 0.01194576 6 0.6295323 0.6381676 0.0101816720
## 8 0.01164524 7 0.6175865 0.6303299 0.0102037441
## 9 0.01000000 8 0.6059413 0.6235169 0.0102521030
##
## Variable importance
## expense cryosleep CryoSleep FoodCourt Spa VRDeck
## 21 16 16 12 11 10
## deck HomePlanet ShoppingMall Age Destination
## 5 5 2 1 1
##
## Node number 1: 6520 observations, complexity param=0.2332136
## mean=0.503681, MSE=0.2499865
## left son=2 (3795 obs) right son=3 (2725 obs)
## Primary splits:
## expense < 0.5 to the right, improve=0.2332136, (0 missing)
## CryoSleep < 0.5 to the left, improve=0.2141494, (74 missing)
## cryosleep < 0.5 to the left, improve=0.2141494, (74 missing)
## RoomService < 0.5 to the right, improve=0.1254121, (0 missing)
## Spa < 0.5 to the right, improve=0.1142180, (0 missing)
## Surrogate splits:
## CryoSleep < 0.5 to the left, agree=0.932, adj=0.837, (0 split)
## cryosleep < 0.5 to the left, agree=0.932, adj=0.837, (0 split)
## Spa < 0.5 to the right, agree=0.784, adj=0.484, (0 split)
## FoodCourt < 0.5 to the right, agree=0.770, adj=0.451, (0 split)
## VRDeck < 0.5 to the right, agree=0.768, adj=0.444, (0 split)
##
## Node number 2: 3795 observations, complexity param=0.02807107
## mean=0.2990777, MSE=0.2096302
## left son=4 (3243 obs) right son=5 (552 obs)
## Primary splits:
## FoodCourt < 1331 to the left, improve=0.05751186, (0 missing)
## ShoppingMall < 627.5 to the left, improve=0.04530804, (0 missing)
## RoomService < 365.5 to the right, improve=0.04440387, (0 missing)
## Spa < 257.5 to the right, improve=0.03347453, (0 missing)
## VRDeck < 721 to the right, improve=0.02290457, (0 missing)
## Surrogate splits:
## expense < 5981 to the left, agree=0.885, adj=0.210, (0 split)
## deck splits as RRRLLLLL, agree=0.884, adj=0.201, (0 split)
## HomePlanet splits as LRL, agree=0.878, adj=0.161, (0 split)
## Spa < 8955.5 to the left, agree=0.856, adj=0.009, (0 split)
## VRDeck < 11692 to the left, agree=0.856, adj=0.009, (0 split)
##
## Node number 3: 2725 observations, complexity param=0.04981598
## mean=0.7886239, MSE=0.1666963
## left son=6 (1449 obs) right son=7 (1276 obs)
## Primary splits:
## deck splits as RRRRLRL-, improve=0.17874760, (0 missing)
## HomePlanet splits as LRR, improve=0.12440710, (0 missing)
## Destination splits as RLL, improve=0.02625136, (0 missing)
## CryoSleep < 0.5 to the left, improve=0.02267577, (74 missing)
## cryosleep < 0.5 to the left, improve=0.02267577, (74 missing)
## Surrogate splits:
## HomePlanet splits as LRR, agree=0.933, adj=0.857, (0 split)
## Age < 24.5 to the left, agree=0.625, adj=0.200, (0 split)
## Destination splits as RLL, agree=0.591, adj=0.126, (0 split)
## VIP < 0.5 to the left, agree=0.538, adj=0.014, (0 split)
## side splits as RLL, agree=0.533, adj=0.002, (0 split)
##
## Node number 4: 3243 observations, complexity param=0.02784643
## mean=0.2537774, MSE=0.1893744
## left son=8 (2577 obs) right son=9 (666 obs)
## Primary splits:
## ShoppingMall < 541.5 to the left, improve=0.07390355, (0 missing)
## RoomService < 365.5 to the right, improve=0.03464407, (0 missing)
## Spa < 240.5 to the right, improve=0.03327259, (0 missing)
## VRDeck < 114 to the right, improve=0.02784287, (0 missing)
## expense < 2867.5 to the right, improve=0.01811461, (0 missing)
## Surrogate splits:
## expense < 18644 to the left, agree=0.795, adj=0.003, (0 split)
##
## Node number 5: 552 observations, complexity param=0.01770479
## mean=0.5652174, MSE=0.2457467
## left son=10 (123 obs) right son=11 (429 obs)
## Primary splits:
## Spa < 1372.5 to the right, improve=0.21272970, (0 missing)
## VRDeck < 1063.5 to the right, improve=0.17089500, (0 missing)
## expense < 5395 to the right, improve=0.06611166, (0 missing)
## deck splits as LLRLLRRL, improve=0.02812225, (0 missing)
## side splits as LLR, improve=0.02807513, (0 missing)
## Surrogate splits:
## expense < 12647 to the right, agree=0.790, adj=0.057, (0 split)
## Age < 13.5 to the left, agree=0.779, adj=0.008, (0 split)
## RoomService < 3895.5 to the right, agree=0.779, adj=0.008, (0 split)
##
## Node number 6: 1449 observations
## mean=0.6266391, MSE=0.2339625
##
## Node number 7: 1276 observations
## mean=0.9725705, MSE=0.02667709
##
## Node number 8: 2577 observations, complexity param=0.01194576
## mean=0.193636, MSE=0.1561411
## left son=16 (2067 obs) right son=17 (510 obs)
## Primary splits:
## FoodCourt < 456.5 to the left, improve=0.04838893, (0 missing)
## expense < 1447.5 to the right, improve=0.04016842, (0 missing)
## HomePlanet splits as RLL, improve=0.02438006, (0 missing)
## Spa < 537.5 to the right, improve=0.01895890, (0 missing)
## RoomService < 400.5 to the right, improve=0.01706521, (0 missing)
## Surrogate splits:
## expense < 12373 to the left, agree=0.804, adj=0.008, (0 split)
## Spa < 13650 to the left, agree=0.803, adj=0.006, (0 split)
## VRDeck < 10123.5 to the left, agree=0.802, adj=0.002, (0 split)
## deck splits as LLLLLLLR, agree=0.802, adj=0.002, (0 split)
##
## Node number 9: 666 observations
## mean=0.4864865, MSE=0.2498174
##
## Node number 10: 123 observations
## mean=0.1382114, MSE=0.119109
##
## Node number 11: 429 observations, complexity param=0.01381587
## mean=0.6876457, MSE=0.2147891
## left son=22 (143 obs) right son=23 (286 obs)
## Primary splits:
## VRDeck < 611 to the right, improve=0.24438400, (0 missing)
## Spa < 225 to the right, improve=0.05300377, (0 missing)
## FoodCourt < 3119.5 to the left, improve=0.05168044, (0 missing)
## side splits as LLR, improve=0.04323810, (0 missing)
## RoomService < 1719.5 to the right, improve=0.03930897, (0 missing)
## Surrogate splits:
## expense < 6032 to the right, agree=0.702, adj=0.105, (0 split)
## Age < 53.5 to the right, agree=0.674, adj=0.021, (0 split)
## FoodCourt < 12128.5 to the right, agree=0.671, adj=0.014, (0 split)
##
## Node number 16: 2067 observations
## mean=0.1504596, MSE=0.1278215
##
## Node number 17: 510 observations, complexity param=0.01164524
## mean=0.3686275, MSE=0.2327413
## left son=34 (204 obs) right son=35 (306 obs)
## Primary splits:
## expense < 1447.5 to the right, improve=0.15990760, (0 missing)
## VRDeck < 86.5 to the right, improve=0.10060710, (0 missing)
## HomePlanet splits as RLL, improve=0.07491751, (0 missing)
## Spa < 500 to the right, improve=0.07353369, (0 missing)
## deck splits as LLLLRRRL, improve=0.05075444, (0 missing)
## Surrogate splits:
## HomePlanet splits as RLL, agree=0.867, adj=0.667, (0 split)
## deck splits as LLLLRRRL, agree=0.839, adj=0.598, (0 split)
## VRDeck < 213.5 to the right, agree=0.818, adj=0.544, (0 split)
## Spa < 219.5 to the right, agree=0.792, adj=0.480, (0 split)
## FoodCourt < 907 to the right, agree=0.722, adj=0.304, (0 split)
##
## Node number 22: 143 observations
## mean=0.3636364, MSE=0.231405
##
## Node number 23: 286 observations
## mean=0.8496503, MSE=0.1277446
##
## Node number 34: 204 observations
## mean=0.1323529, MSE=0.1148356
##
## Node number 35: 306 observations
## mean=0.5261438, MSE=0.2493165
## .
## 2 0.1504596
## 3 0.1382114
## 7 0.8496503
## 14 0.1504596
## 16 0.4864865
## 21 0.1504596
## Transported_pred_nb
## Transported_test_train 0 1
## 0 534 545
## 1 86 1008
## [1] 0.709618
fit_tree <- rpart(Transported ~ ., data = train[, -1])
preds <- predict(fit_tree, newdata = test) %>%
data.frame()Transported <- as.character(Transported_pred_tree)
PassengerId <- test$PassengerId
Transported <- as.vector(Transported)
sample_submission <- cbind(PassengerId, Transported)
sample_submission <- as.data.frame(sample_submission)
sample_submission$Transported <- str_to_title(sample_submission$Transported)
write.csv(sample_submission, "sub_forest.csv", row.names = FALSE, quote = FALSE)Random Forest
## Warning in randomForest.default(m, y, ...): The response has five or fewer
## unique values. Are you sure you want to do regression?
## IncNodePurity
## HomePlanet 38.741878
## CryoSleep 80.608839
## Destination 24.787945
## Age 96.824678
## VIP 2.387148
## RoomService 102.275704
## FoodCourt 114.963217
## ShoppingMall 92.371759
## Spa 117.267689
## VRDeck 103.483651
## withgroup 0.000000
## deck 94.273488
## side 25.773873
## expense 272.213144
## cryosleep 67.135536
## .
## 1 0.9722222
## 2 0.1590679
## 3 0.1260870
## 4 0.1260870
## 5 0.1590679
## 6 0.1590679
Transported <- as.character(Transported_pred_tree)
PassengerId <- test$PassengerId
Transported <- as.vector(Transported)
sample_submission <- cbind(PassengerId, Transported)
sample_submission <- as.data.frame(sample_submission)
sample_submission$Transported <- str_to_title(sample_submission$Transported)
write.csv(sample_submission, "sub_forest.csv", row.names = FALSE, quote = FALSE)