Answer the questions below using markdown and R code. The first cell below loads the data you’ll need to use. Data definitions are available at https://www.kaggle.com/c/titanic/data (note that some columns are not named exactly the same in the Kaggle version, and the version of the data you’ll be using does not have Cabin number or Port of Embarkation).
titanic <- read.csv("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv")
titanic dataframe,
with_family, that indicates whether a passenger had any
family onboard (i.e., either siblings/spouses or parents/children). This
column should be a numeric column that takes on value of 1 for yes, 0
for no.colnames(titanic)
## [1] "Survived" "Pclass"
## [3] "Name" "Sex"
## [5] "Age" "Siblings.Spouses.Aboard"
## [7] "Parents.Children.Aboard" "Fare"
titanic <- clean_names(titanic)
mutated_titanic <- mutate(titanic,
with_family=
case_when(siblings_spouses_aboard!=0 ~ 1,
siblings_spouses_aboard== 0 ~ 0,
parents_children_aboard!=0 ~ 1,
parents_children_aboard== 0 ~ 0))
mutated_titanic%>% glimpse()
## Rows: 887
## Columns: 9
## $ survived <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1…
## $ pclass <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2…
## $ name <chr> "Mr. Owen Harris Braund", "Mrs. John Bradley (…
## $ sex <chr> "male", "female", "female", "female", "male", …
## $ age <dbl> 22, 38, 26, 35, 35, 27, 54, 2, 27, 14, 4, 58, …
## $ siblings_spouses_aboard <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0…
## $ parents_children_aboard <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0…
## $ fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.45…
## $ with_family <dbl> 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0…
colnames(mutated_titanic)
## [1] "survived" "pclass"
## [3] "name" "sex"
## [5] "age" "siblings_spouses_aboard"
## [7] "parents_children_aboard" "fare"
## [9] "with_family"
mutated_titanic_2 <-mutated_titanic %>%
mutate(is_minor= case_when(age<=18~-1,
age>=19~0)) %>% glimpse()
## Rows: 887
## Columns: 10
## $ survived <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1…
## $ pclass <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2…
## $ name <chr> "Mr. Owen Harris Braund", "Mrs. John Bradley (…
## $ sex <chr> "male", "female", "female", "female", "male", …
## $ age <dbl> 22, 38, 26, 35, 35, 27, 54, 2, 27, 14, 4, 58, …
## $ siblings_spouses_aboard <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0…
## $ parents_children_aboard <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0…
## $ fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.45…
## $ with_family <dbl> 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0…
## $ is_minor <dbl> 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, 0, 0, -…
Percentage first class = 24.35175 % Percentage family onboard =
31.9053 %
Percentage minors = 18.71477 %
CLASS
total_obs<- count(mutated_titanic_2)
as.integer(total_obs)
## [1] 887
class_percentages<- mutated_titanic_2 %>% group_by(pclass) %>%
summarise(num_obs = n(),
contant=887,
percentage_class= ((num_obs/887)*100))
class_percentages
## # A tibble: 3 × 4
## pclass num_obs contant percentage_class
## <int> <int> <dbl> <dbl>
## 1 1 216 887 24.4
## 2 2 184 887 20.7
## 3 3 487 887 54.9
FAMILY
with_fam_perc <- mutated_titanic_2 %>% group_by(with_family) %>%
summarise(num_obs = n(),
contant=887,
percentage_with_fam= ((num_obs/887)*100))
with_fam_perc
## # A tibble: 2 × 4
## with_family num_obs contant percentage_with_fam
## <dbl> <int> <dbl> <dbl>
## 1 0 604 887 68.1
## 2 1 283 887 31.9
UNDER AGE
is_minor_perc <- mutated_titanic_2 %>% group_by(is_minor) %>%
summarise(num_obs = n(),
contant=887,
percentage_is_minor= ((num_obs/887)*100))
is_minor_perc
## # A tibble: 2 × 4
## is_minor num_obs contant percentage_is_minor
## <dbl> <int> <dbl> <dbl>
## 1 -1 166 887 18.7
## 2 0 721 887 81.3
colnames(mutated_titanic_2)
## [1] "survived" "pclass"
## [3] "name" "sex"
## [5] "age" "siblings_spouses_aboard"
## [7] "parents_children_aboard" "fare"
## [9] "with_family" "is_minor"
mutated_titanic_2 %>% select(name,age,fare) %>%
arrange(-fare) %>%
head(5)
## name age fare
## 1 Miss. Anna Ward 35 512.3292
## 2 Mr. Thomas Drake Martinez Cardeza 36 512.3292
## 3 Mr. Gustave J Lesurer 35 512.3292
## 4 Mr. Charles Alexander Fortune 19 263.0000
## 5 Miss. Mabel Helen Fortune 23 263.0000
mutated_titanic_2 %>% group_by(pclass) %>%
summarise(num_obs=n())
## # A tibble: 3 × 2
## pclass num_obs
## <int> <int>
## 1 1 216
## 2 2 184
## 3 3 487
mutated_titanic_2 %>% group_by(pclass) %>% select(name,age,pclass) %>%
summarise(oldest_by_class=max(age))
## # A tibble: 3 × 2
## pclass oldest_by_class
## <int> <dbl>
## 1 1 80
## 2 2 70
## 3 3 74
Yes, because 62.9629% of the first class passengers survived.
glimpse(mutated_titanic_2)
## Rows: 887
## Columns: 10
## $ survived <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1…
## $ pclass <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2…
## $ name <chr> "Mr. Owen Harris Braund", "Mrs. John Bradley (…
## $ sex <chr> "male", "female", "female", "female", "male", …
## $ age <dbl> 22, 38, 26, 35, 35, 27, 54, 2, 27, 14, 4, 58, …
## $ siblings_spouses_aboard <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0…
## $ parents_children_aboard <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0…
## $ fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.45…
## $ with_family <dbl> 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0…
## $ is_minor <dbl> 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, -1, 0, 0, 0, -…
mutated_titanic_2 %>% group_by(pclass,survived) %>%
summarise(n_obs=n()) %>%
mutate(total_by_class=sum(n_obs),
per_survived_by_class=(n_obs/total_by_class)*100) %>%
filter(survived>0)
## `summarise()` has grouped output by 'pclass'. You can override using the
## `.groups` argument.
## # A tibble: 3 × 5
## # Groups: pclass [3]
## pclass survived n_obs total_by_class per_survived_by_class
## <int> <int> <int> <int> <dbl>
## 1 1 1 136 216 63.0
## 2 2 1 87 184 47.3
## 3 3 1 119 487 24.4
Most of the variables make more sense when the are analized with another one, for example it is common to think that more females survived than men, however, when we study this variable alongside class, we normally found out that females from the first class survived more than females from other classes. Another interesting varibale is age, one would think that minors survived more than other groups, in terms of proportion, the code below shows us that more minors survived than the other age group
mutated_group_by_minor_survived <- mutated_titanic_2 %>% group_by(survived, is_minor) %>%
summarise(n_obs=n(), .groups="drop") %>%
mutate(is_minor_text= case_when(is_minor==-1 ~ "Yes",
is_minor!=-1 ~"No")) %>%
ungroup()
mutated_group_by_minor_survived
## # A tibble: 4 × 4
## survived is_minor n_obs is_minor_text
## <int> <dbl> <int> <chr>
## 1 0 -1 88 Yes
## 2 0 0 457 No
## 3 1 -1 78 Yes
## 4 1 0 264 No
mutated_group_by_minor_survived %>% group_by(is_minor) %>%
mutate(total_survived=sum(n_obs,by=is_minor),
perc_if_minor_survived= (n_obs/total_survived)*100) %>%
arrange(-is_minor)
## # A tibble: 4 × 6
## # Groups: is_minor [2]
## survived is_minor n_obs is_minor_text total_survived perc_if_minor_survived
## <int> <dbl> <int> <chr> <dbl> <dbl>
## 1 0 0 457 No 721 63.4
## 2 1 0 264 No 721 36.6
## 3 0 -1 88 Yes 164 53.7
## 4 1 -1 78 Yes 164 47.6
# per_survived_if_minor=(n_obs/total_if_minor)*100)
there are 640 family names
family_name_titanic<- mutated_titanic_2%>%
separate(name, c("prefix","firstname","lastname"), sep = " ", remove = FALSE) %>%
mutate(familyname = str_extract(lastname, "^[^ ]+"))
## Warning: Expected 3 pieces. Additional pieces discarded in 562 rows [1, 2, 4, 5, 7, 8,
## 9, 10, 11, 13, 14, 15, 16, 18, 19, 21, 24, 25, 26, 27, ...].
family_name_titanic %>% select(survived,pclass,familyname) %>%
distinct(familyname)
## familyname
## 1 Harris
## 2 Bradley
## 3 Heikkinen
## 4 Heath
## 5 Henry
## 6 Moran
## 7 J
## 8 Leonard
## 9 W
## 10 (Adele
## 11 Rut
## 12 Bonnell
## 13 Johan
## 14 Amanda
## 15 D
## 16 Rice
## 17 Eugene
## 18 (Emelia
## 19 Masselmani
## 20 Beesley
## 21 McGowan
## 22 Thompson
## 23 Danira
## 24 Oscar
## 25 Chehab
## 26 Alexander
## 27 O'Dwyer
## 28 Todoroff
## 29 E
## 30 Augustus
## 31 Agatha
## 32 H
## 33 Joseph
## 34 Oskar
## 35 Mamee
## 36 Charles
## 37 Maria
## 38 Nicola-Yarred
## 39 (Johanna
## 40 John
## 41 Marie
## 42 Delia
## 43 Lennon
## 44 O'Driscoll
## 45 Samaan
## 46 (Josefine
## 47 Niilo
## 48 Cater
## 49 Sleeper
## 50 (Elizabeth
## 51 Cornelius
## 52 Woolner
## 53 Rugg
## 54 Novel
## 55 Mirium
## 56 Frederick
## 57 Sirayanian
## 58 Icard
## 59 Birkhardt
## 60 Skoog
## 61 A
## 62 Moubarek
## 63 Ramell)
## 64 James
## 65 Alexandra
## 66 Kink
## 67 Curnow
## 68 Amy
## 69 Jr
## 70 Chronopoulos
## 71 Bing
## 72 Hansen
## 73 Staneff
## 74 Haim
## 75 Gates
## 76 Dowdell
## 77 Waelens
## 78 Baptist
## 79 M
## 80 Ilett
## 81 Alfred
## 82 Neal
## 83 Francis
## 84 Helen
## 85 Celotti
## 86 Christmann
## 87 Edvin
## 88 Fuller
## 89 Frank
## 90 Coxon
## 91 B
## 92 Bertram
## 93 T
## 94 Kantor
## 95 Petranec
## 96 Petroff
## 97 Frasar
## 98 Joel
## 99 Vilhelm
## 100 Mionoff
## 101 Kristine
## 102 Rekic
## 103 Chamberlain
## 104 Zabour
## 105 Jussila
## 106 Attalah
## 107 Pekoniemi
## 108 Connors
## 109 Edmond
## 110 Anna
## 111 George
## 112 Nasser
## 113 Webber
## 114 Wayland
## 115 McMahon
## 116 Arne
## 117 Peter
## 118 Ekstrom
## 119 Drazenoic
## 120 Fernandeo
## 121 (Mathilde
## 122 Richard
## 123 Monypeny
## 124 Elon
## 125 Giglio
## 126 (Sultana)
## 127 Sofia
## 128 Pietari
## 129 Burke
## 130 Samuel
## 131 Edvard
## 132 Maggie
## 133 Navratil
## 134 Roussel
## 135 (Edith
## 136 Meo
## 137 Blyler
## 138 Martin
## 139 Duane
## 140 Gilnagh
## 141 Corn
## 142 Smiljanic
## 143 Hatfield
## 144 Viktor
## 145 Calic
## 146 Viljami
## 147 Martha
## 148 (Anna
## 149 Ling
## 150 Van
## 151 Ileen
## 152 Wilhelm
## 153 Clinch
## 154 Albin
## 155 Forbes
## 156 Elizabeth
## 157 Hale
## 158 Gladys
## 159 Pernot
## 160 Gustaf
## 161 F
## 162 Gretchen
## 163 Roscoe
## 164 Hallace
## 165 Bourke
## 166 Turcin
## 167 Pinsky
## 168 Carbines
## 169 Christine
## 170 Lurette
## 171 Mernagh
## 172 Siegwart
## 173 Madigan
## 174 Yrois
## 175 Cyriel
## 176 Sage
## 177 Youseff
## 178 Cohen
## 179 Matilda
## 180 Cassem
## 181 Carr
## 182 Blank
## 183 Ali
## 184 Annie
## 185 Kristensen
## 186 Kiernan
## 187 Newell
## 188 Honkanen
## 189 Bazzani
## 190 Nenkoff
## 191 Maxfield
## 192 Ivar
## 193 Hall
## 194 Jonas
## 195 Lefebre
## 196 Adolf
## 197 Gertrud
## 198 William
## 199 Phoebe
## 200 Hold
## 201 Collyer
## 202 Murphy
## 203 Alexanteri
## 204 Edward
## 205 Thorilda
## 206 (Anna)
## 207 Courtenay
## 208 (Elna
## 209 Thomas
## 210 Arthur
## 211 (Helena
## 212 (Hanne
## 213 Maybelle
## 214 Cherry
## 215 Ward
## 216 Davis)
## 217 Rojj
## 218 Taussig
## 219 Harrison
## 220 Reeves
## 221 Arvid
## 222 Ulrik
## 223 Bissette
## 224 Cairns
## 225 Anne
## 226 Healy
## 227 Theodosia
## 228 Charlotta
## 229 Parkes
## 230 (Rosa
## 231 de
## 232 Stankovic
## 233 Naidenoff
## 234 Hosono
## 235 Connolly
## 236 Barber
## 237 Jacques
## 238 Haas
## 239 Mineff
## 240 G
## 241 Hanna
## 242 Loraine
## 243 Saalfeld
## 244 (Helene
## 245 Katherine
## 246 McCoy
## 247 Cahoone
## 248 Hugh
## 249 Trevor
## 250 Fleming
## 251 Abelson
## 252 Mabel
## 253 Bechstein
## 254 Borie
## 255 Hendekovic
## 256 Hart
## 257 Josefina
## 258 (Miriam
## 259 Moraweck
## 260 Natalie
## 261 Oakley
## 262 Dennis
## 263 Danoff
## 264 Mary
## 265 Grice
## 266 Gertrude
## 267 Partner
## 268 Edmondus
## 269 Denkoff
## 270 Clinton
## 271 Margaret
## 272 Edwart
## 273 Weart
## 274 Roger
## 275 Hubert
## 276 Brown
## 277 Elsie
## 278 Loch
## 279 Dimic
## 280 Fellows
## 281 Elias
## 282 Arnold-Franchi
## 283 Yousif
## 284 Edith
## 285 Clemmer
## 286 McGovern
## 287 del
## 288 David)
## 289 Asim
## 290 O'Brien
## 291 Nils
## 292 Manley
## 293 Boulos)
## 294 Jermyn
## 295 Pauline
## 296 Achilles
## 297 Ringhini
## 298 Viola
## 299 Adelia
## 300 Elkins
## 301 Betros
## 302 Gideon
## 303 Bidois
## 304 Nakid
## 305 Tikkanen
## 306 Plotcharsky
## 307 Buss
## 308 Sadlier
## 309 Lehmann
## 310 Ernest
## 311 Olof
## 312 Birger
## 313 (Agnes
## 314 Johansson
## 315 Olsson
## 316 David
## 317 Pain
## 318 Niskanen
## 319 Adams
## 320 Aina
## 321 Oreskovic
## 322 Gale
## 323 Rowe
## 324 Sdycoff
## 325 Julian
## 326 (Annie
## 327 Vivian
## 328 Karoliina
## 329 Charters
## 330 Zimmerman
## 331 Gilbert
## 332 Wiseman
## 333 V
## 334 Florence
## 335 Flynn
## 336 (Berk
## 337 Hakan
## 338 (Florence
## 339 Erland
## 340 Baird
## 341 Polk
## 342 (Emily
## 343 Fortune
## 344 Henrik
## 345 (Esther
## 346 Hampe
## 347 Emil
## 348 Reynaldo
## 349 Johannesen-Bratthammer
## 350 Dodge
## 351 Violet
## 352 Kimber
## 353 Catherine
## 354 Godfrey
## 355 Olai
## 356 Laventall
## 357 L
## 358 Peduzzi
## 359 Jalsevac
## 360 Davis
## 361 R
## 362 Toomey
## 363 O'Connor
## 364 Anderson
## 365 Morley
## 366 Christian
## 367 Maisner
## 368 Estanslas
## 369 Campbell
## 370 Montgomery
## 371 Scanlan
## 372 Barbara
## 373 Keefe
## 374 Cacic
## 375 S
## 376 Quincy
## 377 August
## 378 Victor
## 379 Wood
## 380 Turkula
## 381 Austin
## 382 Leslie
## 383 Mathias
## 384 Windelov
## 385 Markland
## 386 Artagaveytia
## 387 Roland
## 388 Yousseff
## 389 Mussey
## 390 Svensson
## 391 Canavan
## 392 Maioni
## 393 Margido
## 394 Lang
## 395 Patrick
## 396 Robert
## 397 Coleff
## 398 Milley)
## 399 Ryan
## 400 Pavlovic
## 401 Perreault
## 402 Vovk
## 403 Lahoud
## 404 Albert
## 405 Kassem
## 406 Farrell
## 407 Ridsdale
## 408 Farthing
## 409 Werner
## 410 May
## 411 Toufik
## 412 (Catherine
## 413 Miriam
## 414 Willingham
## 415 LeRoy
## 416 Beard
## 417 Margaritha
## 418 Constanzia
## 419 Elisabeth
## 420 Beane
## 421 Donald
## 422 (Ethel
## 423 Padro
## 424 Morgan
## 425 Borland
## 426 Leeni
## 427 Ohman
## 428 Wright
## 429 Christiana
## 430 Robbins
## 431 (Tillie
## 432 Rowan
## 433 Sivic
## 434 Douglas
## 435 Simmons
## 436 Ogden)
## 437 Stoytcheff
## 438 (Alma
## 439 Doharr
## 440 Jonsson
## 441 Dale
## 442 Irwin
## 443 Kelly
## 444 Patchett
## 445 Garside
## 446 (Maria
## 447 Rachel
## 448 Hugo
## 449 Paulner
## 450 Denzil
## 451 Frolicher-Stehli
## 452 Gilinski
## 453 Murdlin
## 454 Rintamaki
## 455 Baptiste
## 456 Wills
## 457 Johnson
## 458 Boulos
## 459 Edmund
## 460 Slabenoff
## 461 Homer
## 462 Bengtsson
## 463 Karaic
## 464 Williams
## 465 (Juliette
## 466 Neto
## 467 Jane
## 468 Horgan
## 469 Herman
## 470 Louise
## 471 Gavey
## 472 Yasbeck
## 473 Nelson
## 474 Damsgaard
## 475 Sutton
## 476 Fiske
## 477 Bostandyeff
## 478 Stahelin-Maeglin
## 479 Thorneycroft
## 480 Peder
## 481 Sagesser
## 482 Foo
## 483 Baclini
## 484 Cor
## 485 Alfons
## 486 Willey
## 487 Zillah
## 488 Mitkoff
## 489 Doling
## 490 Halvorsen
## 491 O'Leary
## 492 Hegarty
## 493 Mark
## 494 Radeff
## 495 (Catherine)
## 496 Floyd
## 497 Webster
## 498 Badt
## 499 Pomeroy
## 500 Hickman
## 501 Fenton
## 502 Paust
## 503 Cook
## 504 Zebley
## 505 Davidson
## 506 Michael
## 507 Wilhelms
## 508 Hastings
## 509 Hjalmar
## 510 (Augusta
## 511 Drake
## 512 Peters
## 513 Hassab
## 514 Philippe
## 515 Arnold
## 516 Dakic
## 517 Thelander
## 518 Adrian
## 519 Karun
## 520 Lam
## 521 Saad
## 522 Weir
## 523 Mullens
## 524 Jacob
## 525 Gallagher
## 526 Juul
## 527 Pennington
## 528 Cleaver
## 529 Gonios
## 530 Antonine
## 531 Klaber
## 532 Greenberg
## 533 Andreas
## 534 Celia
## 535 Joackim
## 536 Jessie
## 537 Lauritz
## 538 Price
## 539 Mannion
## 540 Walton
## 541 Aaron
## 542 (Margaret
## 543 Ivanoff
## 544 Nankoff
## 545 Parker
## 546 McNamee
## 547 Stranden
## 548 Gifford
## 549 Sinkkonen
## 550 Warner
## 551 Connaghton
## 552 Wells
## 553 Moor
## 554 Jonkoff
## 555 (Jane
## 556 Hamalainen
## 557 Sigfrid
## 558 Andrew
## 559 of
## 560 Garfirth
## 561 Antino
## 562 Assi
## 563 Linus
## 564 C
## 565 Jackson
## 566 Mangan
## 567 Danielsen
## 568 Aime
## 569 Mack
## 570 (Eliza
## 571 Fabian
## 572 Tobin
## 573 Ethel
## 574 Scott
## 575 Ayoub
## 576 Clyde
## 577 (David
## 578 Vere
## 579 Guggenheim
## 580 Keane
## 581 Gaskell
## 582 Fisher
## 583 Dantcheff
## 584 Otter
## 585 (Farnham)
## 586 Osman
## 587 Ibrahim
## 588 Ponesell
## 589 (Charlotte
## 590 Thornton
## 591 Natalia
## 592 Meyer
## 593 Lester
## 594 Iris
## 595 Portage
## 596 Fry
## 597 Mallet
## 598 Fredrik
## 599 Thorsten
## 600 Melville
## 601 Lulic
## 602 Abraham
## 603 (Selini
## 604 Sibley
## 605 Augustsson
## 606 Rebecca
## 607 Pasic
## 608 Sirota
## 609 Chip
## 610 Marechal
## 611 Rudolf
## 612 Serepeca
## 613 Culumovic
## 614 Abbing
## 615 Bullen
## 616 Markoff
## 617 Harper
## 618 Harald
## 619 Conover
## 620 (Leah
## 621 Dennick
## 622 Denis
## 623 (Latifa
## 624 Razi
## 625 Bystrom
## 626 Duran
## 627 van
## 628 Theodor
## 629 Balkic
## 630 Vander
## 631 (Hannah
## 632 Kiamie
## 633 Ossian
## 634 Laleff
## 635 (Imanita
## 636 Markun
## 637 Ulrika
## 638 Montvila
## 639 Howell
## 640 Dooley