This analysis uses the Titanic dataset by selecting four numerical variables: Age, SibSp, Parch, and Fare. These variables represent passenger demographics, family structure, and economic condition. Rows containing missing values were removed using the na.omit() function to ensure that all statistical calculations were based on complete observations.
After this cleaning process, the dataset consisted of 714 observations and 4 numerical variables. Age and Fare are continuous variables, while SibSp and Parch are discrete variables. Although the removal of missing values reduced the dataset size from the original data, this step was necessary to avoid bias and ensure the validity of the correlation, covariance, and eigen analyses.
df <- read.csv("Titanic-Dataset.csv")
head(df)
## PassengerId Survived Pclass
## 1 1 0 3
## 2 2 1 1
## 3 3 1 3
## 4 4 1 1
## 5 5 0 3
## 6 6 0 3
## Name Sex Age SibSp Parch
## 1 Braund, Mr. Owen Harris male 22 1 0
## 2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0
## 3 Heikkinen, Miss. Laina female 26 0 0
## 4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0
## 5 Allen, Mr. William Henry male 35 0 0
## 6 Moran, Mr. James male NA 0 0
## Ticket Fare Cabin Embarked
## 1 A/5 21171 7.2500 S
## 2 PC 17599 71.2833 C85 C
## 3 STON/O2. 3101282 7.9250 S
## 4 113803 53.1000 C123 S
## 5 373450 8.0500 S
## 6 330877 8.4583 Q
summary(df)
## PassengerId Survived Pclass Name
## Min. : 1.0 Min. :0.0000 Min. :1.000 Length:891
## 1st Qu.:223.5 1st Qu.:0.0000 1st Qu.:2.000 Class :character
## Median :446.0 Median :0.0000 Median :3.000 Mode :character
## Mean :446.0 Mean :0.3838 Mean :2.309
## 3rd Qu.:668.5 3rd Qu.:1.0000 3rd Qu.:3.000
## Max. :891.0 Max. :1.0000 Max. :3.000
##
## Sex Age SibSp Parch
## Length:891 Min. : 0.42 Min. :0.000 Min. :0.0000
## Class :character 1st Qu.:20.12 1st Qu.:0.000 1st Qu.:0.0000
## Mode :character Median :28.00 Median :0.000 Median :0.0000
## Mean :29.70 Mean :0.523 Mean :0.3816
## 3rd Qu.:38.00 3rd Qu.:1.000 3rd Qu.:0.0000
## Max. :80.00 Max. :8.000 Max. :6.0000
## NA's :177
## Ticket Fare Cabin Embarked
## Length:891 Min. : 0.00 Length:891 Length:891
## Class :character 1st Qu.: 7.91 Class :character Class :character
## Mode :character Median : 14.45 Mode :character Mode :character
## Mean : 32.20
## 3rd Qu.: 31.00
## Max. :512.33
##
tail(df)
## PassengerId Survived Pclass Name Sex
## 886 886 0 3 Rice, Mrs. William (Margaret Norton) female
## 887 887 0 2 Montvila, Rev. Juozas male
## 888 888 1 1 Graham, Miss. Margaret Edith female
## 889 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female
## 890 890 1 1 Behr, Mr. Karl Howell male
## 891 891 0 3 Dooley, Mr. Patrick male
## Age SibSp Parch Ticket Fare Cabin Embarked
## 886 39 0 5 382652 29.125 Q
## 887 27 0 0 211536 13.000 S
## 888 19 0 0 112053 30.000 B42 S
## 889 NA 1 2 W./C. 6607 23.450 S
## 890 26 0 0 111369 30.000 C148 C
## 891 32 0 0 370376 7.750 Q
str(df)
## 'data.frame': 891 obs. of 12 variables:
## $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
## $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
## $ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
## $ Sex : chr "male" "female" "female" "female" ...
## $ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
## $ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
## $ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
## $ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
## $ Cabin : chr "" "C85" "" "C123" ...
## $ Embarked : chr "S" "C" "S" "S" ...
colnames(df)
## [1] "PassengerId" "Survived" "Pclass" "Name" "Sex"
## [6] "Age" "SibSp" "Parch" "Ticket" "Fare"
## [11] "Cabin" "Embarked"
colSums(is.na(df))
## PassengerId Survived Pclass Name Sex Age
## 0 0 0 0 0 177
## SibSp Parch Ticket Fare Cabin Embarked
## 0 0 0 0 0 0
colMeans(is.na(df)) * 100
## PassengerId Survived Pclass Name Sex Age
## 0.00000 0.00000 0.00000 0.00000 0.00000 19.86532
## SibSp Parch Ticket Fare Cabin Embarked
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
df[!complete.cases(df), ]
## PassengerId Survived Pclass
## 6 6 0 3
## 18 18 1 2
## 20 20 1 3
## 27 27 0 3
## 29 29 1 3
## 30 30 0 3
## 32 32 1 1
## 33 33 1 3
## 37 37 1 3
## 43 43 0 3
## 46 46 0 3
## 47 47 0 3
## 48 48 1 3
## 49 49 0 3
## 56 56 1 1
## 65 65 0 1
## 66 66 1 3
## 77 77 0 3
## 78 78 0 3
## 83 83 1 3
## 88 88 0 3
## 96 96 0 3
## 102 102 0 3
## 108 108 1 3
## 110 110 1 3
## 122 122 0 3
## 127 127 0 3
## 129 129 1 3
## 141 141 0 3
## 155 155 0 3
## 159 159 0 3
## 160 160 0 3
## 167 167 1 1
## 169 169 0 1
## 177 177 0 3
## 181 181 0 3
## 182 182 0 2
## 186 186 0 1
## 187 187 1 3
## 197 197 0 3
## 199 199 1 3
## 202 202 0 3
## 215 215 0 3
## 224 224 0 3
## 230 230 0 3
## 236 236 0 3
## 241 241 0 3
## 242 242 1 3
## 251 251 0 3
## 257 257 1 1
## 261 261 0 3
## 265 265 0 3
## 271 271 0 1
## 275 275 1 3
## 278 278 0 2
## 285 285 0 1
## 296 296 0 1
## 299 299 1 1
## 301 301 1 3
## 302 302 1 3
## 304 304 1 2
## 305 305 0 3
## 307 307 1 1
## 325 325 0 3
## 331 331 1 3
## 335 335 1 1
## 336 336 0 3
## 348 348 1 3
## 352 352 0 1
## 355 355 0 3
## 359 359 1 3
## 360 360 1 3
## 365 365 0 3
## 368 368 1 3
## 369 369 1 3
## 376 376 1 1
## 385 385 0 3
## 389 389 0 3
## 410 410 0 3
## 411 411 0 3
## 412 412 0 3
## 414 414 0 2
## 416 416 0 3
## 421 421 0 3
## 426 426 0 3
## 429 429 0 3
## 432 432 1 3
## 445 445 1 3
## 452 452 0 3
## 455 455 0 3
## 458 458 1 1
## 460 460 0 3
## 465 465 0 3
## 467 467 0 2
## 469 469 0 3
## 471 471 0 3
## 476 476 0 1
## 482 482 0 2
## 486 486 0 3
## 491 491 0 3
## 496 496 0 3
## 498 498 0 3
## 503 503 0 3
## 508 508 1 1
## 512 512 0 3
## 518 518 0 3
## 523 523 0 3
## 525 525 0 3
## 528 528 0 1
## 532 532 0 3
## 534 534 1 3
## 539 539 0 3
## 548 548 1 2
## 553 553 0 3
## 558 558 0 1
## 561 561 0 3
## 564 564 0 3
## 565 565 0 3
## 569 569 0 3
## 574 574 1 3
## 579 579 0 3
## 585 585 0 3
## 590 590 0 3
## 594 594 0 3
## 597 597 1 2
## 599 599 0 3
## 602 602 0 3
## 603 603 0 1
## 612 612 0 3
## 613 613 1 3
## 614 614 0 3
## 630 630 0 3
## 634 634 0 1
## 640 640 0 3
## 644 644 1 3
## 649 649 0 3
## 651 651 0 3
## 654 654 1 3
## 657 657 0 3
## 668 668 0 3
## 670 670 1 1
## 675 675 0 2
## 681 681 0 3
## 693 693 1 3
## 698 698 1 3
## 710 710 1 3
## 712 712 0 1
## 719 719 0 3
## 728 728 1 3
## 733 733 0 2
## 739 739 0 3
## 740 740 0 3
## 741 741 1 1
## 761 761 0 3
## 767 767 0 1
## 769 769 0 3
## 774 774 0 3
## 777 777 0 3
## 779 779 0 3
## 784 784 0 3
## 791 791 0 3
## 793 793 0 3
## 794 794 0 1
## 816 816 0 1
## 826 826 0 3
## 827 827 0 3
## 829 829 1 3
## 833 833 0 3
## 838 838 0 3
## 840 840 1 1
## 847 847 0 3
## 850 850 1 1
## 860 860 0 3
## 864 864 0 3
## 869 869 0 3
## 879 879 0 3
## 889 889 0 3
## Name Sex Age SibSp Parch
## 6 Moran, Mr. James male NA 0 0
## 18 Williams, Mr. Charles Eugene male NA 0 0
## 20 Masselmani, Mrs. Fatima female NA 0 0
## 27 Emir, Mr. Farred Chehab male NA 0 0
## 29 O'Dwyer, Miss. Ellen "Nellie" female NA 0 0
## 30 Todoroff, Mr. Lalio male NA 0 0
## 32 Spencer, Mrs. William Augustus (Marie Eugenie) female NA 1 0
## 33 Glynn, Miss. Mary Agatha female NA 0 0
## 37 Mamee, Mr. Hanna male NA 0 0
## 43 Kraeff, Mr. Theodor male NA 0 0
## 46 Rogers, Mr. William John male NA 0 0
## 47 Lennon, Mr. Denis male NA 1 0
## 48 O'Driscoll, Miss. Bridget female NA 0 0
## 49 Samaan, Mr. Youssef male NA 2 0
## 56 Woolner, Mr. Hugh male NA 0 0
## 65 Stewart, Mr. Albert A male NA 0 0
## 66 Moubarek, Master. Gerios male NA 1 1
## 77 Staneff, Mr. Ivan male NA 0 0
## 78 Moutal, Mr. Rahamin Haim male NA 0 0
## 83 McDermott, Miss. Brigdet Delia female NA 0 0
## 88 Slocovski, Mr. Selman Francis male NA 0 0
## 96 Shorney, Mr. Charles Joseph male NA 0 0
## 102 Petroff, Mr. Pastcho ("Pentcho") male NA 0 0
## 108 Moss, Mr. Albert Johan male NA 0 0
## 110 Moran, Miss. Bertha female NA 1 0
## 122 Moore, Mr. Leonard Charles male NA 0 0
## 127 McMahon, Mr. Martin male NA 0 0
## 129 Peter, Miss. Anna female NA 1 1
## 141 Boulos, Mrs. Joseph (Sultana) female NA 0 2
## 155 Olsen, Mr. Ole Martin male NA 0 0
## 159 Smiljanic, Mr. Mile male NA 0 0
## 160 Sage, Master. Thomas Henry male NA 8 2
## 167 Chibnall, Mrs. (Edith Martha Bowerman) female NA 0 1
## 169 Baumann, Mr. John D male NA 0 0
## 177 Lefebre, Master. Henry Forbes male NA 3 1
## 181 Sage, Miss. Constance Gladys female NA 8 2
## 182 Pernot, Mr. Rene male NA 0 0
## 186 Rood, Mr. Hugh Roscoe male NA 0 0
## 187 O'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey) female NA 1 0
## 197 Mernagh, Mr. Robert male NA 0 0
## 199 Madigan, Miss. Margaret "Maggie" female NA 0 0
## 202 Sage, Mr. Frederick male NA 8 2
## 215 Kiernan, Mr. Philip male NA 1 0
## 224 Nenkoff, Mr. Christo male NA 0 0
## 230 Lefebre, Miss. Mathilde female NA 3 1
## 236 Harknett, Miss. Alice Phoebe female NA 0 0
## 241 Zabour, Miss. Thamine female NA 1 0
## 242 Murphy, Miss. Katherine "Kate" female NA 1 0
## 251 Reed, Mr. James George male NA 0 0
## 257 Thorne, Mrs. Gertrude Maybelle female NA 0 0
## 261 Smith, Mr. Thomas male NA 0 0
## 265 Henry, Miss. Delia female NA 0 0
## 271 Cairns, Mr. Alexander male NA 0 0
## 275 Healy, Miss. Hanora "Nora" female NA 0 0
## 278 Parkes, Mr. Francis "Frank" male NA 0 0
## 285 Smith, Mr. Richard William male NA 0 0
## 296 Lewy, Mr. Ervin G male NA 0 0
## 299 Saalfeld, Mr. Adolphe male NA 0 0
## 301 Kelly, Miss. Anna Katherine "Annie Kate" female NA 0 0
## 302 McCoy, Mr. Bernard male NA 2 0
## 304 Keane, Miss. Nora A female NA 0 0
## 305 Williams, Mr. Howard Hugh "Harry" male NA 0 0
## 307 Fleming, Miss. Margaret female NA 0 0
## 325 Sage, Mr. George John Jr male NA 8 2
## 331 McCoy, Miss. Agnes female NA 2 0
## 335 Frauenthal, Mrs. Henry William (Clara Heinsheimer) female NA 1 0
## 336 Denkoff, Mr. Mitto male NA 0 0
## 348 Davison, Mrs. Thomas Henry (Mary E Finck) female NA 1 0
## 352 Williams-Lambert, Mr. Fletcher Fellows male NA 0 0
## 355 Yousif, Mr. Wazli male NA 0 0
## 359 McGovern, Miss. Mary female NA 0 0
## 360 Mockler, Miss. Helen Mary "Ellie" female NA 0 0
## 365 O'Brien, Mr. Thomas male NA 1 0
## 368 Moussa, Mrs. (Mantoura Boulos) female NA 0 0
## 369 Jermyn, Miss. Annie female NA 0 0
## 376 Meyer, Mrs. Edgar Joseph (Leila Saks) female NA 1 0
## 385 Plotcharsky, Mr. Vasil male NA 0 0
## 389 Sadlier, Mr. Matthew male NA 0 0
## 410 Lefebre, Miss. Ida female NA 3 1
## 411 Sdycoff, Mr. Todor male NA 0 0
## 412 Hart, Mr. Henry male NA 0 0
## 414 Cunningham, Mr. Alfred Fleming male NA 0 0
## 416 Meek, Mrs. Thomas (Annie Louise Rowley) female NA 0 0
## 421 Gheorgheff, Mr. Stanio male NA 0 0
## 426 Wiseman, Mr. Phillippe male NA 0 0
## 429 Flynn, Mr. James male NA 0 0
## 432 Thorneycroft, Mrs. Percival (Florence Kate White) female NA 1 0
## 445 Johannesen-Bratthammer, Mr. Bernt male NA 0 0
## 452 Hagland, Mr. Ingvald Olai Olsen male NA 1 0
## 455 Peduzzi, Mr. Joseph male NA 0 0
## 458 Kenyon, Mrs. Frederick R (Marion) female NA 1 0
## 460 O'Connor, Mr. Maurice male NA 0 0
## 465 Maisner, Mr. Simon male NA 0 0
## 467 Campbell, Mr. William male NA 0 0
## 469 Scanlan, Mr. James male NA 0 0
## 471 Keefe, Mr. Arthur male NA 0 0
## 476 Clifford, Mr. George Quincy male NA 0 0
## 482 Frost, Mr. Anthony Wood "Archie" male NA 0 0
## 486 Lefebre, Miss. Jeannie female NA 3 1
## 491 Hagland, Mr. Konrad Mathias Reiersen male NA 1 0
## 496 Yousseff, Mr. Gerious male NA 0 0
## 498 Shellard, Mr. Frederick William male NA 0 0
## 503 O'Sullivan, Miss. Bridget Mary female NA 0 0
## 508 Bradley, Mr. George ("George Arthur Brayton") male NA 0 0
## 512 Webber, Mr. James male NA 0 0
## 518 Ryan, Mr. Patrick male NA 0 0
## 523 Lahoud, Mr. Sarkis male NA 0 0
## 525 Kassem, Mr. Fared male NA 0 0
## 528 Farthing, Mr. John male NA 0 0
## 532 Toufik, Mr. Nakli male NA 0 0
## 534 Peter, Mrs. Catherine (Catherine Rizk) female NA 0 2
## 539 Risien, Mr. Samuel Beard male NA 0 0
## 548 Padro y Manent, Mr. Julian male NA 0 0
## 553 O'Brien, Mr. Timothy male NA 0 0
## 558 Robbins, Mr. Victor male NA 0 0
## 561 Morrow, Mr. Thomas Rowan male NA 0 0
## 564 Simmons, Mr. John male NA 0 0
## 565 Meanwell, Miss. (Marion Ogden) female NA 0 0
## 569 Doharr, Mr. Tannous male NA 0 0
## 574 Kelly, Miss. Mary female NA 0 0
## 579 Caram, Mrs. Joseph (Maria Elias) female NA 1 0
## 585 Paulner, Mr. Uscher male NA 0 0
## 590 Murdlin, Mr. Joseph male NA 0 0
## 594 Bourke, Miss. Mary female NA 0 2
## 597 Leitch, Miss. Jessie Wills female NA 0 0
## 599 Boulos, Mr. Hanna male NA 0 0
## 602 Slabenoff, Mr. Petco male NA 0 0
## 603 Harrington, Mr. Charles H male NA 0 0
## 612 Jardin, Mr. Jose Neto male NA 0 0
## 613 Murphy, Miss. Margaret Jane female NA 1 0
## 614 Horgan, Mr. John male NA 0 0
## 630 O'Connell, Mr. Patrick D male NA 0 0
## 634 Parr, Mr. William Henry Marsh male NA 0 0
## 640 Thorneycroft, Mr. Percival male NA 1 0
## 644 Foo, Mr. Choong male NA 0 0
## 649 Willey, Mr. Edward male NA 0 0
## 651 Mitkoff, Mr. Mito male NA 0 0
## 654 O'Leary, Miss. Hanora "Norah" female NA 0 0
## 657 Radeff, Mr. Alexander male NA 0 0
## 668 Rommetvedt, Mr. Knud Paust male NA 0 0
## 670 Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright) female NA 1 0
## 675 Watson, Mr. Ennis Hastings male NA 0 0
## 681 Peters, Miss. Katie female NA 0 0
## 693 Lam, Mr. Ali male NA 0 0
## 698 Mullens, Miss. Katherine "Katie" female NA 0 0
## 710 Moubarek, Master. Halim Gonios ("William George") male NA 1 1
## 712 Klaber, Mr. Herman male NA 0 0
## 719 McEvoy, Mr. Michael male NA 0 0
## 728 Mannion, Miss. Margareth female NA 0 0
## 733 Knight, Mr. Robert J male NA 0 0
## 739 Ivanoff, Mr. Kanio male NA 0 0
## 740 Nankoff, Mr. Minko male NA 0 0
## 741 Hawksford, Mr. Walter James male NA 0 0
## 761 Garfirth, Mr. John male NA 0 0
## 767 Brewe, Dr. Arthur Jackson male NA 0 0
## 769 Moran, Mr. Daniel J male NA 1 0
## 774 Elias, Mr. Dibo male NA 0 0
## 777 Tobin, Mr. Roger male NA 0 0
## 779 Kilgannon, Mr. Thomas J male NA 0 0
## 784 Johnston, Mr. Andrew G male NA 1 2
## 791 Keane, Mr. Andrew "Andy" male NA 0 0
## 793 Sage, Miss. Stella Anna female NA 8 2
## 794 Hoyt, Mr. William Fisher male NA 0 0
## 816 Fry, Mr. Richard male NA 0 0
## 826 Flynn, Mr. John male NA 0 0
## 827 Lam, Mr. Len male NA 0 0
## 829 McCormack, Mr. Thomas Joseph male NA 0 0
## 833 Saad, Mr. Amin male NA 0 0
## 838 Sirota, Mr. Maurice male NA 0 0
## 840 Marechal, Mr. Pierre male NA 0 0
## 847 Sage, Mr. Douglas Bullen male NA 8 2
## 850 Goldenberg, Mrs. Samuel L (Edwiga Grabowska) female NA 1 0
## 860 Razi, Mr. Raihed male NA 0 0
## 864 Sage, Miss. Dorothy Edith "Dolly" female NA 8 2
## 869 van Melkebeke, Mr. Philemon male NA 0 0
## 879 Laleff, Mr. Kristo male NA 0 0
## 889 Johnston, Miss. Catherine Helen "Carrie" female NA 1 2
## Ticket Fare Cabin Embarked
## 6 330877 8.4583 Q
## 18 244373 13.0000 S
## 20 2649 7.2250 C
## 27 2631 7.2250 C
## 29 330959 7.8792 Q
## 30 349216 7.8958 S
## 32 PC 17569 146.5208 B78 C
## 33 335677 7.7500 Q
## 37 2677 7.2292 C
## 43 349253 7.8958 C
## 46 S.C./A.4. 23567 8.0500 S
## 47 370371 15.5000 Q
## 48 14311 7.7500 Q
## 49 2662 21.6792 C
## 56 19947 35.5000 C52 S
## 65 PC 17605 27.7208 C
## 66 2661 15.2458 C
## 77 349208 7.8958 S
## 78 374746 8.0500 S
## 83 330932 7.7875 Q
## 88 SOTON/OQ 392086 8.0500 S
## 96 374910 8.0500 S
## 102 349215 7.8958 S
## 108 312991 7.7750 S
## 110 371110 24.1500 Q
## 122 A4. 54510 8.0500 S
## 127 370372 7.7500 Q
## 129 2668 22.3583 F E69 C
## 141 2678 15.2458 C
## 155 Fa 265302 7.3125 S
## 159 315037 8.6625 S
## 160 CA. 2343 69.5500 S
## 167 113505 55.0000 E33 S
## 169 PC 17318 25.9250 S
## 177 4133 25.4667 S
## 181 CA. 2343 69.5500 S
## 182 SC/PARIS 2131 15.0500 C
## 186 113767 50.0000 A32 S
## 187 370365 15.5000 Q
## 197 368703 7.7500 Q
## 199 370370 7.7500 Q
## 202 CA. 2343 69.5500 S
## 215 367229 7.7500 Q
## 224 349234 7.8958 S
## 230 4133 25.4667 S
## 236 W./C. 6609 7.5500 S
## 241 2665 14.4542 C
## 242 367230 15.5000 Q
## 251 362316 7.2500 S
## 257 PC 17585 79.2000 C
## 261 384461 7.7500 Q
## 265 382649 7.7500 Q
## 271 113798 31.0000 S
## 275 370375 7.7500 Q
## 278 239853 0.0000 S
## 285 113056 26.0000 A19 S
## 296 PC 17612 27.7208 C
## 299 19988 30.5000 C106 S
## 301 9234 7.7500 Q
## 302 367226 23.2500 Q
## 304 226593 12.3500 E101 Q
## 305 A/5 2466 8.0500 S
## 307 17421 110.8833 C
## 325 CA. 2343 69.5500 S
## 331 367226 23.2500 Q
## 335 PC 17611 133.6500 S
## 336 349225 7.8958 S
## 348 386525 16.1000 S
## 352 113510 35.0000 C128 S
## 355 2647 7.2250 C
## 359 330931 7.8792 Q
## 360 330980 7.8792 Q
## 365 370365 15.5000 Q
## 368 2626 7.2292 C
## 369 14313 7.7500 Q
## 376 PC 17604 82.1708 C
## 385 349227 7.8958 S
## 389 367655 7.7292 Q
## 410 4133 25.4667 S
## 411 349222 7.8958 S
## 412 394140 6.8583 Q
## 414 239853 0.0000 S
## 416 343095 8.0500 S
## 421 349254 7.8958 C
## 426 A/4. 34244 7.2500 S
## 429 364851 7.7500 Q
## 432 376564 16.1000 S
## 445 65306 8.1125 S
## 452 65303 19.9667 S
## 455 A/5 2817 8.0500 S
## 458 17464 51.8625 D21 S
## 460 371060 7.7500 Q
## 465 A/S 2816 8.0500 S
## 467 239853 0.0000 S
## 469 36209 7.7250 Q
## 471 323592 7.2500 S
## 476 110465 52.0000 A14 S
## 482 239854 0.0000 S
## 486 4133 25.4667 S
## 491 65304 19.9667 S
## 496 2627 14.4583 C
## 498 C.A. 6212 15.1000 S
## 503 330909 7.6292 Q
## 508 111427 26.5500 S
## 512 SOTON/OQ 3101316 8.0500 S
## 518 371110 24.1500 Q
## 523 2624 7.2250 C
## 525 2700 7.2292 C
## 528 PC 17483 221.7792 C95 S
## 532 2641 7.2292 C
## 534 2668 22.3583 C
## 539 364498 14.5000 S
## 548 SC/PARIS 2146 13.8625 C
## 553 330979 7.8292 Q
## 558 PC 17757 227.5250 C
## 561 372622 7.7500 Q
## 564 SOTON/OQ 392082 8.0500 S
## 565 SOTON/O.Q. 392087 8.0500 S
## 569 2686 7.2292 C
## 574 14312 7.7500 Q
## 579 2689 14.4583 C
## 585 3411 8.7125 C
## 590 A./5. 3235 8.0500 S
## 594 364848 7.7500 Q
## 597 248727 33.0000 S
## 599 2664 7.2250 C
## 602 349214 7.8958 S
## 603 113796 42.4000 S
## 612 SOTON/O.Q. 3101305 7.0500 S
## 613 367230 15.5000 Q
## 614 370377 7.7500 Q
## 630 334912 7.7333 Q
## 634 112052 0.0000 S
## 640 376564 16.1000 S
## 644 1601 56.4958 S
## 649 S.O./P.P. 751 7.5500 S
## 651 349221 7.8958 S
## 654 330919 7.8292 Q
## 657 349223 7.8958 S
## 668 312993 7.7750 S
## 670 19996 52.0000 C126 S
## 675 239856 0.0000 S
## 681 330935 8.1375 Q
## 693 1601 56.4958 S
## 698 35852 7.7333 Q
## 710 2661 15.2458 C
## 712 113028 26.5500 C124 S
## 719 36568 15.5000 Q
## 728 36866 7.7375 Q
## 733 239855 0.0000 S
## 739 349201 7.8958 S
## 740 349218 7.8958 S
## 741 16988 30.0000 D45 S
## 761 358585 14.5000 S
## 767 112379 39.6000 C
## 769 371110 24.1500 Q
## 774 2674 7.2250 C
## 777 383121 7.7500 F38 Q
## 779 36865 7.7375 Q
## 784 W./C. 6607 23.4500 S
## 791 12460 7.7500 Q
## 793 CA. 2343 69.5500 S
## 794 PC 17600 30.6958 C
## 816 112058 0.0000 B102 S
## 826 368323 6.9500 Q
## 827 1601 56.4958 S
## 829 367228 7.7500 Q
## 833 2671 7.2292 C
## 838 392092 8.0500 S
## 840 11774 29.7000 C47 C
## 847 CA. 2343 69.5500 S
## 850 17453 89.1042 C92 C
## 860 2629 7.2292 C
## 864 CA. 2343 69.5500 S
## 869 345777 9.5000 S
## 879 349217 7.8958 S
## 889 W./C. 6607 23.4500 S
data_selected <- df[, c("Age", "SibSp", "Parch", "Fare")]
data_clean <- na.omit(data_selected)
str(data_clean)
## 'data.frame': 714 obs. of 4 variables:
## $ Age : num 22 38 26 35 35 54 2 27 14 4 ...
## $ SibSp: int 1 1 0 1 0 0 3 0 1 1 ...
## $ Parch: int 0 0 0 0 0 0 1 2 0 1 ...
## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
## - attr(*, "na.action")= 'omit' Named int [1:177] 6 18 20 27 29 30 32 33 37 43 ...
## ..- attr(*, "names")= chr [1:177] "6" "18" "20" "27" ...
dim(data_clean)
## [1] 714 4
The histograms provide a clear visual overview of the distributions of Age, SibSp, Parch, and Fare. The Age variable shows a unimodal distribution that is relatively close to normal, with most passengers concentrated between approximately 20 and 40 years old. This indicates that the majority of Titanic passengers were young to middle-aged adults.
In contrast, SibSp and Parch both exhibit strong right-skewed distributions, where most passengers have values of 0 or 1, meaning they traveled alone or with very few family members. As the number of siblings, spouses, parents, or children increases, the frequency declines sharply, suggesting that large family groups were uncommon.
The Fare variable shows the most extreme right skew, with most passengers paying low ticket prices and a small number paying very high fares, reaching values above 500. This reflects significant economic inequality among passengers and differences in ticket class. Overall, these histograms indicate that Age has a more balanced distribution, while SibSp, Parch, and especially Fare are heavily right-skewed.
par(mfrow = c(2,2))
hist(data_clean$Age,
main = "Distribution of Age",
xlab = "Age",
col = "lightblue")
hist(data_clean$SibSp,
main = "Distribution of SibSp",
xlab = "SibSp",
col = "lightgreen")
hist(data_clean$Parch,
main = "Distribution of Parch",
xlab = "Parch",
col = "lightpink")
hist(data_clean$Fare,
main = "Distribution of Fare",
xlab = "Fare",
col = "khaki")
par(mfrow = c(1,1))
The correlation matrix was computed to measure the linear relationships among Age, SibSp, Parch, and Fare. The results show a moderate negative correlation between Age and SibSp, suggesting that younger passengers were more likely to travel with siblings or spouses.
Age also has a weak negative correlation with Parch, indicating that younger passengers more often traveled with parents or children. The strongest positive correlation appears between SibSp and Parch, which suggests that passengers traveling with siblings or spouses also tended to travel with other family members.
Fare shows weak positive correlations with SibSp and Parch, while its correlation with Age is very weak. Overall, the correlation matrix indicates that the relationships among variables are weak to moderate, and no strong linear dependency is present.
cor_matrix <- cor(data_clean)
cor_matrix
## Age SibSp Parch Fare
## Age 1.00000000 -0.3082468 -0.1891193 0.09606669
## SibSp -0.30824676 1.0000000 0.3838199 0.13832879
## Parch -0.18911926 0.3838199 1.0000000 0.20511888
## Fare 0.09606669 0.1383288 0.2051189 1.00000000
The correlation heatmap provides a visual representation of the correlation matrix, where blue colors indicate positive correlations and red colors indicate negative correlations. From the heatmap, it is evident that no correlation values are extremely high in magnitude. This confirms that multicollinearity is not a major issue in the dataset.
heatmap(
cor_matrix,
symm = TRUE,
col = colorRampPalette(c("red", "white", "blue"))(100),
main = "Heatmap of Correlation Matrix"
)
The variance–covariance matrix was calculated to examine the variability of each variable and how they vary together in their original measurement units. The variance of Fare is the largest, indicating a very high level of dispersion in ticket prices. The variance of Age is also relatively large, reflecting the wide range of passenger ages.
In contrast, SibSp and Parch have much smaller variances due to their limited ranges. The positive covariance between SibSp and Parch reinforces the earlier finding that family-related variables tend to increase together.
cov_matrix <- cov(data_clean)
cov_matrix
## Age SibSp Parch Fare
## Age 211.019125 -4.1633339 -2.3441911 73.849030
## SibSp -4.163334 0.8644973 0.3045128 6.806212
## Parch -2.344191 0.3045128 0.7281027 9.262176
## Fare 73.849030 6.8062117 9.2621760 2800.413100
Eigen values were obtained from the eigen decomposition of the covariance matrix to identify how much variance is explained by each principal component. The first eigen value is extremely large, indicating that the first principal component explains the majority of the total variance in the dataset.
The scree plot shows a sharp drop after the first component, suggesting that most of the information is concentrated in the first one or two components.
eigen_result <- eigen(cov_matrix)
eigen_values <- eigen_result$values
eigen_vectors <- eigen_result$vectors
eigen_values
## [1] 2802.5636587 209.0385659 0.9438783 0.4787214
eigen_vectors
## [,1] [,2] [,3] [,4]
## [1,] 0.028477552 0.99929943 -0.024018111 0.0035788596
## [2,] 0.002386349 -0.02093144 -0.773693322 0.6332099362
## [3,] 0.003280818 -0.01253786 -0.633088089 -0.7739712590
## [4,] 0.999586200 -0.02837826 0.004609234 0.0009266652
plot(
eigen_values,
type = "b",
pch = 19,
col = "blue",
xlab = "Principal Component",
ylab = "Eigen Value",
main = "Scree Plot of Eigen Values"
)
Eigen vectors describe how each original variable contributes to the principal components. The first principal component is almost entirely dominated by Fare, indicating that differences in ticket prices are the main source of variability in the dataset.
The second principal component is primarily associated with Age, while the remaining components are influenced mainly by SibSp and Parch but contribute relatively little variance.
Based on the entire analysis, Fare and Age are the dominant sources of variation in the dataset, while SibSp and Parch play a secondary role related to family structure. The correlations among variables are weak to moderate, indicating low multicollinearity and stable statistical properties.
Eigen analysis shows that most of the information in the dataset can be captured using one or two principal components, making dimensionality reduction techniques such as Principal Component Analysis highly suitable for further analysis. These results provide a strong statistical foundation for more advanced modeling or exploratory techniques using the Titanic dataset.