THIS FUNCTION HELPS US TO VIEW THE IMPORTED DATA.
RENAMING THE DATASET.
str(Titanic)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 891 obs. of 12 variables:
$ PassengerId: num 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : num 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : num 3 1 3 1 3 3 1 3 3 2 ...
$ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
$ Sex : chr "male" "female" "female" "female" ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : num 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : num 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : chr NA "C85" NA "C123" ...
$ Embarked : chr "S" "C" "S" "S" ...
THIS FUNCTION SHOWS THE STRUCTURE OF THE TABLE.
summary(Titanic)
PassengerId Survived Pclass Name Sex
Min. : 1.0 Min. :0.0000 Min. :1.000 Length:891 Length:891
1st Qu.:223.5 1st Qu.:0.0000 1st Qu.:2.000 Class :character Class :character
Median :446.0 Median :0.0000 Median :3.000 Mode :character Mode :character
Mean :446.0 Mean :0.3838 Mean :2.309
3rd Qu.:668.5 3rd Qu.:1.0000 3rd Qu.:3.000
Max. :891.0 Max. :1.0000 Max. :3.000
Age SibSp Parch Ticket Fare
Min. : 0.42 Min. :0.000 Min. :0.0000 Length:891 Min. : 0.00
1st Qu.:20.12 1st Qu.:0.000 1st Qu.:0.0000 Class :character 1st Qu.: 7.91
Median :28.00 Median :0.000 Median :0.0000 Mode :character Median : 14.45
Mean :29.70 Mean :0.523 Mean :0.3816 Mean : 32.20
3rd Qu.:38.00 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.: 31.00
Max. :80.00 Max. :8.000 Max. :6.0000 Max. :512.33
NA's :177
Cabin Embarked
Length:891 Length:891
Class :character Class :character
Mode :character Mode :character
THIS FUNCTION SHOWS THE OVERALL SUMMARY OF THE TABLE.
THIS FUNCTION HELPS US TO VIEW THE RENAMED DATA.
OMITING THE NULL VALUES FROM THE DATASET.
MAKING THE VARIABLE AGE INTO NUMERIC.
SUBISTUTING THE MEAN VALUE TO FILL IN THE NA VALUES COLOUM.
str(Titanic)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 183 obs. of 7 variables:
$ Survived: num 1 1 0 1 1 1 1 0 1 0 ...
$ Pclass : num 1 1 1 3 1 2 1 1 1 1 ...
$ Sex : chr "female" "female" "male" "female" ...
$ Age : num 38 35 54 4 58 34 28 19 49 65 ...
$ SibSp : num 1 1 0 1 0 0 0 3 1 0 ...
$ Parch : num 0 0 0 1 0 0 0 2 0 1 ...
$ Fare : num 71.3 53.1 51.9 16.7 26.6 ...
- attr(*, "na.action")= 'omit' Named int 1 3 5 6 8 9 10 13 14 15 ...
..- attr(*, "names")= chr "1" "3" "5" "6" ...
CHANGING CATEGORICAL VALUES.
SURVIVAL ANALYSIS BETWEEN MALE AND FEMALE.
MALE AND FEMALE SURVIVED.
c
Frame 1 (1%)
Frame 2 (2%)
Frame 3 (3%)
Frame 4 (4%)
Frame 5 (5%)
Frame 6 (6%)
Frame 7 (7%)
Frame 8 (8%)
Frame 9 (9%)
Frame 10 (10%)
Frame 11 (11%)
Frame 12 (12%)
Frame 13 (13%)
Frame 14 (14%)
Frame 15 (15%)
Frame 16 (16%)
Frame 17 (17%)
Frame 18 (18%)
Frame 19 (19%)
Frame 20 (20%)
Frame 21 (21%)
Frame 22 (22%)
Frame 23 (23%)
Frame 24 (24%)
Frame 25 (25%)
Frame 26 (26%)
Frame 27 (27%)
Frame 28 (28%)
Frame 29 (29%)
Frame 30 (30%)
Frame 31 (31%)
Frame 32 (32%)
Frame 33 (33%)
Frame 34 (34%)
Frame 35 (35%)
Frame 36 (36%)
Frame 37 (37%)
Frame 38 (38%)
Frame 39 (39%)
Frame 40 (40%)
Frame 41 (41%)
Frame 42 (42%)
Frame 43 (43%)
Frame 44 (44%)
Frame 45 (45%)
Frame 46 (46%)
Frame 47 (47%)
Frame 48 (48%)
Frame 49 (49%)
Frame 50 (50%)
Frame 51 (51%)
Frame 52 (52%)
Frame 53 (53%)
Frame 54 (54%)
Frame 55 (55%)
Frame 56 (56%)
Frame 57 (57%)
Frame 58 (58%)
Frame 59 (59%)
Frame 60 (60%)
Frame 61 (61%)
Frame 62 (62%)
Frame 63 (63%)
Frame 64 (64%)
Frame 65 (65%)
Frame 66 (66%)
Frame 67 (67%)
Frame 68 (68%)
Frame 69 (69%)
Frame 70 (70%)
Frame 71 (71%)
Frame 72 (72%)
Frame 73 (73%)
Frame 74 (74%)
Frame 75 (75%)
Frame 76 (76%)
Frame 77 (77%)
Frame 78 (78%)
Frame 79 (79%)
Frame 80 (80%)
Frame 81 (81%)
Frame 82 (82%)
Frame 83 (83%)
Frame 84 (84%)
Frame 85 (85%)
Frame 86 (86%)
Frame 87 (87%)
Frame 88 (88%)
Frame 89 (89%)
Frame 90 (90%)
Frame 91 (91%)
Frame 92 (92%)
Frame 93 (93%)
Frame 94 (94%)
Frame 95 (95%)
Frame 96 (96%)
Frame 97 (97%)
Frame 98 (98%)
Frame 99 (99%)
Frame 100 (100%)
Finalizing encoding... done!
ANIMATION PLOT
SURVIVED ACCORDING TO THE PCLASS.
ggplot(data = Titanic[!(is.na(Titanic[1:LT,]$Age)),],aes(x=Age,fill=Survived))+geom_histogram(binwidth =3)
Length of logical index must be 1 or 183, not 131
SURVIED GRAPH CONCERN ON THE AGE.
USING CATOOLS TEST AND TRAIN THE DATA.
summary(Titanic_eq)
Call:
glm(formula = Survived ~ ., family = "binomial", data = Train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7848 -0.8345 0.3057 0.7990 1.9862
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.120993 1.416525 3.615 0.0003 ***
Pclass -0.943987 0.525901 -1.795 0.0727 .
Sexmale -2.781995 0.601230 -4.627 3.71e-06 ***
Age -0.041134 0.017174 -2.395 0.0166 *
SibSp 0.124778 0.399146 0.313 0.7546
Parch -0.366017 0.379845 -0.964 0.3352
Fare 0.002436 0.003245 0.751 0.4528
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 167.22 on 130 degrees of freedom
Residual deviance: 123.17 on 124 degrees of freedom
AIC: 137.17
Number of Fisher Scoring iterations: 5
THE MODEL OF THE DATA USING GLM MODEL.
Test_pred
1 2 3 4 5 6 7
0.33183508 0.28579938 0.18336024 0.18202548 0.55903832 0.65583572 0.53175246
8 9 10 11 12 13 14
0.97232287 0.59590801 0.89549779 0.95463425 0.51347563 0.95905464 0.85803528
15 16 17 18 19 20 21
0.97684511 0.91351204 0.96746538 0.93201531 0.39959060 0.94364989 0.90699345
22 23 24 25 26 27 28
0.96638574 0.97421048 0.14311553 0.12607187 0.55689872 0.39066676 0.95611654
29 30 31 32 33 34 35
0.94836817 0.49455410 0.89494963 0.89445792 0.92591525 0.94447147 0.24576261
36 37 38 39 40 41 42
0.94811779 0.97077612 0.96638574 0.38320457 0.61569467 0.09957376 0.43325304
43 44 45 46 47 48 49
0.95962013 0.97065871 0.17470205 0.95392464 0.92804777 0.42452603 0.95579534
50 51 52
0.92649941 0.56041957 0.84684367
PREDCTION FOR THE ANALYSIS.
sum(diag(T))/sum(T)
[1] 0.8076923
AS PER MY ANALYSIS I PROVE THAT ITS 80% ACCURATE.
ROCR GRAPH TO CHECK THE THRESHOLD.