Task 2

getwd()
## [1] "/Users/mvignesh/Downloads/Prof Sameer Mathur"
list.files()
##  [1] "1- Read Data.R"                                                                  
##  [2] "1-example.Rmd"                                                                   
##  [3] "3- R Functions -- aggregate, by, apply.R"                                        
##  [4] "602096-PDF-ENG.PDF"                                                              
##  [5] "CRMData.csv"                                                                     
##  [6] "Chi-Square Test of Independence.R"                                               
##  [7] "Comparing Groups.R"                                                              
##  [8] "Correlations.R"                                                                  
##  [9] "Describing Data.R"                                                               
## [10] "Fisher's Exact Test, Measure of Association.R"                                   
## [11] "IMB483-PDF-ENG.PDF"                                                              
## [12] "Madan_Gopal_Jhanwar.pdf"                                                         
## [13] "One-Way and Two-Way Contingency Tables.R"                                        
## [14] "Order Confirmation for September 29, 2017 - mvignesh180892@gmail.com - Gmail.pdf"
## [15] "Order Detail - HBR.pdf"                                                          
## [16] "Relationships Between Continuous Variables.R"                                    
## [17] "Single Variable Visualization.R"                                                 
## [18] "StoreData.csv"                                                                   
## [19] "Three-Way Contingency Tables.R"                                                  
## [20] "Titanic Data.csv"                                                                
## [21] "Titanic.Rmd"                                                                     
## [22] "Untitled.Rmd"                                                                    
## [23] "Untitled.html"                                                                   
## [24] "W12307-PDF-ENG.PDF"                                                              
## [25] "rsconnect"
tit_df<-read.csv("Titanic Data.csv")

Task 3

##3A : Number of people onboard:-

str(tit_df)
## 'data.frame':    889 obs. of  8 variables:
##  $ Survived: int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass  : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Sex     : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
##  $ Age     : num  22 38 26 35 35 29.7 54 2 27 14 ...
##  $ SibSp   : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch   : int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Fare    : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Embarked: Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
### The output shows that the data frame has 889 observations which correspond to 889 passengers that have purchased tickets. Also we can see that the 'embarked' variable in the dataframe has only 3 factors "C ","Q" and "S", which in turn shows that all the passengers who purchased the tickets are on the ship.

##3B : Survivors after the sinking:-

survived<-sum(tit_df$Survived == 1)
survived
## [1] 340
### It shows that there were 340 survivors after the sinking of the ship.

##3C : Percentage of survivors:-
per_survived <- (survived/sum(tit_df$Survived==1 | tit_df$Survived == 0))*100
per_survived
## [1] 38.24522
### About 38% of the passengers survived

##3D : Number of first class passengers who survived:-
sur_tab<-xtabs(tit_df$Survived ~ tit_df$Pclass)
sur_tab
## tit_df$Pclass
##   1   2   3 
## 134  87 119
### It is true that the first class passengers account for maximum survivors, however the 3rd class passengers account for second highest number of survivors. 

##3E : Percentage of first class passengers who survived
per_1stclass_sur<-(sum(tit_df$Survived == 1 & tit_df$Pclass == 1)/sum(tit_df$Pclass == 1))*100
per_1stclass_sur
## [1] 62.61682
### The result shows that out of the people who purchase a first class ticket, around 62% survived.

###Prop table
sur_tab<-with(tit_df, table(tit_df$Survived,tit_df$Pclass))
prop.table(sur_tab,1)*100
##    
##            1        2        3
##   0 14.57195 17.66849 67.75956
##   1 39.41176 25.58824 35.00000
### The prop table shows that out of the people that survived, around 39% were those that had a first class ticket.

##3F : Number of first class female survivors:-
sur_df<-tit_df[tit_df$Survived==1,-1]
xtabs(sur_df$Pclass ~ sur_df$Sex)
## sur_df$Sex
## female   male 
##    445    220
### 445 females who survived had a first class ticket.

##3G : Percentage of survivors who were female
sur_tab_gender<-with(tit_df, table(tit_df$Sex,tit_df$Survived))
prop.table(sur_tab_gender,1)*100
##         
##                 0        1
##   female 25.96154 74.03846
##   male   81.10919 18.89081
### It shows that females had a survival rate of around 74% whereas the males have a survival rate of around 18% irrespective of their ticket class.

##3H : Percentage of females on the ship who survived
per_fem_sur<-(sum(sur_df$Sex == "female")/sum(sur_df$Sex =="male"|sur_df$Sex == "female"))*100
per_fem_sur
## [1] 67.94118
### Almost 68% of the females onboard survived.

##3I : Peason's Chisq test
chisq.test(tit_df$Survived, tit_df$Sex)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tit_df$Survived and tit_df$Sex
## X-squared = 258.43, df = 1, p-value < 2.2e-16
### Since the p value is less than 0.05, we can infer that Sex and survival are highly significant variables and must be included in the predictive modelling stage.