This is an R Markdown document which is used to analyse the data from the titanic data.csv file

2b) Reading the titanic data into R and creating a data frame called “titanic”.

titanic <- read.csv(paste("Titanic Data.csv", sep="")) #creating a data frame called "titanic"
View(titanic) # View the data frame

3a) To count the total number of passengers on board the Titanic. No. of rows in the data frame gives the nu,ber of people on baord the
titanic totally.

dimension <- dim(titanic) #Finding the dimensions of the data frame
dimension[1] #1st elements in dimension array gives the no.of rows in the data frame
## [1] 889

3b) To count the number of passengers who survived the sinking of the Titanic, which is equal to no.of times 1 occurs in the “Survived” column.

a <- table(titanic$Survived)  # creating a frequency table based on the Survived
a[2]  # Taking 2nd element which corresponds to Survived = 1
##   1 
## 340

3c)To measure the percentage of passengers who survived the sinking of the Titanic. Getting frequency counts into b data frame and then getting percentages of frequesncies of the frequency table b.

b <- table(titanic$Survived) # creating a frequency table based on the Survived
c <- prop.table(b)*100       # using prop table to get the percentags corresponding to each cell
c[2]       # Taking Percentage from 2nd element which corresponds to Survived = 1
##        1 
## 38.24522

3d) To count the number of first-class passengers who survived the sinking of the Titanic using xtabs() and then displaying the corresponding cell with contains the frequency of first-class passengers who survived.

d <- xtabs(~ Survived+Pclass, data=titanic) # creating a frequency table based on the Survived and Pclass which is 2*3 table
d[2,1]  # Taking element from 2nd row and 1st col which corresponds to Survived = 1 and PClass = 1
## [1] 134

3e) To measure the percentage of first-class passengers who survived the sinking of the Titanic using prop.table . ( Fing the percentage of no.of pasengers who survived among the firstclass passengers alone )

e <- prop.table(d,2)*100 # calculuting colunmn percentages sine percentages of each Pclasss should be calculated
e[2,1]     # Taking element from 2nd row and 1st col which corresponds to Survived = 1 and PClass = 1
## [1] 62.61682

3f) To count the number of females from First-Class who survived the sinking of the Titanic using xtabs() and ftable().

f <- xtabs( ~Sex + Pclass + Survived , data = titanic) # Creating a three way contigency table  
g <- ftable(f)   #using table to get a compressed form of the three way contigency table. g would be a 6*2 table
g[1,2]  # the frequency in 1st row and 2nd col corresponds to the required answer in the table 
## [1] 89

3g) To measure the percentage of survivors who were female using xtabs() and prop.table(). Calculataing the percentage od no of females among the total no. of survivers

h <- xtabs( ~ Survived + Sex, data = titanic) #  Creating a two way frequency table corresponding to SUrvived and Sex ( 2*2 table)
i <- prop.table(h,1)*100    # Calculating the percentages row-wise , ie, for each value of Survived, what percentages are females and males
i[2,1]  # the percentage in 2nd row and 1st col corresponds to the no.of survivors who are female 
## [1] 67.94118

3h) To measure the percentage of females on board the Titanic who survived using the frequency table h which has already been created.

j <- prop.table(h,2)*100 # Calculating the percentages row-wise 
j[2,1] # the percentage in 2nd row and 1st col corresponds to the no.of females who survived 
## [1] 74.03846

3i) To run a Pearson’s Chi-squared test to test the following hypothesis:

Hypothesis: The proportion of females onboard who survived the sinking of the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic.

k <- xtabs(~ Sex + Survived , data = titanic) # Creating the frequency table
chisq.test(k)  # Running the chi-squared test
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  k
## X-squared = 258.43, df = 1, p-value < 2.2e-16

This output suggests a relationship between the Sex of the person and Survived . Since the probability is small (p < 0.01), we reject the Null hypothesis that Sex and Survived are independent.