HW1 BC

Download / Load Data

mydata=read.csv('D:/Titanic/train.csv', stringsAsFactors = TRUE) 
mydata[mydata==""]<-NA  #replace "" factor levels with NA

Q1a.

Types and Levels, PassengerID & Age Passenger ID is qualitative measured at the NOMINAL (name only). Age is quantitative measured at the RATIO level.

V1=noquote(c('Passenger ID:', typeof(mydata$PassengerId)))
V2=noquote(c('Age: ', typeof(mydata$Age)))
myprint(rbind(V1,V2), 'Types from Data Frame')

Types from Data Frame
V1	Passenger ID:	integer
V2	Age:	double

Q1b.

Most missing?

mymax=which.max(apply(mydata, 2, function(x) sum(is.na(x))))
print(noquote(c(names(mymax),": ", sum(is.na(mydata[,as.numeric(mymax)])))))

## [1] Cabin :     687

missmap(mydata)

Q2.

Impute missing observations for Age, SibSp, and Parch with the median.

mydata$Age[is.na(mydata$Age)]=median(mydata$Age, na.rm=TRUE)
mydata$SibSp[is.na(mydata$SibSp)]=median(mydata$SibSp, na.rm=TRUE)
mydata$Parch[is.na(mydata$Age)]=median(mydata$Parch, na.rm=TRUE)

Q3.

Descriptive statistics.

myprint(round(describe(mydata[c("Age", "SibSp", "Parch")]),3), 'Descriptives')

Descriptives
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
Age	1	891	29.362	13.020	28	28.825	8.896	0.42	80	79.58	0.509	0.973	0.436
SibSp	2	891	0.523	1.103	0	0.272	0.000	0.00	8	8.00	3.683	17.727	0.037
Parch	3	891	0.382	0.806	0	0.182	0.000	0.00	6	6.00	2.740	9.688	0.027

Q4.

Provide a cross-tabulation of Survived and Sex.

addmargins(table(mydata$Survived, mydata$Sex))%>%kbl(caption="F Surv > M ")%>%kable_classic(html_font='Cambria')

F Surv > M
	female	male	Sum
0	81	468	549
1	233	109	342
Sum	314	577	891

Q5.

Provide notched boxplots for Survived and Age. What do you notice?

boxplot(mydata$Age~mydata$Survived, notch=TRUE,  col=c('Red','Blue'), main="Little Discernible Effect", xlab="Survived", ylab="Age")

HW1 BC

Sith Fulton

Today

Titanic

Download / Load Data

Q1a.

Q1b.

Q2.

Q3.

Q4.

Q5.