Question 1 (20)

Answer at least 4 questions in the following:

Question 2 (20)

Answer the following questions using the two vectors given in the R code where y is the true response, and yhat is the predicted response.

set.seed(1000)
y <- c(1,0,1,1,0,1,1,0,1,1,0,1,1,1,1,0,0,1,1,0,1,1,0,0,1,1,1,1,0,0,0,1,1,0,1,0)
yhat <- c(1,0,0,1,0,1,1,1,1,0,1,1,0,0,1,0,1,1,1,1,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,0)

*Create a confusion matrix without using the ConfusionMatrix function or similar predetermined functions.

table_cf <- table(y,yhat)
table_cf
##    yhat
## y    0  1
##   0  9  5
##   1  8 14

Calculate the accuracy, sensitivity, and specificity without using the ConfusionMatrix function or similar predetermined functions.

TP = 14
TN = 9
FP = 5
FN = 8
tot = 14 + 9 + 5+ 8
acc <- (TP + TN)/tot
sens_1 <- TP/(TP + FN)
specif_1 <- TN/(TN + FP)
cat(acc,sens_1,specif_1)
## 0.6388889 0.6363636 0.6428571

Question 3 (20)

Suppose that we obtained the following result using a linear regression for the median housing price in Greensboro :

Answer (i) - (iv) where the variable Highway is a binary variable (Yes for a house near a highway and No otherwise), and for (iv) use n=210.

i = -0.0007/0.04
ii = 8.31 * 0.1675
iii = -0.4782/0.2205
iv = 2*pt(iii, 209)
cat(i,ii,iii,iv)
## -0.0175 1.391925 -2.168707 0.03123393

Interpret the coefficient of the variable Room.

Answer For every unit of increase for median price, Room will increase on average of 1.39

Interpret the coefficient of the variable Highway(Yes).

Answer The median housing price is negatively affected when Highway is (No)

When Age = 50, Room = 3, Highway = Yes, and the intercept estimate is 5, estimate the predicted value of the median housing price in Greensboro.

est <- 5 -0.0007*50  + ii*3 -0.4782

est_1 <- 5 -0.0007*50  + 1.391925*3 -0.4782
est_1
## [1] 8.662575

Question 4 (20)

Suppose that we obtained the following result using a logistic regression for the admission status (Yes or No) of a graduate school

Answer (i) - (iv) where the variable SOP denotes Statement of Purpose and it has two categories, Fair and Good. Answer

i = 0.0023/2.25
ii = 2.33 * 0.3318
iii = 0.2312/0.1023
iv = 2*pt(iii, 22)
cat(i,ii,iii,iv)
## 0.001022222 0.773094 2.26002 1.96594

Convert the coefficient of the variable College GPA to the odds ratio and interpret it.

ORgpa = exp(0.773094)

Answer IF the GPA increases by one unit the the odds of admission increase by 2.16 times more likely

Convert the coefficient of the variable SOP(Good) to the odds ratio and interpret it.

ORsop = exp(0.2312)

Answer With a good SOP, admission is 1.26 times more likely

When GRE = 2000, College GPA = 3.2, SOP = Good, and the intercept estimate is -2, estimate the probability of admission status = Yes, and predict the admission status using the estimated probability.

pAdmis = -2 + 0.0023*2000 + ii*3.2 + 1
pAdmis
## [1] 6.073901

Answer the probability is 6.07 of being admitted and the prediction is yes the student will be admitted.

# Question 5 (20)

Suppose that you want to predict whether or not a customer purchases an item at your online store?

Provide your approach step-by-step to build a predictive modeling. If necessary, provide a list of predictors and algorithms.

Answer

*I would import a data set with a response variable of Purchase(1) and No purchase(0) and predictors such as gender, number of purchases, days since last purchase, price of purchase, and # of items purchased and split the data into 70% train and 30% test set.