We will use Logistic Regression to analysis the relationship between buying a certain type new luxury USV in a ridiculous low price and Estimated Salary through a social medial. we use 0 represent not buying the SUV, and 1 means purchasing.
classifier=glm(formula= Purchased~ . ,
family=binomial, data=training_set)
prob_pred = predict(classifier, type= "response", test_set[-3])
y_pred= ifelse(prob_pred>0.5,1, 0)
cm=table(test_set[,3] , y_pred)
library(ElemStatLearn)
## Warning: package 'ElemStatLearn' was built under R version 3.4.2
set=training_set
x1=seq(min(set[, 1])-1, max(set[,1])+1 , by =.01)
x2=seq(min(set[, 2])-1, max(set[,2])+1 , by =.01)
grid_set=expand.grid(x1,x2)
colnames(grid_set)=c("Age" ,"EstimatedSalary")
prob_set=predict(classifier, type="response", newdata=grid_set)
y_grid=ifelse(prob_set >0.5,1,0)
plot(set[, -3], main="Logistic Regression(Training Set)",
xlab="age", ylab="Estimated Salary", xlim=range(x1), ylim=range(x2))+
contour(x1,x2, matrix(as.numeric(y_grid), length(x1), length(x2)), add=TRUE)+
points(grid_set, pch=".", col=ifelse(y_grid==1, "springgreen3","tomato"))+
points(set,pch=21, bg=ifelse(set[,3]==1, "green4", "red3"))
## integer(0)
As we can see out prediction is quite accurate, there are couple points that not correct maybe due to the low income old people might have enough savings. but after all, young and low coming users didnt buy the SUV and high income and older users bought the SUV, which make sense because SUV is more of a family car. So our prediction regions are pretty good the company can target the green region (older and high salary people)