A magazine seller want’s to sell an email to the customer about his new kid’s magazine.Before making his marketing efforts he wants to know that which customer will buy the kid’s magazine.
We have the past record of the details of around 700 customers and their purchase history.Based on this record, i have used a machine learning technique called Logistic regression and predicted the chance of a customer purchasing the kid’s magazine.
#1.Reading the file
setwd("D:/Raviteja/Raviteja Professional/Data Science/EDA_Course_Materials")
data1 <-read.csv("KidCreative.csv")
View(data1)
#2. Cleaning & Preparing the data
#deling the serial.no column which is not useful for building the model
data1 <- data1[-1]
#3. Building the model
mylogit<- glm(formula = Buy ~ ., family = binomial, data = data1)
summary(mylogit)
##
## Call:
## glm(formula = Buy ~ ., family = binomial, data = data1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.36655 -0.08416 -0.00955 -0.00149 2.49038
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.791e+01 2.223e+00 -8.058 7.74e-16 ***
## Income 2.016e-04 2.359e-05 8.545 < 2e-16 ***
## Is.Female 1.646e+00 4.651e-01 3.539 0.000401 ***
## Is.Married 5.662e-01 5.864e-01 0.966 0.334272
## Has.College -2.794e-01 4.437e-01 -0.630 0.528962
## Is.Professional 2.253e-01 4.650e-01 0.485 0.627981
## Is.Retired -1.159e+00 9.323e-01 -1.243 0.214015
## Unemployed 9.886e-01 4.690e+00 0.211 0.833030
## Residence.Length 2.468e-02 1.380e-02 1.788 0.073798 .
## Dual.Income 4.518e-01 5.215e-01 0.866 0.386279
## Minors 1.133e+00 4.635e-01 2.444 0.014521 *
## Own 1.056e+00 5.594e-01 1.888 0.058976 .
## House -9.265e-01 6.218e-01 -1.490 0.136238
## White 1.864e+00 5.454e-01 3.417 0.000632 ***
## English 1.530e+00 8.407e-01 1.821 0.068678 .
## Prev.Child.Mag 1.557e+00 7.119e-01 2.188 0.028704 *
## Prev.Parent.Mag 4.777e-01 6.240e-01 0.766 0.443900
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 646.05 on 672 degrees of freedom
## Residual deviance: 182.33 on 656 degrees of freedom
## AIC: 216.33
##
## Number of Fisher Scoring iterations: 9
# we can observe that the p values of many variables are more than the level of significance (alpha) = 0.005 .All these variables will be having very less effect on the output.Which can be verified using the model.
# 4.Before predicting let us check the accuracy of the model:
fitted.results<- predict(mylogit,newdata=subset(data1,select=c(2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17)),type='response')
fitted.results <- ifelse(fitted.results > 0.5,1,0)
misClasificError <- mean(fitted.results != data1$Buy)
print(paste('Accuracy',1-misClasificError))
## [1] "Accuracy 0.936106983655275"
#The accuracy of the model is very good as the value is almost 94%
#5.Predicting the chance of buying the magazine with the regression model:
newdata= data.frame("Income" = 50000, "Is.Female" =1,"Is.Married" =1,"Has.College" = 0, "Is.Professional"=1, "Is.Retired" =0,"Unemployed"= 0, "Residence.Length" = 10,"Dual.Income" = 1, "Minors" = 1, "Own" =1, "House"=1,"White" =1,"English" =1, "Prev.Child.Mag" =1,"Prev.Parent.Mag" =1)
predict(mylogit,newdata,type="response")
## 1
## 0.8803425
# As the per the summary of the model above , the variables : the sex of the customer, minor( whether the minor is resented in the house) , white (the race of the customer) and the prev.child.mag (whether the customer buy any child magazine previously) are having a lot of impact on the outcome as the p value of these variables are less than alpha which is the significance level.
#Let us change some of these variables and check the change in the predicted value
# put Is.Female = 0 -The chance of male buying a magazine
newdata= data.frame("Income" = 50000, "Is.Female" =0,"Is.Married" =1,"Has.College" = 0, "Is.Professional"=1, "Is.Retired" =0,"Unemployed"= 0, "Residence.Length" = 10,"Dual.Income" = 1, "Minors" = 1, "Own" =1, "House"=1,"White" =1,"English" =1, "Prev.Child.Mag" =1,"Prev.Parent.Mag" =1)
predict(mylogit,newdata,type="response")
## 1
## 0.5865306
#The chance of male buying the kid's magazine is 58.65% vs 88% that of the Female.
# put Minors =0 - The chance of female, who is not having monor in their home buying the kid's magazine
newdata= data.frame("Income" = 50000, "Is.Female" =1,"Is.Married" =1,"Has.College" = 0, "Is.Professional"=1, "Is.Retired" =0,"Unemployed"= 0, "Residence.Length" = 10,"Dual.Income" = 1, "Minors" = 0, "Own" =1, "House"=1,"White" =1,"English" =1, "Prev.Child.Mag" =1,"Prev.Parent.Mag" =1)
predict(mylogit,newdata,type="response")
## 1
## 0.7032452
#The chance of buying the magazine reduced by almost 18% when there are no minors in the house.