Question 1 & 2 : Develop the Model & Asses Predictor
Significance
Step 1: Install and load required libraries
#install.packages("readexcel)
#install.packages("Hmisc")
#install.packages("pscl")
#if(!require(pROC)) install.packages("pROC")
## step 1: load the libraries
library(readxl) #allows us to import excel files
library(Hmisc) #allows us to call the correlation function
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
library(pscl) #allows us to call the pseudo R-square package to evaluate our model
## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.
library(pROC) #allows us to run the area under the curve (AUC) package to get the plot and AUC score
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
Step 2 : Import and explore dataset
simmons_df <-read_excel(file.choose())
head(simmons_df)
## # A tibble: 6 × 4
## Customer Spending Card Coupon
## <dbl> <dbl> <dbl> <dbl>
## 1 1 2.29 1 0
## 2 2 3.22 1 0
## 3 3 2.13 1 0
## 4 4 3.92 0 0
## 5 5 2.53 1 0
## 6 6 2.47 0 1
# Standard Deviatoin using the SD function
sapply(simmons_df, sd)
## Customer Spending Card Coupon
## 29.0114920 1.7412979 0.5025189 0.4923660
customer 29, spending 1.74, card .50, coupon 49
# Cross tabulation of coupon and card
xtabs(~Coupon + Card, data = simmons_df)
## Card
## Coupon 0 1
## 0 36 24
## 1 14 26
Step 3: Building the Model
sim_logit = glm(Coupon ~ Card + Spending, data = simmons_df, family = binomial)
summary(sim_logit)
##
## Call:
## glm(formula = Coupon ~ Card + Spending, family = binomial, data = simmons_df)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.1464 0.5772 -3.718 0.000201 ***
## Card 1.0987 0.4447 2.471 0.013483 *
## Spending 0.3416 0.1287 2.655 0.007928 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 134.60 on 99 degrees of freedom
## Residual deviance: 120.97 on 97 degrees of freedom
## AIC: 126.97
##
## Number of Fisher Scoring iterations: 4
Step 4: Odds Ratio
exp(coef(sim_logit))
## (Intercept) Card Spending
## 0.1169074 3.0003587 1.4072585
# Interpretation: All the variables are significant with the target variable.
Interpretation: the odds of the outcome are multiplied by ~ 3.00 for every customer that has a "Simmons Credit Card". The card increased the odds of outcome by 300%
Interpretation: the odds of the outcome are multiplied by ~ 1.41 for every customer that has an increase in "Spending". The card increased the odds of outcome by 40.73%