Data Wrangling

The assignment lays out precisely what the decision variables are and how your predicted VIPs will be evaluated. I cannot help you with that part except that you will ultimately use previous patterns of behavior to decide on a set of VIPs with the goal of maximizing returns.

Load the Data

Using the R Commander, let me first import the data from Excel. Of course, your data resides in a different place. This will import Training.Set.

Training.Set <- RcmdrMisc::readXL("C:/Users/rwalker/Downloads/ETOTS Exercise Training.xls", 
  rownames=FALSE, header=TRUE, na="", sheet="VIP Exercise Training Set", 
  stringsAsFactors=TRUE)
summary(Training.Set)
##     HOL_REP          HOL_MVAL          S_99HOL           M_VAL99H      
##  Min.   :0.0000   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:0.0000   1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.00  
##  Median :0.0000   Median :   0.00   Median :   0.00   Median :   0.00  
##  Mean   :0.3333   Mean   :  49.55   Mean   :  24.93   Mean   :  46.49  
##  3rd Qu.:1.0000   3rd Qu.:  45.94   3rd Qu.:  33.69   3rd Qu.:  42.88  
##  Max.   :1.0000   Max.   :3718.31   Max.   :1191.54   Max.   :4077.53  
##                                                                        
##     S_00OFF           M_VAL00O            REC            TENURE     
##  Min.   :   0.00   Min.   :   0.00   Min.   : 0.00   Min.   :12.00  
##  1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.: 9.00   1st Qu.:16.00  
##  Median :   0.00   Median :   0.00   Median :14.00   Median :21.00  
##  Mean   :  11.37   Mean   :  23.25   Mean   :13.51   Mean   :18.95  
##  3rd Qu.:   0.00   3rd Qu.:   0.00   3rd Qu.:21.00   3rd Qu.:21.00  
##  Max.   :1825.94   Max.   :5477.82   Max.   :34.00   Max.   :34.00  
##                                                                     
##      S_IDX           N_S_IDX            Self             Gift       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.6363   Mean   :0.3616   Mean   :0.5257   Mean   :0.2638  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                    NA's   :63       NA's   :63      
##      J_9711           J_9712             J_9801             J_9802        
##  Min.   :0.0000   Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.0000   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.0000   Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :0.0024   Mean   :0.007667   Mean   :0.001033   Mean   :0.001533  
##  3rd Qu.:0.0000   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :1.0000   Max.   :1.000000   Max.   :1.000000   Max.   :1.000000  
##                                                                           
##      J_9803           J_9804             J_9805             J_9806        
##  Min.   :0.0000   Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.0000   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.0000   Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :0.0042   Mean   :0.005667   Mean   :0.002633   Mean   :0.005267  
##  3rd Qu.:0.0000   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :1.0000   Max.   :1.000000   Max.   :1.000000   Max.   :1.000000  
##                                                                           
##      J_9807           J_9808             J_9809            J_9810      
##  Min.   :0.0000   Min.   :0.000000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.000000   Median :0.00000   Median :0.0000  
##  Mean   :0.0062   Mean   :0.007333   Mean   :0.01163   Mean   :0.0269  
##  3rd Qu.:0.0000   3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.000000   Max.   :1.00000   Max.   :1.0000  
##                                                                        
##      J_9811           J_9812           J_9901            J_9902       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.00000  
##  Mean   :0.1488   Mean   :0.2832   Mean   :0.04003   Mean   :0.03573  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  
##                                                                       
##      J_9903            J_9904            J_9905           J_9906       
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000   Median :0.0000   Median :0.00000  
##  Mean   :0.05497   Mean   :0.04807   Mean   :0.0608   Mean   :0.06033  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.0000   Max.   :1.00000  
##                                                                        
##      J_9907           J_9908           J_9909            Random         
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000179  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.2522530  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.4985805  
##  Mean   :0.0598   Mean   :0.0576   Mean   :0.06827   Mean   :0.5009524  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.7490447  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :0.9999999  
##                                                                         
##     Random.1        
##  Min.   :0.0000028  
##  1st Qu.:0.2503906  
##  Median :0.4954041  
##  Mean   :0.4981874  
##  3rd Qu.:0.7452566  
##  Max.   :0.9999947  
## 

Numeric to factors

The zero and one variables in the data are stored as numbers. We may need to use factors. For example, in a logistic regression, R requires that the outcome be a factor instead of a number. We will have to transform the 0 and 1 data to factors. This can be done using Data then Manage variables in the active data set... and then convert numeric variables to factors. This will give us two identical versions of the variables that we choose; one a factor with prefix Fac. and one a number. It is messy but I will just pick the Repeat purcahsers and all the J variables.

FactorEnc

FactorEnc

Training.Set <- within(Training.Set, {
  Fac.HOL_REP <- as.factor(HOL_REP)
  Fac.J_9711 <- as.factor(J_9711)
  Fac.J_9712 <- as.factor(J_9712)
  Fac.J_9801 <- as.factor(J_9801)
  Fac.J_9802 <- as.factor(J_9802)
  Fac.J_9803 <- as.factor(J_9803)
  Fac.J_9804 <- as.factor(J_9804)
  Fac.J_9805 <- as.factor(J_9805)
  Fac.J_9806 <- as.factor(J_9806)
  Fac.J_9807 <- as.factor(J_9807)
  Fac.J_9808 <- as.factor(J_9808)
  Fac.J_9809 <- as.factor(J_9809)
  Fac.J_9810 <- as.factor(J_9810)
  Fac.J_9811 <- as.factor(J_9811)
  Fac.J_9812 <- as.factor(J_9812)
  Fac.J_9901 <- as.factor(J_9901)
  Fac.J_9902 <- as.factor(J_9902)
  Fac.J_9903 <- as.factor(J_9903)
  Fac.J_9904 <- as.factor(J_9904)
  Fac.J_9905 <- as.factor(J_9905)
  Fac.J_9906 <- as.factor(J_9906)
  Fac.J_9907 <- as.factor(J_9907)
  Fac.J_9908 <- as.factor(J_9908)
  Fac.J_9909 <- as.factor(J_9909)
})

Splitting Samples

The first part is to show how to split the samples to engage in modelling and testing. Now let us split the data into two random samples. The way to do this involves generating something random. There are a few ways to do this. The easiest is this. We will generate a uniform random variable and add it to the dataset. Then we will subset the data based on that variable. We could do this using Distributions and Sample in the R Commander or by creating a new variable in the R dataset. I will do the latter. Data then Manage variables in the Active data set and then Compute New Variable. We will have to declare a name for the new variable under New variable name and an expression to compute in Expression to compute. I will call the new variable UnifSampler and the expression is runif(30000). That tells R to generate 30000 uniform random draws between 0 and 1. We could also use a binomial of size one and probability 0.5 to accomplish the same task. I will add this line and call it BinSampler with expression rbinom(30000, size=1, 0.5).

The code looks like this.

Training.Set$UnifSampler <- with(Training.Set, runif(30000))
Training.Set$BinSampler <- with(Training.Set, rbinom(30000, size=1, 0.5))

Subsetting

In Data then Active Dataset then Subset active data set..., we can now split the data. The dialog appears below completed. We will require two things: a subset expression and a name for the new dataset. The subset expression will depend on which method [binomial or uniform] chosen above. The key is to split the data on the basis of the uniform or binomial into two parts. Those above and below one half in a uniform (TrainingSet.UnifSmall and TrainingSet.UnifBig) or the zeroes and the ones in a binomial (TrainingSet.BinSmall and TrainingSet.BinBig). I show the uniform above one half in the graphic.

SubsetDialog

SubsetDialog

I then repeat the process only changing the name to TrainingSet.UnifSmall and changing the inequality to be less than 0.5.

TrainingSet.UnifBig <- subset(Training.Set, subset=UnifSampler>0.5)
TrainingSet.UnifSmall <- subset(Training.Set, subset=UnifSampler<0.5)
TrainingSet.BinBig <- subset(Training.Set, subset=BinSampler==1)
TrainingSet.BinSmall <- subset(Training.Set, subset=BinSampler==0)

I now have five datasets. The original Training.Set imported from Excel and the data split into two parts using a binomial or a uniform. The splits are random. Now how do I use the data.

Now I want to understand purchasing behavior and spending amounts given past data. So you want to model the spending and the purchasing as functions of predictors using linear models or generalized linear models. Suppose that I wanted to fit a model of HOL_MVAL as a function of Gift and Tenure. Let me fit it on dataset Big.

Fitting Models

A model.

LM.Gift.Tenure <- lm(HOL_MVAL ~ Gift+TENURE, data=TrainingSet.UnifBig)
summary(LM.Gift.Tenure)
## 
## Call:
## lm(formula = HOL_MVAL ~ Gift + TENURE, data = TrainingSet.UnifBig)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -60.14  -54.01  -38.01   -4.15 2789.70 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  45.2603     4.9210   9.197  < 2e-16 ***
## Gift        -17.3196     2.3150  -7.482 7.75e-14 ***
## TENURE        0.4377     0.2536   1.726   0.0843 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 125.5 on 14970 degrees of freedom
##   (21 observations deleted due to missingness)
## Multiple R-squared:  0.003838,   Adjusted R-squared:  0.003705 
## F-statistic: 28.84 on 2 and 14970 DF,  p-value: 3.159e-13

The model fits poorly. Nevertheless, the point is that I can use those estimates to predict outcomes for the other set of data as long as the names are the same [which they must be because we subsetted the data]. This will require a bit of syntax, but not much. Here is the exact command to create a new variable Pred that is the prediction of the model from one dataset applied to the new one: TrainingSet.UnifSmall$Pred <- predict(LM.Gift.Tenure, newdata=TrainingSet.UnifSmall). Change the active dataset to TrainingSet.UnifSmall and make sure it worked.

TrainingSet.UnifSmall$Pred <- predict(LM.Gift.Tenure, newdata=TrainingSet.UnifSmall)
summary(TrainingSet.UnifSmall)
##     HOL_REP          HOL_MVAL          S_99HOL           M_VAL99H      
##  Min.   :0.0000   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:0.0000   1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.00  
##  Median :0.0000   Median :   0.00   Median :   0.00   Median :   0.00  
##  Mean   :0.3335   Mean   :  50.27   Mean   :  25.43   Mean   :  47.97  
##  3rd Qu.:1.0000   3rd Qu.:  46.44   3rd Qu.:  33.94   3rd Qu.:  43.94  
##  Max.   :1.0000   Max.   :3718.31   Max.   :1010.53   Max.   :2627.82  
##                                                                        
##     S_00OFF          M_VAL00O            REC            TENURE     
##  Min.   :  0.00   Min.   :   0.00   Min.   : 0.00   Min.   :12.00  
##  1st Qu.:  0.00   1st Qu.:   0.00   1st Qu.: 9.00   1st Qu.:16.00  
##  Median :  0.00   Median :   0.00   Median :14.00   Median :21.00  
##  Mean   : 11.47   Mean   :  23.68   Mean   :13.47   Mean   :18.95  
##  3rd Qu.:  0.00   3rd Qu.:   0.00   3rd Qu.:21.00   3rd Qu.:21.00  
##  Max.   :561.16   Max.   :1620.34   Max.   :34.00   Max.   :34.00  
##                                                                    
##      S_IDX           N_S_IDX            Self             Gift       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.6394   Mean   :0.3578   Mean   :0.5259   Mean   :0.2581  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                    NA's   :42       NA's   :42      
##      J_9711             J_9712             J_9801        
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :0.002865   Mean   :0.007664   Mean   :0.001399  
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :1.000000   Max.   :1.000000   Max.   :1.000000  
##                                                          
##      J_9802             J_9803             J_9804        
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :0.001133   Mean   :0.004398   Mean   :0.004731  
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :1.000000   Max.   :1.000000   Max.   :1.000000  
##                                                          
##      J_9805             J_9806             J_9807        
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :0.002865   Mean   :0.005265   Mean   :0.006197  
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :1.000000   Max.   :1.000000   Max.   :1.000000  
##                                                          
##      J_9808             J_9809            J_9810            J_9811      
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.000000   Median :0.00000   Median :0.00000   Median :0.0000  
##  Mean   :0.007997   Mean   :0.01126   Mean   :0.02739   Mean   :0.1513  
##  3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.0000  
##  Max.   :1.000000   Max.   :1.00000   Max.   :1.00000   Max.   :1.0000  
##                                                                         
##      J_9812          J_9901            J_9902            J_9903       
##  Min.   :0.000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.000   Median :0.00000   Median :0.00000   Median :0.00000  
##  Mean   :0.278   Mean   :0.03905   Mean   :0.03705   Mean   :0.05538  
##  3rd Qu.:1.000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
##                                                                       
##      J_9904            J_9905            J_9906            J_9907       
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000   Median :0.00000   Median :0.00000  
##  Mean   :0.04945   Mean   :0.05971   Mean   :0.06084   Mean   :0.05918  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
##                                                                         
##      J_9908            J_9909            Random         
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.0000179  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.2533203  
##  Median :0.00000   Median :0.00000   Median :0.4993271  
##  Mean   :0.05764   Mean   :0.06931   Mean   :0.5011430  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.7495196  
##  Max.   :1.00000   Max.   :1.00000   Max.   :0.9999999  
##                                                         
##     Random.1         Fac.J_9909 Fac.J_9908 Fac.J_9907 Fac.J_9906
##  Min.   :0.0000256   0:13966    0:14141    0:14118    0:14093   
##  1st Qu.:0.2498218   1: 1040    1:  865    1:  888    1:  913   
##  Median :0.4950599                                              
##  Mean   :0.4975722                                              
##  3rd Qu.:0.7444121                                              
##  Max.   :0.9999947                                              
##                                                                 
##  Fac.J_9905 Fac.J_9904 Fac.J_9903 Fac.J_9902 Fac.J_9901 Fac.J_9812
##  0:14110    0:14264    0:14175    0:14450    0:14420    0:10835   
##  1:  896    1:  742    1:  831    1:  556    1:  586    1: 4171   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##  Fac.J_9811 Fac.J_9810 Fac.J_9809 Fac.J_9808 Fac.J_9807 Fac.J_9806
##  0:12736    0:14595    0:14837    0:14886    0:14913    0:14927   
##  1: 2270    1:  411    1:  169    1:  120    1:   93    1:   79   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##  Fac.J_9805 Fac.J_9804 Fac.J_9803 Fac.J_9802 Fac.J_9801 Fac.J_9712
##  0:14963    0:14935    0:14940    0:14989    0:14985    0:14891   
##  1:   43    1:   71    1:   66    1:   17    1:   21    1:  115   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##  Fac.J_9711 Fac.HOL_REP  UnifSampler          BinSampler    
##  0:14963    0:10002     Min.   :0.0000866   Min.   :0.0000  
##  1:   43    1: 5004     1st Qu.:0.1266897   1st Qu.:0.0000  
##                         Median :0.2486838   Median :1.0000  
##                         Mean   :0.2490167   Mean   :0.5023  
##                         3rd Qu.:0.3716938   3rd Qu.:1.0000  
##                         Max.   :0.4999680   Max.   :1.0000  
##                                                             
##       Pred      
##  Min.   :33.19  
##  1st Qu.:42.38  
##  Median :52.70  
##  Mean   :49.09  
##  3rd Qu.:54.45  
##  Max.   :60.14  
##  NA's   :42

Purchases

With the dataset TrainingSet.UnifBig, we can do the same thing with the purchases. The model is a logistic regression [covered in the textbook]. The idea is that a line will go outside of one and zero so we need something that only predicts probabilities.

The dialog is generalized linear models with the same predictors. I will fit two models. One with Gift and Tenure for the whole training set. Then the same model only on my subset called TrainingSet.UnifBig.

GLMFit

GLMFit

GLMGiftTenureW <- glm(Fac.HOL_REP ~ Gift + TENURE, family=binomial(logit), 
  data=Training.Set)
summary(GLMGiftTenureW)
## 
## Call:
## glm(formula = Fac.HOL_REP ~ Gift + TENURE, family = binomial(logit), 
##     data = Training.Set)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9609  -0.9278  -0.8212   1.4322   1.6689  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.417191   0.059042  -7.066 1.59e-12 ***
## Gift        -0.361473   0.028880 -12.517  < 2e-16 ***
## TENURE      -0.009661   0.003051  -3.167  0.00154 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 38131  on 29936  degrees of freedom
## Residual deviance: 37956  on 29934  degrees of freedom
##   (63 observations deleted due to missingness)
## AIC: 37962
## 
## Number of Fisher Scoring iterations: 4
exp(coef(GLMGiftTenureW))  # Exponentiated coefficients ("odds ratios")
## (Intercept)        Gift      TENURE 
##   0.6588950   0.6966495   0.9903852
GLMGiftTenure <- glm(Fac.HOL_REP ~ Gift + TENURE, family=binomial(logit), 
  data=TrainingSet.UnifBig)
summary(GLMGiftTenure)
## 
## Call:
## glm(formula = Fac.HOL_REP ~ Gift + TENURE, family = binomial(logit), 
##     data = TrainingSet.UnifBig)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9694  -0.9280  -0.8202   1.4277   1.6916  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.366763   0.083846  -4.374 1.22e-05 ***
## Gift        -0.381202   0.040653  -9.377  < 2e-16 ***
## TENURE      -0.012043   0.004338  -2.776   0.0055 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 19065  on 14972  degrees of freedom
## Residual deviance: 18964  on 14970  degrees of freedom
##   (21 observations deleted due to missingness)
## AIC: 18970
## 
## Number of Fisher Scoring iterations: 4
exp(coef(GLMGiftTenure))  # Exponentiated coefficients ("odds ratios")
## (Intercept)        Gift      TENURE 
##   0.6929741   0.6830397   0.9880294

Now let me predict the probability of a purchase for each row in the other subset.

TrainingSet.UnifSmall$PredP <- predict(GLMGiftTenure, newdata=TrainingSet.UnifSmall, type="response")
summary(TrainingSet.UnifSmall)
##     HOL_REP          HOL_MVAL          S_99HOL           M_VAL99H      
##  Min.   :0.0000   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:0.0000   1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.00  
##  Median :0.0000   Median :   0.00   Median :   0.00   Median :   0.00  
##  Mean   :0.3335   Mean   :  50.27   Mean   :  25.43   Mean   :  47.97  
##  3rd Qu.:1.0000   3rd Qu.:  46.44   3rd Qu.:  33.94   3rd Qu.:  43.94  
##  Max.   :1.0000   Max.   :3718.31   Max.   :1010.53   Max.   :2627.82  
##                                                                        
##     S_00OFF          M_VAL00O            REC            TENURE     
##  Min.   :  0.00   Min.   :   0.00   Min.   : 0.00   Min.   :12.00  
##  1st Qu.:  0.00   1st Qu.:   0.00   1st Qu.: 9.00   1st Qu.:16.00  
##  Median :  0.00   Median :   0.00   Median :14.00   Median :21.00  
##  Mean   : 11.47   Mean   :  23.68   Mean   :13.47   Mean   :18.95  
##  3rd Qu.:  0.00   3rd Qu.:   0.00   3rd Qu.:21.00   3rd Qu.:21.00  
##  Max.   :561.16   Max.   :1620.34   Max.   :34.00   Max.   :34.00  
##                                                                    
##      S_IDX           N_S_IDX            Self             Gift       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.6394   Mean   :0.3578   Mean   :0.5259   Mean   :0.2581  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                    NA's   :42       NA's   :42      
##      J_9711             J_9712             J_9801        
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :0.002865   Mean   :0.007664   Mean   :0.001399  
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :1.000000   Max.   :1.000000   Max.   :1.000000  
##                                                          
##      J_9802             J_9803             J_9804        
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :0.001133   Mean   :0.004398   Mean   :0.004731  
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :1.000000   Max.   :1.000000   Max.   :1.000000  
##                                                          
##      J_9805             J_9806             J_9807        
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :0.002865   Mean   :0.005265   Mean   :0.006197  
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :1.000000   Max.   :1.000000   Max.   :1.000000  
##                                                          
##      J_9808             J_9809            J_9810            J_9811      
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.000000   Median :0.00000   Median :0.00000   Median :0.0000  
##  Mean   :0.007997   Mean   :0.01126   Mean   :0.02739   Mean   :0.1513  
##  3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.0000  
##  Max.   :1.000000   Max.   :1.00000   Max.   :1.00000   Max.   :1.0000  
##                                                                         
##      J_9812          J_9901            J_9902            J_9903       
##  Min.   :0.000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.000   Median :0.00000   Median :0.00000   Median :0.00000  
##  Mean   :0.278   Mean   :0.03905   Mean   :0.03705   Mean   :0.05538  
##  3rd Qu.:1.000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
##                                                                       
##      J_9904            J_9905            J_9906            J_9907       
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000   Median :0.00000   Median :0.00000  
##  Mean   :0.04945   Mean   :0.05971   Mean   :0.06084   Mean   :0.05918  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
##                                                                         
##      J_9908            J_9909            Random         
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.0000179  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.2533203  
##  Median :0.00000   Median :0.00000   Median :0.4993271  
##  Mean   :0.05764   Mean   :0.06931   Mean   :0.5011430  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.7495196  
##  Max.   :1.00000   Max.   :1.00000   Max.   :0.9999999  
##                                                         
##     Random.1         Fac.J_9909 Fac.J_9908 Fac.J_9907 Fac.J_9906
##  Min.   :0.0000256   0:13966    0:14141    0:14118    0:14093   
##  1st Qu.:0.2498218   1: 1040    1:  865    1:  888    1:  913   
##  Median :0.4950599                                              
##  Mean   :0.4975722                                              
##  3rd Qu.:0.7444121                                              
##  Max.   :0.9999947                                              
##                                                                 
##  Fac.J_9905 Fac.J_9904 Fac.J_9903 Fac.J_9902 Fac.J_9901 Fac.J_9812
##  0:14110    0:14264    0:14175    0:14450    0:14420    0:10835   
##  1:  896    1:  742    1:  831    1:  556    1:  586    1: 4171   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##  Fac.J_9811 Fac.J_9810 Fac.J_9809 Fac.J_9808 Fac.J_9807 Fac.J_9806
##  0:12736    0:14595    0:14837    0:14886    0:14913    0:14927   
##  1: 2270    1:  411    1:  169    1:  120    1:   93    1:   79   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##  Fac.J_9805 Fac.J_9804 Fac.J_9803 Fac.J_9802 Fac.J_9801 Fac.J_9712
##  0:14963    0:14935    0:14940    0:14989    0:14985    0:14891   
##  1:   43    1:   71    1:   66    1:   17    1:   21    1:  115   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##  Fac.J_9711 Fac.HOL_REP  UnifSampler          BinSampler    
##  0:14963    0:10002     Min.   :0.0000866   Min.   :0.0000  
##  1:   43    1: 5004     1st Qu.:0.1266897   1st Qu.:0.0000  
##                         Median :0.2486838   Median :1.0000  
##                         Mean   :0.2490167   Mean   :0.5023  
##                         3rd Qu.:0.3716938   3rd Qu.:1.0000  
##                         Max.   :0.4999680   Max.   :1.0000  
##                                                             
##       Pred           PredP       
##  Min.   :33.19   Min.   :0.2391  
##  1st Qu.:42.38   1st Qu.:0.2906  
##  Median :52.70   Median :0.3499  
##  Mean   :49.09   Mean   :0.3345  
##  3rd Qu.:54.45   3rd Qu.:0.3609  
##  Max.   :60.14   Max.   :0.3749  
##  NA's   :42      NA's   :42

The variable PredP holds the probability of a purchase for this data. As you can see, no one is all that likely to purchase, it ranges from 0.25 to 0.36.

Now, for proof of concept and, ultimately, to output the data into a spreadsheet, we will need Data then Active Data set then Export active data set... with a dialog box like this.

ExportText

ExportText

write.table(TrainingSet.UnifBig, "C:/Users/rwalker/Downloads/TrainingSet.UnifBig.csv",  sep=",", col.names=TRUE, row.names=FALSE, quote=TRUE, na="NA")
write.table(TrainingSet.UnifSmall, "C:/Users/rwalker/Downloads/TrainingSet.UnifSmall.csv",   sep=",", col.names=TRUE, row.names=FALSE, quote=TRUE, na="NA")

We can import the .csv file back into Excel. Note that unifsmall, has no estimated models, it has two predictions from models estimated on unifbig.

ExcelOut

ExcelOut

More Elaborate

library(effects)
## 
## Attaching package: 'effects'
## The following object is masked from 'package:car':
## 
##     Prestige
LinearModel.4 <- lm(HOL_MVAL ~ Gift + bs(TENURE , df=5) + bs(S_99HOL, df=2), data=TrainingSet.UnifBig)
## Warning in bs(S_99HOL, df = 2): 'df' was too small; have used 3
summary(LinearModel.4)
## 
## Call:
## lm(formula = HOL_MVAL ~ Gift + bs(TENURE, df = 5) + bs(S_99HOL, 
##     df = 2), data = TrainingSet.UnifBig)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -321.72  -28.60  -21.09   -5.81 2634.41 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            20.9404     3.6090   5.802 6.67e-09 ***
## Gift                   -7.5025     2.1803  -3.441 0.000581 ***
## bs(TENURE, df = 5)1    -1.7125     7.5798  -0.226 0.821257    
## bs(TENURE, df = 5)2    -0.1247     5.8273  -0.021 0.982928    
## bs(TENURE, df = 5)3    18.0086     8.1527   2.209 0.027197 *  
## bs(TENURE, df = 5)4   -20.9876    15.8206  -1.327 0.184661    
## bs(TENURE, df = 5)5    -5.6099    13.0083  -0.431 0.666287    
## bs(S_99HOL, df = 2)1  558.5534    16.5212  33.808  < 2e-16 ***
## bs(S_99HOL, df = 2)2 -442.0437    77.3594  -5.714 1.12e-08 ***
## bs(S_99HOL, df = 2)3 1639.3299   113.5589  14.436  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 116.4 on 14963 degrees of freedom
##   (21 observations deleted due to missingness)
## Multiple R-squared:  0.143,  Adjusted R-squared:  0.1425 
## F-statistic: 277.4 on 9 and 14963 DF,  p-value: < 2.2e-16
plot(allEffects(LinearModel.4))
## Warning in bs(S_99HOL, df = 2): 'df' was too small; have used 3

## Warning in bs(S_99HOL, df = 2): 'df' was too small; have used 3

## Warning in bs(S_99HOL, df = 2): 'df' was too small; have used 3