The assignment lays out precisely what the decision variables are and how your predicted VIPs will be evaluated. I cannot help you with that part except that you will ultimately use previous patterns of behavior to decide on a set of VIPs with the goal of maximizing returns.
Using the R Commander, let me first import the data from Excel. Of course, your data resides in a different place. This will import Training.Set.
Training.Set <- RcmdrMisc::readXL("C:/Users/rwalker/Downloads/ETOTS Exercise Training.xls",
rownames=FALSE, header=TRUE, na="", sheet="VIP Exercise Training Set",
stringsAsFactors=TRUE)
summary(Training.Set)
## HOL_REP HOL_MVAL S_99HOL M_VAL99H
## Min. :0.0000 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median :0.0000 Median : 0.00 Median : 0.00 Median : 0.00
## Mean :0.3333 Mean : 49.55 Mean : 24.93 Mean : 46.49
## 3rd Qu.:1.0000 3rd Qu.: 45.94 3rd Qu.: 33.69 3rd Qu.: 42.88
## Max. :1.0000 Max. :3718.31 Max. :1191.54 Max. :4077.53
##
## S_00OFF M_VAL00O REC TENURE
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. :12.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 9.00 1st Qu.:16.00
## Median : 0.00 Median : 0.00 Median :14.00 Median :21.00
## Mean : 11.37 Mean : 23.25 Mean :13.51 Mean :18.95
## 3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.:21.00 3rd Qu.:21.00
## Max. :1825.94 Max. :5477.82 Max. :34.00 Max. :34.00
##
## S_IDX N_S_IDX Self Gift
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.6363 Mean :0.3616 Mean :0.5257 Mean :0.2638
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :63 NA's :63
## J_9711 J_9712 J_9801 J_9802
## Min. :0.0000 Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.0000 Median :0.000000 Median :0.000000 Median :0.000000
## Mean :0.0024 Mean :0.007667 Mean :0.001033 Mean :0.001533
## 3rd Qu.:0.0000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :1.0000 Max. :1.000000 Max. :1.000000 Max. :1.000000
##
## J_9803 J_9804 J_9805 J_9806
## Min. :0.0000 Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.0000 Median :0.000000 Median :0.000000 Median :0.000000
## Mean :0.0042 Mean :0.005667 Mean :0.002633 Mean :0.005267
## 3rd Qu.:0.0000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :1.0000 Max. :1.000000 Max. :1.000000 Max. :1.000000
##
## J_9807 J_9808 J_9809 J_9810
## Min. :0.0000 Min. :0.000000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.0000 Median :0.000000 Median :0.00000 Median :0.0000
## Mean :0.0062 Mean :0.007333 Mean :0.01163 Mean :0.0269
## 3rd Qu.:0.0000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.000000 Max. :1.00000 Max. :1.0000
##
## J_9811 J_9812 J_9901 J_9902
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.1488 Mean :0.2832 Mean :0.04003 Mean :0.03573
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
##
## J_9903 J_9904 J_9905 J_9906
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.00000
## Mean :0.05497 Mean :0.04807 Mean :0.0608 Mean :0.06033
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.00000
##
## J_9907 J_9908 J_9909 Random
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000179
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.2522530
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.4985805
## Mean :0.0598 Mean :0.0576 Mean :0.06827 Mean :0.5009524
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.7490447
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :0.9999999
##
## Random.1
## Min. :0.0000028
## 1st Qu.:0.2503906
## Median :0.4954041
## Mean :0.4981874
## 3rd Qu.:0.7452566
## Max. :0.9999947
##
The zero and one variables in the data are stored as numbers. We may need to use factors. For example, in a logistic regression, R requires that the outcome be a factor instead of a number. We will have to transform the 0 and 1 data to factors. This can be done using Data then Manage variables in the active data set... and then convert numeric variables to factors. This will give us two identical versions of the variables that we choose; one a factor with prefix Fac. and one a number. It is messy but I will just pick the Repeat purcahsers and all the J variables.
FactorEnc
Training.Set <- within(Training.Set, {
Fac.HOL_REP <- as.factor(HOL_REP)
Fac.J_9711 <- as.factor(J_9711)
Fac.J_9712 <- as.factor(J_9712)
Fac.J_9801 <- as.factor(J_9801)
Fac.J_9802 <- as.factor(J_9802)
Fac.J_9803 <- as.factor(J_9803)
Fac.J_9804 <- as.factor(J_9804)
Fac.J_9805 <- as.factor(J_9805)
Fac.J_9806 <- as.factor(J_9806)
Fac.J_9807 <- as.factor(J_9807)
Fac.J_9808 <- as.factor(J_9808)
Fac.J_9809 <- as.factor(J_9809)
Fac.J_9810 <- as.factor(J_9810)
Fac.J_9811 <- as.factor(J_9811)
Fac.J_9812 <- as.factor(J_9812)
Fac.J_9901 <- as.factor(J_9901)
Fac.J_9902 <- as.factor(J_9902)
Fac.J_9903 <- as.factor(J_9903)
Fac.J_9904 <- as.factor(J_9904)
Fac.J_9905 <- as.factor(J_9905)
Fac.J_9906 <- as.factor(J_9906)
Fac.J_9907 <- as.factor(J_9907)
Fac.J_9908 <- as.factor(J_9908)
Fac.J_9909 <- as.factor(J_9909)
})
The first part is to show how to split the samples to engage in modelling and testing. Now let us split the data into two random samples. The way to do this involves generating something random. There are a few ways to do this. The easiest is this. We will generate a uniform random variable and add it to the dataset. Then we will subset the data based on that variable. We could do this using Distributions and Sample in the R Commander or by creating a new variable in the R dataset. I will do the latter. Data then Manage variables in the Active data set and then Compute New Variable. We will have to declare a name for the new variable under New variable name and an expression to compute in Expression to compute. I will call the new variable UnifSampler and the expression is runif(30000). That tells R to generate 30000 uniform random draws between 0 and 1. We could also use a binomial of size one and probability 0.5 to accomplish the same task. I will add this line and call it BinSampler with expression rbinom(30000, size=1, 0.5).
The code looks like this.
Training.Set$UnifSampler <- with(Training.Set, runif(30000))
Training.Set$BinSampler <- with(Training.Set, rbinom(30000, size=1, 0.5))
In Data then Active Dataset then Subset active data set..., we can now split the data. The dialog appears below completed. We will require two things: a subset expression and a name for the new dataset. The subset expression will depend on which method [binomial or uniform] chosen above. The key is to split the data on the basis of the uniform or binomial into two parts. Those above and below one half in a uniform (TrainingSet.UnifSmall and TrainingSet.UnifBig) or the zeroes and the ones in a binomial (TrainingSet.BinSmall and TrainingSet.BinBig). I show the uniform above one half in the graphic.
SubsetDialog
I then repeat the process only changing the name to TrainingSet.UnifSmall and changing the inequality to be less than 0.5.
TrainingSet.UnifBig <- subset(Training.Set, subset=UnifSampler>0.5)
TrainingSet.UnifSmall <- subset(Training.Set, subset=UnifSampler<0.5)
TrainingSet.BinBig <- subset(Training.Set, subset=BinSampler==1)
TrainingSet.BinSmall <- subset(Training.Set, subset=BinSampler==0)
I now have five datasets. The original Training.Set imported from Excel and the data split into two parts using a binomial or a uniform. The splits are random. Now how do I use the data.
Now I want to understand purchasing behavior and spending amounts given past data. So you want to model the spending and the purchasing as functions of predictors using linear models or generalized linear models. Suppose that I wanted to fit a model of HOL_MVAL as a function of Gift and Tenure. Let me fit it on dataset Big.
A model.
LM.Gift.Tenure <- lm(HOL_MVAL ~ Gift+TENURE, data=TrainingSet.UnifBig)
summary(LM.Gift.Tenure)
##
## Call:
## lm(formula = HOL_MVAL ~ Gift + TENURE, data = TrainingSet.UnifBig)
##
## Residuals:
## Min 1Q Median 3Q Max
## -60.14 -54.01 -38.01 -4.15 2789.70
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.2603 4.9210 9.197 < 2e-16 ***
## Gift -17.3196 2.3150 -7.482 7.75e-14 ***
## TENURE 0.4377 0.2536 1.726 0.0843 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 125.5 on 14970 degrees of freedom
## (21 observations deleted due to missingness)
## Multiple R-squared: 0.003838, Adjusted R-squared: 0.003705
## F-statistic: 28.84 on 2 and 14970 DF, p-value: 3.159e-13
The model fits poorly. Nevertheless, the point is that I can use those estimates to predict outcomes for the other set of data as long as the names are the same [which they must be because we subsetted the data]. This will require a bit of syntax, but not much. Here is the exact command to create a new variable Pred that is the prediction of the model from one dataset applied to the new one: TrainingSet.UnifSmall$Pred <- predict(LM.Gift.Tenure, newdata=TrainingSet.UnifSmall). Change the active dataset to TrainingSet.UnifSmall and make sure it worked.
TrainingSet.UnifSmall$Pred <- predict(LM.Gift.Tenure, newdata=TrainingSet.UnifSmall)
summary(TrainingSet.UnifSmall)
## HOL_REP HOL_MVAL S_99HOL M_VAL99H
## Min. :0.0000 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median :0.0000 Median : 0.00 Median : 0.00 Median : 0.00
## Mean :0.3335 Mean : 50.27 Mean : 25.43 Mean : 47.97
## 3rd Qu.:1.0000 3rd Qu.: 46.44 3rd Qu.: 33.94 3rd Qu.: 43.94
## Max. :1.0000 Max. :3718.31 Max. :1010.53 Max. :2627.82
##
## S_00OFF M_VAL00O REC TENURE
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. :12.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 9.00 1st Qu.:16.00
## Median : 0.00 Median : 0.00 Median :14.00 Median :21.00
## Mean : 11.47 Mean : 23.68 Mean :13.47 Mean :18.95
## 3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.:21.00 3rd Qu.:21.00
## Max. :561.16 Max. :1620.34 Max. :34.00 Max. :34.00
##
## S_IDX N_S_IDX Self Gift
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.6394 Mean :0.3578 Mean :0.5259 Mean :0.2581
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :42 NA's :42
## J_9711 J_9712 J_9801
## Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000000 Median :0.000000 Median :0.000000
## Mean :0.002865 Mean :0.007664 Mean :0.001399
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :1.000000 Max. :1.000000 Max. :1.000000
##
## J_9802 J_9803 J_9804
## Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000000 Median :0.000000 Median :0.000000
## Mean :0.001133 Mean :0.004398 Mean :0.004731
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :1.000000 Max. :1.000000 Max. :1.000000
##
## J_9805 J_9806 J_9807
## Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000000 Median :0.000000 Median :0.000000
## Mean :0.002865 Mean :0.005265 Mean :0.006197
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :1.000000 Max. :1.000000 Max. :1.000000
##
## J_9808 J_9809 J_9810 J_9811
## Min. :0.000000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.000000 Median :0.00000 Median :0.00000 Median :0.0000
## Mean :0.007997 Mean :0.01126 Mean :0.02739 Mean :0.1513
## 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.000000 Max. :1.00000 Max. :1.00000 Max. :1.0000
##
## J_9812 J_9901 J_9902 J_9903
## Min. :0.000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.278 Mean :0.03905 Mean :0.03705 Mean :0.05538
## 3rd Qu.:1.000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.000 Max. :1.00000 Max. :1.00000 Max. :1.00000
##
## J_9904 J_9905 J_9906 J_9907
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.04945 Mean :0.05971 Mean :0.06084 Mean :0.05918
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
##
## J_9908 J_9909 Random
## Min. :0.00000 Min. :0.00000 Min. :0.0000179
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.2533203
## Median :0.00000 Median :0.00000 Median :0.4993271
## Mean :0.05764 Mean :0.06931 Mean :0.5011430
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.7495196
## Max. :1.00000 Max. :1.00000 Max. :0.9999999
##
## Random.1 Fac.J_9909 Fac.J_9908 Fac.J_9907 Fac.J_9906
## Min. :0.0000256 0:13966 0:14141 0:14118 0:14093
## 1st Qu.:0.2498218 1: 1040 1: 865 1: 888 1: 913
## Median :0.4950599
## Mean :0.4975722
## 3rd Qu.:0.7444121
## Max. :0.9999947
##
## Fac.J_9905 Fac.J_9904 Fac.J_9903 Fac.J_9902 Fac.J_9901 Fac.J_9812
## 0:14110 0:14264 0:14175 0:14450 0:14420 0:10835
## 1: 896 1: 742 1: 831 1: 556 1: 586 1: 4171
##
##
##
##
##
## Fac.J_9811 Fac.J_9810 Fac.J_9809 Fac.J_9808 Fac.J_9807 Fac.J_9806
## 0:12736 0:14595 0:14837 0:14886 0:14913 0:14927
## 1: 2270 1: 411 1: 169 1: 120 1: 93 1: 79
##
##
##
##
##
## Fac.J_9805 Fac.J_9804 Fac.J_9803 Fac.J_9802 Fac.J_9801 Fac.J_9712
## 0:14963 0:14935 0:14940 0:14989 0:14985 0:14891
## 1: 43 1: 71 1: 66 1: 17 1: 21 1: 115
##
##
##
##
##
## Fac.J_9711 Fac.HOL_REP UnifSampler BinSampler
## 0:14963 0:10002 Min. :0.0000866 Min. :0.0000
## 1: 43 1: 5004 1st Qu.:0.1266897 1st Qu.:0.0000
## Median :0.2486838 Median :1.0000
## Mean :0.2490167 Mean :0.5023
## 3rd Qu.:0.3716938 3rd Qu.:1.0000
## Max. :0.4999680 Max. :1.0000
##
## Pred
## Min. :33.19
## 1st Qu.:42.38
## Median :52.70
## Mean :49.09
## 3rd Qu.:54.45
## Max. :60.14
## NA's :42
With the dataset TrainingSet.UnifBig, we can do the same thing with the purchases. The model is a logistic regression [covered in the textbook]. The idea is that a line will go outside of one and zero so we need something that only predicts probabilities.
The dialog is generalized linear models with the same predictors. I will fit two models. One with Gift and Tenure for the whole training set. Then the same model only on my subset called TrainingSet.UnifBig.
GLMFit
GLMGiftTenureW <- glm(Fac.HOL_REP ~ Gift + TENURE, family=binomial(logit),
data=Training.Set)
summary(GLMGiftTenureW)
##
## Call:
## glm(formula = Fac.HOL_REP ~ Gift + TENURE, family = binomial(logit),
## data = Training.Set)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9609 -0.9278 -0.8212 1.4322 1.6689
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.417191 0.059042 -7.066 1.59e-12 ***
## Gift -0.361473 0.028880 -12.517 < 2e-16 ***
## TENURE -0.009661 0.003051 -3.167 0.00154 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 38131 on 29936 degrees of freedom
## Residual deviance: 37956 on 29934 degrees of freedom
## (63 observations deleted due to missingness)
## AIC: 37962
##
## Number of Fisher Scoring iterations: 4
exp(coef(GLMGiftTenureW)) # Exponentiated coefficients ("odds ratios")
## (Intercept) Gift TENURE
## 0.6588950 0.6966495 0.9903852
GLMGiftTenure <- glm(Fac.HOL_REP ~ Gift + TENURE, family=binomial(logit),
data=TrainingSet.UnifBig)
summary(GLMGiftTenure)
##
## Call:
## glm(formula = Fac.HOL_REP ~ Gift + TENURE, family = binomial(logit),
## data = TrainingSet.UnifBig)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9694 -0.9280 -0.8202 1.4277 1.6916
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.366763 0.083846 -4.374 1.22e-05 ***
## Gift -0.381202 0.040653 -9.377 < 2e-16 ***
## TENURE -0.012043 0.004338 -2.776 0.0055 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 19065 on 14972 degrees of freedom
## Residual deviance: 18964 on 14970 degrees of freedom
## (21 observations deleted due to missingness)
## AIC: 18970
##
## Number of Fisher Scoring iterations: 4
exp(coef(GLMGiftTenure)) # Exponentiated coefficients ("odds ratios")
## (Intercept) Gift TENURE
## 0.6929741 0.6830397 0.9880294
Now let me predict the probability of a purchase for each row in the other subset.
TrainingSet.UnifSmall$PredP <- predict(GLMGiftTenure, newdata=TrainingSet.UnifSmall, type="response")
summary(TrainingSet.UnifSmall)
## HOL_REP HOL_MVAL S_99HOL M_VAL99H
## Min. :0.0000 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median :0.0000 Median : 0.00 Median : 0.00 Median : 0.00
## Mean :0.3335 Mean : 50.27 Mean : 25.43 Mean : 47.97
## 3rd Qu.:1.0000 3rd Qu.: 46.44 3rd Qu.: 33.94 3rd Qu.: 43.94
## Max. :1.0000 Max. :3718.31 Max. :1010.53 Max. :2627.82
##
## S_00OFF M_VAL00O REC TENURE
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. :12.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 9.00 1st Qu.:16.00
## Median : 0.00 Median : 0.00 Median :14.00 Median :21.00
## Mean : 11.47 Mean : 23.68 Mean :13.47 Mean :18.95
## 3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.:21.00 3rd Qu.:21.00
## Max. :561.16 Max. :1620.34 Max. :34.00 Max. :34.00
##
## S_IDX N_S_IDX Self Gift
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.6394 Mean :0.3578 Mean :0.5259 Mean :0.2581
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :42 NA's :42
## J_9711 J_9712 J_9801
## Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000000 Median :0.000000 Median :0.000000
## Mean :0.002865 Mean :0.007664 Mean :0.001399
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :1.000000 Max. :1.000000 Max. :1.000000
##
## J_9802 J_9803 J_9804
## Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000000 Median :0.000000 Median :0.000000
## Mean :0.001133 Mean :0.004398 Mean :0.004731
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :1.000000 Max. :1.000000 Max. :1.000000
##
## J_9805 J_9806 J_9807
## Min. :0.000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
## Median :0.000000 Median :0.000000 Median :0.000000
## Mean :0.002865 Mean :0.005265 Mean :0.006197
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
## Max. :1.000000 Max. :1.000000 Max. :1.000000
##
## J_9808 J_9809 J_9810 J_9811
## Min. :0.000000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.000000 Median :0.00000 Median :0.00000 Median :0.0000
## Mean :0.007997 Mean :0.01126 Mean :0.02739 Mean :0.1513
## 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.000000 Max. :1.00000 Max. :1.00000 Max. :1.0000
##
## J_9812 J_9901 J_9902 J_9903
## Min. :0.000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.278 Mean :0.03905 Mean :0.03705 Mean :0.05538
## 3rd Qu.:1.000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.000 Max. :1.00000 Max. :1.00000 Max. :1.00000
##
## J_9904 J_9905 J_9906 J_9907
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.04945 Mean :0.05971 Mean :0.06084 Mean :0.05918
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
##
## J_9908 J_9909 Random
## Min. :0.00000 Min. :0.00000 Min. :0.0000179
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.2533203
## Median :0.00000 Median :0.00000 Median :0.4993271
## Mean :0.05764 Mean :0.06931 Mean :0.5011430
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.7495196
## Max. :1.00000 Max. :1.00000 Max. :0.9999999
##
## Random.1 Fac.J_9909 Fac.J_9908 Fac.J_9907 Fac.J_9906
## Min. :0.0000256 0:13966 0:14141 0:14118 0:14093
## 1st Qu.:0.2498218 1: 1040 1: 865 1: 888 1: 913
## Median :0.4950599
## Mean :0.4975722
## 3rd Qu.:0.7444121
## Max. :0.9999947
##
## Fac.J_9905 Fac.J_9904 Fac.J_9903 Fac.J_9902 Fac.J_9901 Fac.J_9812
## 0:14110 0:14264 0:14175 0:14450 0:14420 0:10835
## 1: 896 1: 742 1: 831 1: 556 1: 586 1: 4171
##
##
##
##
##
## Fac.J_9811 Fac.J_9810 Fac.J_9809 Fac.J_9808 Fac.J_9807 Fac.J_9806
## 0:12736 0:14595 0:14837 0:14886 0:14913 0:14927
## 1: 2270 1: 411 1: 169 1: 120 1: 93 1: 79
##
##
##
##
##
## Fac.J_9805 Fac.J_9804 Fac.J_9803 Fac.J_9802 Fac.J_9801 Fac.J_9712
## 0:14963 0:14935 0:14940 0:14989 0:14985 0:14891
## 1: 43 1: 71 1: 66 1: 17 1: 21 1: 115
##
##
##
##
##
## Fac.J_9711 Fac.HOL_REP UnifSampler BinSampler
## 0:14963 0:10002 Min. :0.0000866 Min. :0.0000
## 1: 43 1: 5004 1st Qu.:0.1266897 1st Qu.:0.0000
## Median :0.2486838 Median :1.0000
## Mean :0.2490167 Mean :0.5023
## 3rd Qu.:0.3716938 3rd Qu.:1.0000
## Max. :0.4999680 Max. :1.0000
##
## Pred PredP
## Min. :33.19 Min. :0.2391
## 1st Qu.:42.38 1st Qu.:0.2906
## Median :52.70 Median :0.3499
## Mean :49.09 Mean :0.3345
## 3rd Qu.:54.45 3rd Qu.:0.3609
## Max. :60.14 Max. :0.3749
## NA's :42 NA's :42
The variable PredP holds the probability of a purchase for this data. As you can see, no one is all that likely to purchase, it ranges from 0.25 to 0.36.
Now, for proof of concept and, ultimately, to output the data into a spreadsheet, we will need Data then Active Data set then Export active data set... with a dialog box like this.
ExportText
write.table(TrainingSet.UnifBig, "C:/Users/rwalker/Downloads/TrainingSet.UnifBig.csv", sep=",", col.names=TRUE, row.names=FALSE, quote=TRUE, na="NA")
write.table(TrainingSet.UnifSmall, "C:/Users/rwalker/Downloads/TrainingSet.UnifSmall.csv", sep=",", col.names=TRUE, row.names=FALSE, quote=TRUE, na="NA")
We can import the .csv file back into Excel. Note that unifsmall, has no estimated models, it has two predictions from models estimated on unifbig.
ExcelOut
library(effects)
##
## Attaching package: 'effects'
## The following object is masked from 'package:car':
##
## Prestige
LinearModel.4 <- lm(HOL_MVAL ~ Gift + bs(TENURE , df=5) + bs(S_99HOL, df=2), data=TrainingSet.UnifBig)
## Warning in bs(S_99HOL, df = 2): 'df' was too small; have used 3
summary(LinearModel.4)
##
## Call:
## lm(formula = HOL_MVAL ~ Gift + bs(TENURE, df = 5) + bs(S_99HOL,
## df = 2), data = TrainingSet.UnifBig)
##
## Residuals:
## Min 1Q Median 3Q Max
## -321.72 -28.60 -21.09 -5.81 2634.41
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.9404 3.6090 5.802 6.67e-09 ***
## Gift -7.5025 2.1803 -3.441 0.000581 ***
## bs(TENURE, df = 5)1 -1.7125 7.5798 -0.226 0.821257
## bs(TENURE, df = 5)2 -0.1247 5.8273 -0.021 0.982928
## bs(TENURE, df = 5)3 18.0086 8.1527 2.209 0.027197 *
## bs(TENURE, df = 5)4 -20.9876 15.8206 -1.327 0.184661
## bs(TENURE, df = 5)5 -5.6099 13.0083 -0.431 0.666287
## bs(S_99HOL, df = 2)1 558.5534 16.5212 33.808 < 2e-16 ***
## bs(S_99HOL, df = 2)2 -442.0437 77.3594 -5.714 1.12e-08 ***
## bs(S_99HOL, df = 2)3 1639.3299 113.5589 14.436 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 116.4 on 14963 degrees of freedom
## (21 observations deleted due to missingness)
## Multiple R-squared: 0.143, Adjusted R-squared: 0.1425
## F-statistic: 277.4 on 9 and 14963 DF, p-value: < 2.2e-16
plot(allEffects(LinearModel.4))
## Warning in bs(S_99HOL, df = 2): 'df' was too small; have used 3
## Warning in bs(S_99HOL, df = 2): 'df' was too small; have used 3
## Warning in bs(S_99HOL, df = 2): 'df' was too small; have used 3