The Project on determining which online actions result in store visits

Project Goal

The Goal of the Project mainly focuses on the question of how online actions influence the number of offline store visits. There are two datasets provided, that is, “online data” and “offline data” which contains the information of online and offline behaviors of consumers for 2013 and that of first 11 weeks for 2014. In this project, A Poisson model is created to decribe the relationship between the Store Traffic and online behaviors, which accordingly, could provide corresponding suggestions to business decision.

library(rJava)
library(xlsxjars)
library(xlsx)
library(GGally)
library(ggplot2)
library(dplyr)
library(influence.ME)

Data Loading

Load the consumer behavior datasets respectively.
The Online Data is stored in the variable of ‘onelinedata’.
The Offline Data is stored in the variable of ‘offlinedata’.

setwd("E:/ResumeApplication/Teleflora")
onlinedata<-read.xlsx("Data Modeling Exercise.xlsx",sheetIndex=2,header=TRUE)
offlinedata<-read.xlsx("Data Modeling Exercise.xlsx",sheetIndex=3,header=TRUE)

Exploratory Data Analysis

The online data is composed of 222 objects of 31 variables.
Any online actions that start with a “m” indicate it’s an action performed on the smartphone.
Any online actions that start with a “pp” indicate it’s an action performed on the product page.

str(onlinedata)

## 'data.frame':    222 obs. of  31 variables:
##  $ Week                     : num  1 1 1 1 1 1 2 2 2 2 ...
##  $ Year                     : num  2014 2014 2014 2014 2014 ...
##  $ Region                   : Factor w/ 6 levels "Bentonville",..: 5 2 3 4 1 6 5 2 3 4 ...
##  $ chat.form                : num  7 5 14 0 1 3 14 11 14 0 ...
##  $ contact.us               : num  1 3 4 0 1 1 0 1 1 0 ...
##  $ create.account           : num  2 3 3 0 2 0 1 0 3 0 ...
##  $ make.appt                : num  2 1 2 0 1 2 3 2 5 0 ...
##  $ mchat                    : num  3 1 6 1 2 2 5 3 17 0 ...
##  $ mfind.store.phone.call   : num  12 5 21 0 2 4 4 10 23 0 ...
##  $ mmain.phone.call         : num  24 16 55 7 17 10 29 12 57 2 ...
##  $ msend.appt.request.ty    : num  2 2 3 0 0 5 1 0 2 0 ...
##  $ calculator               : num  105 107 251 14 39 33 96 80 257 13 ...
##  $ find.store               : num  81 111 212 7 48 49 84 106 203 6 ...
##  $ get.directions           : num  8 11 9 0 4 3 11 14 28 0 ...
##  $ visit.store              : num  172 132 194 11 93 48 162 110 213 9 ...
##  $ mFinancing               : num  19 32 57 2 6 8 12 22 54 10 ...
##  $ mFind.a.Store            : num  101 92 283 12 50 82 90 104 261 8 ...
##  $ mMap                     : num  1 1 5 0 1 1 6 4 6 0 ...
##  $ Print.Coupon             : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ mVisit.Store             : num  36 46 93 4 25 33 35 22 82 2 ...
##  $ mBrowse.products         : num  1 2 6 0 2 0 2 0 6 0 ...
##  $ mBrowse.Product.Selection: num  1290 1017 2923 109 589 ...
##  $ mDesign.Style            : num  244 223 652 21 119 98 202 207 601 20 ...
##  $ mProduct.Type            : num  73 71 174 5 33 25 56 72 166 9 ...
##  $ mSequential.Step         : num  173 136 447 25 77 69 154 151 386 14 ...
##  $ ppBudget                 : num  130 119 245 11 44 31 114 108 256 15 ...
##  $ ppDesign.Style           : num  91 93 236 11 37 37 91 98 225 16 ...
##  $ ppEmail.a.Friend         : num  9 8 19 2 6 4 12 9 14 0 ...
##  $ ppProduct.Type           : num  1490 1231 2973 123 661 ...
##  $ ppSecond.Step            : num  344 296 615 40 169 124 327 342 640 58 ...
##  $ ppSocial.ShareProduct    : num  7 5 7 1 1 1 3 5 12 2 ...

The offline data is composed of 282 objects of 4 variables. They are “Week”, “Year”, “Region” and “Store.Traffic”.

str(offlinedata)

## 'data.frame':    282 obs. of  4 variables:
##  $ Week         : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Year         : num  2013 2013 2013 2013 2013 ...
##  $ Region       : Factor w/ 6 levels "Bentonville",..: 5 2 3 4 1 6 5 2 3 4 ...
##  $ Store.Traffic: num  313 294 407 62 164 0 337 296 418 55 ...

head(offlinedata)

##   Week Year      Region Store.Traffic
## 1    1 2013  Sacramento           313
## 2    1 2013    Brooklyn           294
## 3    1 2013      Denver           407
## 4    1 2013     Phoenix            62
## 5    1 2013 Bentonville           164
## 6    1 2013   Tampa Bay             0

In order to merge the two datasets, explore the common column names of online data and offline data first.They are “Week”,“Year”,“Region”.

intersect(names(onlinedata),names(offlinedata))

## [1] "Week"   "Year"   "Region"

The tables below show the information of “Year” and “Week” in the online data and offline data respectively. As it is indicated, there is only offline and no online data recorded from 1st to 11th week of 2013. In order to determine which online actions are correlated to store visits, we only focus on the data from the 27th week of 2013 to the 11th week of 2014.

 table(onlinedata$Year,onlinedata$Week)

##       
##        1 2 3 4 5 6 7 8 9 10 11 27 28 29 30 31 32 33 34 35 36 37 38 39 40
##   2013 0 0 0 0 0 0 0 0 0  0  0  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   2014 6 6 6 6 6 6 6 6 6  6  6  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##       
##        41 42 43 44 45 46 47 48 49 50 51 52
##   2013  6  6  6  6  6  6  6  6  6  6  6  6
##   2014  0  0  0  0  0  0  0  0  0  0  0  0

 table(offlinedata$Year,offlinedata$Week)

##       
##        1 2 3 4 5 6 7 8 9 10 11 27 28 29 30 31 32 33 34 35 36 37 38 39 40
##   2013 6 5 5 5 5 5 5 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   2014 6 6 6 6 6 6 6 6 6  6  6  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##       
##        41 42 43 44 45 46 47 48 49 50 51 52
##   2013  6  6  6  6  6  6  6  6  6  6  6  6
##   2014  0  0  0  0  0  0  0  0  0  0  0  0

Merge the online data and offline data together with all common column names

mergedData<-merge(onlinedata,offlinedata,all=TRUE)
str(mergedData)

## 'data.frame':    282 obs. of  32 variables:
##  $ Week                     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Year                     : num  2013 2013 2013 2013 2013 ...
##  $ Region                   : Factor w/ 6 levels "Bentonville",..: 1 2 3 4 5 6 1 2 3 4 ...
##  $ chat.form                : num  NA NA NA NA NA NA 1 5 14 0 ...
##  $ contact.us               : num  NA NA NA NA NA NA 1 3 4 0 ...
##  $ create.account           : num  NA NA NA NA NA NA 2 3 3 0 ...
##  $ make.appt                : num  NA NA NA NA NA NA 1 1 2 0 ...
##  $ mchat                    : num  NA NA NA NA NA NA 2 1 6 1 ...
##  $ mfind.store.phone.call   : num  NA NA NA NA NA NA 2 5 21 0 ...
##  $ mmain.phone.call         : num  NA NA NA NA NA NA 17 16 55 7 ...
##  $ msend.appt.request.ty    : num  NA NA NA NA NA NA 0 2 3 0 ...
##  $ calculator               : num  NA NA NA NA NA NA 39 107 251 14 ...
##  $ find.store               : num  NA NA NA NA NA NA 48 111 212 7 ...
##  $ get.directions           : num  NA NA NA NA NA NA 4 11 9 0 ...
##  $ visit.store              : num  NA NA NA NA NA NA 93 132 194 11 ...
##  $ mFinancing               : num  NA NA NA NA NA NA 6 32 57 2 ...
##  $ mFind.a.Store            : num  NA NA NA NA NA NA 50 92 283 12 ...
##  $ mMap                     : num  NA NA NA NA NA NA 1 1 5 0 ...
##  $ Print.Coupon             : num  NA NA NA NA NA NA 0 0 0 0 ...
##  $ mVisit.Store             : num  NA NA NA NA NA NA 25 46 93 4 ...
##  $ mBrowse.products         : num  NA NA NA NA NA NA 2 2 6 0 ...
##  $ mBrowse.Product.Selection: num  NA NA NA NA NA ...
##  $ mDesign.Style            : num  NA NA NA NA NA NA 119 223 652 21 ...
##  $ mProduct.Type            : num  NA NA NA NA NA NA 33 71 174 5 ...
##  $ mSequential.Step         : num  NA NA NA NA NA NA 77 136 447 25 ...
##  $ ppBudget                 : num  NA NA NA NA NA NA 44 119 245 11 ...
##  $ ppDesign.Style           : num  NA NA NA NA NA NA 37 93 236 11 ...
##  $ ppEmail.a.Friend         : num  NA NA NA NA NA NA 6 8 19 2 ...
##  $ ppProduct.Type           : num  NA NA NA NA NA ...
##  $ ppSecond.Step            : num  NA NA NA NA NA NA 169 296 615 40 ...
##  $ ppSocial.ShareProduct    : num  NA NA NA NA NA NA 1 5 7 1 ...
##  $ Store.Traffic            : num  164 294 407 62 313 0 133 296 418 55 ...

Removing the missing values in the dataset and stored them into ‘mergedData2’

which(mergedData$Week==27)

## [1] 127 128 129 130 131 132

mergedData2<-mergedData[127:282,]

In order to model the dataset better, convert the factor variable ‘Region’ into numeric type.
Bentonville=1
Brooklyn=2
Denver=3
Phoenix=4
Sacramento=5
Tampa Bay=6
As we can see, the merged data consists of 156 objects and 32 variables. The project is to create a model with the variable of “Store.Traffic” as the outcome and the other 31 variables as the possible predictors.

mergedData2$Region<-as.numeric(mergedData2$Region)
str(mergedData2)

## 'data.frame':    156 obs. of  32 variables:
##  $ Week                     : num  27 27 27 27 27 27 28 28 28 28 ...
##  $ Year                     : num  2013 2013 2013 2013 2013 ...
##  $ Region                   : num  1 2 3 4 5 6 1 2 3 4 ...
##  $ chat.form                : num  4 3 7 0 2 0 2 4 10 2 ...
##  $ contact.us               : num  1 1 1 0 0 1 1 1 5 0 ...
##  $ create.account           : num  1 1 1 0 0 0 0 2 3 0 ...
##  $ make.appt                : num  0 0 0 0 0 0 0 0 2 0 ...
##  $ mchat                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mfind.store.phone.call   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mmain.phone.call         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ msend.appt.request.ty    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ calculator               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ find.store               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ get.directions           : num  3 7 16 0 4 2 5 6 15 2 ...
##  $ visit.store              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mFinancing               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mFind.a.Store            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mMap                     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Print.Coupon             : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ mVisit.Store             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mBrowse.products         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mBrowse.Product.Selection: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mDesign.Style            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mProduct.Type            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mSequential.Step         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ ppBudget                 : num  4 20 48 5 19 4 21 74 146 13 ...
##  $ ppDesign.Style           : num  8 27 56 2 26 3 20 73 127 33 ...
##  $ ppEmail.a.Friend         : num  1 3 9 0 7 1 1 4 9 0 ...
##  $ ppProduct.Type           : num  313 832 1658 97 724 ...
##  $ ppSecond.Step            : num  54 140 281 23 148 34 118 268 608 76 ...
##  $ ppSocial.ShareProduct    : num  0 3 6 0 5 0 2 1 5 1 ...
##  $ Store.Traffic            : num  207 319 521 74 293 96 172 257 514 77 ...

Explore Correlation before model creating

The correlation between Store.Traffic and the other variables is shown below. ppSecond.Step,ppProduct.Type and ppDesign.Style is the top 3 variables which are most related to the Outcome.

cor<-sort(cor(mergedData2,method="pearson")[32,],decreasing=TRUE)
cor

##             Store.Traffic             ppSecond.Step 
##                1.00000000                0.93264116 
##            ppProduct.Type            ppDesign.Style 
##                0.93155491                0.90053394 
##                  ppBudget            get.directions 
##                0.88983663                0.85822139 
##                 chat.form          ppEmail.a.Friend 
##                0.84286271                0.82587836 
##          mSequential.Step mBrowse.Product.Selection 
##                0.81154264                0.81095333 
##             mDesign.Style             mProduct.Type 
##                0.80436577                0.79719714 
##             mFind.a.Store                find.store 
##                0.79676185                0.76484383 
##    mfind.store.phone.call                      mMap 
##                0.76417260                0.75989117 
##                mFinancing          mmain.phone.call 
##                0.75501863                0.74612196 
##              mVisit.Store     ppSocial.ShareProduct 
##                0.74263542                0.72786247 
##                calculator               visit.store 
##                0.71221332                0.65975371 
##     msend.appt.request.ty                 make.appt 
##                0.61212320                0.59558021 
##            create.account          mBrowse.products 
##                0.57806719                0.56731943 
##                     mchat                contact.us 
##                0.55419289                0.43839955 
##              Print.Coupon                      Week 
##                0.09536204                0.07381276 
##                    Region 
##               -0.19972890

Model Creating

1. Poission Regression Model with ppSecond.Step as the predictor and Store.Traffic as the outcome

For the counts data with no bounded upper limit, Possion model is often used to describe the relationships between predictors and outcome.

lm1<-lm(I(log(Store.Traffic+1))~ppSecond.Step,data=mergedData2)
summary(lm1)

## 
## Call:
## lm(formula = I(log(Store.Traffic + 1)) ~ ppSecond.Step, data = mergedData2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77368 -0.27688  0.03827  0.24399  0.94607 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.4498699  0.0452306   98.38   <2e-16 ***
## ppSecond.Step 0.0030667  0.0001353   22.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3353 on 154 degrees of freedom
## Multiple R-squared:  0.7693, Adjusted R-squared:  0.7678 
## F-statistic: 513.5 on 1 and 154 DF,  p-value: < 2.2e-16

As it is indicated above, the predictor “ppSecond.Step” is significant in the model with 0.001 significance level.
Residuals Standard Error reaches 0.3353.
The adjusted R squared is 0.7678.

round(exp(coef(lm1)),5)

##   (Intercept) ppSecond.Step 
##      85.61580       1.00307

The coefficients of the model are shown above.
85.61580 is the estimated geometric mean store traffic when there is no ‘ppSecond.Step’.
When ‘ppSecond.Step’ increases 1, the Store Traffic would increase 0.307%.

plot(lm1,which=1)

The Residuals-Fitted plot of the model is shown below. For the ideal model, the residuals points should spread randomly and the total residuals should be 0.
In this plot, the residuals are mostly negative when the fitted value is small, positive when the fitted value is in the middle and negative when the fitted value is large. The mean residual changes with the fitted value. There are some patterns still need to be explored.

2.Poission Regression with full model.

Take the variable of “Store.Traffic” as the Outcome and other variables as predictors to create the Possion Regression Model.

lm2<-lm(I(log(Store.Traffic+1))~.,data=mergedData2)
summary(lm2)

## 
## Call:
## lm(formula = I(log(Store.Traffic + 1)) ~ ., data = mergedData2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57856 -0.17813  0.00953  0.15930  0.62791 
## 
## Coefficients: (1 not defined because of singularities)
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                4.7450314  0.2006506  23.648  < 2e-16 ***
## Week                      -0.0023425  0.0047326  -0.495  0.62149    
## Year                              NA         NA      NA       NA    
## Region                    -0.0744006  0.0154158  -4.826 3.98e-06 ***
## chat.form                 -0.0042367  0.0111490  -0.380  0.70459    
## contact.us                -0.0390670  0.0129983  -3.006  0.00320 ** 
## create.account             0.0066206  0.0203568   0.325  0.74555    
## make.appt                 -0.0122821  0.0221770  -0.554  0.58069    
## mchat                     -0.0009696  0.0152434  -0.064  0.94939    
## mfind.store.phone.call    -0.0050440  0.0103714  -0.486  0.62758    
## mmain.phone.call          -0.0094214  0.0055857  -1.687  0.09416 .  
## msend.appt.request.ty     -0.0333352  0.0212407  -1.569  0.11908    
## calculator                -0.0024568  0.0014008  -1.754  0.08191 .  
## find.store                -0.0035532  0.0015435  -2.302  0.02299 *  
## get.directions             0.0162739  0.0084554   1.925  0.05654 .  
## visit.store                0.0031461  0.0010113   3.111  0.00231 ** 
## mFinancing                 0.0040125  0.0045709   0.878  0.38171    
## mFind.a.Store              0.0044726  0.0018634   2.400  0.01786 *  
## mMap                       0.0002982  0.0154004   0.019  0.98458    
## Print.Coupon              -0.0307311  0.0563032  -0.546  0.58617    
## mVisit.Store              -0.0026630  0.0028820  -0.924  0.35725    
## mBrowse.products           0.0308502  0.0386870   0.797  0.42671    
## mBrowse.Product.Selection  0.0007498  0.0004013   1.869  0.06401 .  
## mDesign.Style             -0.0028788  0.0021333  -1.349  0.17963    
## mProduct.Type             -0.0041307  0.0038377  -1.076  0.28384    
## mSequential.Step          -0.0005790  0.0024279  -0.238  0.81192    
## ppBudget                   0.0001179  0.0021444   0.055  0.95625    
## ppDesign.Style            -0.0004033  0.0025634  -0.157  0.87524    
## ppEmail.a.Friend          -0.0006761  0.0096192  -0.070  0.94408    
## ppProduct.Type             0.0009074  0.0002499   3.631  0.00041 ***
## ppSecond.Step             -0.0001579  0.0009570  -0.165  0.86922    
## ppSocial.ShareProduct      0.0137292  0.0162220   0.846  0.39898    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2693 on 125 degrees of freedom
## Multiple R-squared:  0.8792, Adjusted R-squared:  0.8502 
## F-statistic: 30.33 on 30 and 125 DF,  p-value: < 2.2e-16

As it is indicated, not all variables are significant. The variables which are significant in 0.05 significance level for the model are shown below.
“Region”
“contact.us”
“find.store”
“visit.store”
“mFind.a.Store”
“ppProduct.Type”
The Residuals Standard Error reaches 0.2693.
The Adjusted R squared is 0.8502.

plot(lm2,which=1)

The Residuals-Fitted plot is improved better. The mean residuals reach almost 0 for each range of fitted values.

3. Poisson Regression with modified model

Take the significant variables in full model as the predictor in the modified one and “Store Traffic” as the outcome.

lm3<-lm(I(log(Store.Traffic+1))~Region+contact.us+find.store+visit.store+mFind.a.Store+ppProduct.Type,data=mergedData2)
summary(lm3)

## 
## Call:
## lm(formula = I(log(Store.Traffic + 1)) ~ Region + contact.us + 
##     find.store + visit.store + mFind.a.Store + ppProduct.Type, 
##     data = mergedData2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6972 -0.1860  0.0410  0.1941  0.7163 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     4.662e+00  6.785e-02  68.707  < 2e-16 ***
## Region         -5.857e-02  1.420e-02  -4.126 6.13e-05 ***
## contact.us     -3.064e-02  1.085e-02  -2.824  0.00539 ** 
## find.store     -7.261e-03  1.174e-03  -6.188 5.62e-09 ***
## visit.store     3.313e-03  7.956e-04   4.164 5.28e-05 ***
## mFind.a.Store   2.311e-03  8.190e-04   2.822  0.00542 ** 
## ppProduct.Type  8.263e-04  5.932e-05  13.930  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2928 on 149 degrees of freedom
## Multiple R-squared:  0.8297, Adjusted R-squared:  0.8229 
## F-statistic:   121 on 6 and 149 DF,  p-value: < 2.2e-16

As it is indicated above, all variables are significant in 0.01 significance level.
Residual Standard Error reaches 0.2928.
Adjusted R-squared is 0.8229.
Relative to the Full model, the RSE is increased and R-squared is decreased.

round(exp(coef(lm3)),5)

##    (Intercept)         Region     contact.us     find.store    visit.store 
##      105.84120        0.94311        0.96982        0.99277        1.00332 
##  mFind.a.Store ppProduct.Type 
##        1.00231        1.00083

As it is shown above, 105.84120 is the estimated geometric mean store traffic when holding the other variables 0.
The variables of “Region”,“contact.us”,“find.store” are negative to the “Store.Traffic”.
The variables of “visit.store”, “mFind.a.Store”,“ppProduct.Type” are positive to the “Store.Traffic”.

plot(lm3,which=1)

There is no big change of “Residual-fitted” plot relative to full model.

4. Select the model with stepwise search

In order to create the optimized model, Bayesian Information Criterion (BIC) algorithm is used for model selection.
The model with the lowest BIC score is the best one.
The process is taken by forward selection and backward elmination methods by BIC algorithm.
By comparing the BIC improvements from dropping each candidate variable and adding each variable, an optimized model would be arrived with the best BIC improvement (smallest BIC).

lm4<-step(lm2,direction="both",k=log(nrow(mergedData2)))

summary(lm4)

## 
## Call:
## lm(formula = I(log(Store.Traffic + 1)) ~ Region + contact.us + 
##     calculator + find.store + get.directions + visit.store + 
##     mFind.a.Store + mDesign.Style + ppProduct.Type, data = mergedData2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.63073 -0.18537  0.01773  0.17270  0.69194 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     4.623e+00  6.265e-02  73.799  < 2e-16 ***
## Region         -6.470e-02  1.312e-02  -4.932 2.19e-06 ***
## contact.us     -3.440e-02  1.004e-02  -3.425 0.000799 ***
## calculator     -2.463e-03  7.083e-04  -3.477 0.000668 ***
## find.store     -5.147e-03  1.210e-03  -4.255 3.72e-05 ***
## get.directions  1.972e-02  7.774e-03   2.537 0.012240 *  
## visit.store     3.378e-03  7.482e-04   4.515 1.30e-05 ***
## mFind.a.Store   3.611e-03  1.067e-03   3.385 0.000914 ***
## mDesign.Style  -1.183e-03  5.214e-04  -2.270 0.024670 *  
## ppProduct.Type  8.464e-04  8.482e-05   9.979  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2678 on 146 degrees of freedom
## Multiple R-squared:  0.8604, Adjusted R-squared:  0.8518 
## F-statistic:   100 on 9 and 146 DF,  p-value: < 2.2e-16

All variables in the stepwise model are significant in 0.05 significance level.They are listed below.
“Region”
“contact.us”
“calculator”
“find.store”
“get.directions”
“visit.store”
“mFind.a.Store”
“mDesign.Style”
“ppProduct.Type”
The Residual Standard Error reaches 0.2678.
The Adjusted R squared is 0.8518.
Relative to the modified model, the RSE is decreased and Adjusted R squared is improved.

round(exp(coef(lm4)),5)

##    (Intercept)         Region     contact.us     calculator     find.store 
##      101.84680        0.93735        0.96618        0.99754        0.99487 
## get.directions    visit.store  mFind.a.Store  mDesign.Style ppProduct.Type 
##        1.01992        1.00338        1.00362        0.99882        1.00085

As it is shown above, 101.84680 is the estimated geometric mean store traffic when holding the other variables 0.
“Region”,“contact.us”,“calculator”,“find.store”,“mDesign.Style” are negative to the “Store Traffic”.
“get.directions”,“visit.store”,“mFind.a.Store”,“ppProduct.Type” are positive to the “Store Traffic”.

plot(lm4,which=1)

plot(lm4,which=3)

From the Residual-Fitted plot and standardized one, the points are spreaded randomly, which shows that the model describes the relationship well.
Points 131, 214,178 are the outliers of the plot.

plot(lm4,which=4)

From the Cook’s distance plot, the 1st 3 influential points are points 141,147,261.

Find the influential points

influential<-cooks.distance(lm4)
head(sort(influential,decreasing=TRUE),3)

##        147        141        261 
## 0.56351625 0.10392203 0.09385565

Remove the influential points and create the regression model again

mergedData<-mergedData[-c(131,178,214,147),]
which(mergedData$Week==27)

## [1] 127 128 129 130 131

mergedData3<-mergedData[127:278,]
mergedData3$Region<-as.numeric(mergedData3$Region)
lm5<-lm(I(log(Store.Traffic+1))~.,data=mergedData3)

lm6<-step(lm5,direction="both",k=log(nrow(mergedData3)))

summary(lm6)

## 
## Call:
## lm(formula = I(log(Store.Traffic + 1)) ~ Region + calculator + 
##     find.store + get.directions + visit.store + mFind.a.Store + 
##     mProduct.Type + ppProduct.Type, data = mergedData3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.52718 -0.18290  0.01793  0.15998  0.48443 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     4.633e+00  5.877e-02  78.825  < 2e-16 ***
## Region         -6.838e-02  1.213e-02  -5.640 8.80e-08 ***
## calculator     -2.267e-03  6.546e-04  -3.463 0.000705 ***
## find.store     -5.935e-03  1.185e-03  -5.007 1.61e-06 ***
## get.directions  2.252e-02  7.120e-03   3.163 0.001904 ** 
## visit.store     3.018e-03  6.960e-04   4.336 2.72e-05 ***
## mFind.a.Store   4.875e-03  1.080e-03   4.516 1.31e-05 ***
## mProduct.Type  -5.135e-03  1.647e-03  -3.118 0.002204 ** 
## ppProduct.Type  7.986e-04  7.523e-05  10.615  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2488 on 143 degrees of freedom
## Multiple R-squared:  0.8743, Adjusted R-squared:  0.8672 
## F-statistic: 124.3 on 8 and 143 DF,  p-value: < 2.2e-16

As we can see, all the variables in the model are significant in 0.01 significance level. They are listed below.
“Region”
“calculator”
“find.store”
“get.directions”
“visit.store”
“mFind.a.Store”
“mProduct.Type”
“ppProduct.Type”
The Residual Standard Error reaches 0.2488.
The Adjusted R-squared is 0.8672.
Relative to the stepwise model, the two parameters are improved slightly.
This is the best model obtained finally.

round(exp(coef(lm6)),5)

##    (Intercept)         Region     calculator     find.store get.directions 
##      102.80374        0.93390        0.99774        0.99408        1.02278 
##    visit.store  mFind.a.Store  mProduct.Type ppProduct.Type 
##        1.00302        1.00489        0.99488        1.00080

As it is shown above, 102.80374 is the estimated geometric mean store traffic when holding the other variables 0.
The variables of “Region”, “calculator”,“mProduct.Type”,“find.store” are negative to the “Store Traffic”.
The variables of “get.directions”,“visit.store”,“mFind.a.Store”,“ppProduct.Type” are positive to the “Store Traffic”.

plot(lm6,which=1)

plot(lm6,which=3)

From the residual-fitted plot and the standardized one, the residual points are spreaded randomly, which indicates the model could describe the relationship well.

plot(lm6,which=4)

The Cook’s Distance for all points are within 0.12, which would not influence the model greatly.

Conclusion

In the optimized model, 102.80374 is the estimated geometric mean store traffic when holding the other variables 0.
The variables of “Region”, “calculator”,“mProduct.Type”,“find.store” are negative to the “Store Traffic”. We should minimize the influence of the factors as much as possible.
The variables of “get.directions”,“visit.store”,“mFind.a.Store”,“ppProduct.Type” are positive to the “Store Traffic”.The improvement in this field would contribute to the increase of Store Traffic.

The Percentage of change on Store Traffic influenced by other factors

results<-data.frame(behaviors=c("Region","calculator","mProduct.Type","find.store","get.directions","visit.store","mFind.a.Store","ppProduct.Type"),percentage=c("-6.61%","-0.226%","-0.512%","-0.592%","2.278%","0.302%","0.489%","0.080%"))
results

##        behaviors percentage
## 1         Region     -6.61%
## 2     calculator    -0.226%
## 3  mProduct.Type    -0.512%
## 4     find.store    -0.592%
## 5 get.directions     2.278%
## 6    visit.store     0.302%
## 7  mFind.a.Store     0.489%
## 8 ppProduct.Type     0.080%