Task: Regression model_Conversion rate


Develop a model for predicting the conversion rate of a customer.

Data Preprocessing

Using the gam4 library, we fit the model Additive Models: Each fj can be any of the different methods, such as Linear, Polynomial, Step Function, Degree-k spline, Natural cubic spline, and etc… The model is also a flexible and smooth technique which helps us to fit Linear Models which can be either linearly or non linearly dependent on several Predictors Xi to capture Non linear relationships between Response and Predictors.

##     ad_id xyz_campaign_id fb_campaign_id   age gender interest Impressions
## 1  708746             916         103916 30-34      M       15        7350
## 2  708749             916         103917 30-34      M       16       17861
## 3  708771             916         103920 30-34      M       20         693
## 4  708815             916         103928 30-34      M       28        4259
## 5  708818             916         103928 30-34      M       28        4133
## 6  708820             916         103929 30-34      M       29        1915
## 7  708889             916         103940 30-34      M       15       15615
## 8  708895             916         103941 30-34      M       16       10951
## 9  708953             916         103951 30-34      M       27        2355
## 10 708958             916         103952 30-34      M       28        9502
## 11 708979             916         103955 30-34      M       31        1224
## 12 709023             916         103962 30-34      M        7         735
## 13 709038             916         103965 30-34      M       16        5117
## 14 709040             916         103965 30-34      M       16        5120
## 15 709059             916         103968 30-34      M       20       14669
##    Clicks Spent Total_Conversion Approved_Conversion   InvoiceDate InvoiceNo
## 1       1  1.43                2                   1 12/01/10 8:26    536365
## 2       2  1.82                2                   0 12/01/10 8:26    536365
## 3       0  0.00                1                   0 12/01/10 8:26    536365
## 4       1  1.25                1                   0 12/01/10 8:26    536365
## 5       1  1.29                1                   1 12/01/10 8:26    536365
## 6       0  0.00                1                   1 12/01/10 8:26    536365
## 7       3  4.77                1                   0 12/01/10 8:26    536365
## 8       1  1.27                1                   1 12/01/10 8:28    536366
## 9       1  1.50                1                   0 12/01/10 8:28    536366
## 10      3  3.16                1                   0 12/01/10 8:34    536367
## 11      0  0.00                1                   0 12/01/10 8:34    536367
## 12      0  0.00                1                   0 12/01/10 8:34    536367
## 13      0  0.00                1                   0 12/01/10 8:34    536367
## 14      0  0.00                1                   0 12/01/10 8:34    536367
## 15      7 10.28                1                   1 12/01/10 8:34    536367
##      time hour
## 1  120108   08
## 2  120108   08
## 3  120108   08
## 4  120108   08
## 5  120108   08
## 6  120108   08
## 7  120108   08
## 8  120108   08
## 9  120108   08
## 10 120108   08
## 11 120108   08
## 12 120108   08
## 13 120108   08
## 14 120108   08
## 15 120108   08

Model try outs:

## Warning: Computation failed in `stat_smooth()`:
## could not find function "s"

## Warning: Computation failed in `stat_smooth()`:
## could not find function "ns"

## Warning: Removed 1 rows containing missing values (geom_smooth).

Additive model

## Loading required package: splines
## Loading required package: foreach
## Loaded gam 1.16.1
## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts argument
## ignored
## Call:
## gam(formula = Approved_Conversion ~ +s(interest, 6) + Impressions + 
##     gender_fac + age + Spent + cut(Clicks, breaks = c(0, 100, 
##     200, 300, Inf)) + Total_Conversion + hour, data = data)
## 
## Degrees of Freedom: 935 total; 914.0001 Residual
## 207 observations deleted due to missingness 
## Residual Deviance: 769.4782
## 
## Call: gam(formula = Approved_Conversion ~ +s(interest, 6) + Impressions + 
##     gender_fac + age + Spent + cut(Clicks, breaks = c(0, 100, 
##     200, 300, Inf)) + Total_Conversion + hour, data = data)
## Deviance Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4632 -0.4453 -0.1774  0.5367  5.8579 
## 
## (Dispersion Parameter for gaussian family taken to be 0.8419)
## 
##     Null Deviance: 3306.204 on 935 degrees of freedom
## Residual Deviance: 769.4782 on 914.0001 degrees of freedom
## AIC: 2518.888 
## 207 observations deleted due to missingness 
## 
## Number of Local Scoring Iterations: 2 
## 
## Anova for Parametric Effects
##                                                 Df  Sum Sq Mean Sq   F value
## s(interest, 6)                                   1    5.89    5.89    6.9949
## Impressions                                      1 1557.54 1557.54 1850.0705
## gender_fac                                       1   17.57   17.57   20.8735
## age                                              3   83.27   27.76   32.9716
## Spent                                            1  222.14  222.14  263.8658
## cut(Clicks, breaks = c(0, 100, 200, 300, Inf))   3   12.68    4.23    5.0188
## Total_Conversion                                 1  668.91  668.91  794.5486
## hour                                             5    2.34    0.47    0.5552
## Residuals                                      914  769.48    0.84          
##                                                   Pr(>F)    
## s(interest, 6)                                  0.008314 ** 
## Impressions                                    < 2.2e-16 ***
## gender_fac                                     5.579e-06 ***
## age                                            < 2.2e-16 ***
## Spent                                          < 2.2e-16 ***
## cut(Clicks, breaks = c(0, 100, 200, 300, Inf))  0.001867 ** 
## Total_Conversion                               < 2.2e-16 ***
## hour                                            0.734422    
## Residuals                                                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Anova for Nonparametric Effects
##                                                Npar Df Npar F    Pr(F)   
## (Intercept)                                                              
## s(interest, 6)                                       5 3.1035 0.008738 **
## Impressions                                                              
## gender_fac                                                               
## age                                                                      
## Spent                                                                    
## cut(Clicks, breaks = c(0, 100, 200, 300, Inf))                           
## Total_Conversion                                                         
## hour                                                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##                                             (Intercept) 
##                                            2.004944e-01 
##                                          s(interest, 6) 
##                                           -4.674917e-03 
##                                             Impressions 
##                                            1.993109e-06 
##                                             gender_fac1 
##                                           -1.270960e-01 
##                                                age35-39 
##                                            5.373149e-02 
##                                                age40-44 
##                                            6.521062e-02 
##                                                age45-49 
##                                            9.766814e-02 
##                                                   Spent 
##                                           -8.346374e-03 
## cut(Clicks, breaks = c(0, 100, 200, 300, Inf))(100,200] 
##                                            1.630005e-01 
## cut(Clicks, breaks = c(0, 100, 200, 300, Inf))(200,300] 
##                                            5.518351e-01 
## cut(Clicks, breaks = c(0, 100, 200, 300, Inf))(300,Inf] 
##                                           -4.772973e-01 
##                                        Total_Conversion 
##                                            3.357723e-01 
##                                                  hour09 
##                                            4.647636e-02 
##                                                  hour10 
##                                           -1.344962e-01 
##                                                  hour11 
##                                           -5.363080e-02 
##                                                  hour12 
##                                            2.279031e-02 
##                                                  hour13 
##                                            1.079674e-01

Effect of influencing factors on the change of failure number

## Warning in gplot.numeric(x = c(7350L, 17861L, 4259L, 4133L, 15615L, 10951L, :
## Residuals do not match x in "partial for Impressions" preplot object

## Warning in gplot.numeric(x = c(1.42999995, 1.82000002, 1.25, 1.28999996, :
## Residuals do not match x in "partial for Spent" preplot object
## Warning in gplot.numeric(x = c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
## Residuals do not match x in "partial for Total_Conversion" preplot object

## Warning in gplot.default(x = c("08", "08", "08", "08", "08", "08", "08", : The
## "x" component of "partial for hour" has class "character"; no gplot() methods
## available
## [1] 769.4782

Generalized Additive Models are a very nice and effective way of fitting Linear Models which depends on some smooth and flexible Non linear functions fitted on some predictors to capture Non linear relationships in the data. The best part is that they lead to interpretable Models.