GLM: Banner Advertising Effectiveness

MKT382 Individual Assignment 2

Your client is a luxury hotel chain which owns hotels in exotic tourist destinations across the world, including Middle America, South America, Asia and Southern Europe. The chain uses banner advertising on various sites to drive traffic to their site. You are hired as a consultant to determine the relative effectiveness (in terms of driving traffic to the chain’s website) of different types of banner ads, the website characteristics as well as the time of the day the banner ads were shown.

Recently, a test campaign was conducted in which banners with different content were featured on multiple websites. A randomized design was used, so that no specific strategy was used in choosing the content/website/time combination. For each hour of the day, the number of click-throughs to their website per hour was recorded as well as the characteristics of the originating website.

Your Assignment:

Your task is to improve the ROI of banner ads by selecting more effective website and also banner showing time. You will analyze the click-through data to provide input into identifying characteristics of websites for which banner ads have higher click-throughs and at which time of the day and day of the week.

Data and Variable Definitions:

Click_Through: “number of click-through during a period”

Hour: “Hour in 24-hour format”

Day: “The day in 7-day format”

English: “1 if the website’s default language is English”

Travel: “1 if the website is travel related, 0 otherwise”

Shopping: “1 if the website is shopping related, 0 otherwise”

Search: “1 if the website is a search engine, 0 otherwise”

Promo: “1 if the promotion is mentioned in the ad, 0 otherwise”

Flash: “1 if the promotion consists of a flash animation, 0 otherwise”

Size: “1: Button (120 x 90), 2: Half Banner (234 x 60), 3: Skyscraper (160 x 600), 4: Square (336 x 280), 5: Full (468 x 60)”

Popup: “1 if the ad is shown as a popup, 0 otherwise”

load("BannerAdvertising.RData")
ads = X
knitr::kable(head(ads, 10))

Click_Through	Hour	Day	English	Travel	Shopping	Search	Promo	Flash	Size	Popup
270236	21	2	0	1	0	0	1	0	4	1
52	21	1	1	1	1	0	1	1	2	1
77	15	5	1	1	0	1	0	0	1	1
61	14	4	0	1	0	0	1	0	2	1
97	17	3	0	0	1	0	1	1	5	0
80698	23	2	0	0	0	1	0	0	3	1
13	21	1	1	0	1	1	0	1	2	0
27	3	5	0	0	0	1	1	0	2	1
80539	10	4	1	0	1	0	1	0	3	0
256687	13	6	1	0	1	0	1	1	4	0

1. Descriptive Statistics

As mentioned above, the “Click-Through” (CT) variable represents the number of click-throughs from a given banner ad to our client’s website per hour. In order to develop an initial hypothesis regarding the optimal “content/website/time” combination, I’ve extracted the outliers w.r.t. CT and compared the summary statistics of high performing banner ads (outliers) to that of “normal” banner ads (non-outliers). Noteworthy findings are discussed below.

Click-Through Statistics for Original Dataset:

Mean: 73,288.31 Standard Deviation: 122,885.90 Mode: 14

Key Findings

Outliers constitute 6.66% of the original 10,000 observations. Likewise, the anomalous observations range from banners ads which generated approximately 295K click-throughs to those that generated 942K click-throughs.

Banner Ad Type Comparison

Size
High performing banner ads utilized a banner size of either 3: Skyscraper (160 x 600) or 4: Square (336 x 280) exclusively, whereas “normal” observations utilized the full range of banner sizes. Thus, we should expect the coefficient estimates for sizes 3 and 4 to be significant and larger in magnitude relative to the coefficient estimates for sizes 1, 2, and 5.

Promotion
Approximately 65% of high-performing banner ads mentioned a promotion directly in the ad, versus only 48% of banner ads exhibiting “normal” performance.

Flash
The presence of flash animation was roughly equal in both groups based on the table results. Thus, I’ve omitted the visualization.

Pop-up
There was not much of a difference between the two groups, however, high-performing banner ads utilized the pop-up functionality slightly more than the their lower performing counterparts.

Website Type Comparison

English
High-performing banner ads were more likely to be featured on websites where English is the default language.

Travel
Likewise, high-performing banner ads are very likely to be found on travel websites. Given the difference between the two groups, we can expect the coefficient estimate for the travel variable to be significant and large in magnitude relative to the other variables that are indicative of website characteristics (“Shopping”, “Search”, and “English”).

Shopping
High-performing banner ads were less likely to be found on websites that are shopping related.

Search
There was not much of a difference between the two groups with respect to whether or not the banner ad was featured on a search engine. Thus, I’d expect this variable to be insignificant in terms of its effect on the outcome variable - # of click-throughs per hour. For this reason, I’ve omitted the visualization.

Hour and Day Comparison

Hour
In inspecting the graph corresponding to the “hour” variable, we can see that high-performing banner ads experience the most click-throughs to the client’s website during hours 12-16 and hours 21-23.

Day
Likewise, high performing banner ads were more likely to generate click-throughs on weekends versus weekdays.

Linear Regression

Try two different transformations for the hourly and daily data. Interpret the coefficients and compare results for the two transformation specifications.

Under this model we seek to find parameters (standard deviation, mean), which maximize the log likelihood.

Transformation 1 - Turn hour, day, and size variables into dummy variables

Interpretation of Results

It’s important to note that in transforming these variables to dummy variables, the “base case” in each category has been omitted. Specifically, I’ve omitted Sunday (day 7) from the day variable, hour 23 from the hour variable, and size 5 from the banner size variable. These omissions have important implications for the interpretation of the resulting coefficient estimates. Specifically, one must interpret the coefficient estimates in relation to the omitted base case. For example, you would interpret the coefficient estimates for days 1-6 (Monday - Saturday) as the change in value of the dependent variable compared to the base case (Sunday). All of the coefficient estimates correspondings to days 1-5 (Monday through Friday) are negative, meaning banner ads displayed on these days experience less click-throughs per hour than banner ads displayed on Sunday. The coefficient estimate corresponding to Saturday is positive, indicating that banner ads experience more click-throughs on Saturday compared to Sunday. However, this does not align with my EDA findings (see “Day of Week” graph corresponding to outliers), although it is close in value to Sunday. Thus, it makes sense that this is the only day of the week (Saturday) that does not have a statistically significant coefficient estimate.

In looking at the coefficient estimates for the hour variable (hour 0 = 12AM, hour 22 = 10PM), I’ve omitted the variable corresponding to the 23 hour, 11PM. Most of the coefficient estimates are negative, indicating that the number of click-throughs generated in the previous hours is less than those generated in the 23rd hour (a.k.a. 11PM). However, I’d like to get more specific and cross-reference these coefficient estimates with the “Hour of Day” graph created for high-performing banner ads (outliers). The coefficient estimates line up quite nicely with the CT/Hour relationship revealed in this graph. As previously noted, the number of click throughs first peak around hours 12 - 16 (a.k.a. 12PM - 4PM). The coefficient estimates corresponding to these hours are negative and significant for hours 12 (a.k.a. 12PM) and 16 (a.k.a.) 4PM. However, the coefficient estimates for hours 13-15 (a.k.a. 1PM-3PM) are negative and insiginficant. This makes sense because hours 13-15 represent the most significant peak within this timeframe and are very close to the peak exhibited in the 23rd hour (11PM). Likewise, the second click-through peak happens during hours 21-23 (a.k.a. 9PM-11PM). The coefficient estimate corresponding to 9PM is positive, yet insignificant. This makes sense because the 21st hour (9PM) represents the highest number of click-throughs relative to all other hours, however, it is still close in value to the omitted base case, 11PM (hence, the insignificance).

The coefficient estimates corresponding to website type/characteristics (English, Travel, Shopping, and Search variables) all align with the relationships uncovered in EDA. Namely, the “English” and “Travel” variables have significant, positive effects on the outcome variable as expected. Likewise, the “Shopping” variable has a significant, negative effect on the outcome variable, and the effect of the “Search” variable is insignificant as expected.

The coefficient estimates corresponding to banner ad type (Promo, Flash, Popup) also align with the relationships uncovered in EDA. The “Promo” variable has a significant, positive effect on the outcome variable and is considerably larger in magnitude than it’s “Flash” and “Popup” counterparts.“Flash” has a slight positive effect, and “Popup” has a larger positive effect.

Lastly, the coefficient estimates corresponding to the banner size also align well what I uncovered during EDA. Namely, banner sizes 3 and 4 have positive, significant effects on the outcome variable and the estimates are much larger in magnitude relative to the estimates for sizes 1 and 2 (size 5 was the base case here).

#creating dependent variable
y <- ads[ , 1]
#head(y)

#Transformation 1 - Dummy variables
##################################################
ads$Monday = ifelse(ads$Day == 1, 1, 0)
ads$Tuesday = ifelse(ads$Day == 2, 1, 0)
ads$Wednesday = ifelse(ads$Day == 3, 1, 0)
ads$Thursday = ifelse(ads$Day == 4, 1, 0)
ads$Friday = ifelse(ads$Day == 5, 1, 0)
ads$Saturday = ifelse(ads$Day == 6, 1, 0)
#leave Sunday out

#0-22 only, equivalent to 1-23
ads$AM12 = ifelse(ads$Hour == 0, 1, 0)
ads$AM1 = ifelse(ads$Hour == 1, 1, 0)
ads$AM2 = ifelse(ads$Hour == 2, 1, 0)
ads$AM3 = ifelse(ads$Hour == 3, 1, 0)
ads$AM4 = ifelse(ads$Hour == 4, 1, 0)
ads$AM5 = ifelse(ads$Hour == 5, 1, 0)
ads$AM6 = ifelse(ads$Hour == 6, 1, 0)
ads$AM7 = ifelse(ads$Hour == 7, 1, 0)
ads$AM8 = ifelse(ads$Hour == 8, 1, 0)
ads$AM9 = ifelse(ads$Hour == 9, 1, 0)
ads$AM10 = ifelse(ads$Hour == 10, 1, 0)
ads$AM11 = ifelse(ads$Hour == 11, 1, 0)
ads$PM12 = ifelse(ads$Hour == 12, 1, 0)
ads$PM1 = ifelse(ads$Hour == 13, 1, 0)
ads$PM2 = ifelse(ads$Hour == 14, 1, 0)
ads$PM3 = ifelse(ads$Hour == 15, 1, 0)
ads$PM4 = ifelse(ads$Hour == 16, 1, 0)
ads$PM5 = ifelse(ads$Hour == 17, 1, 0)
ads$PM6 = ifelse(ads$Hour == 18, 1, 0)
ads$PM7 = ifelse(ads$Hour == 19, 1, 0)
ads$PM8 = ifelse(ads$Hour == 20, 1, 0)
ads$PM9 = ifelse(ads$Hour == 21, 1, 0)
ads$PM10 = ifelse(ads$Hour == 22, 1, 0)
#ads$PM11 = ifelse(ads$Hour == 23, 1, 0)

#size - leave size 5 out
ads$Size1 = ifelse(ads$Size == 1, 1, 0)
ads$Size2 = ifelse(ads$Size == 2, 1, 0)
ads$Size3 = ifelse(ads$Size == 3, 1, 0)
ads$Size4 = ifelse(ads$Size == 4, 1, 0)

#matrix transformation for dummy independent variables
library(dplyr)
ads_subset = select(ads, -Size)
#head(ads_subset)

X_dummy = as.matrix(ads_subset[ , 4:dim(ads_subset)[2]])
#head(X_dummy)

# Add constant term to estimate an intercept
X_dummy = cbind(rep(1, dim(X_dummy)[1]), X_dummy)
#head(X_dummy)

#calculate dimensions of the data
n = dim(ads)[1]
k = dim(X_dummy)[2]

# transformations of X (3rd component of GLM; e.g., to compute dummy variables)
f_z <- function(X_dummy) X_dummy  # identity for standard linear model 

# exponential family (1st component of GLM; implemented for normal)
# note that these functions are used in the function negll to evaluate the likelihood:
# loglik <- (y * theta - f_b(theta))/phi + f_c(y, phi)  
f_b = function(theta) theta ^ 2 / 2   # b( ) - function in exponential family ##change##
f_c = function(y, phi) - (y ^ 2) / (2 * phi) - log(2 * pi * phi) / 2 # c( ) - function in exponential family ##change##
f_theta = function(mu) mu  # transformation of mu (E[y|X]) into theta (canonical parameter); this is the inverse of derivative of b( ) function ##change##


# response function (2nd component of GLM)
f_mu = function(eta) eta  # response function = inverse of link function (implemented for normal) ##change##

# negative log likelihood (keep unchanged!)
negll = function(p, y, X_dummy) {
    n = dim(X_dummy)[1]
    k = dim(X_dummy)[2]
    b = p[1:k]
    if (NROW(p) == k + 1) {
        phi = p[k + 1]
    }    
    Z = f_z(X_dummy)
    eta = X_dummy %*% b
    mu = f_mu(eta)
    theta = f_theta(mu)
    loglik = (y * theta - f_b(theta))/phi + f_c(y, phi)  
    return(-sum(loglik))
}

# Initial values - implemented for normal
par0 = solve(t(X_dummy)%*%X_dummy, t(X_dummy)%*%y) # starting values for b coefficients  ##change##
par0 = append(par0, sd(y))  # add starting value for phi to the vector ##change##

vec=names(as.data.frame(X_dummy))
vec2=append(vec, "ignore")

# optimization - find the maximum likelihood values of the parameters and store in est.res
est.res = optim(par0, fn=negll, gr=NULL, y, X_dummy, method="BFGS", hessian=TRUE)
if (est.res$convergence>0) {
    cat("WARNING: ALGORITHM DID NOT CONVERGE")
} else {
    print("Estimated parameters:")
    print(cbind(vec2,est.res$par))
}

## [1] "Estimated parameters:"
##       vec2                           
##  [1,] "V1"        "28822.5586693859" 
##  [2,] "English"   "16673.6693182018" 
##  [3,] "Travel"    "55659.4509023035" 
##  [4,] "Shopping"  "-19046.1843872111"
##  [5,] "Search"    "1161.98187670357" 
##  [6,] "Promo"     "21699.0271966634" 
##  [7,] "Flash"     "3465.97441272128" 
##  [8,] "Popup"     "9621.36718248774" 
##  [9,] "Monday"    "-70736.2306797505"
## [10,] "Tuesday"   "-71784.3083280735"
## [11,] "Wednesday" "-70793.108836484" 
## [12,] "Thursday"  "-51314.3347658798"
## [13,] "Friday"    "-26767.6074876174"
## [14,] "Saturday"  "2124.02780869411" 
## [15,] "AM12"      "-46278.2698344163"
## [16,] "AM1"       "-44606.6295997688"
## [17,] "AM2"       "-44930.1321311355"
## [18,] "AM3"       "-45896.978003602" 
## [19,] "AM4"       "-46396.1282440049"
## [20,] "AM5"       "-47121.2302679496"
## [21,] "AM6"       "-49497.7317251148"
## [22,] "AM7"       "-50647.8087757924"
## [23,] "AM8"       "-42434.7645398897"
## [24,] "AM9"       "-42991.1007659385"
## [25,] "AM10"      "-37724.0750630811"
## [26,] "AM11"      "-43640.3875132811"
## [27,] "PM12"      "-19120.9326138373"
## [28,] "PM1"       "-5789.97210238509"
## [29,] "PM2"       "-8349.38423750802"
## [30,] "PM3"       "-6994.02208401432"
## [31,] "PM4"       "-18601.928885164" 
## [32,] "PM5"       "-29974.4482939202"
## [33,] "PM6"       "-36183.3816231474"
## [34,] "PM7"       "-33217.6539563891"
## [35,] "PM8"       "-31124.2283502867"
## [36,] "PM9"       "3471.67237237188" 
## [37,] "PM10"      "-584.541066169871"
## [38,] "Size1"     "-1860.00513151508"
## [39,] "Size2"     "-4420.40168917384"
## [40,] "Size3"     "169192.584069096" 
## [41,] "Size4"     "197219.699156592" 
## [42,] "ignore"    "4515804543.86488"

# compute estimates, p-values, predictions and residuals
est.res$se = diag(solve(est.res$hessian)) ^ .5   # standard errors
est.res$t = est.res$par / est.res$se             # t-values
est.res$p = 1 - pt(abs(est.res$t), n-k-1)    # p-values
est.res$mu = f_mu(f_z(X_dummy) %*% est.res$par[1:k])   # expected values (predictions)
est.res$resid = y - est.res$mu                   # residuals
est.res$dev = NA                                 # not implemented yet
est.res$pearson = NA                             # not implemented yet

#regression diagnostics
plot(y,est.res$mu)

hist(est.res$resid, probability=TRUE)
lines(density(est.res$resid, bw="sj"))

#new_data = cbind(y, as.data.frame(X_dummy)[,2:dim(X_dummy)[2]])
new_data = cbind(y, as.data.frame(X_dummy))
#head(new_data)


glm_mod = glm(y ~.-1, data = new_data)
summary(glm_mod)

## 
## Call:
## glm(formula = y ~ . - 1, data = new_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -139519   -46015   -11116    31529   607106  
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## V1         28823.0     4341.8   6.638 3.33e-11 ***
## English    16673.5     1353.0  12.324  < 2e-16 ***
## Travel     55659.2     1351.7  41.179  < 2e-16 ***
## Shopping  -19046.4     1353.1 -14.076  < 2e-16 ***
## Search      1161.9     1352.3   0.859   0.3902    
## Promo      21699.4     1351.4  16.057  < 2e-16 ***
## Flash       3466.4     1352.7   2.563   0.0104 *  
## Popup       9621.3     1351.8   7.117 1.18e-12 ***
## Monday    -70736.2     2520.2 -28.068  < 2e-16 ***
## Tuesday   -71784.3     2539.7 -28.265  < 2e-16 ***
## Wednesday -70793.1     2525.9 -28.027  < 2e-16 ***
## Thursday  -51314.3     2525.5 -20.319  < 2e-16 ***
## Friday    -26767.6     2538.4 -10.545  < 2e-16 ***
## Saturday    2124.0     2534.9   0.838   0.4021    
## AM12      -46278.3     4781.7  -9.678  < 2e-16 ***
## AM1       -44606.6     4665.0  -9.562  < 2e-16 ***
## AM2       -44930.1     4547.8  -9.880  < 2e-16 ***
## AM3       -45897.0     4749.3  -9.664  < 2e-16 ***
## AM4       -46396.1     4705.5  -9.860  < 2e-16 ***
## AM5       -47121.2     4666.5 -10.098  < 2e-16 ***
## AM6       -49497.7     4710.4 -10.508  < 2e-16 ***
## AM7       -50647.8     4690.2 -10.799  < 2e-16 ***
## AM8       -42434.8     4673.1  -9.081  < 2e-16 ***
## AM9       -42991.1     4695.2  -9.156  < 2e-16 ***
## AM10      -37724.7     4666.1  -8.085 6.95e-16 ***
## AM11      -43641.1     4722.0  -9.242  < 2e-16 ***
## PM12      -19121.6     4692.1  -4.075 4.63e-05 ***
## PM1        -5789.3     4682.1  -1.236   0.2163    
## PM2        -8349.4     4663.4  -1.790   0.0734 .  
## PM3        -6994.0     4690.8  -1.491   0.1360    
## PM4       -18602.6     4623.6  -4.023 5.78e-05 ***
## PM5       -29974.4     4662.5  -6.429 1.35e-10 ***
## PM6       -36183.4     4639.7  -7.799 6.88e-15 ***
## PM7       -33217.7     4686.7  -7.088 1.46e-12 ***
## PM8       -31123.6     4636.5  -6.713 2.01e-11 ***
## PM9         3471.8     4542.6   0.764   0.4447    
## PM10        -584.5     4766.2  -0.123   0.9024    
## Size1      -1860.0     2145.6  -0.867   0.3860    
## Size2      -4420.4     2138.5  -2.067   0.0387 *  
## Size3     169192.5     2140.5  79.044  < 2e-16 ***
## Size4     197219.6     2173.5  90.740  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 4552087588)
## 
##     Null deviance: 2.0471e+14  on 10000  degrees of freedom
## Residual deviance: 4.5334e+13  on  9959  degrees of freedom
## AIC: 250810
## 
## Number of Fisher Scoring iterations: 2

Transformation 2 - Create an interaction effect between day and hour, using a dummy indicator variable for day (weekday = 0, weekend = 1)

For the second transformation of hourly and daily data, I’d like to look at the interaction between day and hour. Since the general trend is that the number of click-throughs peaks during the weekend, we can omit dummy variables for each day and instead use a dummy indicator variable that equals 0 for weekdays and 1 for weekends. One important thing to note is that I’m including Friday as part of the weekend here due to my EDA results. As you can see, in the “Day of Week” graph above featuring outliers (a.k.a. “high-performing banner ads”) the number of click-throughs begins to peak on Friday.

I’ve also transformed the hour variable to reflect sleeping and non-sleeping hours. More specifically, I’m assuming that most people are asleep hours 0 (12AM) through 7 (7AM). In this model, sleeping hours are represented by a 0 and non-sleeping hours are represented by a 1.

Interpretation of Results

Due to issues with the numerical optimizer, I’ve opted to use the results from the built-in glm function for interpretation.

Coefficient estimates for the following variables are approximately the same in this model as in previous models: -English -Travel -Shopping -Search -Promo -Flash -Popup -Sizes 1-4

The new “Hour” variable indicates that during non-sleeping hours, there are 16,798 more click throughs per hour than during sleeping hours, on average. Likewise, the new “Day” variable indicates that during weekends (Friday - Sunday) there are 45,504.9 more click-throughs per hour than during weekdays, on average. Both of these new variables are statistically significant.

Lastly, the new interaction term “Weekend_SleepingINT” represents the interaction between sleeping/non-sleeping hours and weekday/weekend. If it is both the weekend (Friday through Sunday) and non-sleeping hours (all other hours except 0 - 7), then we can expect the number of click-throughs per hour to increase by:

16798 + 45504.9 + 18966.8 = 81,269.70
(Hour = 1) + (Day = 1) + (Hour AND Day = 1)

load("BannerAdvertising.RData")
data = X
#head(X)

#just use previously created y variable
#y1 = X[ , 1]

#same size transformation
#size - leave size 5 out
X$Size1 = ifelse(X$Size == 1, 1, 0)
X$Size2 = ifelse(X$Size == 2, 1, 0)
X$Size3 = ifelse(X$Size == 3, 1, 0)
X$Size4 = ifelse(X$Size == 4, 1, 0)

#New hour transformation, 0 indicates sleeping hours, 1 indicates non-sleeping hours
#assuming 7 hours of sleep per night
X$Hour = ifelse(X$Hour >= 0 & X$Hour <= 7, 0, 1)

#NEW weekend/weekday transformation
#including Friday in weekend based on EDA results w/ outliers
X$Day = ifelse(X$Day>=5, 1, 0)

#Omit original variable Size from X
X_subset = select(X, -Size)
#head(X_subset)

#NEW Interaction between weekday and hour
X_subset$Weekend_SleepingINT = X_subset$Day*X_subset$Hour
#head(X_subset)

Y = data[,1]


X_new = as.matrix(X_subset[ , 2:dim(X_subset)[2]])
#head(X_new)

# Add constant term to estimate an intercept
X_new = cbind(rep(1, dim(X_new)[1]), X_new)
#head(X_new)


#calculate dimensions of the data
#n = dim(X)[1]
#k = dim(X_new)[2]

# transformations of X (3rd component of GLM; e.g., to compute dummy variables)
#f_z <- function(X_new) X_new  # identity for standard linear model 

# exponential family (1st component of GLM; implemented for normal)
# note that these functions are used in the function negll to evaluate the likelihood:
# loglik <- (y * theta - f_b(theta))/phi + f_c(y, phi)  
#f_b = function(theta) theta ^ 2 / 2   # b( ) - function in exponential family ##change##
#f_c = function(y, phi) - (y ^ 2) / (2 * phi) - log(2 * pi * phi) / 2 # c( ) - function in exponential family ##change##
#f_theta = function(mu) mu  # transformation of mu (E[y|X]) into theta (canonical parameter); this is the inverse of derivative of b( ) function ##change##


# response function (2nd component of GLM)
#f_mu = function(eta) eta  # response function = inverse of link function (implemented for normal) ##change##

# negative log likelihood (keep unchanged!)
#negll = function(p, Y, X_new) {
 #   n = dim(X_new)[1]
  #  k = dim(X_new)[2]
   # b = p[1:k]
    #if (NROW(p) == k + 1) {
     #   phi = p[k + 1]
  #  }    
   # Z = f_z(X_new)
  #  eta = X_new %*% b
   # mu = f_mu(eta)
    #theta = f_theta(mu)
    #loglik = (Y * theta - f_b(theta))/phi + f_c(Y, phi)  
    #return(-sum(loglik))
#}

# Initial values - implemented for normal
#par0 = solve(t(X_new)%*%X_new, t(X_new)%*%Y) # starting values for b coefficients  ##change##
#par0 = append(par0, sd(Y))  # add starting value for phi to the vector ##change##

#vect=names(as.data.frame(X_new))
#vect2=append(vect, "ignore")

# optimization - find the maximum likelihood values of the parameters and store in est.res
#est.res = optim(par0, fn=negll, gr=NULL, Y, X_new, method="BFGS", hessian=TRUE)
#if (est.res$convergence>0) {
 #   cat("WARNING: ALGORITHM DID NOT CONVERGE")
#} else {
 #   print("Estimated parameters:")
  #  print(cbind(vect2,est.res$par))
#}


# compute estimates, p-values, predictions and residuals
#est.res$se = diag(solve(est.res$hessian)) ^ .5   # standard errors
#est.res$t = est.res$par / est.res$se             # t-values
#est.res$p = 1 - pt(abs(est.res$t), n-k-1)    # p-values
#est.res$mu = f_mu(f_z(X_new) %*% est.res$par[1:k])   # expected values (predictions)
#est.res$resid = Y - est.res$mu                   # residuals
#est.res$dev = NA                                 # not implemented yet
#est.res$pearson = NA                             # not implemented yet

#regression diagnostics
#plot(Y,est.res$mu)
#hist(est.res$resid, probability=TRUE)
#lines(density(est.res$resid, bw="sj"))

#new_data = cbind(y, as.data.frame(X_dummy)[,2:dim(X_dummy)[2]])
new_data1 = cbind(Y, as.data.frame(X_new))
#head(new_data1)


glm_mod1 = glm(Y ~.-1, data = new_data1)
summary(glm_mod1)

## 
## Call:
## glm(formula = Y ~ . - 1, data = new_data1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -110653   -46747    -9063    32262   634798  
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## V1                  -79021.5     2797.3 -28.249  < 2e-16 ***
## Hour                 16798.0     1950.7   8.611  < 2e-16 ***
## Day                  45504.9     2446.7  18.598  < 2e-16 ***
## English              16411.8     1389.9  11.808  < 2e-16 ***
## Travel               55551.6     1389.5  39.978  < 2e-16 ***
## Shopping            -19256.8     1390.7 -13.847  < 2e-16 ***
## Search                 981.8     1390.0   0.706  0.48000    
## Promo                21808.9     1389.8  15.692  < 2e-16 ***
## Flash                 3937.0     1389.8   2.833  0.00462 ** 
## Popup                 9601.3     1389.9   6.908 5.21e-12 ***
## Size1                -1631.6     2207.1  -0.739  0.45977    
## Size2                -4260.4     2199.5  -1.937  0.05277 .  
## Size3               169393.5     2200.3  76.986  < 2e-16 ***
## Size4               198034.8     2234.5  88.626  < 2e-16 ***
## Weekend_SleepingINT  18966.8     2989.3   6.345 2.32e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 4821896209)
## 
##     Null deviance: 2.0471e+14  on 10000  degrees of freedom
## Residual deviance: 4.8147e+13  on  9985  degrees of freedom
## AIC: 251360
## 
## Number of Fisher Scoring iterations: 2

Performance of Numerical Optimizer

As stated above, the numerical optimizer only worked for my first model. The coefficients resulting from the numerical optimizer for the first model were almost identical to the coefficient estimates produced by the built-in GLM function. Likewise, the log-likelihood estimates were also extremely similar.

The effect of using different initial values on the optimizer

I used the initial value 1 to test whether or not the optimizer would return different coefficient estimates. While the coefficient estimates changed slightly, the log-likelihood remained very similar to the previous linear regression model.

Poisson Regression

In using the log likelihood function for the Poisson distribution below, we seek to estimate the optimal value of (lambda), which can be derived using the inverse of the link function.

Run GLM with Poisson Density using the code provided in GLM.R. Interpret the resulting coefficient and compare these estimates to the previous linear regression models. Which estimates are more reasonable?

In using my first model, which included a dummy variable for each day, hour, and size (omitting base cases), the optimization algorithm was unable to converge. Thus, I used my second model, which featured fewer predictor variables (both day and hour turned into binary predictor variables and I kept the same dummy variables corresponding to banner sizes 1-4).

Given the following Poisson regression results, we interpret the exponentiated coefficients estimates as a rate ratio corresponding to a one unit difference in the predictor variable.

Hour: 3.658e-01, the anti-log is ~1.44. We therefore conclude that banner ads shown during non-sleeping hours generate 1.44 times as many click-throughs per hour as those shown during sleeping hours (hours 0-7).
Day: 7.874e-01, the anti-log is ~ 2.19. We therefore conclude that banner ads shown on weekends (Friday - Sunday) generate 2.19 times as many click-throughs per hour as those shown during weekdays (Monday-Thursday).

I complete my analysis of the remaining coefficients in the “Recommendations” section so I will not repeat this process here. However, it’s worth noting that all of the predictor variables are statistically significant (except “Search”) and similar in relative magnitude to the coefficient estimates produced in the corresponding linear regression model.

When addressing the question of which estimates are more reasonable (linear regression estimates or Poisson regression estimates), it’s clear that the Poisson regression model is better suited for modeling count data and the coefficient estimates are more reasonable as a result. This is because one of the primary assumptions of a linear model is that the residual errors are normally distributed. Since our dependent variable is count data, the distribution of counts is discrete and limited to non-negative values. Thus, the data of interest, click-throughs per hour, cannot be transformed to approximate a normal distribution. In contrast, the Poisson model assumes that residual errors follow a Poisson, not normal, distribution. Additionally, the Poisson coefficient estimates are more reasonable as they can naturally be interpreted in terms of a rate (per hour in our case).

Now use the glm() library to estimate the same Poisson model with the canonical link. Also estimate the model using the inverse link function and interpret your results. Which link function fits the data better? Use the appropriate model comparison technique to test this.

Note that the sample size is the same here, as well as the coefficients being used. Since we are seeking to maximize the log-likelihood value in using both the canonical and inverse link functions, the higher value indicates the better model. In our case, the less negative value, -19772872, indicates the better model. This log-likelihood value corresponds to the Poisson model which uses the canonical link function. Thus, the canonical link function fits the data better.

datafname = "BannerAdvertising.RData" 
load(file=datafname)
data = X

#UNTRANSFORMED DATA
head(data)

##   Click_Through Hour Day English Travel Shopping Search Promo Flash Size
## 1        270236   21   2       0      1        0      0     1     0    4
## 2            52   21   1       1      1        1      0     1     1    2
## 3            77   15   5       1      1        0      1     0     0    1
## 4            61   14   4       0      1        0      0     1     0    2
## 5            97   17   3       0      0        1      0     1     1    5
## 6         80698   23   2       0      0        0      1     0     0    3
##   Popup
## 1     1
## 2     1
## 3     1
## 4     1
## 5     0
## 6     1

#TRANSFORMATIONS
data$Day = ifelse(data$Day>=5, 1, 0)
data$Hour = ifelse(data$Hour >= 0 & data$Hour <= 7, 0, 1)

data$Size1 = ifelse(data$Size == 1, 1, 0)
data$Size2 = ifelse(data$Size == 2, 1, 0)
data$Size3 = ifelse(data$Size == 3, 1, 0)
data$Size4 = ifelse(data$Size == 4, 1, 0)

data = select(data, -Size)
head(data)

##   Click_Through Hour Day English Travel Shopping Search Promo Flash Popup
## 1        270236    1   0       0      1        0      0     1     0     1
## 2            52    1   0       1      1        1      0     1     1     1
## 3            77    1   1       1      1        0      1     0     0     1
## 4            61    1   0       0      1        0      0     1     0     1
## 5            97    1   0       0      0        1      0     1     1     0
## 6         80698    1   0       0      0        0      1     0     0     1
##   Size1 Size2 Size3 Size4
## 1     0     0     0     1
## 2     0     1     0     0
## 3     1     0     0     0
## 4     0     1     0     0
## 5     0     0     0     0
## 6     0     0     1     0

#Data prep
y <- matrix(data[ , 1],10000,1)
X <- as.matrix(data[ , 2:dim(data)[2]])
# Add constant term to estimate an intercept
X <- cbind(rep(1, dim(X)[1]), X)
# transformations of X (3rd component of GLM; e.g., to compute dummy variables)
f_z <- function(X)  X  # identity for standard linear model   ## change unless you have already include transformations in X ##
# calculate dimensions of the data
n <- dim(f_z(X))[1]
k <- dim(X)[2]

####### function to calculate likelihood
negll <- function(p, y, X) {
  n = dim(X)[1]
  k = dim(X)[2]
  b <- p[1:k]
  b=matrix(b,length(b),1)
  if (NROW(p) == k + 1) {
    phi <- 1
  }    
  Z <- f_z(X)
  eta <- X %*% b
  mu <- f_mu(eta)
  theta <- f_theta(mu)
  loglik <- (y * theta - f_b(theta))/phi + f_c(y, phi)  
  return(-sum(loglik))
}


#####################
#Poisson regression
#######################
f_b <- function(theta) exp(theta)  # b( ) - function in exponential family ##change##
f_c <- function(y, phi) 0 # c( ) - function in exponential family ##change##
f_theta <- function(mu) log(mu)  # transformation of mu (E[y|X]) into theta (canonical parameter); this is the inverse of derivative of b( ) function ##change##
 
# response function (2nd component of GLM)
f_mu <- function(eta) exp(eta) # response function = inverse of link function (implemented for normal) ##change##

# Initial values - implemented for normal
par0_poisREG = rep(1,dim(X)[2]) # starting values for b coefficients  ##change##
par0_poisREG = append(par0_poisREG, 1)  # add starting value for phi to the vector

poisson=glm(Click_Through~., data = data, family = poisson("log"))

#optimization - find the maximum likelihood values of the parameters and store in est.res
est.res_p <- optim(par0_poisREG, fn=negll, gr=NULL, y, X, method="BFGS", hessian=TRUE)
if (est.res_p$convergence > 0) {
  cat("WARNING: ALGORITHM DID NOT CONVERGE")
 } else {
   #optimizer results
   print("Estimated parameters:")
   print(est.res_p$par)
 }

## [1] "Estimated parameters:"
##  [1]  4.447245e+00  3.657746e-01  7.873979e-01  1.927485e-01  7.970510e-01
##  [6] -2.837159e-01  7.461295e-05  3.053569e-01  5.918036e-02  1.067998e-01
## [11] -2.570653e+00 -2.555636e+00  6.259024e+00  6.370626e+00  1.000000e+00

  logLik(poisson)

## 'log Lik.' -19772872 (df=14)

 # compute estimates, p-values, predictions and residuals
 est.res_p$se <- diag(solve(est.res_p$hessian[1:k,1:k])) ^ .5   # standard errors
 est.res_p$t <- est.res$par[1:k] / est.res_p$se             # t-values
 est.res_p$p <- 1 - pt(abs(est.res_p$t), n - k - 1)    # p-values
 est.res_p$mu <- f_mu(f_z(X) %*% est.res_p$par[1:k])   # expected values (predictions)
 est.res_p$resid <- y - est.res_p$mu                   # residuals
 est.res_p$dev <- NA                                 # not implemented yet
 est.res_p$pearson <- NA
 
#possible diagnostic plots
plot(y,est.res$mu)
hist(est.res$resid, probability=TRUE)
lines(density(est.res$resid, bw="sj"))


poisson$coefficients

##   (Intercept)          Hour           Day       English        Travel 
##  4.509143e+00  3.657504e-01  7.873827e-01  1.927410e-01  7.970800e-01 
##      Shopping        Search         Promo         Flash         Popup 
## -2.837743e-01 -1.871687e-05  3.053659e-01  5.916456e-02  1.067658e-01 
##         Size1         Size2         Size3         Size4 
## -2.297121e+00 -2.293776e+00  6.196893e+00  6.309051e+00

#canonical link results  
summary(poisson)

## 
## Call:
## glm(formula = Click_Through ~ ., family = poisson("log"), data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -337.85    -3.29    -0.27     2.79   337.46  
## 
## Coefficients:
##               Estimate Std. Error   z value Pr(>|z|)    
## (Intercept)  4.509e+00  1.221e-03  3694.253   <2e-16 ***
## Hour         3.658e-01  8.524e-05  4290.999   <2e-16 ***
## Day          7.874e-01  7.595e-05 10367.775   <2e-16 ***
## English      1.927e-01  7.415e-05  2599.374   <2e-16 ***
## Travel       7.971e-01  7.997e-05  9967.204   <2e-16 ***
## Shopping    -2.838e-01  7.504e-05 -3781.872   <2e-16 ***
## Search      -1.872e-05  7.400e-05    -0.253      0.8    
## Promo        3.054e-01  7.492e-05  4076.069   <2e-16 ***
## Flash        5.916e-02  7.411e-05   798.296   <2e-16 ***
## Popup        1.068e-01  7.407e-05  1441.500   <2e-16 ***
## Size1       -2.297e+00  3.891e-03  -590.323   <2e-16 ***
## Size2       -2.294e+00  3.820e-03  -600.395   <2e-16 ***
## Size3        6.197e+00  1.215e-03  5100.676   <2e-16 ***
## Size4        6.309e+00  1.215e-03  5193.573   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1665186076  on 9999  degrees of freedom
## Residual deviance:   39455434  on 9986  degrees of freedom
## AIC: 39545773
## 
## Number of Fisher Scoring iterations: 4

###################
#inverse link model with glm
poisson_inverse = glm(Click_Through~., data = data, family = "poisson"(link="inverse"))
summary(poisson_inverse)

## 
## Call:
## glm(formula = Click_Through ~ ., family = poisson(link = "inverse"), 
##     data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -607.76   -13.41    -2.33     5.07   512.32  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.864e-03  3.469e-06   825.5   <2e-16 ***
## Hour        -1.233e-06  3.721e-10 -3313.4   <2e-16 ***
## Day         -3.403e-06  4.348e-10 -7826.0   <2e-16 ***
## English     -5.009e-07  2.542e-10 -1970.3   <2e-16 ***
## Travel      -3.716e-06  5.137e-10 -7234.3   <2e-16 ***
## Shopping     8.306e-07  2.720e-10  3053.8   <2e-16 ***
## Search      -3.756e-08  2.402e-10  -156.4   <2e-16 ***
## Promo       -9.396e-07  2.783e-10 -3376.0   <2e-16 ***
## Flash       -1.587e-07  2.417e-10  -656.5   <2e-16 ***
## Popup       -3.261e-07  2.446e-10 -1333.1   <2e-16 ***
## Size1        2.486e-02  1.025e-04   242.5   <2e-16 ***
## Size2        2.409e-02  9.769e-05   246.6   <2e-16 ***
## Size3       -2.852e-03  3.469e-06  -822.2   <2e-16 ***
## Size4       -2.852e-03  3.469e-06  -822.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1665186076  on 9999  degrees of freedom
## Residual deviance:   63696898  on 9986  degrees of freedom
## AIC: 63787237
## 
## Number of Fisher Scoring iterations: 6

#Comparing log-likelihoods - canonical link vs. inverse link - canonical fits the data better
logLik(poisson)

## 'log Lik.' -19772872 (df=14)

logLik(poisson_inverse)

## 'log Lik.' -31893604 (df=14)

Also compare the two alternative transformations suggested under 2 and compare the two models formally in statistical terms

As you can see based on the coefficient estimates included in the tran1 model vs. the tran2 model, the tran1 model provides more granular information with respect to the day and hour variables. In order to compare the two models in terms of fit I used Vuong’s closeness test since the two models are not nested. Thus, the null hypothesis is that the two models are equally close to the data generating process, whereas the alternative hypothesis is that one model is closer to this process. In referencing the results we can see that the Non-nested Likelihood Ratio test reveals that model 1 (tran1), the more granular model, provides a better fit to the data.

#transformation 1
tran_data = new_data
tran_data = select(tran_data, -V1)

tran1 = glm(y~., data=tran_data, family = poisson("log"))
summary(tran1)

## 
## Call:
## glm(formula = y ~ ., family = poisson("log"), data = tran_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.8883  -0.7017  -0.0119   0.6469   3.7878  
## 
## Coefficients:
##               Estimate Std. Error   z value Pr(>|z|)    
## (Intercept)  6.007e+00  1.230e-03  4883.026   <2e-16 ***
## English      2.001e-01  7.444e-05  2687.480   <2e-16 ***
## Travel       7.998e-01  8.024e-05  9968.328   <2e-16 ***
## Shopping    -3.000e-01  7.549e-05 -3974.218   <2e-16 ***
## Search       3.180e-05  7.423e-05     0.428    0.668    
## Promo        2.999e-01  7.526e-05  3985.203   <2e-16 ***
## Flash        5.006e-02  7.443e-05   672.608   <2e-16 ***
## Popup        1.001e-01  7.448e-05  1344.365   <2e-16 ***
## Monday      -1.000e+00  1.493e-04 -6696.144   <2e-16 ***
## Tuesday     -9.999e-01  1.508e-04 -6630.129   <2e-16 ***
## Wednesday   -1.000e+00  1.485e-04 -6736.557   <2e-16 ***
## Thursday    -6.001e-01  1.315e-04 -4562.442   <2e-16 ***
## Friday      -3.000e-01  1.200e-04 -2499.015   <2e-16 ***
## Saturday     7.460e-05  1.126e-04     0.663    0.508    
## AM12        -5.996e-01  2.674e-04 -2242.044   <2e-16 ***
## AM1         -6.004e-01  2.599e-04 -2309.712   <2e-16 ***
## AM2         -6.002e-01  2.553e-04 -2350.550   <2e-16 ***
## AM3         -5.995e-01  2.544e-04 -2356.394   <2e-16 ***
## AM4         -5.995e-01  2.453e-04 -2443.380   <2e-16 ***
## AM5         -6.000e-01  2.509e-04 -2391.608   <2e-16 ***
## AM6         -5.999e-01  2.669e-04 -2247.754   <2e-16 ***
## AM7         -6.001e-01  2.572e-04 -2333.425   <2e-16 ***
## AM8         -4.999e-01  2.456e-04 -2035.149   <2e-16 ***
## AM9         -4.998e-01  2.489e-04 -2007.778   <2e-16 ***
## AM10        -4.996e-01  2.452e-04 -2037.444   <2e-16 ***
## AM11        -4.999e-01  2.481e-04 -2014.415   <2e-16 ***
## PM12        -9.986e-02  2.316e-04  -431.222   <2e-16 ***
## PM1         -9.991e-02  2.112e-04  -473.178   <2e-16 ***
## PM2         -9.985e-02  2.111e-04  -472.901   <2e-16 ***
## PM3         -9.992e-02  2.162e-04  -462.123   <2e-16 ***
## PM4         -1.000e-01  2.254e-04  -443.775   <2e-16 ***
## PM5         -3.997e-01  2.559e-04 -1562.051   <2e-16 ***
## PM6         -3.996e-01  2.479e-04 -1612.289   <2e-16 ***
## PM7         -3.999e-01  2.392e-04 -1671.781   <2e-16 ***
## PM8         -3.999e-01  2.379e-04 -1681.397   <2e-16 ***
## PM9         -2.525e-05  2.053e-04    -0.123    0.902    
## PM10         1.439e-04  2.161e-04     0.666    0.506    
## Size1       -2.301e+00  3.891e-03  -591.371   <2e-16 ***
## Size2       -2.304e+00  3.820e-03  -602.993   <2e-16 ***
## Size3        6.193e+00  1.215e-03  5097.674   <2e-16 ***
## Size4        6.301e+00  1.215e-03  5187.155   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1.6652e+09  on 9999  degrees of freedom
## Residual deviance: 9.8479e+03  on 9959  degrees of freedom
## AIC: 100241
## 
## Number of Fisher Scoring iterations: 4

#transformation 2
tran2=glm(Click_Through~., data = data, family = poisson("log"))
summary(tran2)

## 
## Call:
## glm(formula = Click_Through ~ ., family = poisson("log"), data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -337.85    -3.29    -0.27     2.79   337.46  
## 
## Coefficients:
##               Estimate Std. Error   z value Pr(>|z|)    
## (Intercept)  4.509e+00  1.221e-03  3694.253   <2e-16 ***
## Hour         3.658e-01  8.524e-05  4290.999   <2e-16 ***
## Day          7.874e-01  7.595e-05 10367.775   <2e-16 ***
## English      1.927e-01  7.415e-05  2599.374   <2e-16 ***
## Travel       7.971e-01  7.997e-05  9967.204   <2e-16 ***
## Shopping    -2.838e-01  7.504e-05 -3781.872   <2e-16 ***
## Search      -1.872e-05  7.400e-05    -0.253      0.8    
## Promo        3.054e-01  7.492e-05  4076.069   <2e-16 ***
## Flash        5.916e-02  7.411e-05   798.296   <2e-16 ***
## Popup        1.068e-01  7.407e-05  1441.500   <2e-16 ***
## Size1       -2.297e+00  3.891e-03  -590.323   <2e-16 ***
## Size2       -2.294e+00  3.820e-03  -600.395   <2e-16 ***
## Size3        6.197e+00  1.215e-03  5100.676   <2e-16 ***
## Size4        6.309e+00  1.215e-03  5193.573   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1665186076  on 9999  degrees of freedom
## Residual deviance:   39455434  on 9986  degrees of freedom
## AIC: 39545773
## 
## Number of Fisher Scoring iterations: 4

library(nonnest2)
vuongtest(tran1, tran2)

## 
## Model 1 
##  Class: glm 
##  Call: glm(formula = y ~ ., family = poisson("log"), data = tran_data)
## 
## Model 2 
##  Class: glm 
##  Call: glm(formula = Click_Through ~ ., family = poisson("log"), data = data)
## 
## Variance test 
##   H0: Model 1 and Model 2 are indistinguishable 
##   H1: Model 1 and Model 2 are distinguishable 
##     w2 = 29722590.350,   p = 0.5
## 
## Non-nested likelihood ratio test 
##   H0: Model fits are equal for the focal population 
##   H1A: Model 1 fits better than Model 2 
##     z = 36.176,   p = <2e-16
##   H1B: Model 2 fits better than Model 1 
##     z = 36.176,   p = 1

Test of Nested Models

In this section, you are to perform a test of nested models. Using glm(), estimate a nested model without any time-related transformed variables. Do a likelihood ratio test between the full model and the nested model and report the results. You can use anova(res0,resA,test=“LRT”) with res0and resA the glm objects (the results from glm() estimation) of the nested and full model.

As evidenced by the results of the likelihood ratio test below, the full model(2), is statistically significant, which means the null model should be rejected. This indicates that time related predictors (Hour and Day variables) do matter and provide a better fit of the data.

#NESTED MODELS TEST
#Full model
glm_full = glm(formula = Click_Through ~., family = poisson("log"), data = data)
summary(glm_full)

## 
## Call:
## glm(formula = Click_Through ~ ., family = poisson("log"), data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -337.85    -3.29    -0.27     2.79   337.46  
## 
## Coefficients:
##               Estimate Std. Error   z value Pr(>|z|)    
## (Intercept)  4.509e+00  1.221e-03  3694.253   <2e-16 ***
## Hour         3.658e-01  8.524e-05  4290.999   <2e-16 ***
## Day          7.874e-01  7.595e-05 10367.775   <2e-16 ***
## English      1.927e-01  7.415e-05  2599.374   <2e-16 ***
## Travel       7.971e-01  7.997e-05  9967.204   <2e-16 ***
## Shopping    -2.838e-01  7.504e-05 -3781.872   <2e-16 ***
## Search      -1.872e-05  7.400e-05    -0.253      0.8    
## Promo        3.054e-01  7.492e-05  4076.069   <2e-16 ***
## Flash        5.916e-02  7.411e-05   798.296   <2e-16 ***
## Popup        1.068e-01  7.407e-05  1441.500   <2e-16 ***
## Size1       -2.297e+00  3.891e-03  -590.323   <2e-16 ***
## Size2       -2.294e+00  3.820e-03  -600.395   <2e-16 ***
## Size3        6.197e+00  1.215e-03  5100.676   <2e-16 ***
## Size4        6.309e+00  1.215e-03  5193.573   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1665186076  on 9999  degrees of freedom
## Residual deviance:   39455434  on 9986  degrees of freedom
## AIC: 39545773
## 
## Number of Fisher Scoring iterations: 4

#null model: without any time related predictors
glm_null = glm(formula = Click_Through ~ English + Travel + Shopping + Search + Promo + Flash + Size + Popup, family = gaussian("identity"), data = data)
summary(glm_null)

## 
## Call:
## glm(formula = Click_Through ~ English + Travel + Shopping + Search + 
##     Promo + Flash + Size + Popup, family = gaussian("identity"), 
##     data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -177156   -76581   -15401    37889   786507  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -30800.7     4027.0  -7.649 2.22e-14 ***
## English      12037.2     2297.0   5.240 1.63e-07 ***
## Travel       55891.5     2296.7  24.335  < 2e-16 ***
## Shopping    -19980.5     2297.3  -8.697  < 2e-16 ***
## Search         753.3     2297.2   0.328 0.742993    
## Promo        24221.3     2297.4  10.543  < 2e-16 ***
## Flash         3334.2     2296.9   1.452 0.146651    
## Size         20838.0      815.1  25.564  < 2e-16 ***
## Popup         7849.4     2297.7   3.416 0.000638 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 13181258147)
## 
##     Null deviance: 1.5099e+14  on 9999  degrees of freedom
## Residual deviance: 1.3169e+14  on 9991  degrees of freedom
## AIC: 261410
## 
## Number of Fisher Scoring iterations: 2

anova(glm_null, glm_full, test = "LRT")

## Analysis of Deviance Table
## 
## Model 1: Click_Through ~ English + Travel + Shopping + Search + Promo + 
##     Flash + Size + Popup
## Model 2: Click_Through ~ Hour + Day + English + Travel + Shopping + Search + 
##     Promo + Flash + Popup + Size1 + Size2 + Size3 + Size4
##   Resid. Df Resid. Dev Df   Deviance  Pr(>Chi)    
## 1      9991 1.3169e+14                            
## 2      9986 3.9455e+07  5 1.3169e+14 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Recommendations

Using the Poisson estimation results (the coefficients and p-values), explain how website and banner characteristics as well as time of day and week influence click-throughs.

I will base my recommendations on the following model: glm(formula = Click_Through ~ ., family = poisson(“log”), data = data)

Note that interpretation of each individual coefficient assumes that all other predictor variables in the model are held constant

Note that the data is transformed to include a binary indication of weekend vs. weekday in the “Day” variable. Likewise, a binary indication of sleeping vs. non-sleeping hours is present in the “Hour” variable, and finally, dummy variables were created for banner sizes 1-4 (size 5 was considered the base case).

As stated previously, the following interpretations hold for the Hour and Day coefficient estimates:

Hour: 3.658e-01, the anti-log is ~1.44. We therefore conclude that banner ads shown during non-sleeping hours generate 1.44 times as many click-throughs per hour as those shown during sleeping hours (hours 0-7).
Day: 7.874e-01, the anti-log is ~ 2.19. We therefore conclude that banner ads shown on weekends (Friday - Sunday) generate 2.19 times as many click-throughs per hour as those shown during weekdays (Monday-Thursday).

Based on these estimates, I would highly recommend that client shifts their focus to featuring banner ads more heavily on weekends during non-sleeping hours (i.e., do not invest in showing banner ads during the following timeframe: 12AM - 7AM).

Likewise, I’d also recommend that they utilize banner ads sizes 3 and 4 almost exclusively, as this decision will have the largest positive impact on the number of click-throughs generated per hour relative to the other variables considered.

Size 3: 6.197e+00, the anti-log is ~491.27. We therefore conclude that banner ads that utilize size 3 generate 491.27 times as many click-throughs per hour as the omitted base case, size 5. Note that Size 4 has a similar, although slightly larger coefficient estimate and corresponding interpretation.

Another predictor which has a large effect on the outcome variable is “Travel”, which indicates whether or not the banner ad was featured on a travel related website. Based on the following interpretation of the coefficient estimate, I’d highly recommend that the client seek to maximize their ROI in banner ads by featuring the majority of them on travel-related websites. The other two options they’ve considered previously (Shopping related and Search Engine) has negative impacts on the outcome variable. Thus, they should shift their investment dollars toward advertising on travel related websites, preferably where English is the default language.

Travel: 7.971e-01, the anti-log is ~2.22. We therefore conclude that banner ads that are featured on Travel related websites generate 2.22 times as many click-throughs per hour relative to banner ads featured on non-travel related websites.
English: 1.927e-01, the anti-log is ~1.21. We therefore conclude that banner ads that are featured on websites where English is the default language generate 1.21 times as many click-throughs per hour as banner ads featured on websites where English is not the default language.

Last but not least, I’d also recommend that the client feature promotions, where applicable, in the actual banner ads. This final recommendation is based on the following interpretation of the resulting coefficient estimate.

Promo: 3.054e-01, the anti-log is ~1.36. We therefore conclude that banner ads that feature a promotion generate 1.36 times as many click-throughs per hour as banner ads that do not feature a promotion.