Rocket Fuel and TaskaBella: Case Study to Measure the Effectiveness of Online Advertising

IIMK ADSM 2020-21 Batch-2, Group 6: Ramana, Venugopal, Siju, Vikesh, Abdul

Load required libraries. Also load and setup the data.

library(readxl)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(knitr)

campaignData <- read_xlsx("Rocket fuel excel B5896-XLS-ENG.xlsx", sheet = "2017_9_rocketfuel_B5896", col_names = TRUE)

str(campaignData)
## tibble [588,101 x 6] (S3: tbl_df/tbl/data.frame)
##  $ user_id       : num [1:588101] 1069124 1119715 1144181 1435133 1015700 ...
##  $ test          : num [1:588101] 1 1 1 1 1 1 1 1 1 1 ...
##  $ converted     : num [1:588101] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tot_impr      : num [1:588101] 130 93 21 355 276 734 264 17 21 142 ...
##  $ mode_impr_day : num [1:588101] 1 2 2 2 5 6 3 7 2 1 ...
##  $ mode_impr_hour: num [1:588101] 20 22 18 10 14 10 13 18 19 14 ...
campaignData$userType <- "Exposed Group"
campaignData$userType [campaignData$test == 0] <- "Control Group"

campaignData$userBought <- "Yes"
campaignData$userBought [campaignData$converted == 0] <- "No"

campaignData$dayOfMostImpressions <- "Sunday"
campaignData$dayOfMostImpressions [campaignData$mode_impr_day == 1] <- "Monday"
campaignData$dayOfMostImpressions [campaignData$mode_impr_day == 2] <- "Tuesday"
campaignData$dayOfMostImpressions [campaignData$mode_impr_day == 3] <- "Wednesday"
campaignData$dayOfMostImpressions [campaignData$mode_impr_day == 4] <- "Thursday"
campaignData$dayOfMostImpressions [campaignData$mode_impr_day == 5] <- "Friday"
campaignData$dayOfMostImpressions [campaignData$mode_impr_day == 6] <- "Saturday"

campaignData$userType <- factor(campaignData$userType, levels = c("Control Group", "Exposed Group"))
campaignData$userBought <- factor(campaignData$userBought, levels = c("Yes", "No"))
campaignData$dayOfMostImpressions <- factor (campaignData$dayOfMostImpressions, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))

Question 1 - Was the advertising campaign effective? Did additional consumers convert as a result of the ad campaign?

Hypothsesis Testing (Two Sample t-test)

  • u1 = Average conversion rate in Control Group
  • u2 = Average conversion rate in Exposed Group
  • H0: u1 >= u2
  • Ha: u1 < u2
t.test(campaignData$converted~campaignData$userType, alternative = "less", var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  campaignData$converted by campaignData$userType
## t = -7.3704, df = 588099, p-value = 8.517e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##          -Inf -0.005975726
## sample estimates:
## mean in group Control Group mean in group Exposed Group 
##                  0.01785411                  0.02554656
  • As we can see from the Two Sample t-test result, p-value is < 0.05. We hence reject Null Hypothesis (H0). This proves that the conversion rate in Exposed Group is higher than that in Control Group and hence the campaign was effective.
campaignData %>%
 filter(userBought %in% "Yes") %>%
 ggplot() +
 aes(x = userType) +
 geom_bar(fill = "#ef562d") +
 geom_text(stat = "count", aes(label=after_stat(count)), vjust = "inward") +
 labs(title = "Converted Users by Group") +
 theme_minimal()

totalUsers <- nrow(campaignData) # Total users targeted in the campaign

noOfEGUsers <- nrow(subset(campaignData, campaignData$test == 1)) # No. of users in Exposed Group

noOfConvertedEGUsers <- nrow(subset(campaignData, campaignData$test == 1 & campaignData$converted == 1)) # No. of Converted users in Exposed Group

pctConvertedinEG <- (noOfConvertedEGUsers / noOfEGUsers) * 100 # Percentage of converted users in Exposed Group

noOfCGUsers <- totalUsers - noOfEGUsers # No. of users in Control Group

noOfConvertedCGUsers <- nrow(subset(campaignData, campaignData$test == 0 & campaignData$converted == 1)) # No. of Converted users in Control Group

pctConvertedinCG <- (noOfConvertedCGUsers / noOfCGUsers) * 100 # Percentage of converted users in Control Group

diffConversionPct <- pctConvertedinEG - pctConvertedinCG # Difference in conversion percentage between two groups

noOfEGUsersConvertedDueToCampaign <- (diffConversionPct / 100) * noOfEGUsers # No. of users from Exposed Group that would have converted due to the campaign

pctControlGroup <- (noOfCGUsers / totalUsers) * 100 # Percentage of Control Group in all users
  1. Users
  • Total users targeted in the campaign is 588101.
  • No. of users in Exposed Group is 564577.
  • No. of users in Control Group is 23524.
  • Percentage of users in Control Group is 4.00%.
  1. Conversions
  • No. of converted users in Exposed Group is 14423.
  • No. of converted users in Control Group is 420.
  • Percentage of converted users in Exposed Group is 2.55%.
  • Percentage of converted users in Control Group is 1.79%.
  • Difference in percentage of converted users in both groups is 0.77%.
  • No. of users from Exposed Group that could have converted due to the campaign 4342.

As we can see from above numbers, we have extra converted users from Exposed Group due to the campaign. Hence we can again conclude that the campaign was effective.


Question 2 - Was the campaign profitable?

2a) How much more money did TaskaBella make by running the campaign (excluding advertising cost)?

Assuming the same percentage of users as in Control Group would have converted from Exposed Group without the campaign, the worth of extra converted users from Exposed Group due to the campaign is the more money that TaskaBella made.

moneyMadeFromCampaign <- noOfEGUsersConvertedDueToCampaign * 40
  • More money made from the campaign is $173719 USD.

2b) What was the cost of the campaign?

totalImpressions <- sum(campaignData$tot_impr) # Find the total number of impressions served during the campaign
costOfCampaign <- (totalImpressions / 1000) * 9 # Cost per thousand impressions (CPM) is $9
  • Cost of the campaign is $131375 USD.

2c) Calculate the ROI of the campaign. Was the campaign profitable?

profitFromCampaign <- moneyMadeFromCampaign - costOfCampaign
roiOfCampaign <- (profitFromCampaign / costOfCampaign) * 100
  • Profit from the campaign is $42345 USD.
  • ROI of the campaign is 32.23%.
  • As we can see, the ROI and the profit from the campaign, the campaign was indeed profitable.

2d) What was the opportunity cost of including a control group; how much more could have TaskaBella made with a smaller control group or not having a control group at all?

If there was no control group, 0.77% users from control group would have converted. The worth of these users is the opportunity cost.

opportunityCost <- (diffConversionPct / 100) * noOfCGUsers * 40
  • If there was no control group, TaskaBella could have made another $7238 USD.

Question 3 - How did the number of impressions seen by each user influence the effectiveness of advertising?

a. Create a chart of conversion rates as a function of the number of ads displayed to users. Plot conversion rates for those who were in the control group and for those who were exposed to the ad. Group together number of impressions as necessary to obtain a meaningful plot. (Conversion rate means the percentage of unique users who made a purchase)

b. What can you infer from the charts? In what region is advertising most effective?

c. What do the above figures imply for the design of the next campaign assuming that consumer response would be similar?

Histogram for converted users based on no. of impressions served irrespective of the group

campaignData %>%
 filter(userBought %in% "Yes") %>%
 ggplot() +
 aes(x = tot_impr) +
 geom_histogram(bins = 30L, fill = "#ef562d") +
 labs(x = "Total Impressions", y = "Converted Users", title = "Histogram of Total Impressions vs Converted Users", subtitle = "Irrespective of the Group") +
 theme_minimal()

As we can see most of the conversions happened until 250 impressions. So, let us zoom in and see a better histogram until that range

campaignData %>%
 filter(userBought %in% "Yes") %>%
 ggplot() +
 aes(x = tot_impr) +
 geom_histogram(bins = 30L, fill = "#ef562d") +
 labs(x = "Total Impressions", y = "Converted Users", title = "Histogram of Total Impressions vs Converted Users", subtitle = "Irrespective of the Group with up to 250 impressions served") +
 theme_minimal() +
 xlim(0L, 250L)
## Warning: Removed 577 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).

We can see that most of the conversions happened within the range of 1 - 100 impressions.

Now, let us see the conversions by Group.

Histogram of converted users in Control Group

campaignData %>%
 filter(userType %in% "Control Group") %>%
 filter(userBought %in% 
    "Yes") %>%
 ggplot() +
 aes(x = tot_impr) +
 geom_histogram(bins = 30L, fill = "#ef562d") +
 labs(x = "Total Impressions", y = "Converted Users", title = "Histogram of Total Impressions vs Converted Users", subtitle = "Control Group") +
 theme_minimal()

Histogram of converted users in Exposed Group with upto 250 impressions

campaignData %>%
 filter(userType %in% "Exposed Group") %>%
 filter(userBought %in% 
    "Yes") %>%
 ggplot() +
 aes(x = tot_impr) +
 geom_histogram(bins = 30L, fill = "#ef562d") +
 labs(x = "Total Impressions", y = "Converted Users", title = "Histogram of Total Impressions vs Converted Users", subtitle = "Exposed Group with up to 250 impressions") +
 theme_minimal() +
 xlim(0L, 250L)
## Warning: Removed 554 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).

As we can see in both groups, most of the conversions are happening under the range of 250 impressions. So, let us create a data frame with No. of impressions (in multiples of 30), No. of users in that impressions range, No. of converted users in that range, total no. of impressions served in that range. Using that let us understand the conversion rates and further analyze the campaign results.

campaignData$ImpressionRange <- "271+"
campaignData$ImpressionRange[campaignData$tot_impr <= 30] <- "1-30"
campaignData$ImpressionRange[campaignData$tot_impr > 30 & campaignData$tot_impr <= 60] <- "31-60"
campaignData$ImpressionRange[campaignData$tot_impr > 60 & campaignData$tot_impr <= 90] <- "61-90"
campaignData$ImpressionRange[campaignData$tot_impr > 90 & campaignData$tot_impr <= 120] <- "91-120"
campaignData$ImpressionRange[campaignData$tot_impr > 120 & campaignData$tot_impr <= 150] <- "121-150"
campaignData$ImpressionRange[campaignData$tot_impr > 150 & campaignData$tot_impr <= 180] <- "151-180"
campaignData$ImpressionRange[campaignData$tot_impr > 180 & campaignData$tot_impr <= 210] <- "181-210"
campaignData$ImpressionRange[campaignData$tot_impr > 210 & campaignData$tot_impr <= 240] <- "211-240"
campaignData$ImpressionRange[campaignData$tot_impr > 240 & campaignData$tot_impr <= 270] <- "241-270"
campaignData$ImpressionRange <- factor(campaignData$ImpressionRange, levels = c("1-30", "31-60", "61-90", "91-120", "121-150", "151-180", "181-210", "211-240", "241-270", "271+"))

impressionGroups <- group_by(campaignData, ImpressionRange)
campaignSummary <- summarise(impressionGroups, Users = n(), Conversions = sum(converted), TotalNoOfImpressions = sum(tot_impr))
## `summarise()` ungrouping output (override with `.groups` argument)
# Calculate the conversion rate which is percentage of users converted in the impression range
campaignSummary$ConversionRate <- (campaignSummary$Conversions / campaignSummary$Users) * 100

# Calculate the no. of impressions served per conversion in the same impression ranges
campaignSummary$ImprPerConversion <- campaignSummary$TotalNoOfImpressions / campaignSummary$Conversions

Summary of table of Impressions Range, No. of Users, No. of Conversions, Total no. of impressions, Conversion Rate, Impressions Per Conversion

kable(campaignSummary, format = "html")
ImpressionRange Users Conversions TotalNoOfImpressions ConversionRate ImprPerConversion
1-30 457381 3151 4770365 0.6889224 1513.9210
31-60 77335 3823 3261125 4.9434279 853.0277
61-90 25590 3187 1875987 12.4540836 588.6373
91-120 11293 1926 1171092 17.0548127 608.0436
121-150 5825 1037 779726 17.8025751 751.9055
151-180 3304 567 543154 17.1610169 957.9436
181-210 2024 323 394103 15.9584980 1220.1331
211-240 1393 202 312790 14.5010768 1548.4653
241-270 878 139 223135 15.8314351 1605.2878
271+ 3078 488 1265705 15.8544509 2593.6578
ggplot(campaignSummary) +
 aes(x = ImpressionRange, weight = Conversions) +
 geom_bar(fill = "#ef562d") +
 geom_text(aes(y = Conversions, label = Conversions), vjust = "inward") +
 geom_line(aes(x = ImpressionRange, y = ImprPerConversion), size = 1, color = "black", group = 1) +
 scale_y_continuous(name = "Conversions", sec.axis = sec_axis(~ . * 1, name = "Impressions per Conversion")) +
 geom_text(aes(y = ImprPerConversion, label = round(ImprPerConversion)), vjust = "outward") +
 labs(title = "Impressions Range vs Converted Users and Impressions per Conversion") +
 theme_minimal()

ggplot(campaignSummary) +
 geom_line(aes(x = ImpressionRange, y = ConversionRate), size = 1, color = "#ef562d", group = 1) +
 geom_text(aes(x = ImpressionRange, y = ConversionRate, label = format(ConversionRate, digits = 2, format = "f")), vjust = "inward") +
 labs(title = "Impressions Range vs Conversion Rate") +
 theme_minimal()

Interpretation of above 2 graphs:

  1. From the first graph, we can learn that most of the conversions are happening in the range of 1 to 150 impressions. Also the impressions per conversion is low in this range. Beyond 120 / 150 impressions, the conversions are reducing and the impressions per conversion are also high.
  2. Even from the second graph, we can learn that the conversion rate is increasing as the number of impressions increase until 150. After that it is on declining trend and stabilizing at 15%. So the best number of impressions recommended for best results are 30 to 150 per user.

Recommendation

  1. Keep the number of impressions per user in the range of 1 to 150.
  2. Instead of increasing the impressions beyond 150, engage more users in the campaign

Segregate the data by group and analyze

The above analysis / charts don’t distinguish between Exposed Group and Control Group. Let us do this by group and interpret the results.

# Extract Exposed Group Data and analyze
exposedGroupData <- subset(campaignData, campaignData$test == 1)

exposedImpressionGroups <- group_by(exposedGroupData, ImpressionRange)
exposedCampaignSummary <- summarise(exposedImpressionGroups, Users = n(), Conversions = sum(converted), TotalNoOfImpressions = sum(tot_impr))
## `summarise()` ungrouping output (override with `.groups` argument)
# Calculate the conversion rate which is percentage of users converted in the impression range
exposedCampaignSummary$ConversionRate <- (exposedCampaignSummary$Conversions / exposedCampaignSummary$Users) * 100

# Calculate the no. of impressions served per conversion in the same impression ranges
exposedCampaignSummary$ImprPerConversion <- exposedCampaignSummary$TotalNoOfImpressions / exposedCampaignSummary$Conversions

For Exposed Group: Summary of table of Impressions Range, No. of Users, No. of Conversions, Total no. of impressions, Conversion Rate, Impressions Per Conversion

kable(exposedCampaignSummary, format = "html")
ImpressionRange Users Conversions TotalNoOfImpressions ConversionRate ImprPerConversion
1-30 438983 3026 4586696 0.6893205 1515.7621
31-60 74470 3726 3139252 5.0033571 842.5260
61-90 24538 3125 1798915 12.7353493 575.6528
91-120 10819 1885 1121799 17.4230520 595.1188
121-150 5567 1005 745123 18.0528112 741.4159
151-180 3151 550 517931 17.4547763 941.6927
181-210 1918 312 373293 16.2669447 1196.4519
211-240 1334 192 299489 14.3928036 1559.8385
241-270 842 131 214014 15.5581948 1633.6947
271+ 2955 471 1218189 15.9390863 2586.3885
ggplot(exposedCampaignSummary) +
 aes(x = ImpressionRange, weight = Conversions) +
 geom_bar(fill = "#ef562d") +
 geom_text(aes(y = Conversions, label = Conversions), vjust = "inward") +
 geom_line(aes(x = ImpressionRange, y = ImprPerConversion), size = 1, color = "black", group = 1) +
 scale_y_continuous(name = "Conversions", sec.axis = sec_axis(~ . * 1, name = "Impressions per Conversion")) +
 geom_text(aes(y = ImprPerConversion, label = round(ImprPerConversion)), vjust = "outward") +
 labs(title = "Impressions Range vs Converted Users and Impressions per Conversion", subtitle = "Exposed Group") +
 theme_minimal()

ggplot(exposedCampaignSummary) +
 geom_line(aes(x = ImpressionRange, y = ConversionRate), size = 1, color = "#ef562d", group = 1) +
 geom_text(aes(x = ImpressionRange, y = ConversionRate, label = format(ConversionRate, digits = 2, format = "f")), vjust = "inward") +
 labs(title = "Impressions Range vs Conversion Rate", subtitle = "Exposed Group") +
 theme_minimal()

Interpretation of Exposed Group charts

  1. These charts look similar to above charts where we didn’t segregate the groups and analyzed all data.
  2. So, the recommendation of serving up to 150 impressions per user during the advertising campaign for it to be effective still holds good here.
# Extract Control Group Data and analyze
controlGroupData <- subset(campaignData, campaignData$test == 0)

controlImpressionGroups <- group_by(controlGroupData, ImpressionRange)
controlCampaignSummary <- summarise(controlImpressionGroups, Users = n(), Conversions = sum(converted), TotalNoOfImpressions = sum(tot_impr))
## `summarise()` ungrouping output (override with `.groups` argument)
# Calculate the conversion rate which is percentage of users converted in the impression range
controlCampaignSummary$ConversionRate <- (controlCampaignSummary$Conversions / controlCampaignSummary$Users) * 100

# Calculate the no. of impressions served per conversion in the same impression ranges
controlCampaignSummary$ImprPerConversion <- controlCampaignSummary$TotalNoOfImpressions / controlCampaignSummary$Conversions

For Control Group: Summary of table of Impressions Range, No. of Users, No. of Conversions, Total no. of impressions, Conversion Rate, Impressions Per Conversion

kable(controlCampaignSummary, format = "html")
ImpressionRange Users Conversions TotalNoOfImpressions ConversionRate ImprPerConversion
1-30 18398 125 183669 0.6794217 1469.352
31-60 2865 97 121873 3.3856894 1256.423
61-90 1052 62 77072 5.8935361 1243.097
91-120 474 41 49293 8.6497890 1202.268
121-150 258 32 34603 12.4031008 1081.344
151-180 153 17 25223 11.1111111 1483.706
181-210 106 11 20810 10.3773585 1891.818
211-240 59 10 13301 16.9491525 1330.100
241-270 36 8 9121 22.2222222 1140.125
271+ 123 17 47516 13.8211382 2795.059
ggplot(controlCampaignSummary) +
 aes(x = ImpressionRange, weight = Conversions) +
 geom_bar(fill = "#ef562d") +
 geom_text(aes(y = Conversions, label = Conversions), vjust = "inward") +
 geom_line(aes(x = ImpressionRange, y = ImprPerConversion), size = 1, color = "black", group = 1) +
 scale_y_continuous(name = "Conversions", sec.axis = sec_axis(~ . * 1, name = "Impressions per Conversion")) +
 geom_text(aes(y = ImprPerConversion, label = round(ImprPerConversion)), vjust = "outward") +
 labs(title = "Impressions Range vs Converted Users and Impressions per Conversion", subtitle = "Control Group") +
 theme_minimal()

ggplot(controlCampaignSummary) +
 geom_line(aes(x = ImpressionRange, y = ConversionRate), size = 1, color = "#ef562d", group = 1) +
 geom_text(aes(x = ImpressionRange, y = ConversionRate, label = format(ConversionRate, digits = 2, format = "f")), vjust = "inward") +
 labs(title = "Impressions Range vs Conversion Rate", subtitle = "Control Group") +
 theme_minimal()

Interpretation of Control Group charts

  1. We have an interesting trend here. Though there are no ads served (it is only PSAs) to these users, the conversion rate is increasing until 150 impressions, then dips down and is increasing again from 210 to 270 impressions.
  2. The conversion trend in both groups is increasing until 150 impressions.
  3. However, though the conversion rate increased in Control Group from 210 to 270 impressions again, the number of impressions per conversion is also high and users converted is less.
  4. So, the opportunity cost of increasing the number of impressions for such lower number of actual conversions may not be beneficial.

Question 4 - How does consumer response to advertising vary on different days of the week and at different times of the day?

a. Create a chart with conversion rates for the control group and the exposed group as a function of the day of week when they were shown the most impressions.

b. Create the same chart for hours within a day (excluding the period between midnight and 8 am).

c. What days / hours in advertising most / least effective?

campaignData %>%
 filter(userBought %in% "Yes") %>%
 ggplot() +
 aes(x = dayOfMostImpressions) +
 geom_bar(fill = "#238b45") +
 geom_text(stat = "count", aes(label=after_stat(count)), vjust = "inward") +
 labs(title = "Converted Users by Day of Week", subtitle = "Irrespective of the Group") +
 theme_minimal()

campaignData %>%
 filter(userBought %in% "Yes") %>%
 ggplot() +
 aes(x = mode_impr_hour) +
 geom_histogram(bins = 50L, fill = "#238b45") +
 geom_text(stat = "count", aes(label=after_stat(count)), vjust = "inward") +
 labs(title = "Conversions by Time", subtitle = "Irrespective of the Group") +
 theme_minimal()

campaignData %>%
 filter(userBought %in% "Yes") %>%
 ggplot() +
 aes(x = dayOfMostImpressions, y = mode_impr_hour) +
 geom_boxplot(fill = "#238b45") +
 labs(title = "Converted Users by Day of Week and Time during the day", subtitle = "Irrespective of the Group") +
 theme_minimal()

campaignData %>%
 filter(userBought %in% "Yes") %>%
 ggplot() +
 aes(x = dayOfMostImpressions, y = mode_impr_hour) +
 geom_boxplot(fill = "#238b45") +
 labs(title = "Converted Users by Day of Week and Time during the day", subtitle = "Irrespective of the Group (excluding midnight to 8 AM)") +
 theme_minimal() +
 ylim(8L, 23L)
## Warning: Removed 420 rows containing non-finite values (stat_boxplot).

Interpretation of above four charts.

  1. Most of the conversions are happening on Sunday, Monday, Tuesday and Friday with Monday being the most.
  2. Timing wise, most of the conversions are happening between 10 AM to 5 PM with 3 PM being the most.
  3. The two box plots also show that most of conversions are happening between 12 PM to 6 PM and the mean time is around 3 PM.

The above charts aren’t segregated by group. Let us segregate the data by group and analyze the conversion rates as well.

Analyze Exposed Group’s Data by week day and time.

exposedGroupWeekData <- group_by(exposedGroupData, dayOfMostImpressions)
exposedGroupWeekSummary <- summarise(exposedGroupWeekData, Users = n(), Conversions = sum(converted), TotalNoOfImpressions = sum(tot_impr))
## `summarise()` ungrouping output (override with `.groups` argument)
# Calculate the conversion rate which is percentage of users converted in the impression range
exposedGroupWeekSummary$ConversionRate <- (exposedGroupWeekSummary$Conversions / exposedGroupWeekSummary$Users) * 100

# Calculate the no. of impressions served per conversion in the same impression ranges
exposedGroupWeekSummary$ImprPerConversion <- exposedGroupWeekSummary$TotalNoOfImpressions / exposedGroupWeekSummary$Conversions

exposedGroupTimeData <- group_by(exposedGroupData, mode_impr_hour)
exposedGroupTimeSummary <- summarise(exposedGroupTimeData, Users = n(), Conversions = sum(converted), TotalNoOfImpressions = sum(tot_impr))
## `summarise()` ungrouping output (override with `.groups` argument)
# Calculate the conversion rate which is percentage of users converted in the impression range
exposedGroupTimeSummary$ConversionRate <- (exposedGroupTimeSummary$Conversions / exposedGroupTimeSummary$Users) * 100

# Calculate the no. of impressions served per conversion in the same impression ranges
exposedGroupTimeSummary$ImprPerConversion <- exposedGroupTimeSummary$TotalNoOfImpressions / exposedGroupTimeSummary$Conversions

ggplot(exposedGroupWeekSummary) +
 aes(x = dayOfMostImpressions, weight = Conversions) +
 geom_bar(fill = "#238b45") +
 geom_text(aes(y = Conversions, label = Conversions), vjust = "inward") +
 geom_line(aes(x = dayOfMostImpressions, y = ImprPerConversion), size = 1, color = "red", group = 1) +
 scale_y_continuous(name = "Conversions", sec.axis = sec_axis(~ . / 2, name = "Impressions per Conversion")) +
 geom_text(aes(y = ImprPerConversion, label = round(ImprPerConversion)), vjust = "outward") +
 labs(title = "Conversions and Impressions per Conversion by Week Day", subtitle = "Exposed Group", x = "Week Day", y = "Conversions") +
 theme_minimal()

ggplot(exposedGroupWeekSummary) +
 aes(x = dayOfMostImpressions, weight = ConversionRate) +
 geom_line(aes(x = dayOfMostImpressions, y = ConversionRate), size = 1, color = "red", group = 1) +
 geom_text(aes(x = dayOfMostImpressions, y = ConversionRate, label = format(ConversionRate, digits = 2, format = "f")), vjust = "inward") +
 labs(title = "Conversion Rate by Week Day", subtitle = "Exposed Group", x = "Week Day", y = "Conversion Rate") +
 theme_minimal()

Interpretation of these charts

  1. Maximum conversions are happening on Monday, Tuesday and Sunday.
  2. Impressions per conversion are least on Monday and Tuesday. And high on Friday and Saturday.
  3. This implies that the campaign is cost effective on Monday and Tuesday.
  4. But the conversion rate starts picking up from Sunday and reaches the highest on Monday and then starts declining.
  5. This can also be interpreted as users start seeing the ads with the onset of the weekend, consider the purchase during weekend and ultimately purchase on Sunday, Monday and Tuesday. So, we can stick to the current trend of impressions.
  6. If we think this is not the behavior, TaskaBella can consider reducing the impressions on Friday and Saturday and instead consider targeting more users on Monday and Tuesday as these are the days when we have the highest conversion rates.
ggplot(exposedGroupTimeSummary) +
 aes(x = mode_impr_hour, weight = Conversions) +
 geom_bar(fill = "#238b45") +
 geom_text(aes(y = Conversions, label = Conversions), vjust = "inward") +
 geom_line(aes(x = mode_impr_hour, y = ImprPerConversion), size = 1, color = "red", group = 1) +
 scale_y_continuous(name = "Conversions", sec.axis = sec_axis(~ . * 1, name = "Impressions per Conversion")) +
 geom_text(aes(y = ImprPerConversion, label = round(ImprPerConversion)), vjust = "outward") +
 labs(title = "Conversions by Hour of the Day", subtitle = "Exposed Group") +
 theme_minimal()

ggplot(exposedGroupTimeSummary) +
 aes(x = mode_impr_hour, weight = Conversions) +
 geom_bar(fill = "#238b45") +
 geom_text(aes(y = Conversions, label = Conversions), vjust = "inward") +
 geom_line(aes(x = mode_impr_hour, y = ImprPerConversion), size = 1, color = "red", group = 1) +
 scale_y_continuous(name = "Conversions", sec.axis = sec_axis(~ . * 1, name = "Impressions per Conversion")) +
 geom_text(aes(y = ImprPerConversion, label = round(ImprPerConversion)), vjust = "outward") +
 labs(title = "Conversions by Hour of the Day", subtitle = "Exposed Group") +
 theme_minimal() +
 xlim(8L, 23L)
## Warning: Removed 8 rows containing non-finite values (stat_count).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Warning: Removed 8 rows containing missing values (geom_text).
## Warning: Removed 8 row(s) containing missing values (geom_path).
## Warning: Removed 8 rows containing missing values (geom_text).

ggplot(exposedGroupTimeSummary) +
 aes(x = mode_impr_hour, y = ConversionRate) +
 geom_line(size = 1L, colour = "#cb181d") +
 geom_text(aes(x = mode_impr_hour, y = ConversionRate, label = format(ConversionRate, digits = 2, format = "f")), vjust = "outward") +
 labs(x = "Hour", y = "Conversion Rate", title = "Conversion Rate by Hour of the Day", subtitle = "Exposed Group") +
 theme_minimal()

Interpretation of these charts

  1. 2 PM to 4 PM is the time when most of the conversions are happening and the impressions per conversion is low.
  2. Conversion rate is on increasing trend from 10 AM to 4 PM and again increased at 8 PM.
  3. This means, customers are purchasing when they are in relaxed mood. After lunch hour and after dinner.
  4. These are the time frames when TaskaBella can consider having targeted advertising.

Analyze Control Group’s Data by week day and time.

controlGroupWeekData <- group_by(controlGroupData, dayOfMostImpressions)
controlGroupWeekSummary <- summarise(controlGroupWeekData, Users = n(), Conversions = sum(converted), TotalNoOfImpressions = sum(tot_impr))
## `summarise()` ungrouping output (override with `.groups` argument)
# Calculate the conversion rate which is percentage of users converted in the impression range
controlGroupWeekSummary$ConversionRate <- (controlGroupWeekSummary$Conversions / controlGroupWeekSummary$Users) * 100

# Calculate the no. of impressions served per conversion in the same impression ranges
controlGroupWeekSummary$ImprPerConversion <- controlGroupWeekSummary$TotalNoOfImpressions / controlGroupWeekSummary$Conversions

controlGroupTimeData <- group_by(controlGroupData, mode_impr_hour)
controlGroupTimeSummary <- summarise(controlGroupTimeData, Users = n(), Conversions = sum(converted), TotalNoOfImpressions = sum(tot_impr))
## `summarise()` ungrouping output (override with `.groups` argument)
# Calculate the conversion rate which is percentage of users converted in the impression range
controlGroupTimeSummary$ConversionRate <- (controlGroupTimeSummary$Conversions / controlGroupTimeSummary$Users) * 100

# Calculate the no. of impressions served per conversion in the same impression ranges
controlGroupTimeSummary$ImprPerConversion <- controlGroupTimeSummary$TotalNoOfImpressions / controlGroupTimeSummary$Conversions

ggplot(controlGroupWeekSummary) +
 aes(x = dayOfMostImpressions, weight = Conversions) +
 geom_bar(fill = "#238b45") +
 geom_text(aes(y = Conversions, label = Conversions), vjust = "inward") +
 geom_line(aes(x = dayOfMostImpressions, y = ImprPerConversion), size = 1, color = "red", group = 1) +
 scale_y_continuous(name = "Conversions", sec.axis = sec_axis(~ . / 2, name = "Impressions per Conversion")) +
 geom_text(aes(y = ImprPerConversion, label = round(ImprPerConversion)), vjust = "outward") +
 labs(title = "Conversions and Impressions per Conversion by Week Day", subtitle = "Control Group", x = "Week Day", y = "Conversions") +
 theme_minimal()

ggplot(controlGroupWeekSummary) +
 aes(x = dayOfMostImpressions, weight = ConversionRate) +
 geom_line(aes(x = dayOfMostImpressions, y = ConversionRate), size = 1, color = "red", group = 1) +
 geom_text(aes(x = dayOfMostImpressions, y = ConversionRate, label = format(ConversionRate, digits = 2, format = "f")), vjust = "inward") +
 labs(title = "Conversion Rate by Week Day", subtitle = "Control Group", x = "Week Day", y = "Conversion Rate") +
 theme_minimal()

Interpretation of these charts

  1. Conversions are high on Monday and Thursday. Monday matches with Exposed Group as well.
  2. The conversions are low and the impressions per conversion are high which is not effective.
  3. The conversion rates are high on Monday, Thursday and Sunday. Here Monday and Sunday are matching with Exposed Group as well.
ggplot(controlGroupTimeSummary) +
 aes(x = mode_impr_hour, weight = Conversions) +
 geom_bar(fill = "#238b45") +
 geom_text(aes(y = Conversions, label = Conversions), vjust = "inward") +
 geom_line(aes(x = mode_impr_hour, y = ImprPerConversion), size = 1, color = "red", group = 1) +
 scale_y_continuous(name = "Conversions", sec.axis = sec_axis(~ . * 1, name = "Impressions per Conversion")) +
 geom_text(aes(y = ImprPerConversion, label = round(ImprPerConversion)), vjust = "outward") +
 labs(title = "Conversions by Hour of the Day", subtitle = "Control Group") +
 theme_minimal()

ggplot(controlGroupTimeSummary) +
 aes(x = mode_impr_hour, weight = Conversions) +
 geom_bar(fill = "#238b45") +
 geom_text(aes(y = Conversions, label = Conversions), vjust = "inward") +
 geom_line(aes(x = mode_impr_hour, y = ImprPerConversion), size = 1, color = "red", group = 1) +
 scale_y_continuous(name = "Conversions", sec.axis = sec_axis(~ . * 1, name = "Impressions per Conversion")) +
 geom_text(aes(y = ImprPerConversion, label = round(ImprPerConversion)), vjust = "outward") +
 labs(title = "Conversions by Hour of the Day", subtitle = "Control Group") +
 theme_minimal() +
 xlim(8L, 23L)
## Warning: Removed 8 rows containing non-finite values (stat_count).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Warning: Removed 8 rows containing missing values (geom_text).
## Warning: Removed 8 row(s) containing missing values (geom_path).
## Warning: Removed 8 rows containing missing values (geom_text).

ggplot(controlGroupTimeSummary) +
 aes(x = mode_impr_hour, y = ConversionRate) +
 geom_line(size = 1L, colour = "#cb181d") +
 geom_text(aes(x = mode_impr_hour, y = ConversionRate, label = format(ConversionRate, digits = 2, format = "f")), vjust = "outward") +
 labs(x = "Hour", y = "Conversion Rate", title = "Conversion Rate by Hour of the Day", subtitle = "Control Group") +
 theme_minimal()

Interpretation of these charts

  1. Conversions are high at 3 to 4 PM and impressions per conversion are also low. This time window matches with Exposed Group as well.
  2. Conversion rate is on increasing trend from 10 AM to 12 PM and from 3 PM to 4 PM. This window partly matches with Exposed Group as well. But it is a mixed trend overall in Control Group.

Summary of the analysis by week day and hour

  1. Irrespective of the group, most of the conversions are happening on Sunday and Monday around 3 PM.
  2. For best results, TaskaBella can consider campaigning to start on Friday and end on Sunday from 10 AM to 6 PM.

Overall Summary of this case study

The campaign had been effective. However, for best results, TaskaBella can consider the below suggestions for next campaign.
  1. Do not have control group.If we want to maintain it for comparative study, consider it being at 2%.
  2. Maintain 1 to 150 impressions per user.
  3. Do the campaign on Friday, Saturday and Sunday.
  4. Ensure the systems can handle the traffic on Sunday and Monday around 2 to 4 PM when most of the conversions happen.