I worked with Julia French and Iqbal Parekh on this. We chose to analyze Lab 7 for this week’s data dive.
First, the tidyverse library was loaded:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The main dataset in lab 7 was marketing data, which will be included here:
url_ <- "https://raw.githubusercontent.com/leontoddjohnson/i590/main/data/marketing/marketing.csv"
marketing <- read_delim(url_, delim = ",")
## Rows: 40 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (8): spend, clicks, impressions, display, transactions, revenue, ctr, co...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The main issue is we lack complete knowledge of how this data was collected, what company it was being collected for, and any other contextual information about the dataset that makes it hard to make any judgments on if the dataset has bias or has applicability to broader uses. As a result, one concern to consider with the applicability of any findings from this dataset is that these results may not apply outside of the specific company or even the specific advertising campaign. It would take further analysis to say for certain that any results created from this dataset could be applicable to any other company. Some of this analysis could include understanding what the display visualized looked like (was its graphic design particularly aesthetically pleasing compared to other ad displays, drawing further attention to it than other displays), what the display advertised (did the display advertise items many people had in their carts, or were the items advertised poor sellers), where on the site/email the display was shown (location of the display in an email or on a website may have a greater impact than simply having an advertising display) and knowledge of what kind of company was running these displays (is this Amazon, a clothing retailer, etc?).
Additionally, certain columns are unclear on what they represent. Spend, ctr (the lab suggests it is clicks per impression, but calculating clicks per impression for each day proves this not to be true), and con_rate are particularly unclear on what they represent, and it makes it difficult to draw conclusions based on this data without better documentation explaining what this data represents. As a result, this dataset ends up being far less useful than it could be.
Another issue is that there is no demographics or use data on the customers who clicked versus did not click. A display campaign may result in increased clicks per impressions, but by looking at subsets of customers based on their demographics or customer types (such as if the customer is a regular customers or produces a large amount of revenue per transaction), it may become clear that the display campaign was only effective for some segment of the customer base and not effective at all or detrimental for other subsets of the customer base. In some cases, this may be an ideal result if the campaign was specifically targeted at a particular subset of the customer base, but for more general campaigns, having it only effective for a subset of the customer base and not for others may not be desirable. Since it is not present in the dataset, these underlying differences in trends for clicks per impression or revenue acquired for the different groups cannot be visualized, and thus, limits the applicability of conclusions drawn from this dataset to other campaigns and companies as the full picture of the campaign’s success or lack thereof is unclear. This is particularly concerning for bias reasons as customers who make up a minority of the customer base may not be catered to through display campaigns since their reactions to previous display campaigns cannot be visualized. Thus, the company many find themselves appealing to an increasingly narrow demographic as they cannot adequately notice and respond to the various smaller groups within the customer base, which could in turn hurt overall revenue.
A final consideration to note (which is also mentioned in Lab 7) is that it is unknown if other campaigns ran during the same period the display campaign was run or not run, which may prove to be a confounding variable in analyzing the display campaign’s effect on things like revenue or clicks per impression.
One of the important visualizations made was box-plots comparing number of clicks between days when the advertising display was up and days when the display was not up. This is shown here:
marketing |>
ggplot() +
geom_boxplot(mapping = aes(x = revenue,
y = factor(display, levels = c(0, 1),
labels = c("Normal", "With Display")))) +
labs(title = "Advertisement Effect on Revenue",
x = "Revenue (in dollars)",
y = "Advertisement Variation") +
theme_minimal()
First, one concern is that revenue is a fairly volatile measure of display success since revenue is not consistent, especially for this company. Revenue in the $100s is not that large, and a few customers choosing to buy items at full price that day could easily elevate revenue for the day regardless of the display’s presence.
There are also no dates listed for the day revenue was measured, which may also be a confounding factor since special sales day like Black Friday may generate more revenue, or maybe the site gets more sales in general on certain days of the week (for instance, people might shop more on weekends?). This may also impact the ability of the company to apply any conclusions drawn about this revenue data to other times of year. For instance, if this campaign were run during the holiday season, people may be more receptive to spending in general, and thus more susceptible to advertising than other times of year, whereas if this same data was collected during an off-season display campaign, similar results may not be seen.
Another concern relates back to the issue of not knowing what products the display advertised. An advertisement promoting higher cost objects would have a much greater effect on revenue for a company like this than an advertisement promoting lower cost objects. Thus, it may appear the advertising campaign was far more effective and significant than it actually was. A display campaign that brings increased revenue may seem successful initially, but if that translates to only a few extra sales compared to normal, that display campaign may not be nearly the success it was initially thought to be. This is especially concerning given the small number of transactions in general for this dataset. The most transactions for any given day is 9, which is an incredibly small number of transactions and could easily get skewed by a few expensive items or customers spending a significant amount in one purchase.
However, the one positive thing of note was that revenue was approximately normal for both groups, as seen below:
marketing |>
ggplot() +
geom_histogram(mapping = aes(x = revenue,
fill= factor(display, levels = c(0, 1),
labels = c("Normal", "With Display"))),color="black") +
labs(title = "Advertisement Effect on Revenue",
x = "Revenue (in dollars)",
y = "Advertisement Variation") +
theme_minimal()+labs(fill="Legend")+ scale_fill_manual(values = c("darkgreen", "lightblue"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This is important with hypothesis testing as while bootstrapping can be and was used to ensure the distribution was normal for the difference between two groups, hypothesis testing generally operates on the assumption the samples are normally distributed, which both seem to be in this case. Also, since this was a smaller sample (only 40 entries, 20 per group), a two-sample t-test was used, which was the right choice of hypothesis test for this data. One concern about normalness of the data would be with further breaking down the groups based on demographics, which may result in revenue data not being normally distributed within groups.
When factoring in all of these concerns, it is easy to see why when hypothesis testing was performed for the difference in revenue, this difference was not statistically significant. Thus, the null hypothesis that there was no difference in revenue due to the display campaign could not be rejected. There are simply too many underlying, unmeasured factors that can affect revenue, particularly for a company of such a small size, that concluding that there is not zero difference in revenue due to the add campaign is impossible.
Overall, revenue was a poor choice of measuring display success for this company, especially in light of other data available in this dataset.
Looking at clicks and impressions, especially looking at clicks per impressions, would be a more desirable choice for determining the effectiveness of the campaign (as visualized below):
marketing |>
ggplot() +
geom_boxplot(mapping = aes(x = (clicks/impressions),
y = factor(display, levels = c(0, 1),
labels = c("Normal", "With Display")))) +
labs(title = "Advertisement Effect on Clicks Per Impression",
x = "Clicks Per Impression",
y = "Advertisement Variation") +
theme_minimal()
First, a clear difference between groups is shown- the group medians are almost at opposite ends of the graph, and even the outliers to do not overlap. Even visually, there is a clear difference between groups, though hypothesis testing was also performed in lab 7 to disprove the idea that the difference in clicks per impression between groups is nonexistent. This is shown via the Chi-Squared and Fisher’s Exact Tests below:
ctr_table <- marketing |>
group_by(display) |>
summarize(clicks = sum(clicks),
non_clicks = sum(impressions) - sum(clicks))
ctr_table
## # A tibble: 2 × 3
## display clicks non_clicks
## <dbl> <dbl> <dbl>
## 1 0 5706 279567
## 2 1 5663 181957
The Chi-Squared Test:
chisq.test(select(ctr_table, clicks, non_clicks))
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: select(ctr_table, clicks, non_clicks)
## X-squared = 499.61, df = 1, p-value < 2.2e-16
The Fisher’s Exact Test:
fisher.test(select(ctr_table, clicks, non_clicks))
##
## Fisher's Exact Test for Count Data
##
## data: select(ctr_table, clicks, non_clicks)
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.6316868 0.6808014
## sample estimates:
## odds ratio
## 0.655815
First and foremost, both tests had a p-value less than 2.2e-16, which is quite low. This gives good evidence that the results are difference than the expected equal clicks per impressions rates on days when the display campaign was active, which can be visualized to be in favor of the display campaign via the box plots above.
However, there is one primary concern to when looking at the contingency table: there were a lot less overall impressions overall on days with the display versus days without the display (almost 100,000 less impressions). This may suggest that the display campaign was not deployed effectively as a much fewer number of people visited the site on days when the display campaign was up. As a result, it makes it difficult to measure the true effectiveness of the display campaign. Was the display campaign only active on days when sales and revenue would normally be low to try and raise sales on these days? Would a stronger effect on revenue/clicks per impression be visible if the display campaign was active on days when impressions were normally higher? Ideally, trying to compare days where the display was up or not when the impressions were of similar magnitude might be more valuable for determining ad campaign effectiveness. Additionally, this difference may suggest that the display was run during a slower sales period for the company to try and boost sales.
This is also where the lack of ability to break this dataset down based on the different variety of customers becomes particularly concerning as different groups are likely not to have similar clicks per impression rates compared to others, which means a lot of valuable information on the success of the display campaign is not available. Thus, the conclusion that clicks per impression were increased due to the ad campaign may not necessarily be applicable to every campaign. If this campaign was designed for the general audience of the website, and the general audience is generally 70% men, it may seem a success. However, it is possible that further breaking down the campaign’s success could show that the campaign was effective for men but not for women. Thus, a similar campaign with a goal of increasing the amount women spend on the site may fail because the original campaign’s ineffectiveness with women was not noticed.
Overall, the quality of the hypothesis testing and conclusions drawn from this dataset are hindered tremendously by the narrowness of the dataset and the lack of clarity about any of the context of the dataset’s creation. There are a significant number of potentially conflicting variables with both revenue and clicks per impression including dates of the campaign, demographic differences in responses to the campaign, massive fluctuations in revenue from day-to-day due to the small amount of revenue and size of the business. Additionally, conclusions based on this dataset could result in campaigns that cannot reach specific portions of the customer base due to the lack of information and insight into how different demographics react to the campaign.