Hypothesis 1:
Null Hypothesis: There is no significant relationship between spend and revenue.
Alternative Hypothesis: There is a significant relationship between spend and revenue.
Hypothesis 2:
Null Hypothesis: Spend and whether a display campaign is running do not significantly impact revenue.
Alternative Hypothesis: Spend and whether a display campaign is running significantly impact revenue.
# Load necessary libraries
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages("readxl")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(readxl)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
df <- read_excel("Display_data.xlsx")
df <- as.data.frame(df)
# Simple Regression: Revenue ~ Spend
model_simple <- lm(revenue ~ spend, data = df)
# Summary of the simple regression model
summary(model_simple)
##
## Call:
## lm(formula = revenue ~ spend, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -145.210 -54.647 1.117 67.780 149.476
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.9397 37.9668 0.288 0.775
## spend 4.8066 0.7775 6.182 1.31e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 86.71 on 27 degrees of freedom
## Multiple R-squared: 0.586, Adjusted R-squared: 0.5707
## F-statistic: 38.22 on 1 and 27 DF, p-value: 1.311e-06
Hypothesis 1:
Null Hypothesis: There is no significant relationship between spend and revenue.
Alternative Hypothesis: There is a significant relationship between spend and revenue.
The spend variable has a coefficient of 4.81 with a p-value of 1.31e-06, indicating that for every additional dollar spent, revenue increases by approximately $4.81. Since the p-value is extremely small, this relationship is highly statistically significant. The model’s R squared is 0.586, which means that 58.6% of the variation in revenue can be explained by ad spend alone. The Adjusted R squared is 0.5707, confirming that this model is a good fit for the data. Multiple R is approximately 0.7656, showing a strong positive relationship between revenue and spend. Since the p-value is less than 0.05 and the relationship is statistically significant, we reject the null hypothesis and conclude that there is a significant relationship between ad spend and revenue.
The model shows that increasing ad spend significantly impacts revenue. With every dollar spent generating about $4.81 in return, we would consider this profitable. However, since around 41.4% of the variation in revenue is still unexplained, managers should explore other influential variables to maximize profit.
# Load the data
df <- read_excel("Display_data.xlsx")
# Convert to data frame
df <- as.data.frame(df)
# Multiple Regression: Revenue ~ Spend + Display
model_multiple <- lm(revenue ~ spend + display, data = df)
# Summary of the regression model
summary(model_multiple)
##
## Call:
## lm(formula = revenue ~ spend + display, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -176.730 -35.020 8.661 56.440 129.231
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -50.8612 40.3336 -1.261 0.21850
## spend 5.5473 0.7415 7.482 6.07e-08 ***
## display 93.5856 33.1910 2.820 0.00908 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 77.33 on 26 degrees of freedom
## Multiple R-squared: 0.6829, Adjusted R-squared: 0.6586
## F-statistic: 28 on 2 and 26 DF, p-value: 3.271e-07
Hypothesis 2:
Null Hypothesis: Spend and whether a display campaign is running do not significantly impact revenue.
Alternative Hypothesis: Spend and whether a display campaign is running significantly impact revenue.
The spend variable has a coefficient of 5.55 with a p-value of 6.07e-08, meaning that for every additional dollar spent, revenue increases by approximately $5.55. Since the p-value is less than 0.05, this relationship is highly statistically significant. The display campaign variable has a coefficient of 93.59 with a p-value of 0.00908, indicating that running a display campaign increases revenue by approximately $93.59 on average.The model’s R squared is 0.6829, meaning that 68.29% of the variation in revenue is explained by spend and display campaigns. The Adjusted R squared is 0.6586, which tells us that the model fits the data well. Multiple R is 0.8265, which tells us that there is a strong relationship between revenue and the predictors. Since the p-values for both spend and display are below the significance threshold of 0.05, we reject the null hypothesis that spend and display campaigns do not significantly impact revenue
Since both spend and display campaigns significantly impact revenue, investing more in both areas is likely to increase revenue. Based on our findings, we can assume that display ads boosts revenue independently of spend.
# Load necessary libraries
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(ggplot2)
library(dplyr)
library(readr)
df <- read_csv("ab_testing.csv")
## Rows: 80 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Ads, Purchase
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df <- df %>% filter( Ads %in% c(0, 1, 2))
# Run the regression
summary(lm(Purchase ~ factor(Ads), data = df))
##
## Call:
## lm(formula = Purchase ~ factor(Ads), data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.095 -29.881 0.405 25.980 65.905
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.381 6.766 8.185 4.36e-11 ***
## factor(Ads)1 75.714 9.569 7.912 1.21e-10 ***
## factor(Ads)2 36.557 10.290 3.553 0.000791 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.01 on 55 degrees of freedom
## Multiple R-squared: 0.5324, Adjusted R-squared: 0.5154
## F-statistic: 31.31 on 2 and 55 DF, p-value: 8.338e-10
Null Hypothesis: Advertising exposure (Ads = 1 and Ads = 2) does not significantly affect product purchase compared to no advertising exposure.
Alternative Hypothesis: Advertising exposure (Ads = 1 and Ads = 2) significantly increases product purchases compared to no advertising exposure.
The results of the model tell us that both advertising versions significantly increase product purchases compared to the control group (0). Version 1 had a coefficient of 75.714, meaning an average increase of $75.71 in product purchases compared to the control group. Version 2 had a coefficient of 36.557, meaning an average increase of $36.55 in product purchases.The p-values for both Version 1 (1.21e-10) and Version 2 (0.000791) are well below the 0.05 significance threshold, meaning that these differences are statistically significant. The overall p-value for the model is 8.338e-10, which is also highly significant, supports the rejection of the null hypothesis and the acceptance of the alternative hypothesis that advertising exposure leads to significantly higher product purchases. The R-squared value of 0.5324 suggests that approximately 53.24% of the variability in product purchases can be explained by the advertising campaigns. This indicates a moderate fit of the model to the data. The Multiple R value of 0.7308 reflects a strong positive correlation between the advertising exposure and the product purchases, which supports the effectiveness of the ads.
Based on these findings, the company should prioritize Version 1 for future campaigns to maximize product sales, as it has a stronger positive effect. Additionally, the high statistical significance supports the decision to invest in advertising, because it significantly influences purchases.