Advertising Effectiveness at BBQ2GO: A Multiple Linear Regression Analysis
Abstract
As the owner of BBQ2GO, I want to evaluate the effectiveness of advertising expenditures across three channels: social media, direct mail, and newspapers. Using Multiple Linear Regression (MLR), I aim to quantify the impact of each advertising medium on sales, identify areas for budget optimization, and address potential issues like multicollinearity. By leveraging R for data analysis, I develop a reproducible framework that integrates data cleaning, model fitting, and advanced metrics. This analysis provides actionable insights into advertising strategies while highlighting the utility of MLR in business decision-making.
Introduction
Advertising is a crucial driver of revenue in the competitive food industry. To understand how advertising expenditures influence sales, I employ MLR, a robust statistical method that evaluates the simultaneous impact of multiple predictors on a response variable. Unlike simple linear regression, MLR allows me to isolate the effect of each advertising channel while holding others constant. By using RStudio and Quarto, I ensure that my analysis is transparent, reproducible, and comprehensive.
Methodology and Code
Data Preparation
The first step in any analysis is ensuring data quality. I simulated a dataset to mimic BBQ2GO’s advertising expenditures and sales. This approach allows me to control the complexity of the data while reflecting realistic patterns.
# Load the required packageif (!requireNamespace("tibble", quietly =TRUE)) {install.packages("tibble")}library(tibble)# Step 1: Simulate the dataset.seed(42) # I set the seed to ensure reproducibility of results# I create variables to simulate advertising dataraw_data <-tibble(sales =rnorm(100, mean =500, sd =100), # Sales (dependent variable)social_media =rnorm(100, mean =200, sd =50), # Social media advertisingdirect_mail =rnorm(100, mean =150, sd =40), # Direct mail advertisingnewspaper =rnorm(100, mean =80, sd =20) # Newspaper advertising)# View the simulated datahead(raw_data)
I observed no missing values, which was a relief because missing data often complicates analyses. Next, I focused on identifying and handling outliers.
Data Cleaning
Outliers can bias the model coefficients, leading to misleading interpretations. I used the Interquartile Range (IQR) rule to detect and mitigate extreme values.
# Load the required packageif (!requireNamespace("dplyr", quietly =TRUE)) {install.packages("dplyr")}library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
# Step 3: Handle outliers using the IQR ruleoutlier_limits <-function(x) { q <-quantile(x, probs =c(0.25, 0.75)) # I calculate the 25th and 75th percentiles iqr <-IQR(x) # I calculate the interquartile range (IQR) lower <- q[1] -1.5* iqr # I compute the lower bound for outliers upper <- q[2] +1.5* iqr # I compute the upper bound for outliersc(lower, upper) # I return the lower and upper limits as a vector}# I apply the IQR rule to all predictors and winsorize outlierscleaned_data <- raw_data %>%mutate(across(c(social_media, direct_mail, newspaper), ~ifelse(. >quantile(., 0.75) +1.5*IQR(.),quantile(., 0.75) +1.5*IQR(.),ifelse(. <quantile(., 0.25) -1.5*IQR(.),quantile(., 0.25) -1.5*IQR(.), .))))# View the cleaned datahead(cleaned_data)