Financial Insights Through Multiple Regression

Author

Avery Holloman

Financial Insights Through Multiple Regression

In the world of financial analytics, I often rely on multiple linear regression to uncover the relationships between variables that influence key outcomes. The elegance of this approach lies in its ability to simultaneously evaluate the impact of several predictors on a dependent variable. By using the Advertising dataset, I delve into understanding how TV, radio, and newspaper budgets drive product sales. This analysis not only sharpens my statistical skills but also hones my ability to derive actionable insights for optimizing advertising strategies.

One of the first things I always notice when working with multiple predictors is the potential for interaction and multicollinearity. These dynamics remind me of real-world complexities—predictors don’t act in isolation. For instance, TV and radio advertising may amplify each other’s effects on sales, while newspaper ads might share an overlap with radio, resulting in an apparent but misleading association. I aim to disentangle these relationships using regression techniques, allowing me to draw clear and credible conclusions.

Through this essay and analysis, I demonstrate my approach to data exploration, model fitting, and interpretation. The clarity of insights gained from interpreting regression coefficients and correlation matrices allows me to communicate findings effectively. Each step I take is a testament to my belief that statistical rigor is the foundation of meaningful decision-making.

# Loading necessary libraries
# I always start by loading essential libraries to ensure I have the tools for my analysis.
if (!requireNamespace("tidyverse", quietly = TRUE)) install.packages("tidyverse")
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
if (!requireNamespace("car", quietly = TRUE)) install.packages("car")
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(car)

Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

# Step 1: Loading and preparing the Advertising dataset
# I use this dataset because it provides a realistic scenario of how multiple advertising channels
# interact to drive sales. It’s a perfect sandbox for exploring financial relationships.
data <- data.frame(
  TV = c(230, 44, 17, 151, 180),
  radio = c(37, 39, 45, 41, 10),
  newspaper = c(69, 25, 30, 35, 45),
  sales = c(22.1, 10.4, 9.3, 18.5, 12.9)
)

# Step 2: Visualizing relationships
# Visualization is my first step to understanding patterns. It helps me grasp the data's story before
# diving into statistical models.
pairs(data, main = "Pairwise Relationships Between Variables")

# Step 3: Correlation matrix
# I calculate correlations to uncover multicollinearity among predictors. It's critical for ensuring
# my regression model isn't skewed by redundant variables.
correlation_matrix <- cor(data)
print(correlation_matrix)

                  TV       radio  newspaper      sales
TV         1.0000000 -0.47657829  0.8680597 0.85876645
radio     -0.4765783  1.00000000 -0.2619023 0.03668154
newspaper  0.8680597 -0.26190231  1.0000000 0.80014415
sales      0.8587664  0.03668154  0.8001442 1.00000000

# Step 4: Fitting the multiple linear regression model
# I use multiple regression because I want to assess the individual contributions of TV, radio,
# and newspaper advertising to sales. This helps me pinpoint the most impactful channels.
model <- lm(sales ~ TV + radio + newspaper, data = data)
summary(model)


Call:
lm(formula = sales ~ TV + radio + newspaper, data = data)

Residuals:
       1        2        3        4        5 
 0.02980  0.12072 -0.08368 -0.04440 -0.02243 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) -1.215712   0.321777  -3.778   0.1647  
TV           0.077257   0.002052  37.653   0.0169 *
radio        0.238213   0.006864  34.707   0.0183 *
newspaper   -0.047785   0.009742  -4.905   0.1280  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1579 on 1 degrees of freedom
Multiple R-squared:  0.9998,    Adjusted R-squared:  0.9992 
F-statistic:  1605 on 3 and 1 DF,  p-value: 0.01835

# Step 5: Checking multicollinearity
# Multicollinearity can obscure true relationships. I calculate Variance Inflation Factor (VIF)
# to ensure the predictors are not overly correlated.
vif_results <- vif(model)
print(vif_results)

       TV     radio newspaper 
 5.562306  1.471918  4.615550

# Step 6: Interpreting coefficients
# Each coefficient tells me how a $1,000 increase in a particular advertising channel impacts sales,
# holding the other variables constant. It’s the nuance I need to guide financial decisions.

# Step 7: Residual analysis
# Residuals help me check the model's assumptions—linearity, homoscedasticity, and normality.
# These checks ensure that my conclusions are statistically valid.
par(mfrow = c(2, 2))  # I set the layout for diagnostic plots
plot(model)

Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

# Step 8: Predicting sales
# I make predictions to illustrate how my model works in action. It helps me see the practical
# implications of my analysis.
new_data <- data.frame(TV = 200, radio = 30, newspaper = 40)
predicted_sales <- predict(model, newdata = new_data)
print(paste("Predicted sales for new inputs: ", round(predicted_sales, 2)))

[1] "Predicted sales for new inputs:  19.47"

# Step 9: Summary and conclusions
# Summarizing the insights derived from the regression model is crucial for translating data
# into actionable strategies.
cat("Key Takeaways:\n")

Key Takeaways:

cat("- TV and radio significantly drive sales.\n")

- TV and radio significantly drive sales.

cat("- Newspaper advertising shows minimal impact, likely due to multicollinearity.\n")

- Newspaper advertising shows minimal impact, likely due to multicollinearity.

cat("- Targeting investments in TV and radio is optimal for maximizing ROI.\n")

- Targeting investments in TV and radio is optimal for maximizing ROI.