title: “Multiple Linear Regression” author: “<136814, Jason
Mwangi>” date: “2025-07-10” output: word_document: toc: true
toc_depth: 4 number_sections: true fig_width: 5 keep_md: true
html_notebook: toc: true toc_depth: 4 number_sections: true fig_width: 5
self_contained: false html_document: toc: true toc_depth: 4
number_sections: true fig_width: 5 fig_height: 5 self_contained: false
keep_md: true pdf_document:
toc: true toc_depth: 4 number_sections: true fig_width: 5 fig_height: 5
fig_crop: false keep_tex: true latex_engine: xelatex —
if (!"pacman" %in% installed.packages()[, "Package"]) {
install.packages("pacman", dependencies = TRUE)
library("pacman", character.only = TRUE)}
pacman::p_load("here")
knitr::opts_knit$set(root.dir = here::here())
if (!"pacman" %in% installed.packages()[, "Package"]) {
install.packages("pacman", dependencies = TRUE)
library("pacman", character.only = TRUE)}
pacman::p_load("here")
knitr::opts_knit$set(root.dir = here::here())
pacman::p_load("readr")
advertising_data <- read_csv("/advertising.csv")
Rows: 10 Columns: 4── Column specification ───────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): YouTube, TikTok, Facebook, Sales
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(advertising_data)
dim(advertising_data)
[1] 10 4
sapply(advertising_data, class)
YouTube TikTok Facebook Sales
"numeric" "numeric" "numeric" "numeric"
str(advertising_data)
spc_tbl_ [10 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ YouTube : num [1:10] 1200 1500 1300 1600 1100 1700 1400 1800 1250 1550
$ TikTok : num [1:10] 800 900 850 950 780 1000 880 1020 820 970
$ Facebook: num [1:10] 1000 1100 1050 1150 980 1200 1080 1220 1010 1130
$ Sales : num [1:10] 95.3 101.2 99.5 106.8 93 ...
- attr(*, "spec")=
.. cols(
.. YouTube = col_double(),
.. TikTok = col_double(),
.. Facebook = col_double(),
.. Sales = col_double()
.. )
- attr(*, "problems")=<externalptr>
summary(advertising_data)
YouTube TikTok Facebook Sales
Min. :1100 Min. : 780.0 Min. : 980 Min. : 93.02
1st Qu.:1262 1st Qu.: 827.5 1st Qu.:1020 1st Qu.: 97.27
Median :1450 Median : 890.0 Median :1090 Median :101.95
Mean :1440 Mean : 897.0 Mean :1092 Mean :101.93
3rd Qu.:1588 3rd Qu.: 965.0 3rd Qu.:1145 3rd Qu.:106.17
Max. :1800 Max. :1020.0 Max. :1220 Max. :111.82
sapply(advertising_data[,], var)
YouTube TikTok Facebook Sales
52111.11111 7267.77778 6951.11111 36.13165
sapply(advertising_data[,],sd)
YouTube TikTok Facebook Sales
228.27858 85.25126 83.37332 6.01096
pacman::p_load("e1071")
sapply(advertising_data[,], kurtosis, type =2)
YouTube TikTok Facebook Sales
-1.0800892 -1.4726766 -1.1989242 -0.8697324
sapply(advertising_data[,], skewness, type=2)
YouTube TikTok Facebook Sales
0.08266188 0.09234643 0.19089944 0.10505432
cov(advertising_data, method = "spearman")
YouTube TikTok Facebook Sales
YouTube 9.166667 9.055556 9.166667 9.055556
TikTok 9.055556 9.166667 9.055556 8.944444
Facebook 9.166667 9.055556 9.166667 9.055556
Sales 9.055556 8.944444 9.055556 9.166667
cor.test(advertising_data$Sales, advertising_data$YouTube, method = "spearman")
Spearman's rank correlation rho
data: advertising_data$Sales and advertising_data$YouTube
S = 2, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.9878788
cor.test(advertising_data$Sales, advertising_data$TikTok, method = "spearman")
Spearman's rank correlation rho
data: advertising_data$Sales and advertising_data$TikTok
S = 4, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.9757576
cor.test(advertising_data$Sales, advertising_data$Facebook, method = "spearman")
Spearman's rank correlation rho
data: advertising_data$Sales and advertising_data$Facebook
S = 2, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.9878788
cor(advertising_data, method = "spearman")
YouTube TikTok Facebook Sales
YouTube 1.0000000 0.9878788 1.0000000 0.9878788
TikTok 0.9878788 1.0000000 0.9878788 0.9757576
Facebook 1.0000000 0.9878788 1.0000000 0.9878788
Sales 0.9878788 0.9757576 0.9878788 1.0000000
par(mfrow = c(1,2))
for (i in 1:4)
{if(is.numeric(advertising_data[[i]])){hist(advertising_data[[i]],main= names(advertising_data)[i],xlab=names(advertising_data)[i])}else{message(paste("column", names(advertising_data)[i],"is not numeric and will be skipped."))}}
NA
par(mfrow = c(1,2))
for( i in 1:4)
{ if(is.numeric(advertising_data[[i]])){boxplot(advertising_data[[i]], main=names(advertising_data)[i])}else{message(paste("column",names(advertising_data)[i],"is not numeric and will be skipped."))}}
NA
pacman::p_load("Amelia")
missmap(advertising_data, col= c("red","grey"), legend= TRUE)
pacman::p_load("ggcorrplot")
ggcorrplot(cor(advertising_data[,]))
pacman::p_load("corrplot")
pairs(advertising_data$Sales~., data=advertising_data, col=advertising_data$Sales)
pacman::p_load("ggplot2")
ggplot(advertising_data, aes(x=YouTube, y=Sales)) + geom_point() + geom_smooth(method = lm) + labs(title="Relationship between Sales Revenue and\nExpenditure on YouTube Marketing", x= "Expenditure", y="Sales")
pacman::p_load("dplyr")
advertising_data_composite <- advertising_data %>%
mutate(Total_Expenditure = YouTube + TikTok + Facebook)
ggplot(advertising_data_composite, aes(x=Total_Expenditure, y=Sales)) + geom_point() + geom_smooth(method = lm) + labs(title = "Relationship between Sales Revenue and \nTotal Marketing Expenditure", y="Sales" )
mlr_test <- lm(Sales ~ YouTube + TikTok + Facebook, data = advertising_data)
summary(mlr_test)
Call:
lm(formula = Sales ~ YouTube + TikTok + Facebook, data = advertising_data)
Residuals:
Min 1Q Median 3Q Max
-1.5436 -0.6159 0.1056 0.6598 1.5978
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.019096 24.432018 1.311 0.238
YouTube 0.006088 0.015876 0.384 0.715
TikTok -0.007662 0.032263 -0.237 0.820
Facebook 0.062289 0.048787 1.277 0.249
Residual standard error: 1.2 on 6 degrees of freedom
Multiple R-squared: 0.9734, Adjusted R-squared: 0.9602
F-statistic: 73.32 on 3 and 6 DF, p-value: 4.055e-05
confint(mlr_test, level = 0.95)
2.5 % 97.5 %
(Intercept) -27.76389745 91.80208927
YouTube -0.03275855 0.04493539
TikTok -0.08660653 0.07128263
Facebook -0.05708823 0.18166580
plot(mlr_test, which = 1)
pacman::p_load("lmtest")
dwtest(mlr_test)
Durbin-Watson test
data: mlr_test
DW = 2.1498, p-value = 0.5316
alternative hypothesis: true autocorrelation is greater than 0
plot(mlr_test, which=2)
plot(mlr_test, which = 3)
pacman::p_load("gvlma")
gvlma_results <- gvlma(mlr_test)
summary(gvlma_results)
Call:
lm(formula = Sales ~ YouTube + TikTok + Facebook, data = advertising_data)
Residuals:
Min 1Q Median 3Q Max
-1.5436 -0.6159 0.1056 0.6598 1.5978
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.019096 24.432018 1.311 0.238
YouTube 0.006088 0.015876 0.384 0.715
TikTok -0.007662 0.032263 -0.237 0.820
Facebook 0.062289 0.048787 1.277 0.249
Residual standard error: 1.2 on 6 degrees of freedom
Multiple R-squared: 0.9734, Adjusted R-squared: 0.9602
F-statistic: 73.32 on 3 and 6 DF, p-value: 4.055e-05
ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance = 0.05
Call:
gvlma(x = mlr_test)
pacman::p_load("car")
vif(mlr_test)
YouTube TikTok Facebook
82.13782 47.30912 103.46518
A simultaneous multiple linear regression analysis was conducted on data from 10 observations (N=10) to examine whether advertising expenditures on YouTube, TikTok, and Facebook collectively predict Sales. The results indicated that neither expenses on YouTube (β = 0.01, 95% CI [-.03, .04], SE = 0.02, t(6) = 0.38, p = .715), nor TikTok (β = -0.01, 95% CI [-.09, .07], SE = 0.03, t(6) = -0.24, p = .820) nor Facebook (β = 0.06, 95% CI [-.06, .18], SE = 0.05, t(6) = 1.28, p = .249) individually significantly predicted Sales (all p > 0.05). The model explained 97.34% of the variance in Sales (Multiple R2 = .97, Adjusted R2 = .96, F(3, 6) = 73.32, p < .001). The intercept was 32.02, 95% CI [-27.76, 91.80], SE = 24.43, t(6) = 1.31, p = .238. The residual standard error was 1.2, indicating a robust model. The full results are presented in the table below.
Table: Regression Coefficients Predicting Sales from Multiple Advertising Channels Predictor β 95% CI SE t(6) p (Intercept) 32.02 [-27.76, 91.80] 24.43 1.31 .238 YouTube 0.01 [-.03, .04] 0.02 0.38 .715 TikTok -0.01 [-.09, .07] 0.03 -0.24 .820 Facebook 0.06 [-.06, .18] 0.05 1.28 .249 Note. N = 10; SE = standard error; CI = confidence interval.
Even though the results indicated a robust model whereby advertisement expenditures collectively predict sales, individual parameter estimates did not reach statistical significance when controlling for the other parameters. This suggests that the advertising channels collectively explain variation in Sales but do not uniquely predict Sales in this small sample. This may reflect multicollinearity among Page 46 of 51
the different advertising platforms or limited statistical power due to the small sample size (N = 10). Future research should investigate these predictors with a larger sample and assess potential collinearity
Business Analysis
Although aggregate digital advertising spend across YouTube, TikTok, and Facebook is highly predictive of Sales (accounting for nearly all observed variation), the absence of statistically significant individual coefficients indicates that no single channel can be reliably credited with driving incremental Sales in this dataset.
This finding suggests that, within the current investment levels and the constraints of a small sample, the three platforms function as a cohesive portfolio rather than as independent drivers of sales revenue. Recommendation for management:
1. Continue to view YouTube, TikTok, and Facebook as complementary elements of a unified digital marketing strategy focusing on the total expenditure rather than favouring a single platform.
Limitations
1. Small Sample Size (N = 10): Using a limited number of observations restricts statistical power and inflates standard errors, raising the risk of a Type II error (failing to detect true channel effects).
2. Potential Multicollinearity: High intercorrelations among YouTube, TikTok, and Facebook expenditures may obscure unique contributions.
3. Restricted Expenditure Range: Limited range of advertisement expenditures impairs the ability to detect linear effects.
4. Methodology: Lack of experimental variation in advertisement expenditure limits causal attribution to any single platform.