This dataset is from Github, and this dataset contains advertising budgets in thousands of dollars across TV, radio, and newspaper channels, along with the corresponding sales figures. I am using the first 100 observations from this source. And the objective of this analysis is to analyze which advertising type can create the most sales on average.
# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Read the data
data <- read.csv("Advertising.csv", stringsAsFactors = FALSE)
# View the first few rows
head(data)
## X TV Radio Newspaper Sales
## 1 1 230.1 37.8 69.2 22.1
## 2 2 44.5 39.3 45.1 10.4
## 3 3 17.2 45.9 69.3 9.3
## 4 4 151.5 41.3 58.5 18.5
## 5 5 180.8 10.8 58.4 12.9
## 6 6 8.7 48.9 75.0 7.2
# Keep only the first 100 observations
data100 <- data[1:100, ]
# View summary
summary(data100)
## X TV Radio Newspaper
## Min. : 1.00 Min. : 5.40 Min. : 1.40 Min. : 0.30
## 1st Qu.: 25.75 1st Qu.: 73.67 1st Qu.:13.65 1st Qu.: 16.45
## Median : 50.50 Median :141.10 Median :26.20 Median : 31.40
## Mean : 50.50 Mean :147.16 Mean :24.81 Mean : 32.84
## 3rd Qu.: 75.25 3rd Qu.:216.50 3rd Qu.:36.88 3rd Qu.: 45.92
## Max. :100.00 Max. :293.60 Max. :49.60 Max. :114.00
## Sales
## Min. : 4.80
## 1st Qu.:10.60
## Median :13.30
## Mean :14.46
## 3rd Qu.:18.07
## Max. :25.40
# Simple scatterplot to explore the relationship
ggplot(data100, aes(x = TV, y = Sales)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Relationship between TV Advertising Budget and Sales",
x = "TV Advertising Budget (in thousands)",
y = "Sales (in thousands)")
## `geom_smooth()` using formula = 'y ~ x'
# Run linear regression
model <- lm(Sales ~ TV, data = data100)
summary(model)
##
## Call:
## lm(formula = Sales ~ TV, data = data100)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7061 -1.7965 0.0179 1.7233 6.6984
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.241734 0.625675 11.57 <2e-16 ***
## TV 0.049069 0.003698 13.27 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.087 on 98 degrees of freedom
## Multiple R-squared: 0.6424, Adjusted R-squared: 0.6388
## F-statistic: 176.1 on 1 and 98 DF, p-value: < 2.2e-16
After analyzing the data, it’s clear that there’s a strong link between how much a company spends on TV ads and the number of products it sells. As TV advertising budgets go up, sales generally increase too. The trend was easy to spot in the scatterplot, and the regression results backed it up—TV spending has a significant impact on sales. That means TV advertising still plays a powerful role in influencing consumer behavior. In a world of social media domination, TV still created meaningful sales for businesses.
Kaggle. (n.d.). Online Retail Data Set. Retrieved from https://www.kaggle.com/datasets/mashlyn/online-retail-data-set ChatGPT was used for code suggestions. ```