Load Packages

library(readxl)
library(ggplot2)

Load Data

my_data <- read_excel("advertising_randomized.xlsx")

Explore Data

head(my_data)
## # A tibble: 6 × 6
##       X    X1    TV radio newspaper sales
##   <dbl> <dbl> <dbl> <dbl>     <dbl> <dbl>
## 1    23    86  99    35       41.5  19.1 
## 2    86   122 178.   35.4     36.4  24.4 
## 3   143   159  48.4  36.6     19.7  12.1 
## 4   161   149 119.   11.2      4.73  5.64
## 5   133    56 149.   16.6      9.28 22.7 
## 6   145    55  84.5  23.6      5.31 14.5
str(my_data)
## tibble [250 × 6] (S3: tbl_df/tbl/data.frame)
##  $ X        : num [1:250] 23 86 143 161 133 145 109 86 159 158 ...
##  $ X1       : num [1:250] 86 122 159 149 56 55 139 50 55 120 ...
##  $ TV       : num [1:250] 99 178.2 48.4 119.2 149.1 ...
##  $ radio    : num [1:250] 35 35.4 36.5 11.2 16.6 ...
##  $ newspaper: num [1:250] 41.47 36.35 19.74 4.73 9.28 ...
##  $ sales    : num [1:250] 19.13 24.41 12.12 5.64 22.7 ...
summary(my_data)
##        X               X1              TV             radio      
##  Min.   :  1.0   Min.   :  1.0   Min.   :  0.17   Min.   : 0.44  
##  1st Qu.: 56.0   1st Qu.: 55.0   1st Qu.: 86.46   1st Qu.:13.80  
##  Median : 98.5   Median :105.0   Median :144.46   Median :22.95  
##  Mean   :100.2   Mean   :102.2   Mean   :156.99   Mean   :25.52  
##  3rd Qu.:139.0   3rd Qu.:146.5   3rd Qu.:222.54   3rd Qu.:36.52  
##  Max.   :247.0   Max.   :258.0   Max.   :433.94   Max.   :63.02  
##    newspaper          sales       
##  Min.   :  0.12   Min.   : 0.670  
##  1st Qu.: 17.51   1st Qu.: 9.807  
##  Median : 33.23   Median :14.355  
##  Mean   : 35.17   Mean   :13.923  
##  3rd Qu.: 48.76   3rd Qu.:17.258  
##  Max.   :101.16   Max.   :27.930

TV vs Sales

ggplot(my_data, aes(x = TV, y = sales)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "TV vs Sales", x = "TV Budget", y = "Sales")
## `geom_smooth()` using formula = 'y ~ x'

TV appears to have a positive relationship with sales.

Radio vs Sales

ggplot(my_data, aes(x = radio, y = sales)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Radio vs Sales", x = "Radio Budget", y = "Sales")
## `geom_smooth()` using formula = 'y ~ x'

Radio appears to have a positive relationship with sales.

Newspaper vs Sales

ggplot(my_data, aes(x = newspaper, y = sales)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Newspaper vs Sales", x = "Newspaper Budget", y = "Sales")
## `geom_smooth()` using formula = 'y ~ x'

Newspaper appears to have the weakest relationship with sales.

Compare Advertising Channels

channel_data <- data.frame(
  budget = c(my_data$TV, my_data$radio, my_data$newspaper),
  sales = c(my_data$sales, my_data$sales, my_data$sales),
  channel = c(
    rep("TV", nrow(my_data)),
    rep("Radio", nrow(my_data)),
    rep("Newspaper", nrow(my_data))
  )
)

ggplot(channel_data, aes(x = budget, y = sales)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap(~ channel, scales = "free_x") +
  labs(
    title = "Advertising Channels and Sales",
    x = "Advertising Budget",
    y = "Sales"
  )
## `geom_smooth()` using formula = 'y ~ x'

Based on the faceted plot, newspaper appears to have the clearest positive relationship with sales. Radio also shows a slight positive relationship, but the points quite spread out. TV does not show a clear upward pattern in this dataset. I would say that the trends in this dataset are not clear enough to make an informed business decision, but if I was forced to, I would spend more on newspaper advertising.

Reflection

I learned a lot about writing and troubleshooting R markdown and R script. Formatting proved to be challenging. It is interesting that the synthetic dataset I used had such flat regression lines.