library(readxl)
library(ggplot2)
my_data <- read_excel("advertising_randomized.xlsx")
head(my_data)
## # A tibble: 6 × 6
## X X1 TV radio newspaper sales
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 23 86 99 35 41.5 19.1
## 2 86 122 178. 35.4 36.4 24.4
## 3 143 159 48.4 36.6 19.7 12.1
## 4 161 149 119. 11.2 4.73 5.64
## 5 133 56 149. 16.6 9.28 22.7
## 6 145 55 84.5 23.6 5.31 14.5
str(my_data)
## tibble [250 × 6] (S3: tbl_df/tbl/data.frame)
## $ X : num [1:250] 23 86 143 161 133 145 109 86 159 158 ...
## $ X1 : num [1:250] 86 122 159 149 56 55 139 50 55 120 ...
## $ TV : num [1:250] 99 178.2 48.4 119.2 149.1 ...
## $ radio : num [1:250] 35 35.4 36.5 11.2 16.6 ...
## $ newspaper: num [1:250] 41.47 36.35 19.74 4.73 9.28 ...
## $ sales : num [1:250] 19.13 24.41 12.12 5.64 22.7 ...
summary(my_data)
## X X1 TV radio
## Min. : 1.0 Min. : 1.0 Min. : 0.17 Min. : 0.44
## 1st Qu.: 56.0 1st Qu.: 55.0 1st Qu.: 86.46 1st Qu.:13.80
## Median : 98.5 Median :105.0 Median :144.46 Median :22.95
## Mean :100.2 Mean :102.2 Mean :156.99 Mean :25.52
## 3rd Qu.:139.0 3rd Qu.:146.5 3rd Qu.:222.54 3rd Qu.:36.52
## Max. :247.0 Max. :258.0 Max. :433.94 Max. :63.02
## newspaper sales
## Min. : 0.12 Min. : 0.670
## 1st Qu.: 17.51 1st Qu.: 9.807
## Median : 33.23 Median :14.355
## Mean : 35.17 Mean :13.923
## 3rd Qu.: 48.76 3rd Qu.:17.258
## Max. :101.16 Max. :27.930
ggplot(my_data, aes(x = TV, y = sales)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "TV vs Sales", x = "TV Budget", y = "Sales")
## `geom_smooth()` using formula = 'y ~ x'
TV appears to have a positive relationship with sales.
ggplot(my_data, aes(x = radio, y = sales)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Radio vs Sales", x = "Radio Budget", y = "Sales")
## `geom_smooth()` using formula = 'y ~ x'
Radio appears to have a positive relationship with sales.
ggplot(my_data, aes(x = newspaper, y = sales)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Newspaper vs Sales", x = "Newspaper Budget", y = "Sales")
## `geom_smooth()` using formula = 'y ~ x'
Newspaper appears to have the weakest relationship with sales.
channel_data <- data.frame(
budget = c(my_data$TV, my_data$radio, my_data$newspaper),
sales = c(my_data$sales, my_data$sales, my_data$sales),
channel = c(
rep("TV", nrow(my_data)),
rep("Radio", nrow(my_data)),
rep("Newspaper", nrow(my_data))
)
)
ggplot(channel_data, aes(x = budget, y = sales)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~ channel, scales = "free_x") +
labs(
title = "Advertising Channels and Sales",
x = "Advertising Budget",
y = "Sales"
)
## `geom_smooth()` using formula = 'y ~ x'
Based on the faceted plot, newspaper appears to have the clearest positive relationship with sales. Radio also shows a slight positive relationship, but the points quite spread out. TV does not show a clear upward pattern in this dataset. I would say that the trends in this dataset are not clear enough to make an informed business decision, but if I was forced to, I would spend more on newspaper advertising.
I learned a lot about writing and troubleshooting R markdown and R script. Formatting proved to be challenging. It is interesting that the synthetic dataset I used had such flat regression lines.