library(readr)
ad_sales <- read_csv('https://raw.githubusercontent.com/utjimmyx/regression/master/advertising.csv')
## New names:
## Rows: 200 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," dbl
## (6): ...1, X1, TV, radio, newspaper, sales
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
plot(sales ~ TV, data = ad_sales)
plot(sales ~ radio, data = ad_sales)
This is the end of part 1 for my exploratory analysis.
library(ggplot2)
head(ad_sales)
## # A tibble: 6 × 6
## ...1 X1 TV radio newspaper sales
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 230. 37.8 69.2 22.1
## 2 2 2 44.5 39.3 45.1 10.4
## 3 3 3 17.2 45.9 69.3 9.3
## 4 4 4 152. 41.3 58.5 18.5
## 5 5 5 181. 10.8 58.4 12.9
## 6 6 6 8.7 48.9 75 7.2
ggplot(data = ad_sales, aes(x = radio)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
x <- runif(100)
y <- x^2 + 0.2*x
ggplot(data.frame(x=x,y=y), aes(x=x,y=y)) + geom_line()
This is the end of Part 2 for my exploratory analysis.
There is a relationship between X and Y and that can be shown by using a scatter plot, this shows that when TV ad spending increases, then sales also tend to increase.
ggplot(data = ad_sales, aes(x = TV, y = sales)) +
geom_point(color = "blue", alpha = 0.6) +
geom_smooth(method = "lm", color = "red", se = TRUE) +
labs(title = "Scatter Plot of TV Advertising vs. Sales",
x = "TV Advertising Spend",
y = "Sales") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
A coefficient shows how much sales change for every additional unit of TV ad spent. The box plot shows that higher TV advertising levels generally lead to higher sales
ggplot(data = ad_sales, aes(x = cut(TV, breaks = 5), y = sales)) +
geom_boxplot(fill = "lightblue", color = "black", alpha = 0.7) +
labs(title = "Boxplot of Sales by TV Advertising Levels",
x = "TV Advertising Spend (Binned)",
y = "Sales") +
theme_minimal()
Simple regression helps show if advertising affects sales, which type works best, and how much sales go up when spending more. Simple regressions can’t actually prove if the ad caused the sale though.
Yes, we can plot the relationship between radio advertising and sales using a scatter plot with a regression line. The plot shows a positive correlation, meaning higher radio ad spending is generally associated with higher sales.
ggplot(data = ad_sales, aes(x = radio, y = sales)) +
geom_point(color = "blue", alpha = 0.6) +
geom_smooth(method = "lm", color = "red", se = TRUE) +
labs(title = "Scatter Plot of Radio Advertising vs. Sales",
x = "Radio Advertising Spend",
y = "Sales") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
To explore the radio advertising variable, I used a histogram to analyze its distribution, the histogram shows how frequently different spending levels occur.
ggplot(data = ad_sales, aes(x = radio)) +
geom_histogram(binwidth = 5, fill = "skyblue", color = "black", alpha = 0.7) +
labs(title = "Histogram of Radio Advertising Spend",
x = "Radio Advertising Spend",
y = "Frequency")