This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Note: this analysis was performed using the open source software R and Rstudio.
library(readr)
ad_sales <- read_csv('https://raw.githubusercontent.com/utjimmyx/regression/master/advertising.csv')
## New names:
## Rows: 200 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," dbl
## (6): ...1, X1, TV, radio, newspaper, sales
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
plot(sales ~ TV, data = ad_sales)
plot(sales ~ radio, data = ad_sales)
This is the end of part 1 for my exploratory analysis.
library(ggplot2)
head(ad_sales)
## # A tibble: 6 × 6
## ...1 X1 TV radio newspaper sales
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 230. 37.8 69.2 22.1
## 2 2 2 44.5 39.3 45.1 10.4
## 3 3 3 17.2 45.9 69.3 9.3
## 4 4 4 152. 41.3 58.5 18.5
## 5 5 5 181. 10.8 58.4 12.9
## 6 6 6 8.7 48.9 75 7.2
ggplot(data = ad_sales, aes(x = TV)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
they are both positive, linear, and strongly correlated it looks like
A coefficient is a numerical or constant factor that multiplies a variable in an algebraic equation. the relationship is positive and linear
Regression analysis can help address marketing questions like how advertising spend affects sales, the relationship between price and sales volume, and the impact of promotions on customer behavior. However, it assumes linear relationships, may only show correlation (not causation), and can oversimplify complex marketing dynamics by ignoring multiple influencing factors. Additionally, it’s sensitive to outliers, data quality, and multicollinearity, and may not fully capture non-linear or changing market conditions.
linear as well and positive, not as tightly correlated as TV
In Chapter 3 of Practical Data Science with R, Second
Edition, the ggplot2 package is used to visualize and explore
relationships in data. For the variable “radio ads,” you could use
ggplot()
to create a scatter plot or a bar chart, depending
on whether you’re exploring the distribution or relationship with
another variable (like sales). For example,
ggplot(data, aes(x=radio_ads, y=sales)) + geom_point()
would show the correlation between radio ad spend and sales, helping to
visually assess any trends or patterns. # Question 5