This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Note: this analysis was performed using the open source software R and Rstudio.
library(readr)
ad_sales <- read_csv('https://raw.githubusercontent.com/utjimmyx/regression/master/advertising.csv')
## New names:
## Rows: 200 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," dbl
## (6): ...1, X1, TV, radio, newspaper, sales
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
plot(sales ~ TV, data = ad_sales)
plot(sales ~ radio, data = ad_sales)
library(ggplot2)
head(ad_sales)
## # A tibble: 6 × 6
## ...1 X1 TV radio newspaper sales
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 230. 37.8 69.2 22.1
## 2 2 2 44.5 39.3 45.1 10.4
## 3 3 3 17.2 45.9 69.3 9.3
## 4 4 4 152. 41.3 58.5 18.5
## 5 5 5 181. 10.8 58.4 12.9
## 6 6 6 8.7 48.9 75 7.2
ggplot(data = ad_sales, aes(x = radio))+
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
summary(ad_sales$radio)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 9.975 22.900 23.264 36.525 49.600
ggplot(ad_sales, aes(x=radio, y=sales)) +
geom_point() + geom_smooth() +
ggtitle("Sales and Radio Advertisements")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
This is the end of part 2 for my exploratory analysis.
Is there a relationship between x and y? If so, what does the relationship look like?
Yes, there is a relationship between X and Y. The relationship is positive and linear. In addition, it appears that the correlation is strong because the points are plotted very close together.
What is the meaning of a coefficient? Is there a relationship between TV advertising and Sales? If so, what does the relationship look like?
sales = m * TV ads + b -with m being the coefficient and b being the y-intercept
Based on this information, we can infer that the coefficient tells us that a change in TV advertisements directly effects the sales. For every additional ‘m’ unit added there will be additional sales.
Which marketing questions can we address with a simple regression analysis? Any limitations?
With this simple regression analysis we can address questions such as: “How does TV advertisement affect sales for the product?” or “How were the sales results in comparison to our company’s original predictions?”
Can you plot the relationship between radio advertising and Sales? If so, what does the relationship look like?
Based on the plot, it appears that the relationship between radio ads and sales is positive and linear, but the correlation is weak. The plot points are scattered all over the plot, rather than close together.
Refer to the readings in Chapter 3 (Exploring Data) of the Book, Practical Data Science with R, Second Edition, and use at least one of the ggplot2 methods to explore the variable radio ads.
In addition to the histogram we did in class using ggplot2, I was able to successfully report the 5 number summary for radio ads.
I was also able to create a scatter plot with a smooth curve so that it is easier to identify the trend between the variable radio ads and sales.