## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.3 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
my_data <- read_excel("advertising_randomized.xlsx")
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
head(my_data)
## # A tibble: 6 × 6
## X X1 TV radio newspaper sales
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 154 42 85.9 17.2 51.0 20.1
## 2 70 147 92.5 55.9 53.6 15.1
## 3 184 5 267. 8.48 29.6 6.09
## 4 241 141 258 29.1 3.19 8.91
## 5 80 46 207. 31.1 41.8 6.3
## 6 108 27 127. 37.2 52 14.5
glimpse(my_data)
## Rows: 300
## Columns: 6
## $ X <dbl> 154, 70, 184, 241, 80, 108, 167, 109, 20, 109, 29, 193, 155,…
## $ X1 <dbl> 42, 147, 5, 141, 46, 27, 17, 58, 121, 9, 101, 114, 121, 159,…
## $ TV <dbl> 85.94, 92.52, 267.18, 258.00, 207.44, 127.20, 340.50, 267.23…
## $ radio <dbl> 17.18, 55.90, 8.48, 29.14, 31.13, 37.21, 25.79, 25.69, 27.06…
## $ newspaper <dbl> 51.03, 53.62, 29.59, 3.19, 41.83, 52.00, 53.12, 19.78, 26.25…
## $ sales <dbl> 20.06, 15.11, 6.09, 8.91, 6.30, 14.46, 15.62, 10.71, 13.72, …
ggplot(my_data, aes(x = TV, y = sales)) +
geom_point() +
geom_smooth(method = "lm") +
labs(
title = "Sales vs. TV Advertising",
x = "TV Advertising Budget",
y = "Sales"
)
## `geom_smooth()` using formula = 'y ~ x'
TV and Sales: There is a slight negative relationship between Sales and TV advertising. As the TV advertising increases sales appears to decrease.
ggplot(my_data, aes(x = radio, y = sales)) +
geom_point() +
geom_smooth(method = "lm") +
labs(
title = "Radio vs. Sales",
x = "Radio Advertising Budget",
y = "Sales"
)
## `geom_smooth()` using formula = 'y ~ x'
Radio and sales: The scatter plot shows a positive relation ship between Radio advertising and sales. As Radio advertising increase, sales appears to decrease.
ggplot(my_data, aes(x = newspaper, y = sales)) +
geom_point() +
geom_smooth(method = "lm") +
labs(
title = "Newspaper vs. Sales",
x = "Newspaper Advertising Budget",
y = "Sales"
)
## `geom_smooth()` using formula = 'y ~ x'
News paper vs Sales: There is a very small positive relationship between Newspaper advertising and sales. As news paper advertising increase sales increase
ggplot(data = my_data,
mapping = aes(x = TV, y = sales, color = cut(newspaper, breaks = 3))) +
geom_point()
When comparing all three of the graphs. Sales and TV advertising has the weakest relationship with the steepest trend line. Radio has almost no correlation with each other the positive trends is very small. News paper advertising has the best relationship since more of the data point are more closely clustered together.
From all three of the graphs the strongest relationship to sales is Newspapers. There is a clearer upwards trend which indicates that companies are receiving more sales as the increase the budget to Newspapers. The Weakest graph is Radio advertising budgets because the trend line is almost completely flat indicating negligible gains in sales for this data point.
What you learned
How that segmenting code is very usefull for organization. For example the "```" and the {r} are the main components that I used for organizing my code in the assignment.
Which visualization was most informative
The scatterplots were the most usefull visualization. They were very easy to make and very detailed and were able to be customized very easily.
Any challenges you encountered
Formating my code.
changing the colors of the graph.