R Code to Analyze Advertising Data set

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)

library(readxl)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

my_data <- read_excel("advertising_randomized.xlsx")

install.packages("tidyverse")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)

head(my_data)

## # A tibble: 6 × 6
##       X    X1    TV radio newspaper sales
##   <dbl> <dbl> <dbl> <dbl>     <dbl> <dbl>
## 1   154    42  85.9 17.2      51.0  20.1 
## 2    70   147  92.5 55.9      53.6  15.1 
## 3   184     5 267.   8.48     29.6   6.09
## 4   241   141 258   29.1       3.19  8.91
## 5    80    46 207.  31.1      41.8   6.3 
## 6   108    27 127.  37.2      52    14.5

glimpse(my_data)

## Rows: 300
## Columns: 6
## $ X         <dbl> 154, 70, 184, 241, 80, 108, 167, 109, 20, 109, 29, 193, 155,…
## $ X1        <dbl> 42, 147, 5, 141, 46, 27, 17, 58, 121, 9, 101, 114, 121, 159,…
## $ TV        <dbl> 85.94, 92.52, 267.18, 258.00, 207.44, 127.20, 340.50, 267.23…
## $ radio     <dbl> 17.18, 55.90, 8.48, 29.14, 31.13, 37.21, 25.79, 25.69, 27.06…
## $ newspaper <dbl> 51.03, 53.62, 29.59, 3.19, 41.83, 52.00, 53.12, 19.78, 26.25…
## $ sales     <dbl> 20.06, 15.11, 6.09, 8.91, 6.30, 14.46, 15.62, 10.71, 13.72, …

Scatterplots

  ggplot(my_data, aes(x = TV, y = sales)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(
    title = "Sales vs. TV Advertising",
    x = "TV Advertising Budget",
    y = "Sales"
  )

## `geom_smooth()` using formula = 'y ~ x'

TV and Sales: There is a slight negative relationship between Sales and TV advertising. As the TV advertising increases sales appears to decrease.

  ggplot(my_data, aes(x = radio, y = sales)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(
    title = "Radio vs. Sales",
    x = "Radio Advertising Budget",
    y = "Sales"
  )

## `geom_smooth()` using formula = 'y ~ x'

Radio and sales: The scatter plot shows a positive relation ship between Radio advertising and sales. As Radio advertising increase, sales appears to decrease.

 ggplot(my_data, aes(x = newspaper, y = sales)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(
    title = "Newspaper vs. Sales",
    x = "Newspaper Advertising Budget",
    y = "Sales"
  )

## `geom_smooth()` using formula = 'y ~ x'

News paper vs Sales: There is a very small positive relationship between Newspaper advertising and sales. As news paper advertising increase sales increase

Cutting Data and Breaking

ggplot(data = my_data,
  mapping = aes(x = TV, y = sales, color = cut(newspaper, breaks = 3))) +
  geom_point()

Advertising Channel Comparison:

When comparing all three of the graphs. Sales and TV advertising has the weakest relationship with the steepest trend line. Radio has almost no correlation with each other the positive trends is very small. News paper advertising has the best relationship since more of the data point are more closely clustered together.

Strongest Relationship:

From all three of the graphs the strongest relationship to sales is Newspapers. There is a clearer upwards trend which indicates that companies are receiving more sales as the increase the budget to Newspapers. The Weakest graph is Radio advertising budgets because the trend line is almost completely flat indicating negligible gains in sales for this data point.

Final Reflection

What you learned

How that segmenting code is very usefull for organization. For example the "```" and the {r} are the main components that I used for organizing my code in the assignment.

Which visualization was most informative

The scatterplots were the most usefull visualization. They were very easy to make and very detailed and were able to be customized very easily.

Any challenges you encountered

Formating my code.
changing the colors of the graph.

Advertising

Nicholas Castillo

2026-06-09