library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(boot)
df <- read.csv('Auto Sales data.csv')
Last week, I looked at the correlation using the QUANTITYORDERED, MSRP, and PRICE_EACH columns, alongside a column I created: PRICEDIFF - the difference between the values in PRICE_EACH and MSRP… so I want to look at a different combination:
MSRP and SALES (QUANTITYORDERED * PRICEEACH).
df |>
ggplot() +
geom_point(
aes(x=MSRP, y=SALES)
) +
labs(title = paste("Correlation:",
round(cor(df$MSRP,
df$SALES), 2)))
It’s a fairly strong positive relationship for how scattered the plot is; the higher the MSRP, the more SALES are made from the product. The point with the highest sales value looks to be an outlier.
av_SALES <- mean(df$SALES)
sd_SALES <- sd(df$SALES)
## Making a sample
x <- sample(df$SALES, 200)
f_sales <- \(x) dnorm(x, mean = 3553, sd = 101)
## Creating the confidence interval for SALES
P <- 0.95
z_score <- qnorm(p=(1 - P)/2, lower.tail=FALSE)
ggplot() +
geom_function(xlim = c(3355, 3751),
fun = f_sales) +
geom_segment(mapping = aes(x = av_SALES - 101,
y = f_sales(av_SALES),
xend = av_SALES + 101,
yend = f_sales(av_SALES),
linetype = "proposed interval"),
color = "gray") +
geom_point(mapping = aes(x = av_SALES,
y = f_sales(av_SALES),
color = "our sample"), size = 2) +
labs(title = "*Possible* Sampling Distribution for df_SALES",
x = "Sample Mean",
y = "Probability Density",
color = "",
linetype = "") +
theme_minimal()
With 100 random samples from of the automobile data set, about 95 of their average SALES values fall between 3355 and 3751 monetary units…