Students will learn how to …
Last week was hot!!! What refreshing treat do people enjoy on a hot day? ICE CREAM! As we come to the end of the summer and the cooler weather of the fall, let’s take a look at this fun dataset about ice cream sales and weather.
These data are provided on github by Guilherme D. F. Silva (guilhermedom).
###Import the data
icecream<-read.csv("https://raw.githubusercontent.com/guilhermedom/perceptron-regression-ice-cream-per-temp/refs/heads/main/data/raw/icecream_sales.csv",
header=TRUE)
###Look at the data
str(icecream)
## 'data.frame': 500 obs. of 2 variables:
## $ Temperature: num 24.6 26 27.8 20.6 11.5 ...
## $ Revenue : num 535 625 661 488 316 ...
#View(icecream)
###Get access to the data set and create variables for easy access later
temp<-icecream$Temperature
rev<-icecream$Revenue
### IN BASE
###Make a histogram of the response variable
hist(rev)
### IN GGPLOT
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ggplot(icecream, aes(x=rev))+
geom_histogram(bins=10)
###Report basic statistics of the response variable
summary(rev)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 405.6 529.4 521.6 642.3 1000.0
mean(rev)
## [1] 521.5708
sd(rev)
## [1] 175.4048
###Run a simple linear regression model on shear against age and store the results
results<-lm(rev~temp)
summary(results)
##
## Call:
## lm(formula = rev ~ temp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -73.303 -15.596 -0.167 16.811 91.294
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.8313 3.2718 13.7 <2e-16 ***
## temp 21.4436 0.1383 155.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.01 on 498 degrees of freedom
## Multiple R-squared: 0.9797, Adjusted R-squared: 0.9797
## F-statistic: 2.404e+04 on 1 and 498 DF, p-value: < 2.2e-16
####Create a scatterplot
## BASE
plot(temp, rev)
####Add the least squares regression line
abline(coefficients(results), col="red", lwd=2)
## GGPLOT
ggplot(icecream, aes(x=temp, y=rev))+
geom_point()+
geom_smooth(method="lm", color="red")
## `geom_smooth()` using formula = 'y ~ x'
Use the mathematical equations derived in class to confirm the least squares estimates from lm.
Slope:
\[\hat{\beta}_1 = \frac{n\sum_{i=1}^n x_i Y_i - \sum_{i=1}^n x_i \sum_{i=1} ^n Y_i}{n\sum_{i=1}^n x_i^2-(\sum_{i=1}^n x_i)^2}\]
Intercept:
\[\hat{\beta}_0 = \bar{Y}-\hat{\beta}_1 \bar{x}\]
### FOR EASE OF NOTATION
x<-temp
Y<-rev
### VECTOR MATH IS ELEMENT WISE
head(x)
## [1] 24.56688 26.00519 27.79055 20.59534 11.50350 14.35251
head(Y)
## [1] 534.7990 625.1901 660.6323 487.7070 316.2402 367.9407
head(x*Y)
## [1] 13138.346 16258.189 18359.337 10044.488 3637.868 5280.875
### NUMERATOR (Part 1)
n<-length(x)
num1<-n*sum(x*Y)
### NUMERATOR (Part 2)
num2<-sum(x)*(sum(Y))
### DENOMINATOR (Part 1)
den1<-n*sum(x^2)
### DENOMINATOR (Part 2)
den2<-sum(x)^2
### BETA 1 HAT
beta_1_hat<-(num1-num2)/(den1-den2)
beta_1_hat
## [1] 21.44363
### BETA 0 HAT
beta_0_hat<-mean(Y)-beta_1_hat*mean(x)
beta_0_hat
## [1] 44.83127
The rocket data are available here:
rocket<-read.csv("https://raw.githubusercontent.com/kitadasmalley/Teaching_Public/refs/heads/main/STAT441/FA25/data/rocketProp.csv",
header=TRUE)
These rocket data are from:
“Introduction to Linear Regression Analysis (3rd Edition)” by Douglas C. Montgomery, Elizabeth A. Peek, and G. Geoffrey Vining. Published by Wiley-Interscience as part of the Wiley Series in Probability and Statistics.