2025-10-20

Simple Linear Regression — Temperature vs Sales

  • This presentation examines whether daily ice-cream sales increase as outdoor temperature rises using a simple linear regression model.
  • The predictor \(x\) is Temperature (°F), and the response \(y\) is Daily Ice Cream Sales ($).
  • The objective is to visualize the relationship, fit a straight-line model, interpret the slope and model fit, and communicate what the results imply for sales planning.

What is Simple Linear Regression?

  • The model treats the average sales as a linear function of temperature.
  • It chooses the line that minimizes squared residuals, i.e., the squared gaps between actual and predicted sales.
  • \(\hat\beta_0\) is the expected sales at \(x=0\) (line anchor); \(\hat\beta_1\) is the change in sales per 1°F.
  • Predictions \(\hat y\) give expected sales for a given temperature; residuals show day-to-day factors not explained by temperature.
  • Reasonable use here assumes the trend is roughly linear and residual spread is roughly constant across temperatures.

Regression Equation (LaTeX)

  • The simple linear regression model is expressed as \(\,y_i = \beta_0 + \beta_1 x_i + \varepsilon_i,\; i=1,\dots,n.\)
  • In this expression, \(\beta_0\) is the population intercept, \(\beta_1\) is the population slope, and \(\varepsilon_i\) is the random error for observation \(i\).
  • Interpreted in context, the expected value of sales changes linearly with temperature, while the error term accounts for day-to-day factors not captured by temperature alone.

Least Squares Coefficients (LaTeX)

  • The least squares estimates are \(\displaystyle \hat{\beta}_1=\frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2}\) and \(\displaystyle \hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x}.\)
  • These estimates minimize the total squared residuals \(\sum_{i=1}^n (y_i-\hat{y}_i)^2\) with \(\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1 x_i.\)
  • In this study, \(\hat{\beta}_1\) indicates how many dollars of sales are associated, on average, with a one-degree Fahrenheit increase in temperature, while \(\hat{\beta}_0\) positions the line when \(x=0\) (often mainly for anchoring if \(x=0\) lies outside the observed range).

Data Scatter (ggplot #1)

Fitted Line (ggplot #2)

Interactive 3D Plot (plotly)

R Code (shown)

# Data and model (duplicated so the code is visible on this slide)
temperature <- c(60, 65, 70, 75, 80, 85, 90, 95)
sales       <- c(120,150,180,200,220,250,270,300)
df <- tibble(temperature, sales)
model <- lm(sales ~ temperature, data = df)

# Show model summary (output appears below the code)
summary(model)
Call:
lm(formula = sales ~ temperature, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-4.167 -3.512  1.071  1.488  6.071 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -174.4048     9.2013  -18.95 1.39e-06 ***
temperature    4.9762     0.1174   42.37 1.16e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.806 on 6 degrees of freedom
Multiple R-squared:  0.9967,    Adjusted R-squared:  0.9961 
F-statistic:  1795 on 1 and 6 DF,  p-value: 1.157e-08

Interpretation & Conclusion

  • The fitted regression line is \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x,\) which summarizes the average change in sales as temperature varies.
  • Based on the fitted model, each additional \(1^\circ\)F of temperature is associated with an average increase of 4.98 dollars in daily sales.
  • The model explains a meaningful portion of the variability in sales, with an \(R^2\) of approximately 0.997, indicating how much of the variation is captured by temperature alone.
  • In practical terms, warmer days tend to produce higher sales, so temperature is a useful predictor for basic planning and forecasting. Because the dataset is small and includes only one predictor, conclusions should avoid extrapolating far beyond the observed temperature range and should acknowledge that other factors can also influence sales.