The Question/Problem Statement

Problem: A pizza shop wants to estimate delivery times for their customers.

Simple idea: The farther away the customer lives, the longer pizza delivery takes.

Goal: To build a model to predict delivery time estimates based on the distance.

  • X variable: Distance from shop (miles)
  • Y variable: Delivery time (minutes)

Simple Linear Regression?

We want to draw the best straight line through our data.

The equation: \[Y = \beta_0 + \beta_1 X\]

  • \(\beta_0\) = starting time (prep + loading)
  • \(\beta_1\) = time added per mile
  • \(X\) = distance in miles
  • \(Y\) = total delivery time in minutes

Example: If \(\beta_0 = 15\) and \(\beta_1 = 3\), then a 5-mile delivery takes: \(15 + 3(5) = 30\) minutes

Distance (miles) Time (minutes)
3.2 20.9
8.0 31.4
4.4 24.8
8.9 36.0
9.4 37.2
0.9 15.7
5.5 26.9
9.0 34.3
5.7 25.7
4.8 23.3

60 deliveries recorded over one week

Scatter Plot

Clear positive relationship!

The Regression Line

The Formula

Our fitted equation: \[\widehat{\text{Time}} = 12.51 + 2.44 \times \text{Distance}\]

What this derives:

  • Base time (prep + loading): 12.5 minutes
  • Each mile adds: 2.4 minutes

Example prediction:

For a 6-mile delivery: \(12.51 + 2.44 \times 6 = 27.2\) minutes

R Code

# Basic regression in R - it's this simple!
model <- lm(Time ~ Distance, data = pizza_data)

summary(model)

predict(model, newdata = data.frame(Distance = 6))

That’s it! Just one line to fit the model.

How Good is Our Model for this?

Metric Value
R-squared 0.94
(Intercept) Base Time (minutes) 12.51
Distance Time per Mile (minutes) 2.44

R² = 0.94 means distance explains 94% of delivery time variation.

Pretty good!

Testing & Checking the Model

3D Visualization: Predictions with Confidence

Generating the Predictions

Distance Predicted Time Lower Upper
2 17.4 13.9 20.9
5 24.7 21.3 28.2
8 32.0 28.6 35.5

Example: 5-mile delivery takes about 25 minutes (give or take 3 minutes)

Comparison Text: Short vs Long Deliveries

Summary

  1. Simple linear regression finds the best line throughout the data
  2. Our model: Time = 12.5 + 2.4 × Distance
  3. Each mile adds about 2.4 minutes
  4. The model is very accurate (R² = 0.94)

Use: Pizza shop can now give accurate time estimates to all the customers!

Key takeaway: Linear regression helps us make predictions from simple relationships and data.