November 16, 2024

How long is your flight going to take?

How long is your flight going to take?

In a one-hour sample of domestic flights from NYC:

  • Over half of flights took between 75 and 175 minutes.
  • Nearly all flights took between 33 and 266 minutes, but there were outliers in the 307-322 minute range.

What if there was a way to predict within a much smaller range how long a particular flight would take?

Spoiler alert: We can do that by looking for correlations between flight time and other flight data that we have.

Let’s factor in distance.

Let’s factor in distance.

  • When we plot distance against flight time, a nearly linear relationship emerges.
  • This distance/time relationship is comically intuitive.
    • The fact that flights covering greater distances take longer won’t be a surprise to anyone. I chose a fairly obvious relationship because I wanted to demonstrate linear regression on a data pair with clear correlation.
    • That said, I was a bit surprised by the high degree of linearity. I expected to see a bit more deviation, particularly in the longest and shortest flights.
  • We can model the relationship between flight time and distance as a linear function, allowing us to predict the flight time of a flight of any distance.

What is linear regression?

  • Linear regression is the technique of finding the best-fitting linear model to represent a correlation between variables.
  • The “best-fit line” is the line that minimizes the total variance in the response variable (flight time, in our example) for all values of the predicting variable (distance, in our example).
    • Total variance is defined as the sum of the squares of the errors, or vertical distance between each data point and the regression line.
    • Total variance \(= \sum_{i=1}^ne_i^2\)
  • Our linear model can be defined in a slope-intercept form:
    • \(y=b+mx\)
    • Time = \(\hat{\beta}_0+\hat{\beta}_1*\) Distance

Linear Regression in R

# Data frame is named fl_timedist

# We'll use the lm function to perform the fit.
# The first argument is a "formula" of the 
# form [response variable] ~ [predicting variable]

lin_reg <- lm(air_time ~ distance, data=fl_timedist)

lin_reg
## 
## Call:
## lm(formula = air_time ~ distance, data = fl_timedist)
## 
## Coefficients:
## (Intercept)     distance  
##     18.7348       0.1187

Data Points with Linear Regression Line

Drawing Conclusions from the Model

## 
## Call:
## lm(formula = air_time ~ distance, data = fl_timedist)
## 
## Coefficients:
## (Intercept)     distance  
##     18.7348       0.1187

\(\hat{\beta}_0\) (y-intercept) \(=18.7348\), \(\hat{\beta}_1\) (slope) \(=0.1187\)

We can therefore estimate the flight time (in-air) of any flight with the formula \(time = 18.7348+(0.1187*distance)\)

0.1187 is in minutes/mile, therefore:

1/0.1187 \(\approx\) 8.425 miles/minute \(\approx\) 505.5 miles/hour

Just for Fun

Attribution