NYFlight Delay Analysis with Linear Regression

2025-02-07

Introduction

Using Data set “flights” We will Compare Distance v. Air_time v. Arrival Delays. NY fliers do not normally have this data available. By measuring distance, time, and delays a flier can better predict delays by using the flight distance to predict flight time which in turn can predict flights that are mostly going to be delayed due acts of nature.

# A tibble: 5 × 3
  arr_delay distance air_time
      <dbl>    <dbl>    <dbl>
1        11     1400      227
2        20     1416      227
3        33     1089      160
4        12      719      150
5        19     1065      158

Flight Distance of <2.5k Miles Predicting Flight Airtime of 5 to 8 Hours

Flight Airtime <8 Hours Predicting Arrival Delays >6 hours

Distance Flown Predicting Arrival Delays

Plotly Code

df3 = flights%>% filter(!is.na(arr_delay), !is.na(distance)) %>% filter(arr_delay > 0, distance <= 2600) %>% select(arr_delay, distance) delaydist = lm(arr_delay ~ distance, data = df3)

x = df3\(distance; y = df3\)arr_delay

xax = list(title = “Distance Flown (miles)”, titlefont = list(family = “Modern Computer Roman”)) yax = list(title = “Arrival Delays (mins)”, titlefont = list(family = “Modern Computer Roman”), range = c(0,1000)) fig = plot_ly(x=x, y=y, type = “scatter”, mode = “markers”, name = “data”, width = 800, height = 430) %>% add_lines(x = x, y = fitted(delaydist), name = “fitted”) %>% layout(title = “Regression for (Distance Flown,Arrival Delays)”) %>% layout(xaxis = xax, yaxis = yax) %>% layout(margin = list(l = 150, r = 50, b = 10, t = 110))

config(fig, displaytlogo = FALSE)

Math for Flight Distance Predicting Flight Airtime

fitted:
\(\text{distance} = \hat{\beta}_0 + \hat{\beta}1 \cdot \text{air_time}\)

\(\hat{\beta}_0 = b_0 - \text{estimate of } \beta_0\);
\(\hat{\beta}_1 = b_1 - \text{estimate of } \beta_1\)

slope:
\(\hat{\beta}_1 = \frac{N\sum(xy) - (\sum x)(\sum y)}{N\sum(x^{2}) - (\sum x)^{2}}\)

y-intercept:
\(\hat{\beta}_0\)

Math for Flight Airtime Predicting Arrival Delays

fitted:
\(\text{air_time} = \hat{\beta}_0 + \hat{\beta}1 \cdot \text{arr_delay}\)

\(\hat{\beta}_0 = b_0 - \text{estimate of } \beta_0\);
\(\hat{\beta}_1 = b_1 - \text{estimate of } \beta_1\);

slope:
\(\hat{\beta}_1 = \frac{N\sum(xy) - (\sum x)(\sum y)}{N\sum(x^{2}) - (\sum x)^{2}}\)

y-intercept:
\(\hat{\beta}_0\)

Results and Discussion

The visualizations for th data set (NYflights) did not show a definitive system for predicting arrival delays. Distance over flown time (sld.3) does show that distance can predict how long a plane is in flight, but the opposite is true of flight time predicting arrival delays(sld.4). Per slide 4 flight does not predict the amount of arrival delay time. The plotly graph (sld.6) compares distance flone to arrival delays, showing that delays are commonly 28 to 36 minutes per the regression line ends. Fliers have little chance of predicting delays using a plans travel distance and air time.