class: center, middle, inverse, title-slide .title[ #
Flight Delay Prediction Based on Linear Regression
] .author[ ###
Echefu Chinwendu
] .institute[ ###
West Chester University of Pennsylvania
] .date[ ###
February 6, 2024
] --- class:inverse, top <h2 align="center"> Table of Contents</h2> <BR> .pull-left[ - Introduction - Objectives of the study - Data - Response variable - Model - Description of the data - Exploratory Data Analysis - Categorical Variables - Numerical Variables - Linear Regression Model - Description of the model - Regression Plots ] .pull-right[ - Regression Coefficients of the model - Model Equation and Interpretation - Model Equation - Model Interpretation - Conclusion - Discussion - Limitations ] --- <h2 align="center"> Introduction</h2> <BR> .pull-left[ This data contains information on flight delay. Flight delay poses a big problem for the aviation industry. Customer satisfaction is affected by how much time flights are delayed. In this study, the aim is to predict flight delay using factors such as - weather condition - distance between airports - total number of flights in the airport - total number of support crew available and others. ] .pull-right[ <h4 align="center"> Objectives of the Study </h4> <BR> - To predict flight delay in minutes - To obtain the average flight delay - To determine factors that influence delay of flights - To find any notable patterns between some of the factors and flight delay ] --- <h2 align="center"> Data </h2> <BR> .pull-left[ **Response Variable** <br><br> The response variable is `Arrival Delay (Arr_delay)` which is a continuous variable ranging from 0 to 180 minutes. Increasing values mean a longer delay period. **Model** <br><br> A multiple linear regression model was used to predict the flight arrival delay. This is a tool that helps to analyze the relationship between two or more predictor variables and a response variable. In this model, the coefficient of each independent variable tells the estimated change in the response associated with a one-unit increase in the independent variable, given that the other independent variable is held constant. ] .pull-right[ **Description of the Data**<br><br> The total number of records in this data set was 3593. It consists of 11 variables with no missing values including the response variable with the name `arr_delay`. There are 10 numerical variables and 1 categorical variable. The predictor variables include - Distance between airports - weather condition - Time in minutes for loading the baggage - total number of flights in the airport - total number of support crew available - Time in minutes for late arriving aircraft - Time in minutes for aircraft cleaning - Time in minutes for aircraft fueling - Time in minutes for security checking ] --- <h2 align="center"> Exploratory Data Analysis </h2> <BR> .pull-left[ <img src="flight_delay_files/figure-html/unnamed-chunk-2-1.png" width="400" height="300" /> - There are 14 different types of carriers. - UA has the most number of flights in this data (729) - Type of carrier was not included in the analysis ] .pull-right[ <img src="flight_delay_files/figure-html/unnamed-chunk-3-1.png" width="400" height="300" /> - The weather condition variable was ranked from 0 to 10, with 0 being mild and 10 being extreme. - Only 2 values were reported in the entire data - 5 and 6 - This was recoded as a binary variable - 0 and 1 ] --- <h2 align="center"> Exploratory Data Analysis </h2> <BR> .pull-left[ <img src="flight_delay_files/figure-html/unnamed-chunk-4-1.png" width="400" height="230" /> <img src="flight_delay_files/figure-html/unnamed-chunk-5-1.png" width="400" height="230" /> ] .pull-right[ <img src="flight_delay_files/figure-html/unnamed-chunk-6-1.png" width="400" height="230" /> <img src="flight_delay_files/figure-html/unnamed-chunk-7-1.png" width="400" height="230" /> ] --- <h2 align="center"> Linear Regression Model </h2> <BR> .pull-left[ Linear regression was carried out with 9 predictor variables and they include: - Distance between airports - weather condition - Time in minutes for loading the baggage - total number of flights in the airport - total number of support crew available - Time in minutes for late arriving aircraft - Time in minutes for aircraft cleaning - Time in minutes for aircraft fueling - Time in minutes for security checking The aim was to predict flight delay in minutes. From the plot, there are a few outliers. There is no obvious violation of the assumption of constant variance and normality of residuals. ] .pull-right[ <img src="flight_delay_files/figure-html/unnamed-chunk-9-1.png" width="500" height="300" /> ] --- <h2 align="center"> Regression Coefficients of the model </h2> <BR>
--- <h2 align="center"> Model Equation and Interpretation </h2> <BR> .pull-left[ #### Model Equation Y = -553.69 + 0.173X1 + 0.0044X2 - 0.0486X3 + 13.49X4 + 6.898X5 + 0.0526X6 - 0.0587X7 + 0.0085X8 + 4.466X9, Where - Y = Flight delay time in minutes - X1 = Airport distance - X2 = Number of flights - X3 = Support Crew Available - X4 = Baggage loading time - X5 = Late arrival of flight time - X6 = Time for Cleaning of aircraft - X7 = Time for Fueling of aircraft - X8 = Time for Security checking - X9 = Weather condition ] .pull-right[ #### Model Interpretation For every 1-unit increase in the distance between airports, we expect the average flight delay time to increase by 0.173 minutes. Also, for every one-unit increase in the number of flights in the airport, we expect the average flight delay time to increase by 0.0044minutes For every one-unit increase in the number of support crew available, we expect the average flight delay time to decrease by 0.0486 minutes. We expect the average flight delay time to increase by 13.49 minutes with every one-unit increase in the baggage loading time. We also expect the average flight delay time to increase by 6.898minutes for every one-unit increase in the time for late arrival aircraft of same flight. For every 1-unit increase in the time for cleaning flights, we expect the average flight delay time to increase by 0.0526 minutes. ] --- <h2 align="center"> Conclusion </h2> <BR> #### Discussion Flight delay prediction is very essential and helps to provide necessary information to customers. Accurate predictions are very important because it has a significant influence on customer satisfaction . Using the linear regression model, we were able to use several predictor variables to estimate flight delay. #### Limitations Using a linear regression model to predict flight delay time has some limitations. One is that a linear model provides only information on the linear or straight-line relationship between the predictors and the response. Another limitation is that it is very sensitive to extreme values which may affect the overall estimates. #### Dataset A copy of this publicly available data is stored at: https://raw.githubusercontent.com/chinwex/datasets/main/Flight_delay-data.csv