Association of Flight Delayed

A correlation between departures delayed and arrivals delayed

Onjira Benjarangseepornchai

Last updated: 03 June, 2017

Introduction

Introduction Cont.

Source: https://think-tasmania.com/flight/

Problem Statement

Data

Domestic_Airlines <- read_csv("~/Downloads/domestic_time_performance.csv")
Melbourne_Airport <- Domestic_Airlines %>% filter(Departing_Port == "Melbourne")

Data Cont.

Descriptive Statistics

Decsriptive Statistics Cont. (2)

Departures Delayed

Melbourne_Airport %>% summarise(Min = min(Departures_Delayed,na.rm = TRUE),
                                           Q1 = quantile(Departures_Delayed,probs = .25,na.rm = TRUE),
                                           Median = median(Departures_Delayed, na.rm = TRUE),
                                           Q3 = quantile(Departures_Delayed,probs = .75,na.rm = TRUE),
                                           Max = max(Departures_Delayed,na.rm = TRUE),
                                           Mean = mean(Departures_Delayed, na.rm = TRUE),
                                           SD = sd(Departures_Delayed, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(Departures_Delayed))) -> table1
knitr::kable(table1)
Min Q1 Median Q3 Max Mean SD n Missing
0 6 15 31 337 24.81968 31.51441 6496 13

Decsriptive Statistics Cont. (3)

Arrivals Delayed

Melbourne_Airport %>% summarise(Min = min(Arrivals_Delayed,na.rm = TRUE),
                                           Q1 = quantile(Arrivals_Delayed,probs = .25,na.rm = TRUE),
                                           Median = median(Arrivals_Delayed, na.rm = TRUE),
                                           Q3 = quantile(Arrivals_Delayed,probs = .75,na.rm = TRUE),
                                           Max = max(Arrivals_Delayed,na.rm = TRUE),
                                           Mean = mean(Arrivals_Delayed, na.rm = TRUE),
                                           SD = sd(Arrivals_Delayed, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(Arrivals_Delayed))) -> table2
knitr::kable(table2)
Min Q1 Median Q3 Max Mean SD n Missing
0 7 16 32 372 27.28226 36.50968 6496 9

Linear Regression

plot(log(Departures_Delayed) ~ log(Arrivals_Delayed), data = Melbourne_Airport)

- The plot above shows a strong positive of correlations.

Correlation

\[r = \frac{L_{xy}}{\sqrt[]{L_{xx}-L_{yy}}}\]

cor(Melbourne_Airport$Departures_Delayed,Melbourne_Airport$Arrivals_Delayed, use="complete.obs")
## [1] 0.9575857

Correlation Cont.

library(Hmisc)
bivariate<-as.matrix(dplyr::select(Melbourne_Airport, Departures_Delayed,Arrivals_Delayed))
rcorr(bivariate, type = "pearson")
##                    Departures_Delayed Arrivals_Delayed
## Departures_Delayed               1.00             0.96
## Arrivals_Delayed                 0.96             1.00
## 
## n
##                    Departures_Delayed Arrivals_Delayed
## Departures_Delayed               6483             6483
## Arrivals_Delayed                 6483             6487
## 
## P
##                    Departures_Delayed Arrivals_Delayed
## Departures_Delayed                     0              
## Arrivals_Delayed    0

Hypothesis Testing

\[H_{0}:r = 0 \] \[H_{A}:r \neq 0 \]

Hypothesis Testing Cont.

2*pt(q = 276.293, df = 6496 - 2, lower.tail=FALSE)
## [1] 0

Confidence Interval

\[r=z=\frac{1}{2}ln(\frac{1+r}{1-r})\]

0.5*(log((1+.96)/(1-.96)))
## [1] 1.94591

Confidence Interval Cont.

library(psychometric)
r=cor(Melbourne_Airport$Departures_Delayed,Melbourne_Airport$Arrivals_Delayed,use="complete.obs")
CIr(r = r, n = 6496, level = .95)
## [1] 0.9555183 0.9595589

Interpretation

Discussion

References