MATH2349 Semester 2, 2019

Required packages

library(readr)
library(tidyr)
library(dplyr)
library(Hmisc)
library(outliers)
library(forecast)

Executive Summary

The task at hand was to acquire to multiple sets of data which then needed to be merged, pre-processed to get into a position where data transformation could take place.

The data sets that are to be used have both been provided via the Australian Federal Government’s data portal (http://data.gov.au). Both data sets are provided under a Creative Commons licence. Both data sets are originally sourced from Bureau of Infrastructure, Transport and Regional Economics (BITRE). BITRE’s website (https://www.bitre.gov.au) describes the Bureau as:

“The Bureau of Infrastructure, Transport and Regional Economics (BITRE) provides economic analysis, research and statistics on infrastructure, transport and regional development issues to inform both Australian Government policy development and wider community understanding.”

The first dataset, Domestic Airlines - On Time Performance, has statistics on the punctuality of airlines in Australia for domestic routes. The dataset is described on the data portal as:

“Covers monthly punctuality and reliability data of major domestic and regional airlines operating between Australian airports. Details are published for individual airlines on competitive routes and for airports on those routes.”

It was evident from the outset that the two sets of data could not be merged ‘as is’, work was required to pre-process the individual data sets first before merging.

Once merged missing values were removed and new variables created such as average passengers per flight.

Finally, in an effort to achieve a symmetrical distribution several data transformation methods were applied to the data. It was shown that \(ln\) transformation technique provided the best results for this particular data set.

Data

passengers <- read.csv('https://data.gov.au/data/dataset/cc5d888f-5850-47f3-815d-08289b22f5a8/resource/38bdc971-cb22-4894-b19a-814afc4e8164/download/mon_pax_web.csv',stringsAsFactors = FALSE)
performance <- read.csv('https://data.gov.au/data/dataset/29128ebd-dbaa-4ff5-8b86-d9f30de56452/resource/cf663ed1-0c5e-497f-aea9-e74bfda9cf44/download/otp_time_series_web.csv',stringsAsFactors = FALSE)

str(passengers)

## 'data.frame':    8694 obs. of  12 variables:
##  $ AIRPORT      : chr  "ADELAIDE" "ALICE SPRINGS" "All Australian Airports" "BALLINA" ...
##  $ Year         : int  1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 ...
##  $ Month        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Dom_Pax_In   : int  81661 19238 1192395 0 120950 18840 33809 17450 35352 2106 ...
##  $ Dom_Pax_Out  : int  81630 17887 1192395 0 120776 19865 30739 14328 44203 2398 ...
##  $ Dom_Pax_Total: int  163291 37125 2384790 0 241726 38705 64548 31778 79555 4504 ...
##  $ Int_Pax_In   : int  5806 0 263795 0 25867 1683 0 2942 0 0 ...
##  $ Int_Pax_Out  : int  4733 0 208770 0 19178 1329 0 1837 0 0 ...
##  $ Int_Pax_Total: int  10539 0 472565 0 45045 3012 0 4779 0 0 ...
##  $ Pax_In       : int  87467 19238 1456190 0 146817 20523 33809 20392 35352 2106 ...
##  $ Pax_Out      : int  86363 17887 1401165 0 139954 21194 30739 16165 44203 2398 ...
##  $ Pax_Total    : int  173830 37125 2857355 0 286771 41717 64548 36557 79555 4504 ...

head(passengers)

str(performance)

## 'data.frame':    81677 obs. of  14 variables:
##  $ Route             : chr  "Adelaide-Brisbane" "Adelaide-Canberra" "Adelaide-Gold Coast" "Adelaide-Melbourne" ...
##  $ Departing_Port    : chr  "Adelaide" "Adelaide" "Adelaide" "Adelaide" ...
##  $ Arriving_Port     : chr  "Brisbane" "Canberra" "Gold Coast" "Melbourne" ...
##  $ Airline           : chr  "All Airlines" "All Airlines" "All Airlines" "All Airlines" ...
##  $ Month             : int  37987 37987 37987 37987 37987 37987 37987 37987 37987 37987 ...
##  $ Sectors_Scheduled : num  155 75 40 550 191 ...
##  $ Sectors_Flown     : int  155 75 40 548 191 485 168 63 31685 155 ...
##  $ Cancellations     : num  0 0 0 2 0 1 0 0 228 0 ...
##  $ Departures_On_Time: num  123 72 36 478 169 ...
##  $ Arrivals_On_Time  : num  120 72 35 487 168 ...
##  $ Departures_Delayed: num  32 3 4 70 22 ...
##  $ Arrivals_Delayed  : num  35 3 5 61 23 20 22 0 4210 15 ...
##  $ Year              : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
##  $ Month_Num         : int  1 1 1 1 1 1 1 1 1 1 ...

head(performance)

After reading in the data into R we see that there are 14 fields present:

Route – the route, a combination of Departing and Arriving fields, stored as chr.
Departing_Port – The airport flights departed from, stored as chr.
Arriving_Port – The airport flights arrived at from, stored as chr.
Airline – The individual airline, stored as chr.
Month – Numeric value of the first day of the reported month/year, stored as integer
Sectors_Scheduled – The was no information on this particular field on both the Government’s data portal or BITRE’s website. An assumption made is that each flight route is broken up into several sectors. In any case this data was not going to be used in the final analysis. Stored as integer.
Sectors_Flown – As above, Stored as integer
Cancellations – Number of flights cancelled, Stored as integer
Departures_On_Time – Number of flights departed on time, Stored as integer
Arrivals_On_Time – Number of flights arrived on time, Stored as integer
Departures_Delayed – Number of flights whose departure was delayed, Stored as integer
Arrivals_Delayed – Number of flights whose arrival was delayed, Stored as integer
Year – Year of the data, stored as integer
Month_Num – Month of the data, stored as integer

A note on the above, BITRE describe on-time as follows:

“A flight arrival is counted as “on time” if it arrived at the gate before 15 minutes after the scheduled arrival time shown in the carriers’ schedule. Neither diverted nor cancelled flights count as on time. Similarly, a flight departure is counted as “on time” if it departs the gate before 15 minutes after the scheduled departure time shown in the carriers’ schedule.”

The second dataset, Airport Passenger Movements by Month - 20 major airports, looks at passenger movements through 21 Australian airports. Data is broken down to each individual route taken.

After reading in the data into R we see that there are 12 fields present:

AIRPORT – Name of the airport, 21 unique airports listed as well the category ‘All Australian Airports’ that sums the values for all the other airports, stored as chr.
Year – Year of the data, stored as integer
Month – Month of the data, stored as integer
Dom_Pax_In – Passengers travelling domestic arriving, stored as integer
Dom_Pax_Out – Passengers travelling domestic departing, stored as integer
Dom_Pax_Total – Total passengers travelling domestic, stored as integer
Int_Pax_In – Passengers travelling international arriving, stored as integer
Int_Pax_Out – Passengers travelling international departing, stored as integer
Int_Pax_Total – Total passengers travelling international, stored as integer
Pax_In – Total passengers arriving, stored as integer
Pax_Out – Total passengers departing, stored as integer
Pax_Total – Total passengers departing, stored as integer

To get the data in a state were it could be merged there was additional pre-processing required. For example airports in one dataset was all capitalised while in the other it wasn’t.

Pre processing the individual data sets
Punctuality data

Punctuality data is listed by route. Whilst we won’t be looking at individual routes it will allow to split the data into arrivals and departures by airport.

First the punctuality data is split into two dataframes, depatures and arrivals, with relevent columns extracted for each sub dataset, data is filtred by ‘All Arilines’ as this execrcise will not be looking at indidual airlines, in any case, the passenger data is not provided at an airline level:

departures <- performance[c(2,5,9,11,13,14)] %>% filter(performance$Airline == 'All Airlines') %>% as.data.frame()
arrivals <- performance[c(3,5,10,12:14)] %>% filter(performance$Airline == 'All Airlines') %>% as.data.frame()

str(departures)

## 'data.frame':    21268 obs. of  6 variables:
##  $ Departing_Port    : chr  "Adelaide" "Adelaide" "Adelaide" "Adelaide" ...
##  $ Month             : int  37987 37987 37987 37987 37987 37987 37987 37987 37987 37987 ...
##  $ Departures_On_Time: num  123 72 36 478 169 ...
##  $ Departures_Delayed: num  32 3 4 70 22 ...
##  $ Year              : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
##  $ Month_Num         : int  1 1 1 1 1 1 1 1 1 1 ...

head(departures)

str(arrivals)

## 'data.frame':    21268 obs. of  6 variables:
##  $ Arriving_Port   : chr  "Brisbane" "Canberra" "Gold Coast" "Melbourne" ...
##  $ Month           : int  37987 37987 37987 37987 37987 37987 37987 37987 37987 37987 ...
##  $ Arrivals_On_Time: num  120 72 35 487 168 ...
##  $ Arrivals_Delayed: num  35 3 5 61 23 20 22 0 4210 15 ...
##  $ Year            : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
##  $ Month_Num       : int  1 1 1 1 1 1 1 1 1 1 ...

head(arrivals)

The data then needs to be summarised, and redundant columns removed:

departures <-cbind(departures %>% group_by(.dots=c('Departing_Port','Month','Year','Month_Num')) %>% 
                     summarise(Departures_On_Time = sum(Departures_On_Time, na.rm = TRUE) %>% as.integer()),
                   departures %>% group_by(.dots=c('Departing_Port','Month','Year','Month_Num')) %>% 
                     summarise(Departures_Delayed = sum(Departures_Delayed, na.rm = TRUE) %>% as.integer()))
departures <- departures[c(1:5,10)] %>% as.data.frame()

arrivals <-cbind(arrivals %>% group_by(.dots=c('Arriving_Port','Month','Year','Month_Num')) %>% 
                   summarise(Arrivals_On_Time = sum(Arrivals_On_Time, na.rm = TRUE) %>% as.integer()),
                 arrivals %>% group_by(.dots=c('Arriving_Port','Month','Year','Month_Num')) %>% 
                   summarise(Arrivals_Delayed = sum(Arrivals_Delayed, na.rm = TRUE) %>% as.integer()))
arrivals <- arrivals[c(1:5,10)] %>% as.data.frame()

str(departures)

## 'data.frame':    6670 obs. of  6 variables:
##  $ Departing_Port    : chr  "Adelaide" "Adelaide" "Adelaide" "Adelaide" ...
##  $ Month             : int  37987 38018 38047 38078 38108 38139 38169 38200 38231 38261 ...
##  $ Year              : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
##  $ Month_Num         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Departures_On_Time: int  1334 1320 1416 1319 1375 1242 1309 1335 1299 1318 ...
##  $ Departures_Delayed: int  160 69 75 109 103 177 179 140 193 188 ...

head(departures)

str(arrivals)

## 'data.frame':    6671 obs. of  6 variables:
##  $ Arriving_Port   : chr  "Adelaide" "Adelaide" "Adelaide" "Adelaide" ...
##  $ Month           : int  37987 38018 38047 38078 38108 38139 38169 38200 38231 38261 ...
##  $ Year            : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
##  $ Month_Num       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Arrivals_On_Time: int  1354 1311 1423 1328 1373 1242 1264 1299 1293 1310 ...
##  $ Arrivals_Delayed: int  140 77 72 107 102 177 222 177 200 197 ...

head(arrivals)

Amend airport columns name in both sub-sets to ‘Airport’ to allow sub data sets to be joined, columns names amended and airports converted to all lowercase to allow join to passengers dataset:

colnames(departures)[1] <- 'Airport'
colnames(arrivals)[1] <- 'Airport'

performance <- inner_join(departures,arrivals,by=c('Airport','Month','Year','Month_Num')) #join the dataframes

colnames(performance)[2] <- 'Month.date'
colnames(performance)[4] <- 'Month'

performance$Airport <- performance$Airport %>% tolower()
colnames(performance) <- colnames(performance) %>% tolower()

str(performance)

## 'data.frame':    6670 obs. of  8 variables:
##  $ airport           : chr  "adelaide" "adelaide" "adelaide" "adelaide" ...
##  $ month.date        : int  37987 38018 38047 38078 38108 38139 38169 38200 38231 38261 ...
##  $ year              : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
##  $ month             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ departures_on_time: int  1334 1320 1416 1319 1375 1242 1309 1335 1299 1318 ...
##  $ departures_delayed: int  160 69 75 109 103 177 179 140 193 188 ...
##  $ arrivals_on_time  : int  1354 1311 1423 1328 1373 1242 1264 1299 1293 1310 ...
##  $ arrivals_delayed  : int  140 77 72 107 102 177 222 177 200 197 ...

head(performance)

Passenger data

There was not as much work required to prepare the passenger data to allow a merge with the punctuality data, airports were converted to lower case as were column names.

passengers$AIRPORT <- passengers$AIRPORT %>% tolower()
colnames(passengers) <- colnames(passengers) %>% tolower()

Datasets were then merged on ‘airport’,‘month’ and ‘year’

mergedData <- left_join(passengers,performance,by=c('airport','month','year')) #join the dataframes

Understand

The merged dataset has the following fields:

airport
year
month
dom_pax_in
dom_pax_out
dom_pax_total
int_pax_in
int_pax_out
int_pax_total
pax_in
pax_out
pax_total
month.date
departures_on_time
departures_delayed
arrivals_on_time
arrivals_delayed

More detailed descriptions are provided in the previous section

str(mergedData)

## 'data.frame':    8694 obs. of  17 variables:
##  $ airport           : chr  "adelaide" "alice springs" "all australian airports" "ballina" ...
##  $ year              : int  1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 ...
##  $ month             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ dom_pax_in        : int  81661 19238 1192395 0 120950 18840 33809 17450 35352 2106 ...
##  $ dom_pax_out       : int  81630 17887 1192395 0 120776 19865 30739 14328 44203 2398 ...
##  $ dom_pax_total     : int  163291 37125 2384790 0 241726 38705 64548 31778 79555 4504 ...
##  $ int_pax_in        : int  5806 0 263795 0 25867 1683 0 2942 0 0 ...
##  $ int_pax_out       : int  4733 0 208770 0 19178 1329 0 1837 0 0 ...
##  $ int_pax_total     : int  10539 0 472565 0 45045 3012 0 4779 0 0 ...
##  $ pax_in            : int  87467 19238 1456190 0 146817 20523 33809 20392 35352 2106 ...
##  $ pax_out           : int  86363 17887 1401165 0 139954 21194 30739 16165 44203 2398 ...
##  $ pax_total         : int  173830 37125 2857355 0 286771 41717 64548 36557 79555 4504 ...
##  $ month.date        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ departures_on_time: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ departures_delayed: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ arrivals_on_time  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ arrivals_delayed  : int  NA NA NA NA NA NA NA NA NA NA ...

head(mergedData)

As the to datasets covered deifferent time periods as well as only 21 aiports listed in there a many null values that need be remved, this can be done using a combination is.na() and filter() functions.

stats <- data.frame(dataset = c('passengers','performance'),
                    mindate = c(min(passengers$year),min(performance$year)),
                    maxdate = c(max(passengers$year),max(performance$year)),
                    airports = c(length(unique(passengers$airport)),length(unique(performance$airport))))
stats

mergedData<- mergedData %>% filter(is.na(mergedData$month.date)==0)
str(mergedData)

## 'data.frame':    3584 obs. of  17 variables:
##  $ airport           : chr  "adelaide" "alice springs" "brisbane" "cairns" ...
##  $ year              : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
##  $ month             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ dom_pax_in        : int  197163 22856 451144 97476 77334 36637 117455 55054 35898 17748 ...
##  $ dom_pax_out       : int  197850 22426 464032 105594 74318 32208 123164 61581 39235 17951 ...
##  $ dom_pax_total     : int  395013 45282 915176 203070 151652 68845 240619 116635 75133 35699 ...
##  $ int_pax_in        : int  13508 0 145938 36512 0 4087 5066 0 0 0 ...
##  $ int_pax_out       : int  11289 0 133521 33924 0 2935 4704 0 0 0 ...
##  $ int_pax_total     : int  24797 0 279459 70436 0 7022 9770 0 0 0 ...
##  $ pax_in            : int  210671 22856 597082 133988 77334 40724 122521 55054 35898 17748 ...
##  $ pax_out           : int  209139 22426 597553 139518 74318 35143 127868 61581 39235 17951 ...
##  $ pax_total         : int  419810 45282 1194635 273506 151652 75867 250389 116635 75133 35699 ...
##  $ month.date        : int  37987 37987 37987 37987 37987 37987 37987 37987 37987 37987 ...
##  $ departures_on_time: int  1334 61 2813 561 1093 112 627 340 216 161 ...
##  $ departures_delayed: int  160 2 531 59 91 18 139 64 39 13 ...
##  $ arrivals_on_time  : int  1354 62 2857 566 1070 105 628 336 214 154 ...
##  $ arrivals_delayed  : int  140 1 452 56 118 26 142 65 41 20 ...

head(mergedData)

statsMerged <- data.frame(dataset = c('merged'),
                    mindate = c(min(mergedData$year)),
                    maxdate = c(max(mergedData$year)),
                    airports = c(length(unique(mergedData$airport))))
statsMerged

There are also columns that need to be removed as they don’t apply to this investigation. i.e. data relevant to international travelling passengers is not relevant as we are looking at domestic travel, passenger totals can also be removed as well as this will include international travel.

mergedData <- mergedData[-c(7:12)]
str(mergedData)

## 'data.frame':    3584 obs. of  11 variables:
##  $ airport           : chr  "adelaide" "alice springs" "brisbane" "cairns" ...
##  $ year              : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
##  $ month             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ dom_pax_in        : int  197163 22856 451144 97476 77334 36637 117455 55054 35898 17748 ...
##  $ dom_pax_out       : int  197850 22426 464032 105594 74318 32208 123164 61581 39235 17951 ...
##  $ dom_pax_total     : int  395013 45282 915176 203070 151652 68845 240619 116635 75133 35699 ...
##  $ month.date        : int  37987 37987 37987 37987 37987 37987 37987 37987 37987 37987 ...
##  $ departures_on_time: int  1334 61 2813 561 1093 112 627 340 216 161 ...
##  $ departures_delayed: int  160 2 531 59 91 18 139 64 39 13 ...
##  $ arrivals_on_time  : int  1354 62 2857 566 1070 105 628 336 214 154 ...
##  $ arrivals_delayed  : int  140 1 452 56 118 26 142 65 41 20 ...

The dataframe is starting to take shape, but there are still a few thigs that need to be done. There are date and year fields as well as another month.date field stored as a numerical value. Using as.date() the month date can be converted to a date variable and the other month and date fields removed.

mergedData$month.date <- mergedData$month.date %>% as.Date(origin = "1899-12-30")
mergedData <- mergedData[-c(2:3)]
str(mergedData)

## 'data.frame':    3584 obs. of  9 variables:
##  $ airport           : chr  "adelaide" "alice springs" "brisbane" "cairns" ...
##  $ dom_pax_in        : int  197163 22856 451144 97476 77334 36637 117455 55054 35898 17748 ...
##  $ dom_pax_out       : int  197850 22426 464032 105594 74318 32208 123164 61581 39235 17951 ...
##  $ dom_pax_total     : int  395013 45282 915176 203070 151652 68845 240619 116635 75133 35699 ...
##  $ month.date        : Date, format: "2004-01-01" "2004-01-01" ...
##  $ departures_on_time: int  1334 61 2813 561 1093 112 627 340 216 161 ...
##  $ departures_delayed: int  160 2 531 59 91 18 139 64 39 13 ...
##  $ arrivals_on_time  : int  1354 62 2857 566 1070 105 628 336 214 154 ...
##  $ arrivals_delayed  : int  140 1 452 56 118 26 142 65 41 20 ...

head(mergedData)

One last thing, order the columns in a logical manner:

mergedData <- mergedData  %>% select(month.date,
                                     airport,
                                     dom_pax_in,
                                     dom_pax_out,
                                     dom_pax_total,
                                     arrivals_on_time,
                                     arrivals_delayed,
                                     departures_on_time,
                                     departures_delayed)
head(mergedData)

Tidy & Manipulate Data I

Due to the combined data dealing with multiples variables presenting the data in a ‘wide’ format would meed tidy requirements. The removal of the ’dom_pax_total’field (total passengers) is the only action required to make the data conform with tidy requirements.

mergedData <- mergedData[-c(5)]
head(mergedData)

Tidy & Manipulate Data II

With the data provided and the analysis required later on we can calculate the average passenger per flight, both arriving and departing, from each airport each month. Using mutate() we can create two new fields, ‘averagePassengersDeparting’ and ‘averagePassengersArriving’.

\(averagePassengersDeparting\) = \(\frac{departing passengers}{on time departures + delayed departures}\)

\(averagePassengersArriving\) = \(\frac{arriving passengers}{on time arrivals + delayed arrivals}\)

mergedData <- mergedData %>% mutate(averagePassengersDeparting = dom_pax_out/(departures_on_time+departures_delayed),
                                    averagePassengersArriving = dom_pax_out/(arrivals_on_time+arrivals_delayed))

head(mergedData)

Scan I

Missing values were handled above with the removal of NULL values.

Scan II

Using a box plot to detect outliers could possibly be problematic, as smaller aircraft fly to regional airport these result might be picked up as outliers. Plotting averagePassengersDeparting agsinst averagePassengersArriving and observing the slope would give a clearer indication if outliers were present.

plot(averagePassengersDeparting ~ averagePassengersArriving,main="Average Passengers Departing v Average Passengers Arriving",
     data = mergedData, xlab = "Average Passengers Arriving", ylab = "Average Passengers Departing")

As we can see the slope in uniform all the way through.

Transform

There are several data transformation option that can be applied to the data set to achieve a symmetrical distribution.

The variable that I have chosen to apply data transformation to is the newly created Average Passengers Departing variable, which in its unchanged form is distributed as follows:

transformData <- data.frame(averagePassengersDeparting = mergedData$averagePassengersDeparting)
hist(transformData$averagePassengersDeparting,main = 'Distribution - Average Passengers Departed')

As we can see the distribution is skewed heavily to the right. The following transformation methods will be applied to the data in an effort to produce a symmetrical distribution:

\(log_{10}\) transformation
\(ln\) transformation
Square root transformation
Square transformation
Reciprocal transformation
Box-Cox transformation

Although this may be overkill it will be interesting to see the result side by side

transformData$logPassengersDeparting <- log10(transformData$averagePassengersDeparting)
transformData$lnPassengersDeparting <- log(transformData$averagePassengersDeparting)
transformData$sqrtPassengersDeparting <- sqrt(transformData$averagePassengersDeparting)
transformData$x2PassengersDeparting <- (transformData$averagePassengersDeparting)^2
transformData$recipPassengersDeparting <- 1/(transformData$averagePassengersDeparting)
transformData$boxCoxPassengersDeparting <- BoxCox(transformData$averagePassengersDeparting,lambda = "auto")

par(mfrow=c(2,3))
hist(transformData$logPassengersDeparting,main = 'log10 Transformation')
hist(transformData$lnPassengersDeparting,main = 'ln Transformation')
hist(transformData$sqrtPassengersDeparting,main = 'Square Root Transformation')
hist(transformData$x2PassengersDeparting,main = 'Square Transformation')
hist(transformData$recipPassengersDeparting,main = 'Reciprocal Transformation')
hist(transformData$boxCoxPassengersDeparting,main = 'Box-Cox Transformation')

Both the \(log_{10}\) and \(ln\) transformations provided the best results with the \(ln\) being slightly more symmetrical.

DATA LICENCES

Data is provided by Bureau of Infrastructure, Transport and Regional Economics and is provided under a Creative Commons Attribution 3.0 Australia licence.

Bureau of Infrastructure, Transport and Regional Economics, Canberra, Domestic Airlines - On Time Performance, Sourced on 26 October 2019, https://data.gov.au/dataset/ds-dga-29128ebd-dbaa-4ff5-8b86-d9f30de56452/details?q=* Opens in new window

Bureau of Infrastructure, Transport and Regional Economics, Canberra, Airport Passenger Movements by Month, Sourced on 26 October 2019, https://data.gov.au/dataset/ds-dga-cc5d888f-5850-47f3-815d-08289b22f5a8/distribution/dist-dga-38bdc971-cb22-4894-b19a-814afc4e8164/details?q=* Opens in new window