It has been almost 3 months since i have moved in Sydney. And these day, i am considering domestic trips with my friend. So, i hope this analysis help me arrange my trips.
The source of Airpot Traffic dataset is Domestic Airlines - Top Routes and Totals.
Original data of each is collected by organization backed Australian government. So they are pretty reliable and accurate.
library(data.table)
library(tidyverse)## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.8.0 ✔ stringr 1.3.0
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between() masks data.table::between()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::first() masks data.table::first()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::last() masks data.table::last()
## ✖ purrr::transpose() masks data.table::transpose()
## ✖ dplyr::vars() masks ggplot2::vars()
library(stringr)
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
airport <- fread("audomcitypairs-20180406.csv", data.table = F)
airport$City1 <- airport$City1 %>% str_to_lower()
airport$City1 <- airport$City1 %>% str_to_title()
airport$City2 <- airport$City2 %>% str_to_lower()
airport$City2 <- airport$City2 %>% str_to_title()
airport <- airport %>% filter(Year < 2018)
airport <- airport %>% filter(Year >= 2000)
city <- fread("worldcitiespop.csv", data.table = F)##
Read 9.8% of 3173958 rows
Read 20.2% of 3173958 rows
Read 29.9% of 3173958 rows
Read 35.0% of 3173958 rows
Read 39.7% of 3173958 rows
Read 50.7% of 3173958 rows
Read 64.9% of 3173958 rows
Read 81.0% of 3173958 rows
Read 82.9% of 3173958 rows
Read 3173958 rows and 7 (of 7) columns from 0.153 GB file in 00:00:15
city.australia <- city %>% filter(Country == "au")
city.australia <- city.australia %>% select(-Country, -Population, -Region, -City)
names(city.australia)[1] <- "City"airport %>% str()## 'data.frame': 13169 obs. of 12 variables:
## $ City1 : chr "Albury" "Albury" "Albury" "Albury" ...
## $ City2 : chr "Sydney" "Sydney" "Sydney" "Sydney" ...
## $ Month : int 36526 36557 36586 36617 36647 36678 36708 36739 36770 36800 ...
## $ Passenger_Trips : int 8708 8785 10390 9693 9831 9440 10244 12360 12912 10926 ...
## $ Aircraft_Trips : int 401 398 423 394 418 403 458 589 566 580 ...
## $ Passenger_Load_Factor: num 62.5 63.6 70.4 70.7 67.3 67 63.9 59.5 64.2 53.1 ...
## $ Distance_GC_(km) : int 452 452 452 452 452 452 452 452 452 452 ...
## $ RPKs : int 3936016 3970820 4696280 4381236 4443612 4266880 4630288 5586720 5836224 4938552 ...
## $ ASKs : int 6297264 6243024 6667904 6196016 6601912 6364160 7245560 9387136 9096952 9292216 ...
## $ Seats : int 13932 13812 14752 13708 14606 14080 16030 20768 20126 20558 ...
## $ Year : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
## $ Month_num : int 1 2 3 4 5 6 7 8 9 10 ...
Dataset contains monthly aggregated data of flights between Australian cities. It contains monthly aggregated data of flights between Australian cities.
Seats: (I couldn’t find any definitions)
“Flight Statge” means the operation of an aircraft from take-off to landing.
port.city <- c("Adelaide", "Albury", "Alice Springs", "Armidale", "Ayers Rock","Ballina", "Brisbane",
"Broome", "Bundaberg", "Burnie", "Cairns", "Canberra", "Coffs Harbour", "Darwin", "Devonport",
"Dubbo", "Emerald", "Geraldton", "Gladstone", "Gold Coast", "Hamilton Island", "Hervey Bay", "Hobart",
"Kalgoorlie","Karratha", "Launceston", "Mackay", "Melbourne", "Mildura", "Moranbah", "Mount Isa", "Newcastle",
"Newman", "Perth", "Port Hedland", "Port Lincoln", "Port Macquarie", "Proserpine", "Rockhampton",
"Sunshine Coast", "Sydney", "Tamworth", "Townsville", "Wagga Wagga")
city.australia <- city.australia %>% filter(City %in% port.city)airport <- merge(airport, city.australia, by.x = "City1", by.y = "City")
names(airport)[13] <- "City1.Latitude"
names(airport)[14] <- "City1.Longitude"
airport <- merge(airport, city.australia, by.x = "City2", by.y = "City")
names(airport)[15] <- "City2.Latitude"
names(airport)[16] <- "City2.Longitude"airport <- airport %>% mutate(id = rownames(airport))
airport.1 <- airport %>%
select(-contains("Latitude"), -contains("Longitude"))
airport.1 <- airport.1 %>%
gather('City1', 'City2', key = "Airport.type", value = "City")
airport.1$Airport.type <- airport.1$Airport.type %>% str_replace(pattern = "City1", replacement = "Departure")
airport.1$Airport.type <- airport.1$Airport.type %>% str_replace(pattern = "City2", replacement = "Arrive")
airport.1 <- merge(airport.1, city.australia, by.x = "City", by.y = "City")world.map <- map_data ("world")##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
au.map <- world.map %>% filter(region == "Australia")
au.map <- fortify(au.map)
ggplot() +
geom_map(data=au.map, map=au.map,
aes(x=long, y=lat, group=group, map_id=region),
fill="white", colour="black") +
ylim(-43, -10) +
xlim(110, 155) +
geom_point(data = airport.1, aes(x = Longitude, y = Latitude)) +
geom_line(data = airport.1, aes(x = Longitude, y = Latitude, group = id), colour = "red", alpha = .1) +
labs(title = "Australian Domestic Aircraft Routes")## Warning: Ignoring unknown aesthetics: x, y
As we can see the east coast area looks pretty messy.
plot.year <- airport.1 %>%
ggplot(aes(x = Year, fill = City)) +
geom_bar() +
labs(title = "Airport Traffic Amount by City from 2000 to 2017")
plot.year %>% ggplotly()traffic.transition <- airport.1 %>%
group_by(City, Year) %>%
summarise(Annual.Aircraft_Trips = sum(Aircraft_Trips)) %>%
ungroup() %>%
ggplot(aes(x = Year, y = Annual.Aircraft_Trips, group = City, colour = City)) +
geom_line(show.legend = F) +
labs(title = "Annual Airport Traffic by each city from 2000 to 2017", y = "Annual Traffic")
traffic.transition %>% ggplotly()As the plot shows, Perth, Sydney, Melbourne, Brisbane are the four cities that have most traffic over time.
I’m gonna scrape some tables from List of cities in Australia by population
library(rvest)## Loading required package: xml2
##
## Attaching package: 'rvest'
## The following object is masked from 'package:purrr':
##
## pluck
## The following object is masked from 'package:readr':
##
## guess_encoding
library(stringr)
library(magrittr)##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
page <- read_html("https://en.wikipedia.org/wiki/List_of_cities_in_Australia_by_population")
au.state.list <- page %>% html_table(header = TRUE, fill = TRUE) %>% extract2(1)
au.state.list <- au.state.list %>% select(contains("SUA"), contains("State"))
names(au.state.list)[1] <- "City"
names(au.state.list)[2] <- "State"
airport.state <- merge(airport.1, au.state.list, by.x = "City", by.y = "City")airport.state$Month_num <- as.factor(airport.state$Month_num)
airport.state %>%
ggplot(aes(x = Month_num, y = Aircraft_Trips, fill = Month_num)) +
geom_bar(stat = "identity") +
facet_wrap(~State, scales = "free") +
labs(x = "Month", y = "Monthly Aircraft Trips", title = "Monthly Aircraft Trips by each state")Although the numbers of aircraft trips of airports in Northern Territory and Tasmania is small, there is a reverse trend. In winter in Tasmania, people are less proactive about airplane trips. However, conversely Northern Terriory shows the oppsite trend. In winter, people are proactive.