Data 608 - Final Project

Amit Kapoor

12/01/2020

OpenFlights

Introduction

Airlines industry has been a major mode of transportation within any country or across the countries around the globe now. Though it involves its strict guidelines for airport operations, flights, their routes and all, every other country is now substantially looking to invest in this industry to attract visitors and businesses from around the globe to boost their economy. It also plays a significant role to reflect better infrastructure. There are thousands of airlines operating across the globe connecting thousands of airports with different routes and I am always curious to enquire about airlines routes, connecting airports and much more related insights

Objective

The objective of the project is to give visual presentation of airline data insights i.e. airline routes, show busiest routes, mostly used airline by Airport and any other unfound visual pattern in the data. OpenFlights platform has the data that maps the flights around the world and gives all of us an opportunity to fulfill the objectives listed earlier.

This exercise is relevant given numerous airlines and their planes operating so many flights throughout the countries around the world and rather confined customized visualizations available. It will allow to explore flights from an origin and much more. I would say this could be a start of a future project where this data could be clubbed with any other relevant data for more meaningful comprehensions

# load necessary libraries
library(maps)
library(RCurl)
library(tidyr)
library(dplyr)
library(readr)
library(ggplot2)
library(ggthemes)
library(patchwork)
library(janitor)
library(treemap)
library(ggiraph)
library(plotly)

Data

I will use the data from OpenFlights (https://openflights.org/data.html), an online open source flights platform that has the data for airports, flights, their routes, country codes and planes. As of now, I see data for ~14K airports and corresponding airlines and their routes around the globe. I will be considering below set of data for final project.

  • airlines.csv
  • airports.csv
  • routes.csv
# get data
airlines_url <- getURL('https://raw.githubusercontent.com/amit-kapoor/data608/master/FinalProject/airlines.csv')
airports_url <- getURL('https://raw.githubusercontent.com/amit-kapoor/data608/master/FinalProject/airports.csv')
routes_url <- getURL('https://raw.githubusercontent.com/amit-kapoor/data608/master/FinalProject/routes.csv')

airlines data columns

  • Airline ID - Unique OpenFlights identifier for this airline.
  • Name - Name of the airline.
  • Alias - Alias of the airline. For example, All Nippon Airways is commonly known as “ANA”.
  • IATA - 2-letter IATA code, if available.
  • ICAO - 3-letter ICAO code, if available.
  • Callsign - Airline callsign.
  • Country - Country or territory where airport is located.
  • Active - “Y” if the airline is or has until recently been operational, “N” if it is defunct.
airlines <- read_csv(airlines_url)
head(airlines)
## # A tibble: 6 x 8
##   AirlineID Name                   Alias IATA  ICAO  Callsign  Country    Active
##       <dbl> <chr>                  <chr> <chr> <chr> <chr>     <chr>      <chr> 
## 1        -1 Unknown                "\\N" -     N/A   "\\N"     "\\N"      Y     
## 2         1 Private flight         "\\N" -     N/A    <NA>      <NA>      Y     
## 3         2 135 Airways            "\\N" <NA>  GNL   "GENERAL" "United S… N     
## 4         3 1Time Airline          "\\N" 1T    RNX   "NEXTIME" "South Af… Y     
## 5         4 2 Sqn No 1 Elementary… "\\N" <NA>  WYT    <NA>     "United K… N     
## 6         5 213 Flight Unit        "\\N" <NA>  TFU    <NA>     "Russia"   N

airports data columns

  • Airport ID - Unique OpenFlights identifier for this airport.
  • Name - Name of airport. May or may not contain the City name.
  • City - Main city served by airport. May be spelled differently from Name.
  • Country- Country or territory where airport is located. See Countries to cross-reference to ISO 3166-1 codes.
  • IATA - 3-letter IATA code. Null if not assigned/unknown.
  • ICAO- 4-letter ICAO code. Null if not assigned.
  • Latitude - Decimal degrees, usually to six significant digits. Negative is South, positive is North.
  • Longitude - Decimal degrees, usually to six significant digits. Negative is West, positive is East.
  • Altitude - In feet.
  • Timezone - Hours offset from UTC. Fractional hours are expressed as decimals, eg. India is 5.5.
  • DST- Daylight savings time. One of E (Europe), A (US/Canada), S (South America), O (Australia), Z (New Zealand), N (None) or U (Unknown)
  • Tz database time zone - Timezone in “tz” (Olson) format, eg. “America/Los_Angeles”.
  • Type - Type of the airport. Value “airport” for air terminals, “station” for train stations, “port” for ferry terminals and “unknown” if not known. In airports.csv, only type=airport is included.
  • Source - Source of this data. “OurAirports” for data sourced from OurAirports, “Legacy” for old data not matched to OurAirports (mostly DAFIF), “User” for unverified user contributions. In airports.csv, only source=OurAirports is included.
airports <- read_csv(airports_url)
head(airports)
## # A tibble: 6 x 14
##   `Airport ID` Name  City  Country IATA  ICAO  Latitude Longitude Altitude
##          <dbl> <chr> <chr> <chr>   <chr> <chr>    <dbl>     <dbl>    <dbl>
## 1            1 Goro… Goro… Papua … GKA   AYGA     -6.08      145.     5282
## 2            2 Mada… Mada… Papua … MAG   AYMD     -5.21      146.       20
## 3            3 Moun… Moun… Papua … HGU   AYMH     -5.83      144.     5388
## 4            4 Nadz… Nadz… Papua … LAE   AYNZ     -6.57      147.      239
## 5            5 Port… Port… Papua … POM   AYPY     -9.44      147.      146
## 6            6 Wewa… Wewak Papua … WWK   AYWK     -3.58      144.       19
## # … with 5 more variables: Timezone <dbl>, DST <chr>, `Tz database time
## #   zone` <chr>, Type <chr>, Source <chr>

routes data columns

  • Airline - 2-letter (IATA) or 3-letter (ICAO) code of the airline.
  • Airline ID - Unique OpenFlights identifier for airline (see Airline).
  • Source airport - 3-letter (IATA) or 4-letter (ICAO) code of the source airport.
  • Source airport ID - Unique OpenFlights identifier for source airport (see Airport)
  • Destination airport - 3-letter (IATA) or 4-letter (ICAO) code of the destination airport.
  • Destination airport ID - Unique OpenFlights identifier for destination airport (see Airport)
  • Codeshare - “Y” if this flight is a codeshare (that is, not operated by Airline, but another carrier), empty otherwise.
  • Stops - Number of stops on this flight (“0” for direct)
  • Equipment - 3-letter codes for plane type(s) generally used on this flight, separated by spaces
routes <- read_csv(routes_url)
head(routes)
## # A tibble: 6 x 9
##   Airline AirlineID source_airport `Source airport… destination_air…
##   <chr>   <chr>     <chr>          <chr>            <chr>           
## 1 2B      410       AER            2965             KZN             
## 2 2B      410       ASF            2966             KZN             
## 3 2B      410       ASF            2966             MRV             
## 4 2B      410       CEK            2968             KZN             
## 5 2B      410       CEK            2968             OVB             
## 6 2B      410       DME            4029             KZN             
## # … with 4 more variables: `Destination airport ID` <chr>, Codeshare <chr>,
## #   Stops <dbl>, Equipment <chr>

Data Preparation

Lets see first all countries in airports data.

# countries
unique(airports$Country)
##   [1] "Papua New Guinea"                 "Greenland"                       
##   [3] "Iceland"                          "Canada"                          
##   [5] "Algeria"                          "Benin"                           
##   [7] "Burkina Faso"                     "Ghana"                           
##   [9] "Cote d'Ivoire"                    "Nigeria"                         
##  [11] "Niger"                            "Tunisia"                         
##  [13] "Togo"                             "Belgium"                         
##  [15] "Germany"                          "Estonia"                         
##  [17] "Finland"                          "United Kingdom"                  
##  [19] "Guernsey"                         "Jersey"                          
##  [21] "Isle of Man"                      "Falkland Islands"                
##  [23] "Netherlands"                      "Ireland"                         
##  [25] "Denmark"                          "Faroe Islands"                   
##  [27] "Luxembourg"                       "Norway"                          
##  [29] "Poland"                           "Sweden"                          
##  [31] "South Africa"                     "Botswana"                        
##  [33] "Congo (Brazzaville)"              "Congo (Kinshasa)"                
##  [35] "Swaziland"                        "Central African Republic"        
##  [37] "Equatorial Guinea"                "Saint Helena"                    
##  [39] "Mauritius"                        "British Indian Ocean Territory"  
##  [41] "Cameroon"                         "Zambia"                          
##  [43] "Comoros"                          "Mayotte"                         
##  [45] "Reunion"                          "Madagascar"                      
##  [47] "Angola"                           "Gabon"                           
##  [49] "Sao Tome and Principe"            "Mozambique"                      
##  [51] "Seychelles"                       "Chad"                            
##  [53] "Zimbabwe"                         "Malawi"                          
##  [55] "Lesotho"                          "Mali"                            
##  [57] "Gambia"                           "Spain"                           
##  [59] "Sierra Leone"                     "Guinea-Bissau"                   
##  [61] "Liberia"                          "Morocco"                         
##  [63] "Senegal"                          "Mauritania"                      
##  [65] "Guinea"                           "Cape Verde"                      
##  [67] "Ethiopia"                         "Burundi"                         
##  [69] "Somalia"                          "Egypt"                           
##  [71] "Kenya"                            "Libya"                           
##  [73] "Rwanda"                           "Sudan"                           
##  [75] "South Sudan"                      "Tanzania"                        
##  [77] "Uganda"                           "Albania"                         
##  [79] "Bulgaria"                         "Cyprus"                          
##  [81] "Croatia"                          "France"                          
##  [83] "Saint Pierre and Miquelon"        "Greece"                          
##  [85] "Hungary"                          "Italy"                           
##  [87] "Slovenia"                         "Czech Republic"                  
##  [89] "Israel"                           "Malta"                           
##  [91] "Austria"                          "Portugal"                        
##  [93] "Bosnia and Herzegovina"           "Romania"                         
##  [95] "Switzerland"                      "Turkey"                          
##  [97] "Moldova"                          "Macedonia"                       
##  [99] "Gibraltar"                        "Serbia"                          
## [101] "Montenegro"                       "Slovakia"                        
## [103] "Turks and Caicos Islands"         "Dominican Republic"              
## [105] "Guatemala"                        "Honduras"                        
## [107] "Jamaica"                          "Mexico"                          
## [109] "Nicaragua"                        "Panama"                          
## [111] "Costa Rica"                       "El Salvador"                     
## [113] "Haiti"                            "Cuba"                            
## [115] "Cayman Islands"                   "Bahamas"                         
## [117] "Belize"                           "Cook Islands"                    
## [119] "Fiji"                             "Tonga"                           
## [121] "Kiribati"                         "Wallis and Futuna"               
## [123] "Samoa"                            "American Samoa"                  
## [125] "French Polynesia"                 "Vanuatu"                         
## [127] "New Caledonia"                    "New Zealand"                     
## [129] "Antarctica"                       "Afghanistan"                     
## [131] "Bahrain"                          "Saudi Arabia"                    
## [133] "Iran"                             "Jordan"                          
## [135] "West Bank"                        "Kuwait"                          
## [137] "Lebanon"                          "United Arab Emirates"            
## [139] "Oman"                             "Pakistan"                        
## [141] "Iraq"                             "Syria"                           
## [143] "Qatar"                            "Northern Mariana Islands"        
## [145] "Guam"                             "Marshall Islands"                
## [147] "Midway Islands"                   "Micronesia"                      
## [149] "Palau"                            "Taiwan"                          
## [151] "Japan"                            "South Korea"                     
## [153] "Philippines"                      "Argentina"                       
## [155] "Brazil"                           "Chile"                           
## [157] "Ecuador"                          "Paraguay"                        
## [159] "Colombia"                         "Bolivia"                         
## [161] "Suriname"                         "French Guiana"                   
## [163] "Peru"                             "Uruguay"                         
## [165] "Venezuela"                        "Guyana"                          
## [167] "Antigua and Barbuda"              "Barbados"                        
## [169] "Dominica"                         "Martinique"                      
## [171] "Guadeloupe"                       "Grenada"                         
## [173] "Virgin Islands"                   "Puerto Rico"                     
## [175] "Saint Kitts and Nevis"            "Saint Lucia"                     
## [177] "Aruba"                            "Netherlands Antilles"            
## [179] "Anguilla"                         "Trinidad and Tobago"             
## [181] "British Virgin Islands"           "Saint Vincent and the Grenadines"
## [183] "Kazakhstan"                       "Kyrgyzstan"                      
## [185] "Azerbaijan"                       "Russia"                          
## [187] "Ukraine"                          "Belarus"                         
## [189] "Turkmenistan"                     "Tajikistan"                      
## [191] "Uzbekistan"                       "India"                           
## [193] "Sri Lanka"                        "Cambodia"                        
## [195] "Bangladesh"                       "Hong Kong"                       
## [197] "Laos"                             "Macau"                           
## [199] "Nepal"                            "Bhutan"                          
## [201] "Maldives"                         "Thailand"                        
## [203] "Vietnam"                          "Burma"                           
## [205] "Indonesia"                        "Malaysia"                        
## [207] "Brunei"                           "East Timor"                      
## [209] "Singapore"                        "Australia"                       
## [211] "Christmas Island"                 "Norfolk Island"                  
## [213] "China"                            "North Korea"                     
## [215] "Mongolia"                         "United States"                   
## [217] "Latvia"                           "Lithuania"                       
## [219] "Armenia"                          "Eritrea"                         
## [221] "Palestine"                        "Georgia"                         
## [223] "Yemen"                            "Bermuda"                         
## [225] "Solomon Islands"                  "Nauru"                           
## [227] "Tuvalu"                           "Namibia"                         
## [229] "Djibouti"                         "Montserrat"                      
## [231] "Johnston Atoll"                   "Western Sahara"                  
## [233] "Niue"                             "Cocos (Keeling) Islands"         
## [235] "Myanmar"                          "Svalbard"                        
## [237] "Wake Island"

In the next few steps, country names are renamed, route column column Airline is renamed to IATA which will further be used to merge routes with airlines data.

# rename countries
airports$Country[airports$Country=='Congo (Kinshasa)'] <- 'Democratic Republic of the Congo'
airports$Country[airports$Country=='Congo (Brazzaville)'] <- 'Republic of Congo'

#rename column
colnames(routes)[1]<-'IATA'

#merge routes and airline data
routes1<-data.frame(merge(routes, airlines %>% select(IATA, Name), by='IATA',sort=F))

Data Visualization

Top 25 Countries having most airports

To get top 25 countries having most airports, we first simply creates tabular results of categorical variables which in this case is Country from airports data and then arrange in descending order. Finally select top 25 from results and create a bar graph.

data.frame(table(airports$Country)) %>% 
  arrange(desc(Freq)) %>% 
  head(25) %>% 
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1, label = Freq)) + 
  geom_bar(stat = "identity", show.legend = F) +
  labs(title = "Top 25 Countries having most Airports", 
       x = "Country", y = "The number of Airports") +
  #geom_label(angle = 45, show.legend = F) +
  geom_text(show.legend = F, vjust = -.5) + 
  scale_fill_viridis_d(option = "cividis") +
  theme_fivethirtyeight() +
  theme(axis.text.x = element_text(angle = 40, size = 15), panel.grid.major = element_blank(), panel.grid.minor = element_blank())

treemap(data.frame(table(airports$Country)),
        index="Var1",
        vSize="Freq",
        type="index",
        palette = 'Pastel1', #Set3, Set2, Pastel1, 
        title = "Overall Number of Airport by countries")

United States has undoubtedly the most airports most probably due to highest connectivity within and around the world. Interesting to see Brazil and Russia has same number of airports as they significantly differ in total area. India has also made it within top 10 countries. Number of airports in any given country depends upon the its economy and connectivity.

Top 25 Countries having most airlines

To get top 25 countries having most airlines, we first simply creates tabular results of categorical variables which in this case is Country from airlines data and then arrange in descending order. Finally select top 25 from results and create a bar graph.

data.frame(table(airlines$Country)) %>% 
  arrange(desc(Freq)) %>% head(25) %>% 
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, 
             fill = Var1, label = Freq)) + 
  geom_bar(stat = "identity", show.legend = F) +
  #geom_label(show.legend = F, vjust = -.1) + 
  geom_text(show.legend = F, vjust = -.5) + 
  scale_fill_viridis_d(option = "cividis") +  #plasma, magma, inferno, cividis
  theme_fivethirtyeight() +
  theme(axis.text.x = element_text(angle = 40, size = 15), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  labs(x = "Country", y = "The number of Airlines", 
       title = "Top 25 Countries having most airlines")

treemap(data.frame(table(airlines$Country)),
        index="Var1",
        vSize="Freq",
        type="index",
        palette = 'Pastel1', #Set3, Set2, Pastel1, 
        algorithm = 'pivotSize', 
        title = "Overall Number of Airport by countries")

Again as expected US is on top having most airlines. Dont see India here though it was earlier in top 25 countries of airports.

Routes from an airport

First create a basic map.

countries_map <-map_data("world")
world_map <- ggplot() + 
  geom_map(data = countries_map, 
           map = countries_map,aes(x = long, y = lat, map_id = region, group = group),
           fill = "white", color = "black", size = 0.05)

world_map

To find routes from a given airport, we first create a function routes_through as below:

  • create a dataframe filtering the given airport - d1
  • get all the destination airports
  • merge it with the airports.csv data - d2
  • cbind d1 and d2
routes_through <- function(iata_start){
  
  d1 <- data.frame(routes1 %>% filter(source_airport == iata_start))
  d1.iata.start<-data.frame(d1 %>% select(destination_airport) %>% rename(IATA = destination_airport))
  
  #merge with the airport data
  d2<-data.frame(merge(d1.iata.start,airports %>% select(Name,City,Country,IATA,Latitude,Longitude),by='IATA', sort=F))
  colnames(d2)<-c("iata_end","arp_name_dest","city_name_dest","sntry_name_dest","lat_end","long_end")
  
  #get geo locations of source.airport
  lat.start<-rep(airports[airports$IATA==iata_start,'Latitude'],nrow(d1))
  long.start<-rep(airports[airports$IATA==iata_start,'Longitude'],nrow(d1))
  d1$lat.start = lat.start
  d1$long.start = long.start
  
  #cbind all
  res <- data.frame(cbind(d1,d2))
  
  return(res)
}

Lets consider here Chicago ORD airport and find its routes all around the world.

arp_origin <- routes_through('ORD')
colnames(arp_origin)
##  [1] "IATA"                   "AirlineID"              "source_airport"        
##  [4] "Source.airport.ID"      "destination_airport"    "Destination.airport.ID"
##  [7] "Codeshare"              "Stops"                  "Equipment"             
## [10] "Name"                   "lat.start"              "long.start"            
## [13] "iata_end"               "arp_name_dest"          "city_name_dest"        
## [16] "sntry_name_dest"        "lat_end"                "long_end"
head(arp_origin)
##   IATA AirlineID source_airport Source.airport.ID destination_airport
## 1   3E     10739            ORD              3830                 DEC
## 2   3E     10739            ORD              3830                 BRL
## 3   AA        24            ORD              3830                 AUH
## 4   AA        24            ORD              3830                 AUS
## 5   AA        24            ORD              3830                 AZO
## 6   AA        24            ORD              3830                 BDL
##   Destination.airport.ID Codeshare Stops Equipment              Name lat.start
## 1                   4042      <NA>     0       CNC    Air Choice One   41.9786
## 2                   5726      <NA>     0       CNC    Air Choice One   41.9786
## 3                   2179         Y     0       777 American Airlines   41.9786
## 4                   3673      <NA>     0   M80 M83 American Airlines   41.9786
## 5                   4039         Y     0   ER4 ERD American Airlines   41.9786
## 6                   3825         Y     0   ER4 E75 American Airlines   41.9786
##   long.start iata_end                          arp_name_dest city_name_dest
## 1   -87.9048      DEC                        Decatur Airport        Decatur
## 2   -87.9048      BRL        Southeast Iowa Regional Airport     Burlington
## 3   -87.9048      AUH        Abu Dhabi International Airport      Abu Dhabi
## 4   -87.9048      AUH        Abu Dhabi International Airport      Abu Dhabi
## 5   -87.9048      AUS Austin Bergstrom International Airport         Austin
## 6   -87.9048      AUS Austin Bergstrom International Airport         Austin
##        sntry_name_dest lat_end long_end
## 1        United States 39.8346 -88.8657
## 2        United States 40.7832 -91.1255
## 3 United Arab Emirates 24.4330  54.6511
## 4 United Arab Emirates 24.4330  54.6511
## 5        United States 30.1945 -97.6699
## 6        United States 30.1945 -97.6699

Now to draw all the routes started with ‘ORD’ first we group by destination to avoid duplicates. Then we draw all the routes using the world_map created above using geom_curve to show connectivity from origin (in this case ORD) to various destinations.

arp_origin_grpd <- arp_origin %>% 
  select(arp_name_dest ,lat.start, long.start,lat_end,long_end) %>% 
  group_by(arp_name_dest) %>% 
  mutate(count=n()) %>% 
  distinct()

maxFlights <- max(arp_origin_grpd$count)

world_map + 
  geom_curve(data=arp_origin_grpd, 
             aes(x=unlist(long.start),
                 y=unlist(lat.start),
                 xend=unlist(long_end),
                 yend=unlist(lat_end),
                 color=factor(count)),
             curvature = 0.25, arrow = arrow(length = unit(0.008, "npc")), alpha=.70,size=1) + 
  geom_point_interactive(data=arp_origin_grpd, 
                         aes( tooltip=arp_name_dest, label= arp_name_dest, x=unlist(long_end),y=unlist(lat_end)), size=0.01) + 
  theme_fivethirtyeight() + theme(
    legend.title = element_text(face = "bold", size=15), 
    legend.text = element_text(colour="black", size=15, color = "orangered4"), 
    panel.grid.major = element_blank(),
    axis.text=element_blank(),
    axis.ticks=element_blank()) + 
  scale_color_manual(name="Routes serving airlines",values=rev(viridis::viridis(maxFlights))) +
  ggtitle(paste0('Departures from ',(airports %>% filter(IATA=='ORD'))$Name))

Busiest routes

Busiest airports here refer to the ones that deal with high number of different planes. From the routes data, the equipment column refers to the plane type(s) used and separated by spaces. Therfore a string split will reflect the count i.e. the number of different aircraft per route.

routes$num_aircraft <- sapply(routes$Equipment, function(x) length(strsplit(x," ")[[1]]))
routes %>% group_by(num_aircraft) %>% summarise(count=n()) %>% mutate(perc = 100*count / sum(count)) %>% mutate(perc = round(perc,2))
## # A tibble: 9 x 3
##   num_aircraft count  perc
##          <int> <int> <dbl>
## 1            1 50534 74.7 
## 2            2 11659 17.2 
## 3            3  3524  5.21
## 4            4  1250  1.85
## 5            5   469  0.69
## 6            6   139  0.21
## 7            7    51  0.08
## 8            8    29  0.04
## 9            9     8  0.01

Above we see almost 75% of all the routes are using single type of aircraft. We will draw a visualization on world map to see routes using 6,7,8,9 type of aircrafts. First create a function accepts number of aircrafts and returns a dataframe with relevant details. In this function first we subset data for given airline, loop over and get log and lat. We will then get routes of these 4 types and finall draw on map.

airlineConnect <- function(routes, name){
  
  arpt_src<-c()
  arpt_dest<-c()
  arpt_src_long<-c()
  arpt_src_lat<-c()
  arpt_dest_long<-c()
  arpt_dest_lat<-c()
  
  for(i in 1:nrow(routes)){
    arpt_src[i]<-routes$source_airport[i]
    arpt_dest[i]<-routes$destination_airport[i]
    arpt_src_long[i]<- airports[airports$IATA==arpt_src[i],'Longitude']
    arpt_src_lat[i]<- airports[airports$IATA==arpt_src[i],'Latitude']
    arpt_dest_long[i]<- airports[airports$IATA==arpt_dest[i],'Longitude']
    arpt_dest_lat[i]<- airports[airports$IATA==arpt_dest[i],'Latitude']
  }
  
  res<-data.frame('arl_name' = rep(name,nrow(routes)),
                  'arpt_src'= arpt_src,
                  'arpt_dest'= arpt_dest,
                  'arpt_src_long'= unlist(arpt_src_long),
                  'arpt_src_lat'= unlist(arpt_src_lat),
                  'arpt_dest_long'= unlist(arpt_dest_long),
                  'arpt_dest_lat'= unlist(arpt_dest_lat))
  
  return(res)
}
routes$num_aircraft <- sapply(routes$Equipment, function(x) length(strsplit(x," ")[[1]]))

#routes_5_aircrafts <- routes %>% dplyr::filter(num_aircraft==5)
routes_6_aircrafts <- routes %>% dplyr::filter(num_aircraft==6)
routes_7_aircrafts <- routes %>% dplyr::filter(num_aircraft==7)
routes_8_aircrafts <- routes %>% dplyr::filter(num_aircraft==8)
routes_9_aircrafts <- routes %>% dplyr::filter(num_aircraft==9)

#routes_5 <- airlineConnect(routes_5_aircrafts, 'For 5')
routes_6 <- airlineConnect(routes_6_aircrafts, 'routes having 6 aircrafts')
routes_7 <- airlineConnect(routes_7_aircrafts, 'routes having 7 aircrafts')
routes_8 <- airlineConnect(routes_8_aircrafts, 'routes having 8 aircrafts')
routes_9 <- airlineConnect(routes_9_aircrafts, 'routes having 9 aircrafts')

tot_routes <- rbind(routes_6, routes_7, routes_8, routes_9)
world_map + 
      geom_curve(data=tot_routes,aes(x=arpt_src_long,
                                     y=arpt_src_lat,
                                     xend=arpt_dest_long,
                                     yend=arpt_dest_lat,
                                     color=arl_name),
                 curvature = 0.2, arrow = arrow(length = unit(0.005, "npc")), alpha=1,size=.5) + 
      theme_fivethirtyeight() + 
      theme(
        legend.position="bottom",
        legend.text = element_text(colour="black", size=9, color = "orangered4"), 
        panel.grid.major = element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),plot.title=element_text(face="bold",hjust=0,vjust=.75,colour="#3C3C3C",size=19),
        plot.subtitle=element_text(size=15, hjust=0, face="italic", color="black")) + 
      labs(
        title="Busiest Routes in the World",
        subtitle="Routes shown here use 6 or more different aircrafts") +
      scale_color_manual(name="",values= c("orange" ,"red", "green", "blue"))

Airline routes - Codeshare vs Non Codeshare

Here we will create 2 graphs for a given airline (considered here lufthansa) which depicts codeshare and non code share details. First we filter the records for airline for cs and non cs and then used previously created function to get corressponding routes.

head(routes)
## # A tibble: 6 x 10
##   IATA  AirlineID source_airport `Source airport… destination_air…
##   <chr> <chr>     <chr>          <chr>            <chr>           
## 1 2B    410       AER            2965             KZN             
## 2 2B    410       ASF            2966             KZN             
## 3 2B    410       ASF            2966             MRV             
## 4 2B    410       CEK            2968             KZN             
## 5 2B    410       CEK            2968             OVB             
## 6 2B    410       DME            4029             KZN             
## # … with 5 more variables: `Destination airport ID` <chr>, Codeshare <chr>,
## #   Stops <dbl>, Equipment <chr>, num_aircraft <int>
#merge routes and airline data
routes<-data.frame(merge(routes, airlines %>% select(IATA, Name), by='IATA',sort=F))
head(routes)
##   IATA AirlineID source_airport Source.airport.ID destination_airport
## 1   2B       410            ASF              2966                 KZN
## 2   2B       410            NBC              6969                 SVX
## 3   2B       410            NJC              2972                 SVX
## 4   2B       410            KZN              2990                 DME
## 5   2B       410            NBC              6969                 DME
## 6   2B       410            KZN              2990                 LED
##   Destination.airport.ID Codeshare Stops Equipment num_aircraft       Name
## 1                   2990      <NA>     0       CR2            1 Aerocondor
## 2                   2975      <NA>     0       CR2            1 Aerocondor
## 3                   2975      <NA>     0       CR2            1 Aerocondor
## 4                   4029      <NA>     0       CR2            1 Aerocondor
## 5                   4029      <NA>     0       CR2            1 Aerocondor
## 6                   2948      <NA>     0       CR2            1 Aerocondor
# Lufthansa
al_noncs <- routes %>% dplyr::filter(Name=="Lufthansa") %>% dplyr::filter(is.na(Codeshare))
al_cs <- routes %>% dplyr::filter(Name=="Lufthansa" & Codeshare == 'Y')
    
al_noncs_routes <- airlineConnect(al_noncs, "Lufthansa")
al_cs_routes <- airlineConnect(al_cs, "Lufthansa")

w1 <- world_map + 
  geom_curve(data=al_noncs_routes, aes(x=arpt_src_long, 
                                          y=arpt_src_lat, 
                                          xend=arpt_dest_long, 
                                          yend=arpt_dest_lat #, color=Codeshare
  ), 
  curvature = 0.3, 
  arrow = arrow(length = unit(0.005, "npc")), 
  color = "orangered2",
  alpha=.5,size=.25) + 
  theme_fivethirtyeight() +
  #theme_classic() +
  theme(legend.position=c(.85,1.04),
        panel.grid.major = element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        plot.title=element_text(face="bold",hjust=0,vjust=.8,colour="#3C3C3C",size=20),
        plot.subtitle=element_text(size=15, hjust=0, face="italic", color="black")) + 
  labs(title=paste0("Routes taken by airplanes for ",al_noncs_routes$arl_name),
       subtitle="Operated directly by airline") + 
  scale_color_brewer(palette='Set1')

w2<- world_map + 
  geom_curve(data=al_cs_routes, aes(x=arpt_src_long, 
                                    y=arpt_src_lat, 
                                    xend=arpt_dest_long, 
                                    yend=arpt_dest_lat #, color=Codeshare
  ), 
  curvature = 0.3, 
  arrow = arrow(length = unit(0.005, "npc")), 
  color = "green4",
  alpha=.5,size=.25) + 
  theme_fivethirtyeight() +
  #theme_classic() +
  theme(legend.position=c(.85,1.04),
        panel.grid.major = element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        plot.title=element_text(face="bold",hjust=0,vjust=.8,colour="#3C3C3C",size=20),
        plot.subtitle=element_text(size=15, hjust=0, face="italic", color="black")) + 
  labs(title=paste0("Routes taken by airplanes for ",al_cs_routes$arl_name),
       subtitle="Operated not directly by airline but through another carrier") + 
  scale_color_brewer(palette='Set1')

w1+w2

Summary

As we can clearly gain insights from the above graphs for most busiest airports, top countries having most airports and running most airlines. United States has by far the most airports most probably due to highest connectivity within and around the world. Interesting to see Brazil and Russia has same number of airports as they significantly differ in total area. India has also made it within top 10 countries. Number of airports in any given country depends upon the its economy and connectivity. Again as expected US is on top having most airlines. Dont see India here though it was earlier in top 25 countries of airports. For busiest routes, it seems mainly in United States and Europe. Also seeing few airline routes, US (American Airlines, Delta) and Europe (Lufthansa) airlines are having connectivity across the globe.

There can be further ways to expand this Data Exploration with extended airports data that includes airports, train stations and ferry terminals, including user contributions.