This project titled “US Airport Network” authored by Eric Jiahong Li, uses R to conduct a thorough network analysis of flight data between US airports in 2010. This project aims to further refine the author’s network analysis and visualization skills using real-world data. The analysis includes an edge-list based representation of connections between airports, followed by an intricate exploration of graph properties such as edge betweenness, k-step reach, and degree.
The project begins with a succinct description of the data, which consists of a directed network of flights between US airports, with each edge representing a connection from one airport to another. The weight of an edge denotes the number of flights on that connection in the given direction for the year 2010. Using the igraph library in R, a graph object is created from this edge list. The project goes further by visualizing the connections and then delves into the graph’s components and centralization measures, followed by an examination of its diameter.
The project also includes an analysis of edge betweenness, which can help understand the most ‘central’ connections in terms of the flow of traffic. A k-step reach function is created to assess the reachability within the graph. Lastly, the project examines the degree of each node in the graph, capturing the total degree, in-degree, and out-degree. These measures give insights into the busiest airports in terms of incoming and outgoing flights, providing a comprehensive understanding of the US airport network.
US airports This is the directed network of flights between US airports in 2010. Each edge represents a connection from one airport to another, and the weight of an edge shows the number of flights on that connection in the given direction, in 2010. Data Sources The complete US airport network in 2010. This is the network used in the first part of the Why Anchorage is not (that) important: Binary ties and Sample selection-blog post. The data is downloaded from the Bureau of Transportation Statistics (BTS) Transtats site (Table T-100; id 292) with the following filters: Geography=all; Year=2010; Months=all; and columns: Passengers, Origin, Dest. Based on this table, the airport codes are converted into id numbers, and the weights of duplicated ties are summed up. Also ties with a weight of 0 are removed (only cargo), and self-loops removed.
setwd("~/Desktop/BIS 411A/final")
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
airport <- read.table("USairport_2010.txt") # read in edge list as data frame
aircodes <- read.table("USairport_2010_codes.txt") # read in airport codes as data frame
aircodes2 <- aircodes[aircodes$V1 %in% airport$V1 | aircodes$V1 %in% airport$V2,] # delete airport codes not in the edge list data frame
graph <- graph_from_data_frame(airport, directed=FALSE)
V(graph)$label <- aircodes2$V2 # change vertex labels to airport codes
By executing this setup code, the necessary data is loaded, and a graph object is created, ready for further analysis and visualization of the US airport network.
# Plotting the connections with modified vertex size
plot(graph, vertex.color = "gray40", edge.color = "gray70", edge.arrow.mode = 0, vertex.label = NA,
vertex.size = 4, main = "US airports connection")
This resulting plot provides a visual representation of the US airport network, where the vertices represent airports, and the edges represent the connections between them. By examining the graph, one can gain an understanding of the overall structure and connectivity of the US airport network in 2010.
##components
# Get the components
comp <- components(graph)
# Count the number of nodes in each component
comp_sizes <- table(comp$membership)
# Print the count of nodes in each component
print(comp_sizes)
##
## 1 2
## 1572 2
# Print the total number of components
cat("Total number of components: ", max(comp$membership), "\n")
## Total number of components: 2
farthest_vertices(graph)
## $vertices
## + 2/1574 vertices, named, from 3db194f:
## [1] 277 1740
##
## $distance
## [1] 8
##Centralization
centr_degree(graph)$centralization
## [1] 0.356086
##Diameter
diameter(graph)
## [1] 8
graph.diameter <- shortest_paths(graph,
from = V(graph)["277"],
to = V(graph)["1740"],
output = "both")
ecol <- rep("gray70", ecount(graph))
ecol[unlist(graph.diameter$epath)] <- "tomato"
ew <- rep(2, ecount(graph))
ew[unlist(graph.diameter$epath)] <- 4
vcol <- rep("gray40", vcount(graph))
vcol[unlist(graph.diameter$vpath)] <- "gold"
plot(graph, vertex.color=vcol, edge.color=ecol, edge.width=ew, edge.arrow.mode=0,
vertex.label.cex=.7)
$csize [1] 1572 2 means that there are two components in the network, A major component includes 1572 airports. This is likely to be the main network of airports in the US, where each airport can be reached from any other airport in this component through some path of connecting flights. A smaller component that includes only 2 airports. This suggests that there is a pair of airports that are only connected to each other and are not directly or indirectly connected to the rest of the airports in your dataset In terms of your US airport network analysis, this information tells you that the vast majority of airports are connected to each other in a single large network, but there’s a pair of airports that are isolated from this main network in terms of direct or indirect flight routes.
The output $vertices [1] 277 1740 tells you the IDs(CEX-Chena Hot Springs Airport, WST-Westerly State Airport) of the pair of airports that are farthest apart in the network. These are the two airports where the minimum number of connecting flights (i.e., the shortest travel path) is greater than for any other pair of airports in the network.
The output $distance [1] 8 indicates that these two airports are 8 flights apart. This means that to travel from one of these airports to the other, you would need to take at least 8 flights, assuming you always choose the quickest route with the fewest connecting flights.
A degree centralization score of 0.356086 means that while there are some airports in the network with a high number of flights, the network is not highly dominated by these airports. There is a somewhat moderate distribution of flight connections among the airports in network.
graph.diameter <- shortest_paths(graph, from = V(graph)[“266”], to = V(graph)[“1740”], output = “both”): This line calculates the shortest paths between two specific airports in the graph. The airports are identified by their vertex indices (266 and 1740 in this case). By determining the shortest paths, the code aims to visualize the specific routes and connections between these two airports.
##K-Step Reach
reach2<-function(x){
r=vector(length=vcount(x))
for (i in 1:vcount(x)){
n=neighborhood(x,2,nodes=i)
ni=unlist(n)
l=length(ni)
r[i]=(l)/vcount(x)}
r}
# install DT package if it is not installed
if(!require(DT)) {
install.packages("DT")
}
## Loading required package: DT
# load DT library
library(DT)
# get reach2 values
reach2_values <- reach2(graph)
# create a data frame
df <- data.frame(Node = 1:length(reach2_values), Reach = reach2_values)
# create an interactive datatable and display 10 rows at a time
datatable(df, options = list(pageLength = 10))
# Calculate 2-step reach for each node
reachability <- reach2(graph)
# Find the maximum 2-step reach
max_reachability <- max(reachability)
print(max_reachability)
## [1] 0.821474
graph_reach2 <- reach2(graph)
set.seed(2022)
plot(graph, edge.arrow.size=.2, vertex.label=NA, vertex.color="tomato",
vertex.size=graph_reach2*40)
In the context of the US airport network, the 2-step reach can provide valuable insights about the connectivity and accessibility of each airport. Airports with a larger 2-step reach could be considered as more accessible as they can connect to a larger proportion of the network with at most one connecting flight. This analysis could be used to identify key airports for connectivity in the network. The airports with larger 2-step reach could also be considered as potential hubs for further expansion or increased capacity.
The 2-step reach value of 0.821474 indicates that the major transportation airport with this maximum reachability value can directly or indirectly (i.e., with at most one connecting flight) reach approximately 82.15% of all the other airports in the network.
This means that from this major transportation airport, people can get to more than 82% of all other airports with no more than one stopover.
##degree
graph <- graph_from_data_frame(airport, directed=TRUE)
graph_degree <- degree(graph, mode="total")
graph_indeg <- degree(graph, mode="in")
graph_outdeg <- degree(graph, mode="out")
dat <- data.frame(graph_degree, graph_indeg, graph_outdeg)
dat
# Find the maximum
max_indegree <- max(dat$graph_indeg)
# Find the maximum degree
max_degree <- max(dat$graph_degree)
# Find the maximum out-degree
max_outdegree <- max(dat$graph_outdeg)
# Print the results
cat("Maximum In-Degree:", max_indegree, "\n")
## Maximum In-Degree: 294
cat("Maximum Degree:", max_degree, "\n")
## Maximum Degree: 596
cat("Maximum Out-Degree:", max_outdegree, "\n")
## Maximum Out-Degree: 302
par(mfrow=c(1,3), mar=c(1,1,1,1))
set.seed(2022)
plot(graph, edge.arrow.size=.2, vertex.label=NA, vertex.color="tomato",
vertex.size=graph_degree/8)
title(main="total degree", line= -5)
set.seed(2022)
plot(graph, edge.arrow.size=.2, vertex.label=NA, vertex.color="tomato",
vertex.size=graph_indeg/8)
title(main="in-degree", line= -5)
set.seed(2022)
plot(graph, edge.arrow.size=.2, vertex.label=NA, vertex.color="tomato",
vertex.size=graph_outdeg/8)
title(main="out-degree", line= -5)
294 represents the highest number of incoming flights to any airport in the network. This airport has the most connections from other airports, indicating that it is a popular destination for flights.
596 refers to the highest total number of connections (both incoming and outgoing flights) for any airport in the network. This airport has the most overall connectivity, serving as a major hub with a high volume of incoming and outgoing flights.
302 indicates the highest number of outgoing flights from any airport in the network. This airport has the most connections to other airports, suggesting that it is a major departure point with numerous direct flight routes.
These maximum degree values suggest that airports with high degrees are likely to be significant hubs or major airports within the US airport network. They play a crucial role in connecting various airports and facilitating passenger travel across the network. Airports with higher degrees generally have a greater impact on the overall connectivity, efficiency, and accessibility of the US airport system.
In this analysis of the US airport network, several key findings emerge. The network consists of a main component comprising 1,572 airports that are interconnected, allowing passengers to travel between any two airports through a series of connecting flights. This reflects a well-connected and extensive network, ensuring ease of travel across the United States. However, there is also a smaller component consisting of only two airports that are isolated from the main network, suggesting limited direct or indirect flight routes to other airports.
Furthermore, the analysis reveals the farthest airports in terms of travel distance. The airports with vertex IDs 277 and 1740 are approximately eight flights apart, implying that a minimum of eight connecting flights would be required to travel between them. This finding highlights the vast geographical coverage and potential challenges in reaching certain airports within the network.
The degree centrality measures shed light on the importance and connectivity of airports. The maximum in-degree, degree, and out-degree values indicate that specific airports play critical roles in facilitating air travel. The airport with the highest degree has the most connections, both incoming and outgoing, making it a major hub within the US airport system.
Overall, this analysis emphasizes the extensive reach and connectivity of the US airport network, enabling efficient travel across the country. While most airports are interconnected within the main network, a few airports may require more indirect routes to reach. Understanding the structure and connectivity of the airport network is crucial for network planning and optimization. Identifying the major hubs, airports with high degrees, and airports with the most connections allows airport authorities and aviation professionals to allocate resources effectively. It helps in determining optimal routes, scheduling flights, and improving connectivity between airports, ultimately enhancing the efficiency and reliability of air travel.