Airports across the U.S. are part of a large and complex network, but not all airports play the same role. Some act as major hubs that connect a huge number of routes, while others are much smaller and less connected. In this project, I explore how these connections are structured and which airports are the most central to the overall system. It’s a chance to apply network analysis techniques I’ve learned in class to a real-world topic and build a visual representation of how the U.S. air travel network operates.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(igraph)
##
## Attaching package: 'igraph'
##
## The following objects are masked from 'package:lubridate':
##
## %--%, union
##
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
##
## The following objects are masked from 'package:purrr':
##
## compose, simplify
##
## The following object is masked from 'package:tidyr':
##
## crossing
##
## The following object is masked from 'package:tibble':
##
## as_data_frame
##
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
##
## The following object is masked from 'package:base':
##
## union
library(readr)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
# Load airport data
airports <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat", col_names = FALSE)
## Rows: 7698 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): X2, X3, X4, X5, X6, X10, X11, X12, X13, X14
## dbl (4): X1, X7, X8, X9
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(airports) <- c("ID", "Name", "City", "Country", "IATA", "ICAO", "Latitude", "Longitude", "Altitude", "Timezone", "DST", "TZ", "Type", "Source")
# Load routes data
routes <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat", col_names = FALSE)
## Rows: 67663 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): X1, X2, X3, X4, X5, X6, X7, X9
## dbl (1): X8
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(routes) <- c("Airline", "AirlineID", "SourceAirport", "SourceID", "DestAirport", "DestID", "Codeshare", "Stops", "Equipment")
# Filter to U.S. airports only
us_airports <- airports %>% filter(Country == "United States" & IATA != "\\N")
# Filter routes to only include U.S. airports
us_routes <- routes %>%
filter(SourceAirport %in% us_airports$IATA & DestAirport %in% us_airports$IATA)
# Create edge list for graph
edges <- us_routes %>% select(SourceAirport, DestAirport) %>% drop_na()
# Create graph object
airport_graph <- graph_from_data_frame(edges, directed = TRUE)
This dataset includes: - Nodes: Airports (filtered to U.S. only) - Edges: Direct flight routes between U.S. airports - Source: OpenFlights public dataset (GitHub Repo) - Last updated: ~2017
set.seed(42)
plot(
airport_graph,
vertex.size = 3,
vertex.label = NA,
edge.arrow.size = 0.2,
layout = layout_with_fr,
main = "U.S. Airport Network"
)
# Degree Centrality
degree_centrality <- degree(airport_graph, mode = "all")
top_airports <- sort(degree_centrality, decreasing = TRUE)[1:10]
top_airports
## ATL ORD DFW DEN LAX CLT PHX PHL LAS MSP
## 1496 752 657 653 606 477 441 412 404 380
# Betweenness Centrality
betweenness_centrality <- betweenness(airport_graph)
top_betweenness <- sort(betweenness_centrality, decreasing = TRUE)[1:10]
top_betweenness
## ANC ATL DEN ORD SEA DFW LAX BET
## 87962.95 46629.63 44481.38 43040.45 37869.65 24376.85 21140.99 20896.28
## FAI MSP
## 19459.08 18799.67
These measures help identify the most connected and influential airports in the U.S. network. Degree centrality shows the number of direct connections an airport has, while betweenness highlights airports that serve as key bridges between others.
This project showed how network analysis can highlight major hubs in the U.S. airline system. Airports like ATL, ORD, and DFW stand out as central connectors based on both degree and betweenness. These insights could be useful for understanding system efficiency, planning routes, or assessing network resilience. If I had more time, I’d explore clustering or community detection to find regional groupings of airports.