1 Introduction

Airports across the U.S. are part of a large and complex network, but not all airports play the same role. Some act as major hubs that connect a huge number of routes, while others are much smaller and less connected. In this project, I explore how these connections are structured and which airports are the most central to the overall system. It’s a chance to apply network analysis techniques I’ve learned in class to a real-world topic and build a visual representation of how the U.S. air travel network operates.

2 Setup

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(igraph)
## 
## Attaching package: 'igraph'
## 
## The following objects are masked from 'package:lubridate':
## 
##     %--%, union
## 
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## 
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## 
## The following object is masked from 'package:tidyr':
## 
##     crossing
## 
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## 
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## 
## The following object is masked from 'package:base':
## 
##     union
library(readr)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

3 Load and Clean Data

# Load airport data
airports <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat", col_names = FALSE)
## Rows: 7698 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): X2, X3, X4, X5, X6, X10, X11, X12, X13, X14
## dbl  (4): X1, X7, X8, X9
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(airports) <- c("ID", "Name", "City", "Country", "IATA", "ICAO", "Latitude", "Longitude", "Altitude", "Timezone", "DST", "TZ", "Type", "Source")

# Load routes data
routes <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat", col_names = FALSE)
## Rows: 67663 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): X1, X2, X3, X4, X5, X6, X7, X9
## dbl (1): X8
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(routes) <- c("Airline", "AirlineID", "SourceAirport", "SourceID", "DestAirport", "DestID", "Codeshare", "Stops", "Equipment")

# Filter to U.S. airports only
us_airports <- airports %>% filter(Country == "United States" & IATA != "\\N")

# Filter routes to only include U.S. airports
us_routes <- routes %>%
  filter(SourceAirport %in% us_airports$IATA & DestAirport %in% us_airports$IATA)

# Create edge list for graph
edges <- us_routes %>% select(SourceAirport, DestAirport) %>% drop_na()

# Create graph object
airport_graph <- graph_from_data_frame(edges, directed = TRUE)

4 Summary of the Data

This dataset includes: - Nodes: Airports (filtered to U.S. only) - Edges: Direct flight routes between U.S. airports - Source: OpenFlights public dataset (GitHub Repo) - Last updated: ~2017

5 Network Visualization

set.seed(42)
plot(
  airport_graph,
  vertex.size = 3,
  vertex.label = NA,
  edge.arrow.size = 0.2,
  layout = layout_with_fr,
  main = "U.S. Airport Network"
)

6 Network Analysis

# Degree Centrality
degree_centrality <- degree(airport_graph, mode = "all")
top_airports <- sort(degree_centrality, decreasing = TRUE)[1:10]
top_airports
##  ATL  ORD  DFW  DEN  LAX  CLT  PHX  PHL  LAS  MSP 
## 1496  752  657  653  606  477  441  412  404  380
# Betweenness Centrality
betweenness_centrality <- betweenness(airport_graph)
top_betweenness <- sort(betweenness_centrality, decreasing = TRUE)[1:10]
top_betweenness
##      ANC      ATL      DEN      ORD      SEA      DFW      LAX      BET 
## 87962.95 46629.63 44481.38 43040.45 37869.65 24376.85 21140.99 20896.28 
##      FAI      MSP 
## 19459.08 18799.67

These measures help identify the most connected and influential airports in the U.S. network. Degree centrality shows the number of direct connections an airport has, while betweenness highlights airports that serve as key bridges between others.

7 Conclusion

This project showed how network analysis can highlight major hubs in the U.S. airline system. Airports like ATL, ORD, and DFW stand out as central connectors based on both degree and betweenness. These insights could be useful for understanding system efficiency, planning routes, or assessing network resilience. If I had more time, I’d explore clustering or community detection to find regional groupings of airports.