See updates spanning 12/20/2024–01/08/2025 for initial Alaska carrier-route-year build and carrier lists, 06/01/2025–07/26/2025 for HHI, Bush vs. Mainline segmentation, and integrated bankruptcy/M&A analysis.
Full script builds the panel, merges event data (with alias/parent splits),tracks exits and market concentration.
Purpose of this code: - Build a detailed carrier-route-year dataset for Alaska (post-2002). - Add labels for route type (mainline vs. bush) and carrier exit timing. - Track passenger, freight, and mail totals per carrier per route per year. - Enrich the dataset with carrier counts per route and route-year.
Katie code: Route is defined by the undirected pair of city names,origin_code and destination_code are airport codes in lexical order.
Each row is a unique carrier’s operation on a route in a given year/month,capturing summed traffic and performance metrics.
Note: Summarize traffic by carrier, route, year, and month; route is undirected city pair.
Extract all mainline (hub) airport codes from the bush_mainline_cities reference sheet. ‘Bush1_Hub0’ == 0 denotes a mainline hub; these codes define the state’s high-capacity trunk network. These codes are critical for distinguishing trunk routes (hub-to-hub) from bush service. Get list of hub airports for trunk network (mainline hubs).
Shows first 10 rows so we can visually confirm classification results.
Note: Check first 10 rows to confirm classification worked.
While each row represents one carrier per route-year, this step finds the total number of unique carriers that ever operated on each route across the dataset. Adds context on market diversity.
Note: Count how many unique carriers ever flew each route (context for competition).
Joins the route-level carrier counts back to the detailed panel dataset so every carrier-route-year record has a field for overall market diversity.
Note: Add carrier count back to full dataset for each route.
Finds the maximum observed number of carriers on any single route,providing a quick diagnostic for market competition extremes.
Creates route-specific and state-level first/last year markers for each carrier. These are essential for understanding timing of exits relative to route and state.
Note: First and last year for each carrier on each route.
This calculation aggregates the passenger numbers to give insights into carrier performance and route popularity.
Analyzing freight allows us to gauge the volume of goods transported, offering insights into economic activity. This metric can highlight the role of carriers in facilitating communication and commerce through mail transport.
Creates a new ‘total_freight_mail’ column that represents the aggregate of both freight and mail per carrier, per route, per year.
The Herfindahl-Hirschman Index (HHI) measures market concentration by summing the squared market shares of all participants within a market. Higher HHI values reflect a more concentrated market and lower competition. These calculations will produce HHI values for passengers, freight, mail,and combined freight+mail by route and year.
Katie code: HHI is calculated for each route and year based on passenger totals.
Mike note: NEED ***** guardrails so the HHI is not calculated for a group when total_passengers = 0. Was previously defaulting to HHI = 10,000, which skews results. Includes filter to avoid zero-passenger markets.
Note: Calculate passenger-based HHI per route-year (guarded for zero totals).
HHI is computed for freight volumes per route-year using the same method as passengers. Note: Calculate freight-based HHI per route-year.
HHI is computed for mail volumes per route-year. Note: Calculate mail-based HHI per route-year.
This combines freight and mail volumes to reflect total cargo activity. Note: Calculate combined cargo HHI per route-year.
Note: These HHI values (passenger, freight, mail, cargo) are now ready for integration into market concentration and bankruptcy/event carrier analysis.
Sort the data by the key variables (route, year, carrier_name) for clear output and analysis order.
Also, create the yearly number of carriers serving each route within this master dataset to support market concentration and HHI analysis.
Note: Order rows by route, year, and carrier for clarity.
The purpose of this script is to integrate data from two sources to analyze airline mergers and bankruptcies. By merging the Certified Carrier List with bankruptcy and merger data, the code aims to create a consolidated dataset for tracking airline operational changes, including aliases, operating statuses, and financial outcomes (e.g., reorganizations or service cessations). The final outputs support further analysis or reporting.
Goal: The objective of this script is to clean and process the Bankruptcy dataset.
Chapter 7 = liquidated (exited the market) Chapter 11 = reorganized (continues operating)
Key objectives: 1. Standardize column names and handle missing
values. 2. Map Chapter
data to descriptive statuses
(Chapter 7 vs Chapter 11). 3. Extract and format date information (year
and month). 4. Extract contextual notes from raw text. 5. Flag carriers
relevant to Alaska via fuzzy matching.
Goal: Use Exact, Token, and Fuzzy matching against the AK list (all_data_new$unique_carrier_name) – but match from bankruptcy_carriers.
At this point: • br_carriers_ak
contains your Alaska
carriers, • bankruptcy_carriers
remains the full, cleaned
panel, • non_matches
shows the carriers still unmatched and
near_misses
highlights borderline fuzzy cases for any
manual review.