| Field Name | Description | Data Type | Notes / Transformation |
|---|---|---|---|
| FL_DATE | Flight date standardized to ISO (YYYY-MM-DD) | Date | Parsed first with ymd(), then mdy() for
remaining formats |
| ROUTE_KEY | Canonical “origin–destination” route identifier | character | ifelse(ORIGIN<DEST, paste(ORIGIN,DEST), paste(DEST,ORIGIN)) |
| DEP_DELAY, ARR_DELAY | Departure/arrival delay in minutes | numeric | Negative values floored at zero for cost modeling; extremes capped/filtered in EDA |
| AIR_TIME, DISTANCE | Flight time (min) and distance (mi) | numeric | Stripped non-digits via gsub("[^0-9\\.]","",…), then
as.numeric() |
| OCCUPANCY_RATE | Fraction of seats filled per flight | numeric | Missing values imputed to the mean occupancy of that origin–destination pair |
| PASSENGERS | Estimated passengers per flight (capacity × occupancy) | numeric | Computed as OCCUPANCY_RATE * 200 |
| Ticket_Revenue | Revenue from ticket sales per flight | numeric | PASSENGERS * AVG_FARE (uses half the round-trip fare
per leg) |
| Baggage_Revenue | Ancillary baggage fee revenue per flight | numeric | PASSENGERS * 35 * 0.5 (one bag on 50% of passengers per
leg) |
| Total_Revenue | Sum of Ticket_Revenue and Baggage_Revenue | numeric | — |
| Fuel_Cost, Depreciation_Cost | Variable operating costs per flight | numeric | Fuel_Cost = DISTANCE * 8;
Depreciation_Cost = DISTANCE * 1.18 |
| Airport_Fee | Fixed airport usage fee at destination | numeric | $10 000 if large airport; otherwise
$5 000 |
| Delay_Cost | Penalty cost for delays | numeric | 75 * (max(DEP_DELAY–15,0) + max(ARR_DELAY–15,0)) |
| Total_Cost | Sum of Fuel_Cost, Depreciation_Cost, Airport_Fee, and Delay_Cost | numeric | — |
| Profit | Operating profit per flight | numeric | Total_Revenue − Total_Cost |
| Flights | Total one-way flight legs in the quarter | integer | n() rows per ROUTE_KEY |
| Round_Trips | Number of complete round trips | integer | floor(Flights / 2) |
| Total_Pax | Total passengers carried on that route (round trips basis) | numeric | sum(PASSENGERS) / 2 |
| Total_Revenue (route) | Sum of Total_Revenue across all flights in the quarter | numeric | — |
| Total_Cost (route) | Sum of Total_Cost across all flights in the quarter | numeric | — |
| Total_Profit (route) | Sum of Profit across all flights in the quarter | numeric | — |
| Profit_per_Round_Trip | Average profit generated by one round trip | numeric | Total_Profit / Round_Trips |
| Breakeven_Round_Trips | Number of round trips needed to recoup the $90 M plane cost | integer | ceiling(90e6 / Profit_per_Round_Trip) |
FL_DATE appeared as both
"YYYY-MM-DD" and "M/D/YY".clean_dates() to try
ymd() then mdy() so all dates become
Date type.AIR_TIME, DISTANCE, and
ITIN_FARE contained non-digit characters (e.g. “.0”).gsub()
then as.numeric(). Checked how many NAs were
introduced.NA in
OCCUPANCY_RATE.distinct() at each
cleaning step to remove duplicates.End of documentation.