Approach

In this assignment, I will analyze airline arrival performance data for two airlines—Alaska and AM West—across five destination cities. I will begin by recreating the dataset exactly as presented in the source table using a wide format. This includes explicitly representing all values and ensuring that any empty or missing cells from the original table are properly accounted for. The dataset will be saved in a publicly accessible location, such as a GitHub repository, to ensure reproducibility and transparency.

After importing the data into R, I will apply tidy data principles to transform the dataset from a wide format into a long format. This restructuring will allow each row to represent a single observation defined by airline, destination city, and arrival status (on time or delayed). Transforming the data into this format will make it easier to summarize, visualize, and compare airline performance consistently.

For the analysis, I will calculate both counts and percentages of delayed and on-time arrivals. I will first compare overall airline performance by examining the percentage of delayed flights across all cities combined. I will then conduct city-by-city comparisons to evaluate how each airline performs within individual destinations. These results will be summarized using tables and/or visualizations and accompanied by written explanations of the observed patterns.

Finally, I will examine discrepancies between the overall comparison and the city-level comparisons. I will explain how differences in flight volume across cities can influence aggregate statistics, potentially leading to conclusions that differ from those based on city-by-city analysis. This discussion will highlight the importance of careful data aggregation and interpretation.

Anticipated Data Challenges

One anticipated challenge is accurately recreating the dataset to match the original table, particularly when handling empty or missing values. Another challenge is avoiding misleading conclusions based solely on raw counts rather than percentages, especially given the unequal number of flights across cities. Clear documentation of assumptions and careful data transformation will be essential to ensure valid comparisons and conclusions.