Airline Route Analysis Submission

Welcome! This repository contains all deliverables for the Q1 2019 Airline Route Profitability analysis and accompanying Shiny dashboard. Below is a guide to the contents, how to run each component, and what each file does.


📁 Directory Structure

## Data Acquisition

> **Note:**   The `data/` folder with the original CSV files is _not_ included in this submission (too large for this repo).  
> To reproduce the analyses and run the Shiny app, you must download the three datasets yourself:
  
  1. **Clone or download** the raw data from the Capital One GitHub:  
  https://github.com/CapitalOneRecruiting/DA-Airline-Data-Challenge  

2. **Locate** the files:
  - `Flights.csv`
- `Tickets.csv`
- `Airport_Codes.csv`

3. **Create** a folder named `data` in the project root, and **copy** those three CSVs into it:
  
  
  
.
├── data/
│   ├── Flights.csv
│   ├── Tickets.csv
│   └── Airport\_Codes.csv
│
├── 1\_Raw\_analysis.R
├── 2_\Shinyapp\_Airline\_Route\_Anaysis.R
├── 3_\Airline\_Route\_Analysis.Rmd
├── 4_\Metadata\_and\_Data\_Quality\_Insights.Rmd
└── README.md           ← you are here

1. data/

Contains the three raw CSV files provided:

  • Flights.csv
    Flight-level data (Q1 2019), including dates, delays, occupancy rates, etc.

  • Tickets.csv
    Sample ticketing data for round-trip itineraries.

  • Airport_Codes.csv
    Metadata on airport size and country codes.


2. Airline_Route_Analysis.Rmd

Main analysis report in RMarkdown. When you knit it, it will:

  1. Load & clean the three datasets.
  2. Perform full EDA with histograms, density plots, boxplots, and correlation heatmap.
  3. Compute route-level metrics (revenue, cost, profit, breakeven).
  4. Display the “Top 10 Busiest,” “Top 10 Profitable,” “Recommended” routes, and full summary table.
  5. Include a “What’s Next” section with suggested future work.

How to run:

# In RStudio:
# 1. Set your working directory to the project root.
# 2. Open Airline_Route_Analysis.Rmd.
# 3. Click Knit → Knit to HTML.

3. Metadata_and_Data_Quality_Insights.Rmd

Documentation of all newly created fields (metadata) and the key data-quality issues discovered (date formats, missing values, outliers, duplicates, sparse routes) with your remediation steps.

How to run:

# Knit this file to HTML just like the main report.

4. Raw_analysis.R

A pure R script version of the data cleaning and EDA code (equivalent to the “global.R” plus visualization chunks). Useful if you prefer running an R script instead of an Rmd.

How to run:

Rscript Raw_analysis.R

or source inside R:

source("Raw_analysis.R")

5. Shinyapp_Airline_Route_Anaysis.R

The Shiny application code. Launches an interactive dashboard with:

  • Analysis tab: • Top 10 most profitable routes (table + bar chart) • Top 10 busiest routes • Composite “recommended” routes penalizing delays • Full route summary table + profit-per-round-trip chart

  • EDA tab: Separate sub-tabs for Flights, Tickets, and Airports with interactive Plotly charts and data tables.

How to run:

  1. Open R or RStudio in the project root.

  2. Install Shiny & dependencies if needed:

    install.packages(c("shiny","shinyjs","readr","dplyr","lubridate",
                       "plotly","DT"))
  3. Launch the app:

    library(shiny)
    runApp("Shinyapp_Airline_Route_Anaysis.R")

    Or simply open the file in RStudio and click Run App.


Dependencies

All code uses the following R packages (install via install.packages()):

readr, dplyr, tidyr, lubridate, stringr,
ggplot2, corrplot, DT, shiny, shinyjs, plotly

Submission Checklist


Contact

If you have any questions or run into issues, please reach out to Bharat K Karumuri at BharatBiomed@gmail.com.