Welcome! This repository contains all deliverables for the Q1 2019 Airline Route Profitability analysis and accompanying Shiny dashboard. Below is a guide to the contents, how to run each component, and what each file does.
## Data Acquisition
> **Note:** The `data/` folder with the original CSV files is _not_ included in this submission (too large for this repo).
> To reproduce the analyses and run the Shiny app, you must download the three datasets yourself:
1. **Clone or download** the raw data from the Capital One GitHub:
https://github.com/CapitalOneRecruiting/DA-Airline-Data-Challenge
2. **Locate** the files:
- `Flights.csv`
- `Tickets.csv`
- `Airport_Codes.csv`
3. **Create** a folder named `data` in the project root, and **copy** those three CSVs into it:
.
├── data/
│ ├── Flights.csv
│ ├── Tickets.csv
│ └── Airport\_Codes.csv
│
├── 1\_Raw\_analysis.R
├── 2_\Shinyapp\_Airline\_Route\_Anaysis.R
├── 3_\Airline\_Route\_Analysis.Rmd
├── 4_\Metadata\_and\_Data\_Quality\_Insights.Rmd
└── README.md ← you are here
Contains the three raw CSV files provided:
Flights.csv
Flight-level data (Q1 2019), including dates, delays, occupancy rates,
etc.
Tickets.csv
Sample ticketing data for round-trip itineraries.
Airport_Codes.csv
Metadata on airport size and country codes.
Main analysis report in RMarkdown. When you knit it, it will:
How to run:
# In RStudio:
# 1. Set your working directory to the project root.
# 2. Open Airline_Route_Analysis.Rmd.
# 3. Click Knit → Knit to HTML.
Documentation of all newly created fields (metadata) and the key data-quality issues discovered (date formats, missing values, outliers, duplicates, sparse routes) with your remediation steps.
How to run:
# Knit this file to HTML just like the main report.
A pure R script version of the data cleaning and EDA code (equivalent to the “global.R” plus visualization chunks). Useful if you prefer running an R script instead of an Rmd.
How to run:
Rscript Raw_analysis.R
or source inside R:
source("Raw_analysis.R")
The Shiny application code. Launches an interactive dashboard with:
Analysis tab: • Top 10 most profitable routes (table + bar chart) • Top 10 busiest routes • Composite “recommended” routes penalizing delays • Full route summary table + profit-per-round-trip chart
EDA tab: Separate sub-tabs for Flights, Tickets, and Airports with interactive Plotly charts and data tables.
How to run:
Open R or RStudio in the project root.
Install Shiny & dependencies if needed:
install.packages(c("shiny","shinyjs","readr","dplyr","lubridate",
"plotly","DT"))Launch the app:
library(shiny)
runApp("Shinyapp_Airline_Route_Anaysis.R")
Or simply open the file in RStudio and click Run App.
All code uses the following R packages (install via
install.packages()):
readr, dplyr, tidyr, lubridate, stringr,
ggplot2, corrplot, DT, shiny, shinyjs, plotly
If you have any questions or run into issues, please reach out to Bharat K Karumuri at BharatBiomed@gmail.com.