The sample data for assignment 6 was stored as a table in a database.
I used the RMySQL package to retrieve the data.
id airline ontime_delay Los.Angeles Phoenix San.Diego San.Francisco Seattle
1 1 ALASKA on time 497 221 212 403 1841
2 2 delayed 62 12 20 102 305
3 3 AM WEST on time 694 4840 383 320 201
4 4 delayed 117 415 65 129 61
Using the tidyr and dplyr packages to create Tidy Data.
Here are the results:
airline ontime city freq
1 ALASKA t Los Angeles 497
2 ALASKA f Los Angeles 62
3 AM WEST t Los Angeles 694
4 AM WEST f Los Angeles 117
5 ALASKA t Phoenix 221
6 ALASKA f Phoenix 12
7 AM WEST t Phoenix 4840
8 AM WEST f Phoenix 415
9 ALASKA t San Diego 212
10 ALASKA f San Diego 20
Perform analysis to compare the arrival delays for the two airlines:
If we take a second look at the data without Phoenix, the amount of AM WEST flights that are delayed increases dramatically. So, stick with ALASKA airlines (unless you’re going to Phoenix).