This dataset contains the information about the passenger air traffic statistics at San Francisco International Airport (SFO) from July 2005 to June 2017. The database is updated daily but published quarterly by www.data.gov . I intend to do some analysis based on the air traffic provided in the data set. I will be referring to the airport throughout the report by its IATA airport code (SFO) for sake of brevity.
There were a couple steps I had to undertake in order to clean my data and make it formatted in a way that would facilitate future analysis. The first thing I did, as shown below, was to create two columns and extract the year and month from the “Activity Period” column.
I then used the newly procured month data and grouped them according to quarter, i.e, Q1/Q2/Q3/Q4.
I then wanted to create a column which would show if the published airline (the one where the ticket booking/finances are done) was the same as the operating airline (the airline that actually flies the passengers from origin to destination). I had to vectorize both columns in order for the matching to take place.
## [1] "character"
## [1] "character"
Below, I’m going to analyze the data in one variable.
## Activity.Period Operating.Airline Operating.Airline.IATA.Code
## Min. :200507 United Airlines : 3442 UA : 3442
## 1st Qu.:200807 SkyWest Airlines: 1080 OO : 1080
## Median :201108 Alaska Airlines : 817 AS : 817
## Mean :201117 Virgin America : 422 VX : 422
## 3rd Qu.:201409 Delta Air Lines : 420 DL : 420
## Max. :201706 Air Canada : 396 AC : 396
## (Other) :10466 (Other):10466
## Published.Airline Published.Airline.IATA.Code
## United Airlines :4224 UA :4224
## Alaska Airlines :1049 AS :1049
## Delta Air Lines : 899 DL : 899
## American Airlines: 507 AA : 507
## Air Canada : 436 AC : 436
## Virgin America : 422 VX : 422
## (Other) :9506 (Other):9506
## GEO.Summary GEO.Region Activity.Type.Code
## Domestic : 6402 US :6402 Deplaned :8034
## International:10641 Asia :3714 Enplaned :8023
## Europe :2463 Thru / Transit: 986
## Canada :1623
## Mexico :1299
## Australia / Oceania: 842
## (Other) : 700
## Price.Category.Code Terminal Boarding.Area Passenger.Count
## Low Fare: 2170 International:10649 A :6006 Min. : 1
## Other :14873 Other : 27 G :4663 1st Qu.: 5400
## Terminal 1 : 3442 B :2068 Median : 9210
## Terminal 2 : 440 F :1544 Mean : 29717
## Terminal 3 : 2485 C :1354 3rd Qu.: 20990
## E : 941 Max. :659837
## (Other): 467
## Activity.Month Activity.Year Quarter
## Length:17043 Length:17043 Length:17043
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Operating.Airline.Match
## Length:17043
## Class :character
## Mode :character
##
##
##
##
Above, we have a summary of the data set, with all of the basic statistics of the different variables/columns. We can see that United Airlines has by far and away the highest traffic at SFO, and this isn’t entirely surprising because SFO is one of its hubs.
Above is a graph showing the number of domestic and international flights through SFO between July 2005 and June 2017. We can see that there were roughly 6,300 domestic flights and 10,600 international flights. This estimate matches the accurate results of the summary() command done above, which reports that 6,402 and 10,641 flights respectively.
I then wanted to see how many total flights there were per year. Below are the results:
Above, we see that the number of flights is rather consistent from 2006 to 2015, roughly in the range of 1375-1400. There is a peak in 2016, in excess of 1500. This particular plot made me realize that this dataset was a tiny sample and not the complete version, because 1500 flights per year translates to roughly 4 per day - which is certainly nowhere near the true traffic at a busy airport such as SFO.
I then wanted to see how many flights were flown through SFO on a per-quarter basis.
We can see that flights are flown at roughly the same amounts per quarter, with Q2 and Q3 seeing slightly more activity than Q1 and Q4. This might be because there are more people traveling during the summer months.
Next, I wanted to see the distribution of “Low Fare” flights and the ones which were not (classified as “Other” in the dataset). “Low Fare” flights refer to airlines which are low-cost carriers, and the “Other” airlines are the ones which are not.
Not surprisingly, the budget flights are in the clear minority of all air travel.
I then wanted to see what to and from which area the flights at SFO transit the most.
We can see above that most of the flights in and out of SFO are domestic. Out of the international flights, the vast majority are in and out of Asia, followed by Europe. This makes sense, as the West Coast has an easy access to Asia via the Pacific Ocean. I would expect the reverse to be true at a major airport on the East Coast (JFK, EWR, BOS, DCA etc.), with an easy access to Europe over the Atlantic.
I wanted to see which airlines had the most domestic traffic in and out of SFO. Below are my results:
We can see that United Airlines had by far and away the most activity domestically, distantly followed by SkyWest Airlines. I then wanted to see how this graph looks without United, since everything is else is so much further behind it.
We now have a clearer plot when we remove United from the picture. We see that SkyWest has the highest traffic at SFO (with over 600 flights), followed by Alaskan Airlines and Southwest Airlines with roughly 380 and 370 flights respectively.
I wanted to delve into this data and see which airlines travelling to and from Asia had the most traffic through SFO.
Above, we see that United Airlines had the greatest traffic. This is somewhat surprising, as United only has 3 Asian destinations from SFO at the time of writing - Beijing (PEK), Hong Kong (HKG) and Tokyo (NRT). I find it surprising that it had roughly 33% more flights than any of its nearest competitors on the same graph, given that they are all Asia-based airlines.
I then wanted to see which airlines had the most traffic to and from Europe. Below is the plot:
Again, we see that United Airlines tops the list on the European sector, with close to 400 flights. After United, the next most trafficked airlines are Air France, British Airways, KLM, Lufthansa and Virgin Atlantic - all major European airlines.
I then wanted to see United Airlines’ geographical activity, i.e, where is it flying to/from the most?
We can see that the vast majority of United Airlines’ transit airports are in the US, which isn’t surprising in the least. Followed by afar are Canada and Mexico, which also makes sense since they are in North America. Among United’s intercontinental destinations, Asia, Europe and Australia all seem to have similar amounts of traffic (close to the 400 mark).
The next plot shows which airline had the most activity in the international terminal.
We see that United again had the most international traffic, but other than that, it is hard to gain any other meaningful information. Let’s look at this plot without United in the picture.
We see that the most trafficked airlines on international segments excluding United were SkyWest Airlines, Alaskan Airlines and Air Canada, in that order. They had roughly 440, 440 and 395 flights respectively.
I then wanted to see which terminals at SFO were the most trafficked.
This plot above shows that the international terminal was the most trafficked out them all, and this lines up with the previous plot which showed the significantly higher number of international than domestic flights over the years. One part of this plot that struck me as odd was the minimally trafficked Terminal 2, especially compared to Terminals 1 and 3. I did a little research and found that Terminal 2 underwent renovation and only reopened in April 2011, which is when its data in the dataset starts getting populated.
Now, I’m going to do some more complex analysis and dissect this dataset in two variables.
I wanted to see how many domestic and international flights there were for each year from 2005 to 2017.
## # A tibble: 26 x 3
## # Groups: Activity.Year [?]
## Activity.Year GEO.Summary Number.Of.Flights
## <chr> <fctr> <int>
## 1 2005 Domestic 290
## 2 2005 International 405
## 3 2006 Domestic 558
## 4 2006 International 811
## 5 2007 Domestic 625
## 6 2007 International 784
## 7 2008 Domestic 620
## 8 2008 International 813
## 9 2009 Domestic 564
## 10 2009 International 829
## # ... with 16 more rows
Above, the table shows the number of domestic and international flights per year from 2005 to 2017. As useful as this table is, I wanted to display this in graphical format since that would give us a better idea of the trends in the number of flights per type over time.
Above, from the range of 2006 to 2016, we see a slightly similar trend for each type of flight. The values for 2005 and 2017 seem much lower than the others, and rightly so - this dataset only contains the values for the second half of 2005 and the first half of 2017. We see peaks in 2006 and 2016 for the international flights, and peaks in 2007 and 2008 for domestic flights. There is a sharp increase in international flights from 2014 to 2016, and I wonder why that is. I tried looking online to see if there was an expansion in the international terminal during that time period, but I couldn’t find any specific instance or event that corroborated with these findings.
Next, I wanted to see the number of flights per type per terminal over time.
We see above that there has been a downward trend in the number of domestic flights at Terminal 1 over time. Ignoring the years of 2005 and 2017 (since their full data is not present), we can see that there is a rather steady decrease from 2006 (~325 flights) to 2016 (~160 flights). There is no data for international flights after 2014, and this is because I suspect that all international traffic was moved to the International Terminal. This makes sense, as there was minimal international activity in Terminal 1 to begin with, with roughly 25 flights per year (according to this subset of the true dataset).
Below is the corresponding data for Terminal 2:
As mentioned earlier, due to renovation, SFO’s Terminal 2 only reopened in April 2011, which is why the data is only provided after that time. Again, ignoring the half-year data of 2017, we can see that there is a steady increase in the traffic at Terminal 2, with it roughly doubling from 2011 (~36 flights) to 2016 (~73 flights). Similar to Terminal 1, there were minimal international flights at Terminal 2.
Below is the corresponding data for Terminal 3:
Similar to Terminal 1, we see a steady decrease in domestic air traffic from 2006 to 2016, with a trough at 2013. In terms of international travel, there was an overall an increase in traffic from 2006 to 2016, similar to its domestic counterpart, there was a trough in 2013. Interestingly enough, in this subset of the true dataset, there were the same number of domestic and international flights at T3 in 2015.
Below is the corresponding data for the International Terminal:
As expected, there is significantly more international traffic here than at Terminals 1, 2 or 3. There has been a steady increase in traffic from 2006 to 2016, and there is high peak in 2016, which corroborates with an earlier plot which showed higher activity in 2016 - this must have been because of an increase in international travel. Similar (but in reverse) to the other three terminals, there was minimal domestic activity at this terminal.
Now, I am going to step my analysis up a notch and analyze the data on multiple variables.
I wanted to see which airlines were operating on the European sector over time, and to differentiate them based on low-cost carrier or not. Below are the results:
We can see above the vast majority of the carriers were not low-cost, with the exception of Servisair which operated from mid-2010 to mid-2011. Unsurprisingly, we see that the major carriers (Air France, British Airways, KLM, Lufthansa, United and Virgin Atlantic) did not have any breakage in activity over the years. This was not the case with every other airline. Another tidbit of data that I found out had to do with Finnair - its activity at SFO only shows up from mid-2016 onwards. I did some research online and found that Finnair recently announced a non-stop service from Helsinki to San Francisco.
Next, I wanted to analyze the same variables but over the American domestic market instead.
We see a lot more “low-fare” activity on this chart than on the European market, and that is completely unsurprising. Like earlier, we see the major airlines such as American, Delta and United do not have any breakages in activity over time. ATA Airlines only has activity until mid-2008 because that is when they went defunct. We see Spirit Airlines had a brief stint at SFO between 2005 and 2007, because I assume they ceased their operations there after that - it is no longer one of their destinations.
I found that there were far more flights on the international segment than on the domestic segment at SFO. Most of the international flights go to Asian destinations, which makes sense because of SFO’s easy access to Asia via the Pacific Ocean. I found that there is an increase in air traffic in the middle 6 months of the year as compared to Q1 and Q4, and I suspect this is the case due to increased passenger traffic during the summer months. The most interesting piece of information that I found through the data was that bigger airlines tended to have more control over which airports they want to service, as they do not have breakages in service over time if they do not want to. One of the limitations of this dataset was its size, as it is clearly a tiny, tiny subset of the actual version. I feel like I would have some very different results if I had access to the whole thing. That said, I do believe that the authorities at SFO airport could use the information of the increased activity during Q2 and Q3 - perhaps by increasing security or by hiring more staff to allow for a seamless transition for passengers from being outside the airport to fastening their seatbelts.