## Warning: package 'ggplot2' was built under R version 3.2.4
## Warning: package 'readr' was built under R version 3.2.5
## Warning: package 'lubridate' was built under R version 3.2.4
## Warning: package 'gridExtra' was built under R version 3.2.4
## Warning: package 'forcats' was built under R version 3.2.5

Admistrative:

Please indicate

  • Who you collaborated with: Yuchen, Connor
  • Roughly how much time you spent on this HW so far: 6.5 hours
  • The URL of the RPubs published URL here.
  • What gave you the most trouble: Understanding the variables that large datasets contain. The glimpse command does not help enough.
  • Any comments you have: My code seems too bulky and I would like to make it more compact

Question 1:

Plot a “time series” of the proportion of flights that were delayed by > 30 minutes on each day. i.e.

  • the x-axis should be some notion of time
  • the y-axis should be the proportion.

Using this plot, indicate describe the seasonality of when delays over 30 minutes tend to occur.

Interpretation

I examined the daily average of flight delays from Jan 2011 to Jan 2012.

The average flight delay remains almost constant from January to July, then dips till October and then rises again till January. The months from January to July contain many important holidays (New Year, 4th of July etc) and the summer vacations, when people tend to travel a lot. Thus the increased traffic may cause higher delays.

I also examined daily delays with respect to temperature and noticed that colder months had higher delays, and warmer months had lower delays, with the exception of April to July (this could be due to increased traffic as mentioned above).

Notes

I included cancelled flights when I counted delays over 30 minutes.

The traffic approach seems more trustworthy than the temperature approach.

It seems that it would be helpful to remove outliers to observe a clearer trend.

Question 2:

Some people prefer flying on older planes. Even though they aren’t as nice, they tend to have more room. Which airlines should these people favor?

Interpretation

For people who prefer aircrafts that are more than 20 years old, American Airlines (AA) is the best option since it has a large number of aircrafts and almost all of its aircrafts are older than 20 years.

Delta Airlines (DL) is also a good option, since its median aircraft age is about 25 years, although there is a lot more variation (both upward and downward) in the age of its aircrafts.

Envoy Air (MQ) is not a good option since it has only two aircrafts.

In the 10-15 year age range, United Airlines (UA), US Air (US), Continental Airlines (CO), ExpressJet (EV) and so on are also good options. Southwest Airlines (WN) has too much variation in aircraft age.

Question 3:

  • What states did Southwest Airlines’ flight paths tend to fly to?
  • What states did Southwest Airlines’ flights tend to fly to?

For example, Southwest Airlines Flight 60 to Dallas consists of a single flight path, but since it flew 299 times in 2013, it would be counted as 299 flights.

Interpretation

The largest proportion of Southwest Airlines flights and flight paths tend to fly to destinations within Texas. Texas is followed by Florida, Los Angeles, California, Oakland and so on, though their proportions are less than or equal to 10%.

Question 4:

I want to know proportionately what regions (NE, south, west, midwest) each carrier flies to/from Houston in the month of July. Consider the month() function from the lubridate package.

Interpretation

American Airlines (AA) and AirTran (FL) and Mesa Airlines (YV) have flights to/from Houston solely from the South region. Frontier Airlines (F9) and Alaska Airlines (AS) have flights solely from the West and JetBlue (B6) has flights solely from the North-East. The other airlines have mixed proportions for regions.

Notes

I looked at destinations from Houston Airport, but since most flights are round-trip, I assume that these airlines have an equal proportion of flights flying into Houston Airport as well.