Please indicate
The box-and-whisker plot produced from the code below (Plot #01) portrays each season as a time interval. The y-axis shows the proportion of flights delayed on each day, which are resembled by points. One can see the median for each season, and can make the conclusion that one would tend to see the most delays of greater than 30 minutes in Spring 2011 and Summer 2011. In other words, 50% of the total number of days in the spring and 50% of the total number of days in the spring had the highest proportion of flights delayed more than 30 min. 50% of the total number of days in Fall 2011 had the lowest proportion of flights delayed more than 30 min and winter had the second lowest proportion. This means that Fall 2011 tended to have the least amount of delays of greater than 30 minutes. That being said, one can see the outliers in the fall, however this should not effect the overall conclusion. The outlier represents one particular day that had many delays. This could be due to an element independent from the season altogether such as logistics and holiday weekends, and therefore, should not affect the conclusion. It is important to understand the limitation of this figure. Fall could have many flights delayed by 29 minutes. Therefore, we cannot make the conclusion that Spring and Summer have MORE delayed flights than Fall, but only that Spring and Summer have more flights delayed over 30 minutes than Fall 2011. In conclusion, if one wished to avoid delays of 30 minutes or more in 2011, one should have flown in the fall or the winter rather than the spring or summer.
Some people prefer flying on older planes. Even though they aren’t as nice, they tend to have more room. Which airlines should these people favor?
One looking to fly in older planes should fly with American Airlines, Delta Airlines, Envoy Air inc., and US Airlines. This is illustrated by the box-and- whisker plot (Plot #02) created by the code below. The box-plot for each carrier depicts when the older 25%, 50%, and 75% of planes were made. The year 1980 is overlaid on this graph to resemble the deregulation of flights that occurred in the late 1970s. The planes with the most legroom, should be those before 1980.* According to the figure, American Airlines and Envoy Air have the most amount of planes built before 1980. Since no airlines have medians below 1980, the airlines with the lowest medians should be favored by individuals wishing to fly on older planes, meaning airlines AA, DL, UA, MQ, US, and WN should be favored. They should avoid Airlines AS, B6, CO, EV, F9, OO, and YV. There are outliers that are below 1980, however, one would be careful selecting the airline based off outliers, since they do not summarize the data set. For example, CO has two planes built well before 1980 as seen the plot. The median, however, for CO’s planes is in the 2000s. One cannot chose which plane they fly on when booked a flight. It is mostly chance, which plane they are seating on. They can only rely on probability. Therefore, those individuals have the greatest probability of securing a seat on an old plane if they fly with those airlines with the lowest medians (lowest meaning oldest). Lastly an important note, to make this plot, the flights that had ‘na’ for year of manufacture were removed. It is possible that these planes were too old to have their year of manufacture noted. It is not possible to determine the true age of these planes, so they were, therefore, removed from our plot. They are, nevertheless, still important to note.
*http://fortune.com/2015/09/12/airline-seats-shrink/
In order to answer this question, a bar plot (Plot #03) was made using each distinct flight number and matching the flight number to the destination airport and state. From this bar plot, on can see that the number of flight numbers dedicated to flights to Texas dominated over the other states. The most common flied to states were those in the South and also resort or vacation states, such as FL, LA, CA, and OK. Illinois was the 6th most common state that listed Southwest flights flew to. This is most likely due to Chicago. There are 31 states that Delta does not fly to from Houston. PA, NJ, and AR are flied to less than any other state that Delta does fly to from Houston.
What is perhaps more shocking is the amount of flight numbers that were assigned to multiple destination airports. For example, flight number WN167 is assigned to flights from Houston to MSY, DEN, and LAS. Although, MSY is the far more frequently flown destination airport to number WN167. DEN and LAS are assigned to WN167 only once in 2011. This could have resulted from an error due to human mistake when inputting the data, but it could also be fact. If there was no WN167 flight to MSY that day, it is fair to assign that number to DEN. I would, however, think this would cause confusion. Shockingly, WN167 is only one example of a common pattern in Delta flight number allocation. A second bar plot was created (Plot #04) which shows the number (or count) of flight numbers that were assigned one or more destination airports. In other words, since WN167 was assigned three destination cities, it contributes to the bar found at 3 on the x-axis (There are 144 additional flights with 3 varying airport destinations).
The bar plot made from all Southwest Airlines flights (Plot #05) appears very similar to the bar plot made in the first part of question 3. Again, Texas is the most flight to state in total for all flights leaving from Houston. These flights range from flights to Dallas to Austin. Second to Texas, Florida is found more than 10,000 flights below Texas. Texas as a destination, therefore, dominates the number of flights leaving Houston. Again, LA, and CA are the third and fourth most common destination states. Most of the states are the destination state between 2500 and 0 flights.
What weather patterns are associated with the biggest departure delays?
To answer this question two different plots were designed. Prior to creating plots, scatter plots for the weather variables including temp, precip, gust speed, wind speed, wind direction, dew point, humidity, and visibility were made against minutes delayed for each flight. The most telling scatter plot was plotting the dew point (ºF) against the delayed time (min) (Plot #07). This showed a positive relationship, meaning that the higher the dew point temperature became, the longer the delay became. Dew Point is defined as the temperature where condensation begins, or where the relative humidity would be 100% if the air was cooled. This is way to measure moisture in the air. The higher the dew point temperature the more water vapor is in the air. Therefore, dew point is similar to humidity, and it relates to precipitation. Days with high dew point temperature could be relatively stormy or raining days. It is important not to forget the magnitude of each point on the plot. The darker the circle, the more common that point is. The most common points are those between 60ºF and 80ºF, which also have the highest delays. Therefore, it is difficult to determine weather the flight delays are due to the dew point temp or if the delays are due to logistical or other weather problems that just happen to occur while dew point is relatively normal between 60 and 80ºF.
To further determine which weather patterns were associated with delayed flights a box-and-whisker plot (Plot #08) was created to demonstrate the effects of each condition on the time of delay. The x-axis shows the log10 of the delay time (min) in order to ease viewing of the plots. At first glance, one can see that the medians from light freezing rain and freezing rain resulted in the longest flight log10-delays (min). Freezing rain can be quite problematic for a flight- especially for one in Houston, where freezing conditions are not common. Freezing conditions can cause problems with the runway, cleaning and water loading, and engines. The plane actually has to be sprayed to prevent freezing while the plane is in flight. In addition, in-flight conditions could be effected, so waiting for the freezing rain to perhaps subside is a reason for increases delays. Looking back to the plot, one sees many outliers from scattered clods, partly cloudy, overcast, and mostly cloudy, as well as clear. The increase in delays on clear days is most likely due to factors external from weather such as logistics and holiday breaks. The outliers in the cloudy conditions as well as the hazy condition could be a result of decreased visibility. It would be beneficial to cross reference the dates that resulted in outliers with the visibility data. Lastly, thunderstorms and rain also has an increase in delay time. Thunderstorms and rain is severe weather, which make it hard to fly.
It is important to mention that ‘na’ observations were removed from the data. These values could contribute to a more defined conclusion, but they are not helpful when presenting the data in this example, since we have no further information.
## Warning: Removed 302 rows containing non-finite values (stat_smooth).
## Warning: Removed 303 rows containing missing values (geom_point).
I want to know what proportionately regions (NE, south, west, midwest) each carrier flies to from Houston in the month of July. Consider the month() function from the lubridate package.
’’ From the data given, the figure (Plot #08) from the code below was made to illustrate the proportion of flights to four geographical regions for each carrier in the month of July. Airlines YV, FL, and AA fly only to states in the SOUTH, while the airline B6 flies only to the NORTHEAST, AS and F9 fly only to the WEST. B6 is Jetblue, who has a hub in New York City- they only fly to New York from Houston. AS is Alaskan Airlines, who only flies to Seattle from Houston, and F9 is Frontier Airlines which flies to the Denver from Houston. The other airlines are a mixture of midwest, NE, south, and West. One can see trends. For example, the most flied to region is the South. The least flied region is the NE. The west is the second most flied to region, and the midwest is the third most. Geographically, this make sense. The South is the closest region to Texas, then the West, then midwest, and then the NE. The Northeast is the farthest region away from Texas.