Case 2

Question 1. For CVG_Flights.csv, how many variables and how many observations in the data?

For this first question, I want to first read the data and load programs to assist in this case.

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.5     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
CVG.Flights <- read.csv("CVG_Flights.csv", header = TRUE, stringsAsFactors = TRUE, na.strings = "")
airlines <- read.csv("airlines.xls.csv", header = TRUE, stringsAsFactors = TRUE, na.strings = "")
airports <- read.csv("airports.csv", header = TRUE, stringsAsFactors = TRUE, na.strings = "")

After reading, I can now run some code. First I will find how many variables there are in the data sets for CVG_Flights.

ncol(CVG.Flights)
## [1] 27

This output shows me that there are 27 different variables for the data set. Next I will look to see how many observations there are.

nrow(CVG.Flights)
## [1] 7763

This output tells me that CVG_Flights has 7763 observations.

Question 2. Are there missing values in the data? If so, can you show/visualize how data is missing?

To find the number of missing variables in the dataset, I will you the sum function to find the number of NA’s.

sum(is.na(CVG.Flights))
## [1] 40270

This output shows that there is 40,270 missing values in the data set. To help visualize the missing data, I will show the number of NA’s per variable.

colSums(is.na(CVG.Flights))
##         FLIGHT_DATE             AIRLINE       FLIGHT_NUMBER      ORIGIN_AIRPORT 
##                   0                   0                   0                   0 
## DESTINATION_AIRPORT SCHEDULED_DEPARTURE      DEPARTURE_TIME     DEPARTURE_DELAY 
##                   0                   0                 370                 370 
##            TAXI_OUT          WHEELS_OFF      SCHEDULED_TIME        ELAPSED_TIME 
##                 378                 378                   0                 392 
##            AIR_TIME            DISTANCE           WHEELS_ON             TAXI_IN 
##                 392                   0                 381                 381 
##   SCHEDULED_ARRIVAL        ARRIVAL_TIME       ARRIVAL_DELAY            DIVERTED 
##                   0                 381                 392                   0 
##           CANCELLED CANCELLATION_REASON    AIR_SYSTEM_DELAY      SECURITY_DELAY 
##                   0                7385                5814                5814 
##       AIRLINE_DELAY LATE_AIRCRAFT_DELAY       WEATHER_DELAY 
##                5814                5814                5814

This output tells me that there are missing values for the following variables: * DEPARTURE_TIME * DEPARTURE_DELAY * TAXI_OUT * WHEELS_OFF * ELAPSED_TIME * AIR_TIME * WHEELS_ON * TAXI_IN * ARRIVAL_TIME * ARRIVAL_DELAY * CANCELLATION_REASON * AIR_SYSTEM_DELAY * SECURITY_DELAY * AIRLINE_DELAY * LATE_AIRCRAFT_DELAY * WEATHER_DELAY.

Question 3. For each variable in the data, please describe what you observe, such as some summary statistics, their distributions, and etc.

To begin, I will first get a summary of the data set to help with the observations to be made.

summary(CVG.Flights)
##     FLIGHT_DATE   AIRLINE   FLIGHT_NUMBER  ORIGIN_AIRPORT DESTINATION_AIRPORT
##  3/2/2015 : 139   DL:2289   Min.   :  62   CVG    :3886   CVG    :3877       
##  1/2/2015 : 136   EV:1553   1st Qu.:1908   ORD    : 727   ORD    : 762       
##  3/5/2015 : 135   F9: 515   Median :3246   DFW    : 458   DFW    : 460       
##  2/13/2015: 134   MQ:2080   Mean   :3194   ATL    : 412   ATL    : 413       
##  2/23/2015: 134   OO:1326   3rd Qu.:4536   EWR    : 299   EWR    : 294       
##  3/6/2015 : 134             Max.   :6297   IAH    : 247   IAH    : 242       
##  (Other)  :6951                            (Other):1734   (Other):1715       
##  SCHEDULED_DEPARTURE    DEPARTURE_TIME DEPARTURE_DELAY     TAXI_OUT     
##  4:15:00 PM: 210     9:28:00 AM:  58   Min.   :-24.00   Min.   :  1.00  
##  9:30:00 AM: 190     9:27:00 AM:  54   1st Qu.: -5.00   1st Qu.: 12.00  
##  6:00:00 AM: 179     9:25:00 AM:  40   Median : -1.00   Median : 15.00  
##  6:00:00 PM: 106     9:30:00 AM:  40   Mean   : 12.35   Mean   : 18.52  
##  4:00:00 PM: 104     9:29:00 AM:  35   3rd Qu.: 12.00   3rd Qu.: 21.00  
##  8:30:00 AM: 103     (Other)   :7166   Max.   :708.00   Max.   :122.00  
##  (Other)   :6871     NA's      : 370   NA's   :370      NA's   :378     
##       WHEELS_OFF   SCHEDULED_TIME   ELAPSED_TIME      AIR_TIME    
##  9:41:00 AM:  35   Min.   : 60.0   Min.   : 49.0   Min.   : 36.0  
##  4:24:00 PM:  33   1st Qu.: 85.0   1st Qu.: 83.0   1st Qu.: 56.0  
##  9:43:00 AM:  32   Median :126.0   Median :123.0   Median : 99.0  
##  9:40:00 AM:  28   Mean   :128.1   Mean   :127.3   Mean   :100.4  
##  4:20:00 PM:  27   3rd Qu.:156.0   3rd Qu.:152.0   3rd Qu.:125.0  
##  (Other)   :7230   Max.   :313.0   Max.   :340.0   Max.   :319.0  
##  NA's      : 378                   NA's   :392     NA's   :392    
##     DISTANCE           WHEELS_ON       TAXI_IN         SCHEDULED_ARRIVAL
##  Min.   : 229.0   1:13:00 AM:  98   Min.   :  1.000   7:24:00 PM: 139   
##  1st Qu.: 308.0   1:05:00 AM:  97   1st Qu.:  5.000   7:25:00 PM: 120   
##  Median : 589.0   1:12:00 AM:  97   Median :  6.000   3:30:00 PM:  87   
##  Mean   : 675.9   1:19:00 AM:  97   Mean   :  8.448   8:39:00 PM:  79   
##  3rd Qu.: 871.0   1:25:00 AM:  97   3rd Qu.:  9.000   9:30:00 PM:  78   
##  Max.   :2036.0   (Other)   :6896   Max.   :128.000   7:05:00 PM:  75   
##                   NA's      : 381   NA's   :381       (Other)   :7185   
##       ARRIVAL_TIME  ARRIVAL_DELAY       DIVERTED          CANCELLED      
##  7:23:00 PM :  23   Min.   :-56.00   Min.   :0.000000   Min.   :0.00000  
##  10:25:00 AM:  20   1st Qu.:-12.00   1st Qu.:0.000000   1st Qu.:0.00000  
##  6:06:00 PM :  20   Median : -2.00   Median :0.000000   Median :0.00000  
##  10:53:00 AM:  19   Mean   : 10.65   Mean   :0.001803   Mean   :0.04869  
##  10:58:00 AM:  19   3rd Qu.: 16.00   3rd Qu.:0.000000   3rd Qu.:0.00000  
##  (Other)    :7281   Max.   :716.00   Max.   :1.000000   Max.   :1.00000  
##  NA's       : 381   NA's   :392                                          
##  CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY   
##  A   :  71           Min.   :  0.00   Min.   :0.00   Min.   :  0.00  
##  B   : 180           1st Qu.:  0.00   1st Qu.:0.00   1st Qu.:  0.00  
##  C   : 127           Median : 10.00   Median :0.00   Median :  0.00  
##  NA's:7385           Mean   : 16.35   Mean   :0.01   Mean   : 17.06  
##                      3rd Qu.: 23.00   3rd Qu.:0.00   3rd Qu.: 18.00  
##                      Max.   :258.00   Max.   :9.00   Max.   :435.00  
##                      NA's   :5814     NA's   :5814   NA's   :5814    
##  LATE_AIRCRAFT_DELAY WEATHER_DELAY    
##  Min.   :  0.00      Min.   :  0.000  
##  1st Qu.:  0.00      1st Qu.:  0.000  
##  Median :  2.00      Median :  0.000  
##  Mean   : 23.24      Mean   :  3.873  
##  3rd Qu.: 30.00      3rd Qu.:  0.000  
##  Max.   :435.00      Max.   :638.000  
##  NA's   :5814        NA's   :5814

For FLIGHT_DATE, I observed that the data was only collected in Jan, Feb and March 2015.

For AIRLINE, there are 5 total airlines in the dataset, and Delta having the most inputs.

For Origin airport, I want to see how many total airports are in the data set.

summary(CVG.Flights$ORIGIN_AIRPORT)
##  ATL  BDL  BNA  BOS  CLT  CVG  DEN  DFW  DTW  EWR  FLL  GRR  IAD  IAH  JFK  LAS 
##  412    3  100   58   31 3886  142  458   31  299  100   81    6  247   69   79 
##  LAX  LGA  MCI  MCO  MEM  MIA  MKE  MSN  MSP  ORD  PHX  PIT  RDU  RSW  SAN  SEA 
##   87   53    1  196   16  134   54   33   85  727   41    3    4   72    4    4 
##  SFO  SLC  STL  TPA  TTN  XNA 
##   47   92   29   59    3   17

This summary tells me there are 38 total airports in the data set.

For FLIGHT_NUMBER, the highest was 6297, meaning there was 6297 different flight numbers used in the dataset.

For DESTINATION_AIRPORT, I wanted to something similar to ORIGIN_AIRPORT to find the total number of airports used as destinations.

summary(CVG.Flights$DESTINATION_AIRPORT)
##  ATL  BDL  BNA  BOS  CLT  CVG  DEN  DFW  DTW  EWR  FLL  GRR  IAD  IAH  JFK  LAS 
##  413    3   99   59   31 3877  118  460   31  294  102   80    4  242   68   80 
##  LAX  LGA  MCI  MCO  MEM  MIA  MKE  MSN  MSP  ORD  PHX  PIT  RDU  RSW  SAN  SEA 
##   88   52    1  197   16  133   57   33   81  762   43    3    5   73    3    4 
##  SFO  SLC  STL  TPA  TTN  XNA 
##   47   92   30   61    3   18

Again, there is a total of 38 different airport destinations, mirroring the origin airport.

For SCHEDULED_DEPARTURE, we can see that 4:15 PM is the most common departure time for aircraft at CVG.

For DEPARTURE_TIME, the most common time listed is 9:28 am. This means that planes scheduled for 4:30 rarely make their departure on time.

DEPARTURE_DELAY tells us that the average delay across all the airports is 12.35 minutes.

TAXI_OUT tells us that the average time is takes to taxi across all airports is 18.52 minutes.

WHEELS_OFF, unlike DEPARTURE_TIME is the exact moment the plane leaves the ground, THis means the most common time a plane takes off in these airports is 9:41 am.

FLIGHT_TIME for these aircraft have an average 127 mins of time in the air.

SCHEDULED_TIME for these aircraft go no lower than 60 minutes. This means that these airports do not have anyflights which would take less than one hour in the air.

AIR_TIME is the time a plane is actually in the air, not including taxi time and other things that make it wait. The average time a plane is in the air is 100.4 minutes.

The average DISTANCE planes fly from these airports is 589 miles.

1:13 am is the most often time planes touchdown after a flight.

On average, it takes a little under 9 minutes for planes to taxi in after touchdown.

The most common scheduled arrival time is 7:24, which mirrors the most common takeoff time of 4:30 pm.

The most common arrival time is 7:23, which makes the SCHEDULED_ARRIVAL time a very accurate predictor.

On average, aircraft arrive two minutes ahead of schedule.

for DIVERTED, I want to take a summary.

summary(CVG.Flights$DIVERTED)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.000000 0.000000 0.001803 0.000000 1.000000

This tells us that DIVERTED is a dummy variable, with 0 meaning no, and 1 meaning yes if a plane was diverted.

This is also true with CANCELLED, with 1 being yes and 0 being no.

CANCELLATION_REASON has three different levels: A, B and C. These are code for the three types of cancellation reasons the airports use.

AIR_SYSTEM_DELAY on average causes 16.25 minutes of delay.

SECURITY_DELAY does not happen often, but when it does, it causes at most 9 minutes of delay.

AIRLINE_DELAY causes on average 17.06 minutes of delay.

LATE_AIRCRAFT_DELAY causes 23.24 minutes of delay on average.

Finally, weather delays cause an average of 3.87 minutes of delay, and at most cause a delay of 638 minutes.

Question 4. Visualize the association between some variable pairs of your choice. For any two variables, you can check to see if there is an interesting relationship worth mentioning. If so, you can explore further and visualize what you have found.

I want to look at the variable between DISTANCE and AIR_TIME. I anticipate a very strong correlation between them, as longer distance flights take more time. To visualize his data, I will use a scatter plot.

ggplot(CVG.Flights, aes(AIR_TIME, DISTANCE)) +
  geom_point(alpha = 0.5) +
  ggtitle("Distance v. Air Time") +
  xlab("Air Time") +
  ylab("Flight Distance")
## Warning: Removed 392 rows containing missing values (geom_point).

We can see there is definitely a positive linear relationship between AIR_TIME and DISTANCE. However, it can be seen that for each distance, there is a large number of different Air times. This could be caused by jetstreams from going east to west, and other factors such as landing delays.

Question 5. Visualize the association between some variable pairs (of your choice) conditional on some other variables (of your choice). This is similar to the previous questions but your visualization involves more than two variables.

For this next relationship, I wanted to see if longer scheduled flights have a longer delay time to leave the airport. This would be interesting as longer flights may have more passengers and prep that go into each flight. Within this comparison, I will also be comparing these flights by each airline, to see if we can see more in this relationship.

ggplot(CVG.Flights, aes(x = SCHEDULED_TIME, y = DEPARTURE_DELAY, col = AIRLINE)) +
  geom_point() +
  ggtitle("Scheduled Time and Delay by AIrline") +
  xlab("Scheduled Flight Time") +
  ylab("Departure Delay")
## Warning: Removed 370 rows containing missing values (geom_point).

We can see with this scatter plot that flights with a shorter flight time seem to have more instances of long delay. While these flights are more common, they still show large signs of high median delays compared to the longest of flight times. We can also see by airline that DL looks to have the lowest average delay time across the highest scheduled time of flights, while it is about the same from flights below 150 scheduled flight time.

Question 6. For airlines.csv and airports,csv data sets, how many variables and how many observations in the data? Are there missing values? If so, visualize the missing data pattern.

For the airlines.cvs, I will first check the rows and columns.

ncol(airlines)
## [1] 2
nrow(airlines)
## [1] 14

For this output, we can see there are 2 variables and 14 observations in the airlines dataset.

Now to check the number of NAs in the dataset, I will do the sum function.

sum(is.na(airlines))
## [1] 0

This output tells me there is 0 NAs in the airlines dataset.

For airports.csv, I will again check the number of rows and columns to see the number of variables and observations.

ncol(airports)
## [1] 7
nrow(airports)
## [1] 322

This output tells me there are 7 variables and 322 observations.

To check the NA values, I will again sum function to check NA.

sum(is.na(airports))
## [1] 6

This output tells me there are 6 NA values in the airport dataset. To visualize this missing data, I will use the colsum function to find where the NA values are.

colSums(is.na(airports))
## IATA_CODE   AIRPORT      CITY     STATE   COUNTRY  LATITUDE LONGITUDE 
##         0         0         0         0         0         3         3

We can see that both LATITUDE and LONGITUDE each have 3 missing values. To visualize this even further, I will make a simple barplot to show the NA values.

barplot(colSums(is.na(airports)), las = 2, main = "Number of Missing Values", xlab = "Variables", ylab = "Number")

Question 7. Merge all three data sets CVG_Flights.csv, airlines.csv, and airports,csv according to the IATA code for airlines and airports.

To merege these three datasets, I will be using the left_join function to create new dfs. I will call this final dataset with all three flightsmerged.

flightswairlines <- left_join(CVG.Flights, airlines, by = c("AIRLINE" = "IATA_CODE"))
flightsworigins <- left_join(flightswairlines, airports, by = c("ORIGIN_AIRPORT" = "IATA_CODE"))
flightsmerged <- left_join(flightsworigins, airports, by = c("DESTINATION_AIRPORT" = "IATA_CODE"))

This new flightsmerged data frame tells me I have successfully merged all three data sets by IATA_CODE. Here is a quick summary visualizing the merger.

summary(flightsmerged)
##     FLIGHT_DATE      AIRLINE     FLIGHT_NUMBER  ORIGIN_AIRPORT
##  3/2/2015 : 139   DL     :2289   Min.   :  62   CVG    :3886  
##  1/2/2015 : 136   MQ     :2080   1st Qu.:1908   ORD    : 727  
##  3/5/2015 : 135   EV     :1553   Median :3246   DFW    : 458  
##  2/13/2015: 134   OO     :1326   Mean   :3194   ATL    : 412  
##  2/23/2015: 134   F9     : 515   3rd Qu.:4536   EWR    : 299  
##  3/6/2015 : 134   AA     :   0   Max.   :6297   IAH    : 247  
##  (Other)  :6951   (Other):   0                  (Other):1734  
##  DESTINATION_AIRPORT SCHEDULED_DEPARTURE    DEPARTURE_TIME DEPARTURE_DELAY 
##  CVG    :3877        4:15:00 PM: 210     9:28:00 AM:  58   Min.   :-24.00  
##  ORD    : 762        9:30:00 AM: 190     9:27:00 AM:  54   1st Qu.: -5.00  
##  DFW    : 460        6:00:00 AM: 179     9:25:00 AM:  40   Median : -1.00  
##  ATL    : 413        6:00:00 PM: 106     9:30:00 AM:  40   Mean   : 12.35  
##  EWR    : 294        4:00:00 PM: 104     9:29:00 AM:  35   3rd Qu.: 12.00  
##  IAH    : 242        8:30:00 AM: 103     (Other)   :7166   Max.   :708.00  
##  (Other):1715        (Other)   :6871     NA's      : 370   NA's   :370     
##     TAXI_OUT           WHEELS_OFF   SCHEDULED_TIME   ELAPSED_TIME  
##  Min.   :  1.00   9:41:00 AM:  35   Min.   : 60.0   Min.   : 49.0  
##  1st Qu.: 12.00   4:24:00 PM:  33   1st Qu.: 85.0   1st Qu.: 83.0  
##  Median : 15.00   9:43:00 AM:  32   Median :126.0   Median :123.0  
##  Mean   : 18.52   9:40:00 AM:  28   Mean   :128.1   Mean   :127.3  
##  3rd Qu.: 21.00   4:20:00 PM:  27   3rd Qu.:156.0   3rd Qu.:152.0  
##  Max.   :122.00   (Other)   :7230   Max.   :313.0   Max.   :340.0  
##  NA's   :378      NA's      : 378                   NA's   :392    
##     AIR_TIME        DISTANCE           WHEELS_ON       TAXI_IN       
##  Min.   : 36.0   Min.   : 229.0   1:13:00 AM:  98   Min.   :  1.000  
##  1st Qu.: 56.0   1st Qu.: 308.0   1:05:00 AM:  97   1st Qu.:  5.000  
##  Median : 99.0   Median : 589.0   1:12:00 AM:  97   Median :  6.000  
##  Mean   :100.4   Mean   : 675.9   1:19:00 AM:  97   Mean   :  8.448  
##  3rd Qu.:125.0   3rd Qu.: 871.0   1:25:00 AM:  97   3rd Qu.:  9.000  
##  Max.   :319.0   Max.   :2036.0   (Other)   :6896   Max.   :128.000  
##  NA's   :392                      NA's      : 381   NA's   :381      
##   SCHEDULED_ARRIVAL      ARRIVAL_TIME  ARRIVAL_DELAY       DIVERTED       
##  7:24:00 PM: 139    7:23:00 PM :  23   Min.   :-56.00   Min.   :0.000000  
##  7:25:00 PM: 120    10:25:00 AM:  20   1st Qu.:-12.00   1st Qu.:0.000000  
##  3:30:00 PM:  87    6:06:00 PM :  20   Median : -2.00   Median :0.000000  
##  8:39:00 PM:  79    10:53:00 AM:  19   Mean   : 10.65   Mean   :0.001803  
##  9:30:00 PM:  78    10:58:00 AM:  19   3rd Qu.: 16.00   3rd Qu.:0.000000  
##  7:05:00 PM:  75    (Other)    :7281   Max.   :716.00   Max.   :1.000000  
##  (Other)   :7185    NA's       : 381   NA's   :392                        
##    CANCELLED       CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY
##  Min.   :0.00000   A   :  71           Min.   :  0.00   Min.   :0.00  
##  1st Qu.:0.00000   B   : 180           1st Qu.:  0.00   1st Qu.:0.00  
##  Median :0.00000   C   : 127           Median : 10.00   Median :0.00  
##  Mean   :0.04869   NA's:7385           Mean   : 16.35   Mean   :0.01  
##  3rd Qu.:0.00000                       3rd Qu.: 23.00   3rd Qu.:0.00  
##  Max.   :1.00000                       Max.   :258.00   Max.   :9.00  
##                                        NA's   :5814     NA's   :5814  
##  AIRLINE_DELAY    LATE_AIRCRAFT_DELAY WEATHER_DELAY    
##  Min.   :  0.00   Min.   :  0.00      Min.   :  0.000  
##  1st Qu.:  0.00   1st Qu.:  0.00      1st Qu.:  0.000  
##  Median :  0.00   Median :  2.00      Median :  0.000  
##  Mean   : 17.06   Mean   : 23.24      Mean   :  3.873  
##  3rd Qu.: 18.00   3rd Qu.: 30.00      3rd Qu.:  0.000  
##  Max.   :435.00   Max.   :435.00      Max.   :638.000  
##  NA's   :5814     NA's   :5814        NA's   :5814     
##                         AIRLINE.y   
##  Delta Air Lines Inc.        :2289  
##  American Eagle Airlines Inc.:2080  
##  Atlantic Southeast Airlines :1553  
##  Skywest Airlines Inc.       :1326  
##  Frontier Airlines Inc.      : 515  
##  Alaska Airlines Inc.        :   0  
##  (Other)                     :   0  
##                                               AIRPORT.x   
##  Cincinnati/Northern Kentucky International Airport:3886  
##  Chicago O'Hare International Airport              : 727  
##  Dallas/Fort Worth International Airport           : 458  
##  Hartsfield-Jackson Atlanta International Airport  : 412  
##  Newark Liberty International Airport              : 299  
##  George Bush Intercontinental Airport              : 247  
##  (Other)                                           :1734  
##                CITY.x        STATE.x     COUNTRY.x    LATITUDE.x   
##  Covington        :3886   KY     :3886   USA:7763   Min.   :25.79  
##  Chicago          : 727   IL     : 727              1st Qu.:36.12  
##  Dallas-Fort Worth: 458   TX     : 705              Median :39.05  
##  Atlanta          : 412   FL     : 561              Mean   :37.64  
##  Newark           : 299   GA     : 412              3rd Qu.:39.05  
##  Houston          : 247   NJ     : 302              Max.   :47.45  
##  (Other)          :1734   (Other):1170                             
##   LONGITUDE.x                                                   AIRPORT.y   
##  Min.   :-122.37   Cincinnati/Northern Kentucky International Airport:3877  
##  1st Qu.: -87.90   Chicago O'Hare International Airport              : 762  
##  Median : -84.66   Dallas/Fort Worth International Airport           : 460  
##  Mean   : -87.06   Hartsfield-Jackson Atlanta International Airport  : 413  
##  3rd Qu.: -84.66   Newark Liberty International Airport              : 294  
##  Max.   : -71.01   George Bush Intercontinental Airport              : 242  
##                    (Other)                                           :1715  
##                CITY.y        STATE.y     COUNTRY.y    LATITUDE.y   
##  Covington        :3877   KY     :3877   USA:7763   Min.   :25.79  
##  Chicago          : 762   IL     : 762              1st Qu.:36.08  
##  Dallas-Fort Worth: 460   TX     : 702              Median :39.05  
##  Atlanta          : 413   FL     : 566              Mean   :37.64  
##  Newark           : 294   GA     : 413              3rd Qu.:39.05  
##  Houston          : 242   NJ     : 297              Max.   :47.45  
##  (Other)          :1715   (Other):1146                             
##   LONGITUDE.y     
##  Min.   :-122.37  
##  1st Qu.: -87.90  
##  Median : -84.66  
##  Mean   : -87.02  
##  3rd Qu.: -84.66  
##  Max.   : -71.01  
##