Prep

I’ve read the assignment thoroughly and know what is required to complete all three projects. As always, before I begin the task, I have to make sure all necessary packages are installed and loaded.

These include: 1. dplyr 2. magrittr 3. utils 4. dataset 5. nycflights13

Next, I load the data set “flights.” This is a large data set, so I don’t want to print it all. I create a tibble so I am able to view just a snapshot.

flts.tbl <- tbl_df(flights)
flts.tbl
## # A tibble: 336,776 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>
##  1  2013     1     1      517            515         2      830
##  2  2013     1     1      533            529         4      850
##  3  2013     1     1      542            540         2      923
##  4  2013     1     1      544            545        -1     1004
##  5  2013     1     1      554            600        -6      812
##  6  2013     1     1      554            558        -4      740
##  7  2013     1     1      555            600        -5      913
##  8  2013     1     1      557            600        -3      709
##  9  2013     1     1      557            600        -3      838
## 10  2013     1     1      558            600        -2      753
## # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
## #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>

Next, I run the commands.

Project 1

Task

Display the minimum, maximum, and average flight time and average distance traveled of all United Airline(UA) flights departing JFK during March 2013

I think I“ll have to complete Project 1 tasks in three steps:

flts.tbl %>%
  select(carrier, origin, month, air_time, distance) %>%
  filter(month==3) %>%
  summarise_each(funs(min(., na.rm=TRUE), max(., na.rm=TRUE)), matches("air_time"))
## `summarise_each()` is deprecated.
## Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
## To map `funs` over a selection of variables, use `summarise_at()`
## # A tibble: 1 x 2
##   air_time_min air_time_max
##          <dbl>        <dbl>
## 1           21          695
flts.tbl %>%
  select(carrier, origin, month, air_time, distance) %>%
  filter(month==3) %>%
  summarise(avg_flttime = mean(air_time, na.rm=TRUE))
## # A tibble: 1 x 1
##   avg_flttime
##         <dbl>
## 1     149.077
flts.tbl %>%
  select(carrier, origin, month, air_time, distance) %>%
  filter(month==3) %>%
  summarise(avg_dist = mean(distance, na.rm=TRUE))
## # A tibble: 1 x 1
##   avg_dist
##      <dbl>
## 1 1011.987

I’m having trouble with these commands. I tried my best, did research, watched videos, but I’m unable to do this task.

Project 2

Display the minimum, maximum, and average departure delays in minutes for June 2013 grouped by airport. *Some records for dep_delay are negative entries. I have to include only departure delays, but not include early departures.

To find average departure delays in minutes

flts.tbl %>%
  select(month, origin, dep_delay) %>%
  filter(month==6, dep_delay>0) %>%
  group_by(origin) %>%
  summarise(avg_del = mean(dep_delay, na.rm=TRUE))
## # A tibble: 3 x 2
##   origin  avg_del
##    <chr>    <dbl>
## 1    EWR 47.92212
## 2    JFK 47.98522
## 3    LGA 54.96745

To find minimum and maximum delays in minutes

flts.tbl %>%
  select(month, origin, dep_delay) %>%
  filter(month==6, dep_delay>0) %>%
  group_by(origin) %>%
  summarise_each(funs(min(., na.rm=TRUE), max(., na.rm=TRUE)), matches("dep_delay"))
## `summarise_each()` is deprecated.
## Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
## To map `funs` over a selection of variables, use `summarise_at()`
## # A tibble: 3 x 3
##   origin dep_delay_min dep_delay_max
##    <chr>         <dbl>         <dbl>
## 1    EWR             1           502
## 2    JFK             1          1137
## 3    LGA             1           803

Project 3

Display the minimum, maximum and average miles traveled per hour for United Airlines and American Airlines flights flying between all three airports and Chicago’s O’Hare International Airport in June, July and August 2013.

*air_time is recorded in minutes, not hours, so I have to mutate (divide by 60 to convert minutes into hours)
flts.tbl %>%
  select(month, origin, dest, air_time, carrier) %>%
  filter(month==6 | month==7 | month==8)%>%
  group_by(origin, dest, carrier) %>%
  summarise(avg_miles = mean(air_time/60, na.rm=TRUE)) %>%
  summarise_each(funs(min(., na.rm=TRUE), max(., na.rm=TRUE)), matches("air_time"))
## `summarise_each()` is deprecated.
## Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
## To map `funs` over a selection of variables, use `summarise_at()`
## # A tibble: 201 x 2
## # Groups:   origin [?]
##    origin  dest
##     <chr> <chr>
##  1    EWR   ALB
##  2    EWR   ANC
##  3    EWR   ATL
##  4    EWR   AUS
##  5    EWR   AVL
##  6    EWR   BDL
##  7    EWR   BNA
##  8    EWR   BOS
##  9    EWR   BQN
## 10    EWR   BTV
## # ... with 191 more rows

I am unable to select only UA and AA. I keep running into an error. I’ve researched and tried my best to find a solution, but no luck.