I’ve read the assignment thoroughly and know what is required to complete all three projects. As always, before I begin the task, I have to make sure all necessary packages are installed and loaded.
These include: 1. dplyr 2. magrittr 3. utils 4. dataset 5. nycflights13
Next, I load the data set “flights.” This is a large data set, so I don’t want to print it all. I create a tibble so I am able to view just a snapshot.
flts.tbl <- tbl_df(flights)
flts.tbl
## # A tibble: 336,776 x 19
## year month day dep_time sched_dep_time dep_delay arr_time
## <int> <int> <int> <int> <int> <dbl> <int>
## 1 2013 1 1 517 515 2 830
## 2 2013 1 1 533 529 4 850
## 3 2013 1 1 542 540 2 923
## 4 2013 1 1 544 545 -1 1004
## 5 2013 1 1 554 600 -6 812
## 6 2013 1 1 554 558 -4 740
## 7 2013 1 1 555 600 -5 913
## 8 2013 1 1 557 600 -3 709
## 9 2013 1 1 557 600 -3 838
## 10 2013 1 1 558 600 -2 753
## # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
## # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## # minute <dbl>, time_hour <dttm>
Next, I run the commands.
Display the minimum, maximum, and average flight time and average distance traveled of all United Airline(UA) flights departing JFK during March 2013
I think I“ll have to complete Project 1 tasks in three steps:
flts.tbl %>%
select(carrier, origin, month, air_time, distance) %>%
filter(month==3) %>%
summarise_each(funs(min(., na.rm=TRUE), max(., na.rm=TRUE)), matches("air_time"))
## `summarise_each()` is deprecated.
## Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
## To map `funs` over a selection of variables, use `summarise_at()`
## # A tibble: 1 x 2
## air_time_min air_time_max
## <dbl> <dbl>
## 1 21 695
flts.tbl %>%
select(carrier, origin, month, air_time, distance) %>%
filter(month==3) %>%
summarise(avg_flttime = mean(air_time, na.rm=TRUE))
## # A tibble: 1 x 1
## avg_flttime
## <dbl>
## 1 149.077
flts.tbl %>%
select(carrier, origin, month, air_time, distance) %>%
filter(month==3) %>%
summarise(avg_dist = mean(distance, na.rm=TRUE))
## # A tibble: 1 x 1
## avg_dist
## <dbl>
## 1 1011.987
I’m having trouble with these commands. I tried my best, did research, watched videos, but I’m unable to do this task.
Display the minimum, maximum, and average departure delays in minutes for June 2013 grouped by airport. *Some records for dep_delay are negative entries. I have to include only departure delays, but not include early departures.
To find average departure delays in minutes
flts.tbl %>%
select(month, origin, dep_delay) %>%
filter(month==6, dep_delay>0) %>%
group_by(origin) %>%
summarise(avg_del = mean(dep_delay, na.rm=TRUE))
## # A tibble: 3 x 2
## origin avg_del
## <chr> <dbl>
## 1 EWR 47.92212
## 2 JFK 47.98522
## 3 LGA 54.96745
To find minimum and maximum delays in minutes
flts.tbl %>%
select(month, origin, dep_delay) %>%
filter(month==6, dep_delay>0) %>%
group_by(origin) %>%
summarise_each(funs(min(., na.rm=TRUE), max(., na.rm=TRUE)), matches("dep_delay"))
## `summarise_each()` is deprecated.
## Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
## To map `funs` over a selection of variables, use `summarise_at()`
## # A tibble: 3 x 3
## origin dep_delay_min dep_delay_max
## <chr> <dbl> <dbl>
## 1 EWR 1 502
## 2 JFK 1 1137
## 3 LGA 1 803
Display the minimum, maximum and average miles traveled per hour for United Airlines and American Airlines flights flying between all three airports and Chicago’s O’Hare International Airport in June, July and August 2013.
*air_time is recorded in minutes, not hours, so I have to mutate (divide by 60 to convert minutes into hours)
flts.tbl %>%
select(month, origin, dest, air_time, carrier) %>%
filter(month==6 | month==7 | month==8)%>%
group_by(origin, dest, carrier) %>%
summarise(avg_miles = mean(air_time/60, na.rm=TRUE)) %>%
summarise_each(funs(min(., na.rm=TRUE), max(., na.rm=TRUE)), matches("air_time"))
## `summarise_each()` is deprecated.
## Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
## To map `funs` over a selection of variables, use `summarise_at()`
## # A tibble: 201 x 2
## # Groups: origin [?]
## origin dest
## <chr> <chr>
## 1 EWR ALB
## 2 EWR ANC
## 3 EWR ATL
## 4 EWR AUS
## 5 EWR AVL
## 6 EWR BDL
## 7 EWR BNA
## 8 EWR BOS
## 9 EWR BQN
## 10 EWR BTV
## # ... with 191 more rows
I am unable to select only UA and AA. I keep running into an error. I’ve researched and tried my best to find a solution, but no luck.