How many tornadoes occurred in WA (Washington) and MS (Mississippi)?
sum(st=="WA" | st == "MS")
## [1] 530
How many tornadoes occurred in WA before 2012?
sum(st=="WA" & yr <= 2012 )
## [1] 13
Get all tornadoes that occurred in WA in 2011, 2013, and 2014.
sum(st=="WA" & yr %in% c(2011,2013,2014))
## [1] 7
The List is as follows:
filter(df_t, st=="WA" & yr %in% c(2011,2013,2014))
## yr mo dy date time tz st stf stn f
## 1 2011 5 27 2011-05-27 14:50:00 3 WA 53 1 EF-0
## 2 2011 10 5 2011-10-05 16:19:00 3 WA 53 2 EF-0
## 3 2013 3 21 2013-03-21 18:00:00 3 WA 53 0 EF-0
## 4 2013 9 30 2013-09-30 08:20:00 3 WA 53 0 EF-1
## 5 2014 4 27 2014-04-27 18:30:00 3 WA 53 0 EF-0
## 6 2014 8 13 2014-08-13 19:30:00 3 WA 53 0 EF-0
## 7 2014 10 23 2014-10-23 14:40:00 3 WA 53 0 EF-1
Which ’mo’ (month) had the most tornadoes in HI (Hawaii)? How do you know.
month.name[max((df_t[st=="HI",] %>% group_by(mo)%>% tally())$n)]
## [1] "February"
There were two tornadoes in the month of Feb, other months only had one each (as seen in table below).
df_t %>% group_by(st, mo) %>% filter(st=="HI") %>% tally()
## # A tibble: 5 × 3
## # Groups: st [1]
## st mo n
## <chr> <int> <int>
## 1 HI 2 2
## 2 HI 3 1
## 3 HI 4 1
## 4 HI 9 1
## 5 HI 12 1
Get the count of tornadoes by year.
df_t %>% group_by(yr) %>% tally()
## # A tibble: 9 × 2
## yr n
## <int> <int>
## 1 2007 1116
## 2 2008 1738
## 3 2009 1182
## 4 2010 1315
## 5 2011 1777
## 6 2012 956
## 7 2013 939
## 8 2014 906
## 9 2015 1212
Find the carrier (UniqueCarrier), the flight number (FlightNum), and origin (origin) of all the flights that: Departed (DepTime) after 9pm and flew (Dest) to Nashville (’BNA’).
df_a %>%
select(UniqueCarrier, FlightNum, Origin) %>%
filter(DepTime>2100 & Dest=="BNA")
## UniqueCarrier FlightNum Origin
## 1 WN 2181 BWI
## 2 OH 5421 CVG
## 3 DL 938 ATL
## 4 AA 2435 ORD
## 5 DH 7332 ORD
Find the carrier (UniqueCarrier), the flight number (FlightNum), and origin (origin) of all the flights that: departed (DepTime) after 9pm, but Originated (Origin) from Nashville ’BNA’ or landed (Dest) in Memphis ’MEM’.
df_a %>%
select(UniqueCarrier, FlightNum, Origin) %>%
filter(DepTime>2100 & (Origin=="BNA" | Dest =="MEM"))
## UniqueCarrier FlightNum Origin
## 1 WN 1447 BNA
## 2 WN 2658 BNA
## 3 WN 1511 BNA
## 4 WN 1732 BNA
## 5 WN 2769 BNA
## 6 WN 459 BNA
## 7 OO 6838 ORD
## 8 DL 579 ATL
## 9 FL 1723 ATL
## 10 NW 279 DTW
## 11 DH 6270 CVG
Find the carrier (UniqueCarrier), the flight number (FlightNum), and origin (origin) of all the flights that: were early (ArrDelay) and strictly less than 8 minutes of travel (AirTime). Note that a negative delay is a flight that arrived early.
df_a %>%
select(UniqueCarrier, FlightNum, Origin) %>%
filter(ArrDelay<0 & AirTime<8)
## UniqueCarrier FlightNum Origin
## 1 OH 5495 ATL
## 2 OH 5106 SWF
## 3 OH 5165 FNT
## 4 OH 5340 BOS
## 5 OH 5379 DCA
## 6 OH 5791 HOU
## 7 OH 5949 ROC
## 8 OH 5762 SBN
## 9 OH 5242 PIT
## 10 OH 5945 SLC
## 11 EV 4197 ATL
## 12 EV 4527 ATL
## 13 EV 4588 ATL
## 14 EV 4755 ATL
## 15 EV 4784 DFW
## 16 EV 4909 CVG
## 17 HA 90 LNY
## 18 DH 7459 TYS
Find the carrier (UniqueCarrier), the flight number (FlightNum), and origin (origin) of all the flights that: arrived (ArrDelay) more than two hours late, but didn’t leave late (DepDelay).
df_a %>%
select(UniqueCarrier, FlightNum, Origin) %>%
filter(ArrDelay>120 & DepDelay<=0)
## UniqueCarrier FlightNum Origin
## 1 TZ 505 PIE
Arrange: Sort flights to find the 5 least delayed (DepDelay) flights
df_a %>%
arrange(-desc(DepDelay)) %>%
slice(1:5)
## Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime
## 1 2004 3 25 4 2047 2113 2225 2227
## 2 2004 3 25 4 911 935 1053 1109
## 3 2004 3 25 4 1053 1115 1129 1159
## 4 2004 3 25 4 2109 2130 2159 2218
## 5 2004 3 25 4 1015 1035 1226 1243
## UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime
## 1 MQ 3647 N817MQ 98 74 52
## 2 OH 5260 N933CA 102 94 75
## 3 NW 707 N986US 36 44 28
## 4 AA 1685 N4XGAA 50 48 36
## 5 NW 1926 N9346 71 68 46
## ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled
## 1 -2 -26 DFW SGF 364 3 43 0
## 2 -16 -24 RIC CVG 413 1 26 0
## 3 -30 -22 GTF FCA 146 2 6 0
## 4 -19 -21 HOU AUS 148 5 9 0
## 5 -17 -20 MDW DTW 229 12 13 0
## CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
## LateAircraftDelay
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
Arrange: Sort flights to find the 5 least delayed (ArrDelay) flights
df_a %>%
arrange(-desc(ArrDelay)) %>%
slice(1:5)
## Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime
## 1 2004 3 25 4 600 605 751 843
## 2 2004 3 25 4 557 600 722 811
## 3 2004 3 25 4 1530 1540 1720 1809
## 4 2004 3 25 4 1355 1401 1702 1750
## 5 2004 3 25 4 1647 1651 1932 2020
## UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime
## 1 OH 5071 N374CA 111 158 88
## 2 OH 5710 N473CA 85 131 75
## 3 OH 5082 N510CA 110 149 95
## 4 OO 6704 N964SW 121 169 314
## 5 DL 1181 N375DA 225 269 205
## ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled
## 1 -52 -5 BUF ATL 712 10 13 0
## 2 -49 -3 BGM CVG 506 2 8 0
## 3 -49 -10 BUF ATL 712 6 9 0
## 4 -48 0 MFR DEN 964 5 6 0
## 5 -48 -4 BOS DFW 1562 5 15 0
## CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
## LateAircraftDelay
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
Transform using the Airline data: create a new data frame with only the columns ’DepDelay’, ’ArrDelay’, ’Origin’, ’Dest’, AirTime’, and ’Distance’. In addition keep only the observation for flights that were delayed (DepDelay) by more than 1 hour.
new_df_b1 <- subset(df_a, DepDelay>60)
new_df_b <-select(new_df_b1,c(DepDelay, ArrDelay, Origin, Dest, AirTime, Distance))
Add a new column reporting departure delays minus the median departure delay to the new data frame
med_diff <- new_df_b$DepDelay - median(new_df_b$DepDelay)
new_df_b01 <- cbind(new_df_b,med_diff)
head(kable(new_df_b01),10)
## [1] "| | DepDelay| ArrDelay|Origin |Dest | AirTime| Distance| med_diff|"
## [2] "|:-----|--------:|--------:|:------|:----|-------:|--------:|--------:|"
## [3] "|8 | 112| 102|DEN |SNA | 116| 846| 17|"
## [4] "|23 | 65| 78|ORD |ROC | 72| 528| -30|"
## [5] "|44 | 73| 72|IAH |ORD | 139| 925| -22|"
## [6] "|65 | 164| 144|BUF |ORD | 79| 473| 69|"
## [7] "|66 | 130| 138|ORD |MSP | 57| 334| 35|"
## [8] "|78 | 122| 144|ORD |SAN | 238| 1723| 27|"
## [9] "|86 | 65| 180|CMH |ORD | 71| 296| -30|"
## [10] "|107 | 86| 79|LAX |DEN | 109| 862| -9|"
Convert the departure and arrival delays columns from minutes into hours
conv_depdel_hrs <- substr(times((new_df_b01$DepDelay%/%60 + new_df_b01$DepDelay%%60 /60)/24), 1, 5)
conv_arrdel_hrs <- substr(times((new_df_b01$ArrDelay%/%60 + new_df_b01$ArrDelay%%60 /60)/24), 1, 5)
new_df_b02 <- subset(new_df_b01, select = -c(1:2))
new_df_b03 <- cbind(new_df_b02,conv_arrdel_hrs,conv_depdel_hrs)
head(kable(new_df_b03),10)
## [1] "| |Origin |Dest | AirTime| Distance| med_diff|conv_arrdel_hrs |conv_depdel_hrs |"
## [2] "|:-----|:------|:----|-------:|--------:|--------:|:---------------|:---------------|"
## [3] "|8 |DEN |SNA | 116| 846| 17|01:42 |01:52 |"
## [4] "|23 |ORD |ROC | 72| 528| -30|01:18 |01:05 |"
## [5] "|44 |IAH |ORD | 139| 925| -22|01:12 |01:13 |"
## [6] "|65 |BUF |ORD | 79| 473| 69|02:24 |02:44 |"
## [7] "|66 |ORD |MSP | 57| 334| 35|02:18 |02:10 |"
## [8] "|78 |ORD |SAN | 238| 1723| 27|02:24 |02:02 |"
## [9] "|86 |CMH |ORD | 71| 296| -30|03:00 |01:05 |"
## [10] "|107 |LAX |DEN | 109| 862| -9|01:19 |01:26 |"
Add a column with the average flight speed (in mph)
avg_flt_spd_in_mph <- new_df_b1$Distance / new_df_b1$AirTime * 60
new_df_b04 <- cbind(new_df_b03,avg_flt_spd_in_mph)
head(kable(new_df_b04),10)
## [1] "| |Origin |Dest | AirTime| Distance| med_diff|conv_arrdel_hrs |conv_depdel_hrs | avg_flt_spd_in_mph|"
## [2] "|:-----|:------|:----|-------:|--------:|--------:|:---------------|:---------------|------------------:|"
## [3] "|8 |DEN |SNA | 116| 846| 17|01:42 |01:52 | 437.58621|"
## [4] "|23 |ORD |ROC | 72| 528| -30|01:18 |01:05 | 440.00000|"
## [5] "|44 |IAH |ORD | 139| 925| -22|01:12 |01:13 | 399.28058|"
## [6] "|65 |BUF |ORD | 79| 473| 69|02:24 |02:44 | 359.24051|"
## [7] "|66 |ORD |MSP | 57| 334| 35|02:18 |02:10 | 351.57895|"
## [8] "|78 |ORD |SAN | 238| 1723| 27|02:24 |02:02 | 434.36975|"
## [9] "|86 |CMH |ORD | 71| 296| -30|03:00 |01:05 | 250.14085|"
## [10] "|107 |LAX |DEN | 109| 862| -9|01:19 |01:26 | 474.49541|"
Report your new table using the function summary()
summary(new_df_b04)
## Origin Dest AirTime Distance
## Length:783 Length:783 Min. :-1395.0 Min. : 56.0
## Class :character Class :character 1st Qu.: 55.0 1st Qu.: 316.0
## Mode :character Mode :character Median : 90.0 Median : 590.0
## Mean : 101.1 Mean : 679.2
## 3rd Qu.: 134.0 3rd Qu.: 867.0
## Max. : 346.0 Max. :2704.0
## med_diff conv_arrdel_hrs conv_depdel_hrs avg_flt_spd_in_mph
## Min. :-34.00 Length:783 Length:783 Min. :-11.35
## 1st Qu.:-21.00 Class :character Class :character 1st Qu.:328.80
## Median : 0.00 Mode :character Mode :character Median :393.05
## Mean : 10.97 Mean :378.69
## 3rd Qu.: 29.00 3rd Qu.:442.68
## Max. :985.00 Max. :543.11