How many flights were carried out at the first day of January 2013?
## [1] 842
In which day in 2013 had the maximum number of flights from New York?
## Month Day Total_flight
## 323 11 27 1014
Giving the list of 10 airline companies which performed largest number of flights. Showing the result in descending order
## # A tibble: 10 x 2
## carrier n
## <chr> <int>
## 1 UA 58665
## 2 B6 54635
## 3 EV 54173
## 4 DL 48110
## 5 AA 32729
## 6 MQ 26397
## 7 US 20536
## 8 9E 18460
## 9 WN 12275
## 10 VX 5162
If we only collect data without missing values from the variable “dep_delay” and giving name for it such df_full_dep_delay, how many rows in this dataset?
## # A tibble: 328,521 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <fct> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## 7 2013 1 1 555 600 -5 913 854
## 8 2013 1 1 557 600 -3 709 723
## 9 2013 1 1 557 600 -3 838 846
## 10 2013 1 1 558 600 -2 753 745
## # ... with 328,511 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Calculating the average delay time for 10 airline companies in the list which created from the question 3 and arrange it in descending order following the average delay time
## # A tibble: 16 x 2
## carrier avg_delay_company
## <chr> <dbl>
## 1 US 3.78
## 2 HA 4.90
## 3 AS 5.80
## 4 AA 8.59
## 5 DL 9.26
## 6 MQ 10.6
## 7 UA 12.1
## 8 OO 12.6
## 9 VX 12.9
## 10 B6 13.0
## 11 9E 16.7
## 12 WN 17.7
## 13 FL 18.7
## 14 YV 19.0
## 15 EV 20.0
## 16 F9 20.2
Create the null vector (without elements), given the name space_vector. Try to write the for loop to calculate the sum square from 1 to 10 and then assign the resulte to space_vector
## [1] 1 4 9 16 25 36 49 64 81 100
Create the function (count_missing) in order to count the number of observations missing from the given vector. Using this function and the for loop to summarize the number of observations for all flights’ columns and show the result
## [1] 5
## col_name n_missing
## 1 year 0
## 2 month 0
## 3 day 0
## 4 dep_time 8255
## 5 sched_dep_time 0
## 6 dep_delay 8255
## 7 arr_time 8713
## 8 sched_arr_time 0
## 9 arr_delay 9430
## 10 carrier 0
## 11 flight 0
## 12 tailnum 2512
## 13 origin 0
## 14 dest 0
## 15 air_time 9430
## 16 distance 0
## 17 hour 0
## 18 minute 0
## 19 time_hour 0
Learn about Benford’s law and write the function to calculate the probability of one natural number randomly in the interval [1..9]
## [1] 0.17609126 0.12493874 0.09691001 0.07918125 0.06694679 0.05799195 0.05115252
## [8] 0.04575749 0.04139269
## digits Probability
## 1 1 0.17609126
## 2 2 0.12493874
## 3 3 0.09691001
## 4 4 0.07918125
## 5 5 0.06694679
## 6 6 0.05799195
## 7 7 0.05115252
## 8 8 0.04575749
## 9 9 0.04139269
The barplot using ggplot2