R-Test 1

Question 1

How many flights were carried out at the first day of January 2013?

## [1] 842

Question 2

In which day in 2013 had the maximum number of flights from New York?

##     Month Day Total_flight
## 323    11  27         1014

Question 3

Giving the list of 10 airline companies which performed largest number of flights. Showing the result in descending order

## # A tibble: 10 x 2
##    carrier     n
##    <chr>   <int>
##  1 UA      58665
##  2 B6      54635
##  3 EV      54173
##  4 DL      48110
##  5 AA      32729
##  6 MQ      26397
##  7 US      20536
##  8 9E      18460
##  9 WN      12275
## 10 VX       5162

Question 4

If we only collect data without missing values from the variable “dep_delay” and giving name for it such df_full_dep_delay, how many rows in this dataset?

## # A tibble: 328,521 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <fct> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013 1         1      517            515         2      830            819
##  2  2013 1         1      533            529         4      850            830
##  3  2013 1         1      542            540         2      923            850
##  4  2013 1         1      544            545        -1     1004           1022
##  5  2013 1         1      554            600        -6      812            837
##  6  2013 1         1      554            558        -4      740            728
##  7  2013 1         1      555            600        -5      913            854
##  8  2013 1         1      557            600        -3      709            723
##  9  2013 1         1      557            600        -3      838            846
## 10  2013 1         1      558            600        -2      753            745
## # ... with 328,511 more rows, and 11 more variables: arr_delay <dbl>,
## #   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## #   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Question 5

Calculating the average delay time for 10 airline companies in the list which created from the question 3 and arrange it in descending order following the average delay time

## # A tibble: 16 x 2
##    carrier avg_delay_company
##    <chr>               <dbl>
##  1 US                   3.78
##  2 HA                   4.90
##  3 AS                   5.80
##  4 AA                   8.59
##  5 DL                   9.26
##  6 MQ                  10.6 
##  7 UA                  12.1 
##  8 OO                  12.6 
##  9 VX                  12.9 
## 10 B6                  13.0 
## 11 9E                  16.7 
## 12 WN                  17.7 
## 13 FL                  18.7 
## 14 YV                  19.0 
## 15 EV                  20.0 
## 16 F9                  20.2

R-Test 2

Question 1

Create the null vector (without elements), given the name space_vector. Try to write the for loop to calculate the sum square from 1 to 10 and then assign the resulte to space_vector

##  [1]   1   4   9  16  25  36  49  64  81 100

Question 2

Create the function (count_missing) in order to count the number of observations missing from the given vector. Using this function and the for loop to summarize the number of observations for all flights’ columns and show the result

## [1] 5
##          col_name n_missing
## 1            year         0
## 2           month         0
## 3             day         0
## 4        dep_time      8255
## 5  sched_dep_time         0
## 6       dep_delay      8255
## 7        arr_time      8713
## 8  sched_arr_time         0
## 9       arr_delay      9430
## 10        carrier         0
## 11         flight         0
## 12        tailnum      2512
## 13         origin         0
## 14           dest         0
## 15       air_time      9430
## 16       distance         0
## 17           hour         0
## 18         minute         0
## 19      time_hour         0

R-Test 3

Question 1

Learn about Benford’s law and write the function to calculate the probability of one natural number randomly in the interval [1..9]

## [1] 0.17609126 0.12493874 0.09691001 0.07918125 0.06694679 0.05799195 0.05115252
## [8] 0.04575749 0.04139269
##   digits Probability
## 1      1  0.17609126
## 2      2  0.12493874
## 3      3  0.09691001
## 4      4  0.07918125
## 5      5  0.06694679
## 6      6  0.05799195
## 7      7  0.05115252
## 8      8  0.04575749
## 9      9  0.04139269

Question 2

The barplot using ggplot2