Looking at the data set, we see we have NA values. We must take them out for a proper analyses.
flights_nona <- flights |>filter(!is.na(distance) &!is.na(air_time)) # remove na's for distance and arr_delay
The data set
Looking at the data set we see headers such, “tailnum”, “distance” and so on, only by this we cannot be sure of what it really means, it it in miles, kilometer, air-miles and so on. And so this is the R source for the data set, it explains in more detail it all.
With it we got to know that distance in miles, and tailnum is the aircraft of the flight.
Aircraft speed
I am interested to see the average speed in regards of the aircraft, what impacts it ? The distance, the time or the number of flights done with it . So we must compile it all into a new frame based on tail number.
First we must create a analyses on the speed, a new column that divides distance for the air_time.
flights_nona <- flights_nona %>%#selects the datasetselect(tailnum, air_time, distance) %>%#selects the column usedmutate( #funcition that creates something newSpeed_avarage = (distance*60)/air_time #operation )head(flights_nona)
Now Lets separate only these information from the original set to a new one only with what we will need.
by_tailnum <- flights_nona |>group_by(tailnum) |># group all tailnumbers togethersummarise(count =n(), # counts totals for each tailnumberdist =mean(distance), # calculates the mean distance traveledtime =mean(air_time), # calculates mean airtime for the plainSpeed =mean(Speed_avarage) ) # calculates the mean avarage speedhead(by_tailnum)
Now only separating the fastest ones for easier viewing, because there still is 4037 rows. Separating 100 wont damage the analyses but will help the view.
Warning: Setting row names on a tibble is deprecated.
mat <-data.matrix(top100)#arrganes all into a matrix, the way heat maps are rceated. mat_final <- mat[,2:5]#take the first collum witch is the name of the plain, it is not a
It is never easy to understand heat maps at first. The more orange the higher the number so, High average speeds plains have less flights, that might be because they are newer and so more technological, witch explains why they are faster. Another thing that is really interesting is the fact that the speed grows but distance and flight time are almost always constant among each other, the colors change together, that is because of cruse speed of the plains that can only be achieved a long time after the flight begging and must be doped a long time before it ends for it to land and rise correctly. And so the more a plain is in the air the more is can be in its highest speed. Highring its average speed. The relationship with distance is the same, the highest the distance more time we have in high altitudes and so in higher speeds. Correlating average speed, airtime and distance in a direct relationship.
One important thing is that the airships name could be taken of and would not damage the analyses.