Assingment week 5 - Davi Krause

Author

Davi Krause

A Look into Flights Datasheet

Source: FAA Aircraft registry,
https://www.faa.gov/licenses_certificates/aircraft_certification/ aircraft_registry/releasable_aircraft_download/

First we must download the date set and look at it to better understand it. Here we must also start our libraries.

#install.packages("nycflights13")
library(treemap)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(RColorBrewer)
library(nycflights13)
library(RColorBrewer)
data(flights)
head(flights)

# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      542            540         2      923            850
4  2013     1     1      544            545        -1     1004           1022
5  2013     1     1      554            600        -6      812            837
6  2013     1     1      554            558        -4      740            728
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Looking at the data set, we see we have NA values. We must take them out for a proper analyses.

flights_nona <- flights |>
  filter(!is.na(distance) & !is.na(air_time))  
# remove na's for distance and arr_delay

The data set

Looking at the data set we see headers such, “tailnum”, “distance” and so on, only by this we cannot be sure of what it really means, it it in miles, kilometer, air-miles and so on. And so this is the R source for the data set, it explains in more detail it all.

With it we got to know that distance in miles, and tailnum is the aircraft of the flight.

Aircraft speed

I am interested to see the average speed in regards of the aircraft, what impacts it ? The distance, the time or the number of flights done with it . So we must compile it all into a new frame based on tail number.

First we must create a analyses on the speed, a new column that divides distance for the air_time.

flights_nona <- flights_nona %>% #selects the dataset
  select(tailnum, air_time, distance) %>% #selects the column used
  mutate( #funcition that creates something new
    Speed_avarage = (distance*60)/air_time #operation
  )

head(flights_nona)

# A tibble: 6 × 4
  tailnum air_time distance Speed_avarage
  <chr>      <dbl>    <dbl>         <dbl>
1 N14228       227     1400          370.
2 N24211       227     1416          374.
3 N619AA       160     1089          408.
4 N804JB       183     1576          517.
5 N668DN       116      762          394.
6 N39463       150      719          288.

Now Lets separate only these information from the original set to a new one only with what we will need.

by_tailnum <- flights_nona |>
  group_by(tailnum) |>  # group all tailnumbers together
  summarise(count = n(),   # counts totals for each tailnumber
            dist = mean(distance), # calculates the mean distance traveled
            time = mean(air_time), # calculates mean airtime for the plain
            Speed = mean(Speed_avarage)
            ) # calculates the mean avarage speed

head(by_tailnum)

# A tibble: 6 × 5
  tailnum count  dist  time Speed
  <chr>   <int> <dbl> <dbl> <dbl>
1 D942DN      4  854. 135.   381.
2 N0EGMQ    352  679. 104.   391.
3 N10156    145  756. 115.   385.
4 N102UW     48  536.  82.8  394.
5 N103US     46  535.  83.3  388.
6 N104UW     46  535.  81.3  400.

Now only separating the fastest ones for easier viewing, because there still is 4037 rows. Separating 100 wont damage the analyses but will help the view.

top100 <- by_tailnum |> # select the 100 hight speeds
head(100) |>
arrange(Speed) # sort ascending - heatmap displays descending costs
row.names(top100) <- top100$tailnum

Warning: Setting row names on a tibble is deprecated.

mat <- data.matrix(top100)
#arrganes all into a matrix, the way heat maps are rceated. 
mat_final <- mat[,2:5]
#take the first collum witch is the name of the plain, it is not a

Now we plot the graph.

heatmap(mat_final,
Rowv = NA, Colv = NA,
col= heat.colors(20),
s=0.6,
v=1,
scale="column",
margins=c(7,10),
main = "Heat Map Avarage Aircraft Speed",
xlab = "Flight Characteristics",
ylab = "Air craft Tail Number",
labCol = c("Flights","Distance","Flight_time","Avarage Speed"),
cexCol=1,
cexRow =1)

layout: widths =  0.05 4 , heights =  0.25 4 ; lmat=
     [,1] [,2]
[1,]    0    3
[2,]    2    1

Whow!

About the graph

It is never easy to understand heat maps at first. The more orange the higher the number so, High average speeds plains have less flights, that might be because they are newer and so more technological, witch explains why they are faster. Another thing that is really interesting is the fact that the speed grows but distance and flight time are almost always constant among each other, the colors change together, that is because of cruse speed of the plains that can only be achieved a long time after the flight begging and must be doped a long time before it ends for it to land and rise correctly. And so the more a plain is in the air the more is can be in its highest speed. Highring its average speed. The relationship with distance is the same, the highest the distance more time we have in high altitudes and so in higher speeds. Correlating average speed, airtime and distance in a direct relationship.

One important thing is that the airships name could be taken of and would not damage the analyses.