nyc flight

Author

bodidi

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data(flight)
Warning in data(flight): data set 'flight' not found
summary(flights)
      year          month             day           dep_time     sched_dep_time
 Min.   :2023   Min.   : 1.000   Min.   : 1.00   Min.   :   1    Min.   : 500  
 1st Qu.:2023   1st Qu.: 3.000   1st Qu.: 8.00   1st Qu.: 931    1st Qu.: 930  
 Median :2023   Median : 6.000   Median :16.00   Median :1357    Median :1359  
 Mean   :2023   Mean   : 6.423   Mean   :15.74   Mean   :1366    Mean   :1364  
 3rd Qu.:2023   3rd Qu.: 9.000   3rd Qu.:23.00   3rd Qu.:1804    3rd Qu.:1759  
 Max.   :2023   Max.   :12.000   Max.   :31.00   Max.   :2400    Max.   :2359  
                                                 NA's   :10738                 
   dep_delay          arr_time     sched_arr_time   arr_delay       
 Min.   : -50.00   Min.   :   1    Min.   :   1   Min.   : -97.000  
 1st Qu.:  -6.00   1st Qu.:1105    1st Qu.:1135   1st Qu.: -22.000  
 Median :  -2.00   Median :1519    Median :1551   Median : -10.000  
 Mean   :  13.84   Mean   :1497    Mean   :1552   Mean   :   4.345  
 3rd Qu.:  10.00   3rd Qu.:1946    3rd Qu.:2007   3rd Qu.:   9.000  
 Max.   :1813.00   Max.   :2400    Max.   :2359   Max.   :1812.000  
 NA's   :10738     NA's   :11453                  NA's   :12534     
   carrier              flight         tailnum             origin         
 Length:435352      Min.   :   1.0   Length:435352      Length:435352     
 Class :character   1st Qu.: 364.0   Class :character   Class :character  
 Mode  :character   Median : 734.0   Mode  :character   Mode  :character  
                    Mean   : 785.2                                        
                    3rd Qu.:1188.0                                        
                    Max.   :1972.0                                        
                                                                          
     dest              air_time        distance           hour      
 Length:435352      Min.   : 18.0   Min.   :  80.0   Min.   : 5.00  
 Class :character   1st Qu.: 77.0   1st Qu.: 479.0   1st Qu.: 9.00  
 Mode  :character   Median :121.0   Median : 762.0   Median :13.00  
                    Mean   :141.8   Mean   : 977.5   Mean   :13.35  
                    3rd Qu.:177.0   3rd Qu.:1182.0   3rd Qu.:17.00  
                    Max.   :701.0   Max.   :4983.0   Max.   :23.00  
                    NA's   :12534                                   
     minute        time_hour                     
 Min.   : 0.00   Min.   :2023-01-01 05:00:00.00  
 1st Qu.:10.00   1st Qu.:2023-03-30 20:00:00.00  
 Median :29.00   Median :2023-06-27 08:00:00.00  
 Mean   :28.53   Mean   :2023-06-29 10:02:22.39  
 3rd Qu.:45.00   3rd Qu.:2023-09-27 11:00:00.00  
 Max.   :59.00   Max.   :2023-12-31 23:00:00.00  
                                                 
bradleysplot <- flights |>
  select('air_time','distance', 'dest') 
bradleysplot
# A tibble: 435,352 × 3
   air_time distance dest 
      <dbl>    <dbl> <chr>
 1      367     2500 SMF  
 2      108      760 ATL  
 3      190     1576 BQN  
 4      108      636 CHS  
 5       80      488 DTW  
 6      154     1085 MIA  
 7      192     1576 BQN  
 8      119      719 ORD  
 9      258     1400 IAH  
10      157     1065 FLL  
# ℹ 435,342 more rows
bradleysplot <- flights |>
  filter(dest %in% c("LAX", "IAD")) |>  
  select(air_time, distance, dest)
ggplot(bradleysplot, aes(x = air_time, fill = dest)) +
  geom_histogram() + 
  scale_fill_manual(values = c("LAX" = "blue", "IAD" = "red")) +
  labs( title = "Air Time for LAX & IAD Flights",
    x = "Air Time",
    y = "Distance",
    fill = "Destination",
    caption = "Source: NYC Flight 23" )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 299 rows containing non-finite outside the scale range
(`stat_bin()`).

• Write a brief paragraph that describes the visualization you have created and at least one aspect of the plot that you would like to highlight. The paragraph should be around 150-250 words as a good estimate.

So basically the graph I created is a histogram that shows us planes that were taking off from IAD (Washington Dulles international airport and LAX ( Los Angeles international airport) . the graph shows the times the planes that took off from these two airports and the distance they covered. The data isn’t really accurate in my option because I spend almost 3 days trying different codes just to get something that would run because I kept getting errors . till I finally decided to just copy the hate crimes code which them allowed me to run the code and the histogram showed up. I feel like if I was able to filter the work a little more I would been able to get a graph that showed us way more than this one.