SNCF Trains and US Flights Project

Richard Loeur & Arnaud Lemoine & Antoine Cremel
18/12/2019

Transport Project

Our datasets:

  • SNCF train dataset (they used to have train I guess)

This dataset contain data that describe a train line so we have the train departure and arrival stations, the journey time (in minutes) and some more information like delay, cancelation…

  • 2015 US flights

This dataset is representing all the fligths in the US for the year 2015. It contains the flight number, the airline, the departure and arrival airport as well as the date and information about the distance or delay and more.

Summary of the Train Dataset

      year          month                 service    
 Min.   :2015   Min.   : 1.000   International: 432  
 1st Qu.:2016   1st Qu.: 3.000   National     :3600  
 Median :2017   Median : 6.000   NA's         :1430  
 Mean   :2017   Mean   : 6.369                       
 3rd Qu.:2018   3rd Qu.: 9.000                       
 Max.   :2018   Max.   :12.000                       

            departure_station             arrival_station journey_time_avg
 PARIS LYON          :1139    PARIS LYON          :1139   Min.   : 45.96  
 PARIS MONTPARNASSE  : 752    PARIS MONTPARNASSE  : 752   1st Qu.:100.77  
 PARIS EST           : 282    PARIS EST           : 282   Median :160.84  
 LYON PART DIEU      : 246    LYON PART DIEU      : 246   Mean   :165.39  
 PARIS NORD          : 188    PARIS NORD          : 188   3rd Qu.:205.70  
 MARSEILLE ST CHARLES: 174    MARSEILLE ST CHARLES: 174   Max.   :481.00  
 (Other)             :2681    (Other)             :2681                   
 total_num_trips num_of_canceled_trains comment_cancellations
 Min.   :  6.0   Min.   :  0.000        Mode:logical         
 1st Qu.:181.0   1st Qu.:  0.000        NA's:5462            
 Median :238.0   Median :  1.000                             
 Mean   :281.1   Mean   :  7.737                             
 3rd Qu.:390.0   3rd Qu.:  4.000                             
 Max.   :878.0   Max.   :279.000                             

 num_late_at_departure avg_delay_late_at_departure avg_delay_all_departing
 Min.   :  0.00        Min.   :  0.00              Min.   : -4.468        
 1st Qu.: 10.00        1st Qu.: 11.98              1st Qu.:  0.896        
 Median : 23.00        Median : 15.84              Median :  1.783        
 Mean   : 41.58        Mean   : 16.81              Mean   :  2.539        
 3rd Qu.: 51.75        3rd Qu.: 20.28              3rd Qu.:  3.243        
 Max.   :451.00        Max.   :173.57              Max.   :173.571        

 comment_delays_at_departure num_arriving_late avg_delay_late_on_arrival
 Mode:logical                Min.   :  0.00    Min.   :  0.00           
 NA's:5462                   1st Qu.: 17.00    1st Qu.: 23.81           
                             Median : 30.00    Median : 30.76           
                             Mean   : 38.03    Mean   : 32.45           
                             3rd Qu.: 50.00    3rd Qu.: 38.77           
                             Max.   :235.00    Max.   :258.00           
                             NA's   :9         NA's   :9                
 avg_delay_all_arriving
 Min.   :-143.969      
 1st Qu.:   2.706      
 Median :   4.581      
 Mean   :   5.287      
 3rd Qu.:   7.252      
 Max.   :  36.817      

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                comment_delays_on_arrival     
 Ce mois-ci, l'OD a été touchée par les incidents suivants: \nLe 3 : Avarie à la caténaire à lâ\200\231entrée de la gare de Paris Montparnasse (46 TGV ; 476mn)\nLe 6 : épisode neigeux sur toute la France (187 TGV ; 18980mn ; 23 suppressions)\nLe 7 : épisode neigeux sur toute la France (134 TGV ; 6294mn ; 77 suppressions)\nLe 8 : épisode neigeux sur toute la France (166 TGV ; 2814mn ; 30 suppressions)\nLe 9 : épisode neigeux sur toute la France (151 TGV ; 2830mn ; 15 suppressions)\nLe 15 : Incident caténaire à lâ\200\231entrée de la gare de Paris Montparnasse (78 TGV ; 2349mn ; 15 suppressions)\nLe 15 : Dérangement dâ\200\231installation sur la ligne grande vitesse (22 TGV ; 672mn)\nLe 21 : Présences de Chèvres aux abords de la ligne grande vitesse à Marcoussis (15 TGV ; 292mn)\nLe 21 : Avarie Matérielle sur la ligne grande vitesse au niveau de St Leger (62 TGV ; 2150mn)\nLe 28 : Bâche dans la caténaire à lâ\200\231entrée de la gare de Paris Montparnasse (28 TGV ; 605mn):  34  
 Mois marqué par neuf accidents de personne et cinq heurts d'animaux, qui ont eu un fort impact sur l'ensemble des relations                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          :  19  
 Ce mois-ci, l'OD a été touchée par: _x000D_\nLe 7 : Défaillance matérielle à la sortie de la gare de Paris Montparnasse (87 TGV ; 2202â\200\231)_x000D_\nLe 11 : Divergence entre les équipements au sol et les TGV en circulation sur la grande ceinture parisienne (36 TGV ; 1034â\200\231)_x000D_\nLe 13 : Dérangement dâ\200\231aiguille sur le tronc commun des lignes grandes vitesse à St Arnoult (19 TGV ; 289â\200\231)_x000D_\nLe 13 : Défaut dâ\200\231alimentation électrique à la sortie de la gare de Paris Montparnasse (30 TGV ; 422â\200\231)_x000D_\nLe 17 : Rupture caténaire en gare de Paris Montparnasse (75 TGV ; 4801â\200\231)_x000D_\nLe 19 : Personnes dans les voies à la sortie de la gare de Paris Montparnasse (24 TGV ; 363â\200\231)_x000D_\nLe 23 : Présence dâ\200\231objets pris dans la caténaire à la sortie du Mans (16 TGV ; 372â\200\231)_x000D_\nDu 29 au 31 : Dérangement du poste de Vanves (450 TGV ; 27110â\200\231)                                                                                                    :  18  
 Des travaux de modernisation de l'infrastructure ont perturbé la régularité de cette relation en Juillet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           :  17  
 Des travaux de modernisation de l'infrastructure ont perturbé la régularité de cette relation en Juin                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              :  16  
 (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               :1437  
 NA's                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  :3921  
 delay_cause_external_cause delay_cause_rail_infrastructure
 Min.   :0.0000             Min.   :0.0000                 
 1st Qu.:0.1667             1st Qu.:0.1515                 
 Median :0.2571             Median :0.2353                 
 Mean   :0.2780             Mean   :0.2518                 
 3rd Qu.:0.3684             3rd Qu.:0.3333                 
 Max.   :1.0000             Max.   :1.0000                 
 NA's   :170                NA's   :170                    
 delay_cause_traffic_management delay_cause_rolling_stock
 Min.   :0.0000                 Min.   :0.00000          
 1st Qu.:0.0800                 1st Qu.:0.09292          
 Median :0.1613                 Median :0.15843          
 Mean   :0.1831                 Mean   :0.17877          
 3rd Qu.:0.2571                 3rd Qu.:0.24000          
 Max.   :1.0000                 Max.   :1.00000          
 NA's   :170                    NA's   :170              
 delay_cause_station_management delay_cause_travelers num_greater_15_min_late
 Min.   :0.00000                Min.   :0.00000       Min.   :  0.00         
 1st Qu.:0.00000                1st Qu.:0.00000       1st Qu.: 11.00         
 Median :0.05263                Median :0.02128       Median : 20.00         
 Mean   :0.06999                Mean   :0.03730       Mean   : 26.09         
 3rd Qu.:0.10256                3rd Qu.:0.05769       3rd Qu.: 35.00         
 Max.   :1.00000                Max.   :0.66667       Max.   :192.00         
 NA's   :170                    NA's   :170           NA's   :5              
 avg_delay_late_greater_15_min num_greater_30_min_late num_greater_60_min_late
 Min.   :-118.022              Min.   : 0.00           Min.   : 0.000         
 1st Qu.:   8.994              1st Qu.: 4.00           1st Qu.: 1.000         
 Median :  31.533              Median : 9.00           Median : 3.000         
 Mean   :  28.984              Mean   :11.65           Mean   : 4.197         
 3rd Qu.:  41.000              3rd Qu.:16.00           3rd Qu.: 6.000         
 Max.   : 258.000              Max.   :91.00           Max.   :36.000         
 NA's   :5                     NA's   :5               NA's   :5              

Summary of the Flights Dataset

      YEAR          MONTH        DAY     DAY_OF_WEEK    AIRLINE  
 Min.   :2015   Min.   :1   Min.   :1   Min.   :4    UA     :40  
 1st Qu.:2015   1st Qu.:1   1st Qu.:1   1st Qu.:4    B6     :22  
 Median :2015   Median :1   Median :1   Median :4    OO     :22  
 Mean   :2015   Mean   :1   Mean   :1   Mean   :4    AA     :21  
 3rd Qu.:2015   3rd Qu.:1   3rd Qu.:1   3rd Qu.:4    EV     :20  
 Max.   :2015   Max.   :1   Max.   :1   Max.   :4    DL     :16  
                                                     (Other):59  
 FLIGHT_NUMBER   TAIL_NUMBER  ORIGIN_AIRPORT DESTINATION_AIRPORT
 Min.   :  17   N107SY :  1   BOS    : 12    IAH    : 24        
 1st Qu.: 513   N11140 :  1   ANC    : 10    DEN    : 20        
 Median :1222   N11150 :  1   LAS    : 10    DFW    : 15        
 Mean   :2000   N12142 :  1   LAX    : 10    EWR    : 13        
 3rd Qu.:2539   N12563 :  1   SFO    :  9    MIA    : 12        
 Max.   :7419   N12967 :  1   PHX    :  8    ATL    : 11        
                (Other):194   (Other):141    (Other):105        
 SCHEDULED_DEPARTURE DEPARTURE_TIME   DEPARTURE_DELAY      TAXI_OUT    
 Min.   :  5.0       Min.   :   2.0   Min.   :-18.000   Min.   : 4.00  
 1st Qu.:295.0       1st Qu.: 310.0   1st Qu.: -6.000   1st Qu.:11.00  
 Median :535.0       Median : 538.0   Median : -2.000   Median :14.00  
 Mean   :433.3       Mean   : 451.5   Mean   :  5.654   Mean   :16.15  
 3rd Qu.:550.0       3rd Qu.: 555.0   3rd Qu.:  3.500   3rd Qu.:19.00  
 Max.   :600.0       Max.   :2354.0   Max.   :213.000   Max.   :43.00  
                     NA's   :9        NA's   :9         NA's   :9      
   WHEELS_OFF     SCHEDULED_TIME   ELAPSED_TIME      AIR_TIME    
 Min.   :  14.0   Min.   : 36.0   Min.   : 35.0   Min.   : 20.0  
 1st Qu.: 320.5   1st Qu.:105.0   1st Qu.:111.5   1st Qu.: 85.0  
 Median : 553.0   Median :161.5   Median :163.0   Median :138.0  
 Mean   : 470.4   Mean   :164.3   Mean   :163.0   Mean   :138.9  
 3rd Qu.: 610.0   3rd Qu.:210.0   3rd Qu.:201.5   3rd Qu.:182.0  
 Max.   :1006.0   Max.   :404.0   Max.   :396.0   Max.   :376.0  
 NA's   :9                        NA's   :9       NA's   :9      
    DISTANCE        WHEELS_ON         TAXI_IN       SCHEDULED_ARRIVAL
 Min.   :  84.0   Min.   : 254.0   Min.   : 2.000   Min.   : 320.0   
 1st Qu.: 518.2   1st Qu.: 619.0   1st Qu.: 5.000   1st Qu.: 630.0   
 Median : 989.0   Median : 730.0   Median : 7.000   Median : 736.5   
 Mean   :1033.5   Mean   : 750.4   Mean   : 7.906   Mean   : 757.8   
 3rd Qu.:1448.0   3rd Qu.: 846.5   3rd Qu.: 9.000   3rd Qu.: 850.0   
 Max.   :2762.0   Max.   :1344.0   Max.   :52.000   Max.   :1411.0   
                  NA's   :9        NA's   :9                         
  ARRIVAL_TIME    ARRIVAL_DELAY        DIVERTED   CANCELLED    
 Min.   : 259.0   Min.   :-36.000   Min.   :0   Min.   :0.000  
 1st Qu.: 628.5   1st Qu.:-14.000   1st Qu.:0   1st Qu.:0.000  
 Median : 740.0   Median : -6.000   Median :0   Median :0.000  
 Mean   : 762.5   Mean   :  1.764   Mean   :0   Mean   :0.045  
 3rd Qu.: 858.0   3rd Qu.:  6.500   3rd Qu.:0   3rd Qu.:0.000  
 Max.   :1357.0   Max.   :226.000   Max.   :0   Max.   :1.000  
 NA's   :9        NA's   :9                                    
 CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY  
  :191               Min.   : 0.00    Min.   :0      Min.   : 0.00  
 A:  3               1st Qu.: 0.00    1st Qu.:0      1st Qu.: 0.00  
 B:  6               Median : 9.50    Median :0      Median :15.00  
                     Mean   :10.54    Mean   :0      Mean   :23.21  
                     3rd Qu.:16.25    3rd Qu.:0      3rd Qu.:53.25  
                     Max.   :43.00    Max.   :0      Max.   :85.00  
                     NA's   :172      NA's   :172    NA's   :172    
 LATE_AIRCRAFT_DELAY WEATHER_DELAY   
 Min.   :0           Min.   :  0.00  
 1st Qu.:0           1st Qu.:  0.00  
 Median :0           Median :  0.00  
 Mean   :0           Mean   : 23.93  
 3rd Qu.:0           3rd Qu.:  0.00  
 Max.   :0           Max.   :213.00  
 NA's   :172         NA's   :172     

What we can learn from the Flights Dataset

We can see that the most popular Airlines in the US in 2015 where Southwest Airlines Co. with 1 261 855 fligths that year. The second one is American Airlines Inc. with 725 984 flights.

Unsurprinsing is the fact that Southwest Airlines is also the airline with the most delayed flights with 646 569. test

Here we can see the application with a map showing the airport in th US, a radio button that enables us to choose between airline and departure airport for the data visualised, and a few tabs

The data shown from the Flights Dataset

[1] "Total number of flights per Airport"
   Var1 Freq
1    AA   21
2    AS   11
3    B6   22
4    DL   16
5    EV   20
6    F9    4
7    HA    6
8    MQ    4
9    NK   12
10   OO   22
11   UA   40
12   US   14
13   WN    8
[1] "Mean time per airline"
   Group.1         x
1       AA        NA
2       AS        NA
3       B6 158.90909
4       DL 166.68750
5       EV  96.10000
6       F9 140.00000
7       HA  99.83333
8       MQ        NA
9       NK 165.00000
10      OO        NA
11      UA 188.70000
12      US 197.14286
13      WN 149.37500

What we can learn from the Trains Dataset

The train Dataset shows us how the train per year and departure station are evolving (departure or arrival late) and the causes.

The fact that we have the causes enables us to drive some hypothetic conclusion on how this can impact the trains

test

We have two list button from which we can choose either the year or the departure train station, and the following data. also have a list to choose the cause that we want to have the percentage related.