The R Notebook published at RPubs website

The R Notebook requires below packages to be installed.

install.packages('ggplot2')
install.packages('plotly')
install.packages('dplyr')
install.packages("magrittr")
devtools::install_github('hadley/ggplot2')

Objective:

This notebook focuses on transponder flight data analysis of Santa Monica VOR(SMOVOR) which locates on the southwest edge of Santa Monica Airport (SMO).

Dataset Scope:

The scope of the analysis is to see if there are any quantifiable differences between before/after Columbus day (Monday, October 12th, 2015) - an oft cited milestone for recent changes in noise.

Two dates are selected in this analysis for comparisons: March 14, 2015 and December 12, 2015.

Data Wrangling:

1. There are total four data files in this analysis.

  1. day1_night: 00:00AM ~ 06:30AM on March 14, 2015
  2. day1_day: 06:30AM on March 14, 2015 ~ 00:00AM on March 15, 2015
  3. day2_night: 00:00AM ~ 06:30AM on December 12, 2015
  4. day2_day: 06:30AM on December 12, 2015 ~ 00:00AM on December 13, 2015

2. Data cleaning process:

  1. Load the data
  2. Select the data columns we want
  3. Transform date and time values to timestamp
  4. Remove unuse data of columns by null
  5. Set up the name to columns
  6. Get valid altitude (alt), and latitude (lat)
  7. Strip white spaces from flight name and fill in empty flight name by unique flight name associated using the flight code
  8. Remove private plan from dataset by flight code = a60, d60, c60, c50, d20 with null flight name.
  9. Remove duplicate row data.
  10. Get the first data for same position (alt, lon, lat), timestamp, and flight
  11. Add track variable on same flight on same date if the flight stay more than 1 hour, remove less than 6 records
  12. Re-order column names
  13. keep only altitude between 3000 and 11000 feet
  14. Filter flights that get within 2Km from SMOVOR
  15. Remove any row with NA
  16. Filter correct time period and save to csv format for each data file.

3. Final dataset:

df1_day <- read.csv("RTL150314_day.csv")
str(df1_day)
'data.frame':   1984 obs. of  9 variables:
 $ timestamp: Factor w/ 1977 levels "2015-03-14 07:32:53",..: 453 452 451 450 449 448 447 446 445 444 ...
 $ flight   : Factor w/ 41 levels "3527","464","AAL1150",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ code     : Factor w/ 40 levels "71BE09","71BF00",..: 36 36 36 36 36 36 36 36 36 36 ...
 $ track    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ id       : Factor w/ 41 levels "71BE09KAL2130",..: 36 36 36 36 36 36 36 36 36 36 ...
 $ lon      : num  -118 -118 -118 -118 -118 ...
 $ lat      : num  34 34 34 34 34 ...
 $ alt      : int  3625 4500 5475 5475 5600 5625 5775 6000 6000 6150 ...
 $ dist     : num  13100 13100 12795 11792 11792 ...
df1_day$dt <- as.POSIXct(df1_day$timestamp, tz="America/Los_Angeles")
df1_night <- read.csv("RTL150314_night.csv")
str(df1_night)
'data.frame':   85 obs. of  9 variables:
 $ timestamp: Factor w/ 85 levels "2015-03-14 01:44:12",..: 85 84 83 82 81 80 79 78 77 76 ...
 $ flight   : Factor w/ 2 levels "FDX1508","TWY878": 1 1 1 1 1 1 1 1 1 1 ...
 $ code     : Factor w/ 2 levels "A76535","AC145B": 1 1 1 1 1 1 1 1 1 1 ...
 $ track    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ id       : Factor w/ 2 levels "A76535FDX15080",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ lon      : num  -118 -118 -118 -118 -118 ...
 $ lat      : num  34 34 34 34 34 ...
 $ alt      : int  7075 7400 7425 7850 7825 8000 8025 8100 8225 8300 ...
 $ dist     : num  1298 1298 1298 1298 9646 ...
df1_night$dt <- as.POSIXct(df1_night$timestamp, tz="America/Los_Angeles")
df2_day <- read.csv("RTL151212_day.csv")
str(df2_day)
'data.frame':   24054 obs. of  9 variables:
 $ timestamp: Factor w/ 20371 levels "2015-12-12 06:31:03",..: 6444 6443 6442 6441 6440 6439 6438 6437 6436 6435 ...
 $ flight   : Factor w/ 131 levels "AAL1143","AAL155",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ code     : Factor w/ 123 levels "3A2DD5","3C6517",..: 106 106 106 106 106 106 106 106 106 106 ...
 $ track    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ id       : Factor w/ 131 levels "3A2DD5THT70",..: 113 113 113 113 113 113 113 113 113 113 ...
 $ lon      : num  -118 -118 -118 -118 -118 ...
 $ lat      : num  34 34 34 34 34 ...
 $ alt      : int  4225 4875 5350 5500 5550 5625 6050 6075 6225 6300 ...
 $ dist     : num  8048 8048 7691 7164 7164 ...
df2_day$dt <- as.POSIXct(df2_day$timestamp, tz="America/Los_Angeles")
df2_night <- read.csv("RTL151212_night.csv")
str(df2_night)
'data.frame':   4998 obs. of  9 variables:
 $ timestamp: Factor w/ 4248 levels "2015-12-12 01:14:40",..: 1878 1877 1875 1873 1871 1868 1864 1857 1854 1852 ...
 $ flight   : Factor w/ 24 levels "1735","AAL14",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ code     : Factor w/ 24 levels "424970","78023E",..: 16 16 16 16 16 16 16 16 16 16 ...
 $ track    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ id       : Factor w/ 24 levels "424970ABW5970",..: 16 16 16 16 16 16 16 16 16 16 ...
 $ lon      : num  -118 -118 -118 -118 -118 ...
 $ lat      : num  34 34 34 34 34 ...
 $ alt      : int  4675 5050 5150 5175 5275 5375 5475 5650 6000 6000 ...
 $ dist     : num  7690 7690 7425 7086 6431 ...
df2_night$dt <- as.POSIXct(df2_night$timestamp, tz="America/Los_Angeles")

Comparisons:

2. Differences in pathes (latitude, longitude and altitude) over time

library('ggmap')
library('ggplot2')
library('plotly')
SMO <- data.frame(label = "SMO", lon=-118.456667, lat=34.010167)
smo <- c(SMO$lon, SMO$lat)
title <- "00:00AM ~ 06:30AM on March, 14, 2015"
map.google <- get_map(location = smo, zoom = 10)
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=34.010167,-118.456667&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
p <- ggmap(map.google) +
  geom_point(data = SMO, aes(x=lon, y=lat), color="red", size=5, alpha=.5) +
  geom_path(data = df1_night, aes(x=lon, y=lat, color=alt), alpha=.5) +
  scale_colour_gradient(limits=c(3000, 11000), low="orange", high="blue" ) +
  ggtitle(paste(title))
p

Figure 2-1. Flight pathes in the nighttime on Mar. 14, 2015

library('ggmap')
library('ggplot2')
library('plotly')
SMO <- data.frame(label = "SMO", lon=-118.456667, lat=34.010167)
smo <- c(SMO$lon, SMO$lat)
title <- "06:30AM on Mar. 14, 2015 ~ 00:00AM on Mar. 15, 2015"
map.google <- get_map(location = smo, zoom = 10)
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=34.010167,-118.456667&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
p <- ggmap(map.google) +
  geom_point(data = SMO, aes(x=lon, y=lat), color="red", size=5, alpha=.5) +
  geom_path(data = df1_day, aes(x=lon, y=lat, color=alt), alpha=.5) +
  scale_colour_gradient(limits=c(3000, 11000), low="red", high="green" ) +
  ggtitle(paste(title))
p

Figure 2-2. Flight pathes in the daytime on Mar. 14, 2015

library('ggmap')
library('ggplot2')
library('plotly')
SMO <- data.frame(label = "SMO", lon=-118.456667, lat=34.010167)
smo <- c(SMO$lon, SMO$lat)
title <- "00:00AM ~ 06:30AM on Dec. 12, 2015"
map.google <- get_map(location = smo, zoom = 10)
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=34.010167,-118.456667&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
p <- ggmap(map.google) +
  geom_point(data = SMO, aes(x=lon, y=lat), color="red", size=5, alpha=.5) +
  geom_path(data = df2_night, aes(x=lon, y=lat, color=alt), alpha=.5) +
  scale_colour_gradient(limits=c(3000, 11000), low="orange", high="blue" ) +
  ggtitle(paste(title))
p

Figure 2-3. Flight pathes in the nighttime on Dec. 12, 2015

library('ggmap')
library('ggplot2')
library('plotly')
SMO <- data.frame(label = "SMO", lon=-118.456667, lat=34.010167)
smo <- c(SMO$lon, SMO$lat)
title <- "06:30AM on Dec. 12, 2015 ~ 00:00AM on Dec. 13, 2015"
map.google <- get_map(location = smo, zoom = 10)
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=34.010167,-118.456667&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
p <- ggmap(map.google) +
  geom_point(data = SMO, aes(x=lon, y=lat), color="red", size=5, alpha=.5) +
  geom_path(data = df2_day, aes(x=lon, y=lat, color=alt), alpha=.5) +
  scale_colour_gradient(limits=c(3000, 11000), low="red", high="green" ) +
  ggtitle(paste(title))
p

Figure 2-4. Flight pathes in the daytime on Dec. 12, 2015

3. Differences in descent rate (altitude vs time)

library('ggmap')
library('ggplot2')
library('plotly')
title <- "00:00AM ~ 06:30AM on March, 14, 2015"
p <- ggplot(df1_night, aes(x = dt, y = alt, colour = factor(substring(flight,1,3)))) +
  geom_point(alpha = 1/3, aes(text = paste("Airline:", flight, "<br>CODE:", code))) +
  labs(x="Date Time", y="Altitude - Miles", colour = "Airlines" ) +
  ggtitle(paste(title)) 
Ignoring unknown aesthetics: text
ggplotly()

Figure 3-1. Descent rate in nighttime on Mar. 14, 2015

library('ggmap')
library('ggplot2')
library('plotly')
title <- "06:30AM on Mar. 14, 2015 ~ 00:00AM on Mar. 15, 2015"
p <- ggplot(df1_day, aes(x = dt, y = alt,  colour = factor(substring(flight,1,3)))) +
  geom_point(alpha = 1/3) +
  labs(x="Date Time", y="Altitude - Miles", colour = "Airlines" ) +
  ggtitle(paste(title)) 
ggplotly()

Figure 3-2. Descent rate in daytime on Mar. 14, 2015

library('ggmap')
library('ggplot2')
library('plotly')
title <- "00:00AM ~ 06:30AM on March, 14, 2015"
p <- ggplot(df2_night, aes(x = dt, y = alt, colour = factor(substring(flight,1,3)))) +
  geom_point(alpha = 1/3) +
  labs(x="Date Time", y="Altitude - Miles", colour = "Airlines" ) +
  ggtitle(paste(title)) 
ggplotly()

Figure 3-3. Descent rate in nighttime on Dec. 12, 2015

library('ggmap')
library('ggplot2')
library('plotly')
title <- "06:30AM on Mar. 14, 2015 ~ 00:00AM on Mar. 15, 2015"
p <- ggplot(df2_day, aes(x = dt, y = alt, color = factor(substring(flight,1,3)))) +
  geom_point(alpha = 1/3) +
  labs(x="Date Time", y="Altitude - Miles", colour = "Airlines" ) +
  ggtitle(paste(title)) 
ggplotly()

Figure 3-4. Descent rate in daytime on Dec. 12, 2015

Conclusion:

The flight patterns significantly changed between Mar. 14, 2015 and Dec. 12, 2015. The total numbers of flights, airlines companies, density of pathes are all increased around SMOVOR according to above figures. Total flights increase from 42 to 152 which is 362% growth rate compare with these two dates.

Reference & Tools:

  1. Data Source: Airplane transponder data provided in INF554 class at University of Southern California.
  2. Data Wrangling: Source code provided by instructor, Dr. Luciano Nocera, in INF554 class at University of Southern California.
  3. Tools: R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
