Introduction

The project encompasses the time series climate data analysis to make short-term prediction of maximum wind speed, potential wind direction around flight landing zones and forecast extreme precipitation at JFK airport.

library(xts) #  for manipulating and analyzing time series data
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(tsbox)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(dplyr)
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:xts':
## 
##     first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(imputeTS)
## 
## Attaching package: 'imputeTS'
## The following object is masked from 'package:zoo':
## 
##     na.locf
library(ggplot2)

Load the Data

The real time observational climate data in csv file is downloaded from National Oceanic and Atmospheric Administration (NOAA), https://www.ncei.noaa.gov/ at the JFK airport’s weather station. After eliminating unwanted columns, the nyc_weather dataset has 3997 observations and 13 variables.

The variables of particular interest for this analysis are the following.

WDF5: Wind Direction of fastest 5-sec (degrees) WSF5: Wind Speed of fastest 5-sec (kmph) PRCP: Precipitiation in (inches)

csv = read.csv("https://www.ncei.noaa.gov/orders/cdo/3544714.csv", na.strings = c("", "NA", "N/A")) 

range(csv$DATE)
## [1] "2013-01-01" "2023-12-11"
# Delete empty data columns
csv_del <- csv[, -c(16:29)]
nyc_weather <- csv_del[, !names(csv_del) %in% c("PGTM", "TAVG")]
head(nyc_weather)
##       STATION                             NAME       DATE  AWND PRCP SNOW SNWD
## 1 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-01 15.21    0    0    0
## 2 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-02 14.54    0    0    0
## 3 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-03 11.63    0    0    0
## 4 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-04 16.33    0    0    0
## 5 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-05 11.63    0    0    0
## 6 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-06 10.29    0    0    0
##   TMAX TMIN WDF2 WDF5 WSF2 WSF5
## 1   41   27  320  320 29.1 36.9
## 2   35   23  330  310 28.0 33.1
## 3   33   25  260  320 17.9 23.0
## 4   39   30  250  260 25.1 30.0
## 5   45   32  320  310 25.1 30.0
## 6   45   33  290  290 17.0 23.0

Converting Time-Series Data

One of the primary objectives of converting observational to time series data is to be able to forecast the values for that series at future times. In the project, ARIMA forecasting model is applied to visualize the trends of accurate of NYC weather data analysis at JFK airport.

# Creating an xts object from the selected columns of the 'nyc_weather' dataset
real_time_nyc_weather = xts(nyc_weather[,c("WDF5","WSF5","PRCP")], order.by=as.Date(nyc_weather$DATE))
# Ensuring regular time intervals in the time series data
real_time_nyc_weather = ts_regular(real_time_nyc_weather)
# Filling missing values by extending the last observation forward
real_time_nyc_weather = na.fill(real_time_nyc_weather, "extend")
## Warning in storage.mode(fill.) <- storage.mode(object): NAs introduced by
## coercion
# Trimming the time series data
real_time_nyc_weather = window(real_time_nyc_weather, start=as.Date("2013-01-01"), end=as.Date("2023-12-11"))

Visualization of Real_Time_NYC_Weather_Data

# Plotting Wind Direction and Wind Speed
plot(ts_ts(real_time_nyc_weather$WDF5), col="brown", bty="n", las=1, fg=NA, 
    ylim=c(20, 360), ylab="Wind Direction (deg) and Speed (mph)")
lines(ts_ts(real_time_nyc_weather$WSF5), col="purple")
grid(nx=NA, ny=NULL, lty=1, col="gray")
legend("topright", fill=c("brown", "purple"), cex=0.7,
    legend=c("WDF5", "WSF5"), bg="white")

# Creating a bar plot for Daily Rainfall
barplot(real_time_nyc_weather$PRCP, border=NA, col="darkblue", ylim=c(0, 2),
    space=0, bty="n", las=1, fg=NA, ylab="Daily Rainfall (inches)")
grid(nx=NA, ny=NULL, lty=1)

A diagram of WDT5 and WSF5 shows the annual cycle of wind speed and direction as well as gusty wind that spike above the general curve. A diagram of daily of rainfall shows no clear seasonal pattern, although the presence of a number of high precipitation days over a decade is noticeable.

Analyze Data

A descriptive statistics and characteristics of a time series data set with three columns: ‘WDF5’, ‘WSF5’, and ‘PRCP’ for the specified time period (2013-01-01 to 2023-12-11), offering insights into the range, central tendency, and variability of the recorded wind speed, direction and precipitation measurements.

#Summary Descriptive Data 
summary(real_time_nyc_weather)
##      Index                 WDF5            WSF5           PRCP       
##  Min.   :2013-01-01   Min.   : 10.0   Min.   : 8.1   Min.   :0.0000  
##  1st Qu.:2015-09-27   1st Qu.:150.0   1st Qu.:21.9   1st Qu.:0.0000  
##  Median :2018-06-22   Median :210.0   Median :25.9   Median :0.0000  
##  Mean   :2018-06-22   Mean   :213.6   Mean   :27.8   Mean   :0.1206  
##  3rd Qu.:2021-03-17   3rd Qu.:310.0   3rd Qu.:33.1   3rd Qu.:0.0400  
##  Max.   :2023-12-11   Max.   :360.0   Max.   :70.0   Max.   :8.0500  
##                       NA's   :48      NA's   :48     NA's   :1
#Structure of Data 
str(real_time_nyc_weather)
## An xts object on 2013-01-01 / 2023-12-11 containing: 
##   Data:    double [3997, 3]
##   Columns: WDF5, WSF5, PRCP
##   Index:   Date [3997] (TZ: "UTC")

Time Series Decomposition

Time series decomposition separates data set into three fundamental components that can be added together to create the original data: A seasonal component A long-term trend component and A random component

When using weather data over a past decade to anticipate what might happen in the future, we often wish to separates the seasonal and random components to see the long term trends of historical data.

# Performing interpolation to fill missing values 
imputed_ts_nyc_weather <- na.interpolation(real_time_nyc_weather)
## Warning: na.interpolation will be replaced by na_interpolation.
##            Functionality stays the same.
##            The new function name better fits modern R code style guidelines.
##            Please adjust your code accordingly.
#Analysis on Fastest Wind Direction
decomposition = stl(ts_ts(imputed_ts_nyc_weather$WDF5), s.window=365, t.window=7001)
plot(decomposition)

summary(decomposition$time.series[,"trend"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   212.8   212.8   213.0   213.6   214.3   216.0
#Analysis on Fastest Wind Speed
decomposition = stl(ts_ts(imputed_ts_nyc_weather$WSF5), s.window=365, t.window=7001)
plot(decomposition)

summary(decomposition$time.series[,"trend"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   27.61   27.73   27.78   27.76   27.81   27.82
#Analysis on precipitation
decomposition = stl(ts_ts(imputed_ts_nyc_weather$PRCP), s.window=365, t.window=7001)
plot(decomposition)

summary(decomposition$time.series[,"trend"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1118  0.1169  0.1213  0.1211  0.1255  0.1293

The trend of wind direction indicates an increase of approximately 0.2 degrees over the time period 2013 - 2023. The trend of wind speeds illustrates a slightly increase from 2013 - 2018, then wind speed is getting decrease until the end of the year 2023. The trend of precipitation has a small daily rise nearly 0.01 inches over the past 10 years,

Find average monthly wind speed, wind direction and rainfall at JFK Airport

Although having daily weather observations is extremely useful for analysis, visualization of decomposition captures trends, but might be confusing to others with no experience with time series analysis. Therefore, the aggregation by mean value smooths out the daily randomness to make the overall cycles and trends clearer.

# Use aggregate() function
monthly_wind_speed <- aggregate(
  imputed_ts_nyc_weather$WSF5,
  as.yearmon,
  FUN = mean
)

# Plot the aggregated series
plot(ts_ts(monthly_wind_speed), col = "darkred", ylim = c(10, 50), 
     lwd = 3, bty = "n", las = 1, fg = NA, ylab = "Average Wind Speed (kmph)")
grid(nx = NA, ny = NULL, lty = 1)

# Find Monthly maximum rainfall amount
monthly_max_rainfall <- aggregate(
  imputed_ts_nyc_weather$PRCP,
  as.yearmon,
  FUN = max
)

# Plot the aggregated series
plot(ts_ts(monthly_max_rainfall), col = "darkred", ylim = c(0, 10), 
     lwd = 3, bty = "n", las = 1, fg = NA, ylab = "Monthly Max rainfall (inches)")
grid(nx = NA, ny = NULL, lty = 1)

# Use aggregate() function
monthly_wind_dir <- aggregate(
  imputed_ts_nyc_weather$WDF5,
  as.yearmon,
  FUN = mean
)

# Plot the aggregated series of Wind Direction
plot(ts_ts(monthly_wind_dir), col = "darkred", ylim = c(1, 360), 
     lwd = 3, bty = "n", las = 1, fg = NA, ylab = "Average Wind Direction (degree)")
grid(nx = NA, ny = NULL, lty = 1)

A line plot of average monthly wind speed shows the trend of average wind speed (in kilometers per hour), maximum monthly rainfall (in inches), and average wind direction (in degrees) across different months between 2013 to 2024.

Time Series Analysis: ARIMA Forecasts

Auto regressive integrated moving average (ARIMA) model involves a more detailed analysis of the training data using lags and forecast errors.For highly detailed weather observations data, the model takes several minutes of computing time rather than seconds.

## Apply an ARIMA model to wind speed data

training.data = ts_ts(imputed_ts_nyc_weather$WSF5)
parameters = auto.arima(training.data)
print(parameters)
## Series: training.data 
## ARIMA(4,0,2) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2      ar3     ar4     ma1      ma2     mean
##       0.3576  0.8253  -0.2767  0.0698  -0.003  -0.9095  27.7443
## s.e.  0.0288  0.0348   0.0215  0.0170   0.024   0.0225   0.4569
## 
## sigma^2 = 64.06:  log likelihood = -13981.69
## AIC=27979.37   AICc=27979.41   BIC=28029.72

This detailed information about the fitted ARIMA model, including its orders p = 4 (four autoregressive terms), d = 0 (no differencing), and q = 2 (two moving average terms), coefficients Autoregressive coefficients for lag 1, 2, 3, and 4, standard error for each estimates, other statistical significance, information criteria used for model selection, Lower values of AIC, AICc and BIC indicate a better-fitting model and measures evaluating the model’s goodness of fit to selected data points.

Genearate a Forecast for 1825 days (approximately 5 years)

# Build an ARIMA model, generate forecasts
arima.model = arima(training.data, order = c(5,0,1), seasonal = list(order=c(0,1,0), period=365))
arima.windspeed = forecast(arima.model, 1825)

# Plotpredicted mean monthly wind speed
plot(arima.windspeed, lwd=3, bty="n", las=1, fg=NA, 
    xlim=c(2013, 2024), ylab="Mean Monthly Wind Speed (kmph)")
grid(nx=NA, ny=NULL, lty=1)

The plot displays the forecasted values of the mean monthly wind speed over time.

Forecasting extreme Precipation with ARIMA model

## Apply an ARIMA model to nonseasonal data
training.data = ts_ts(imputed_ts_nyc_weather$PRCP)
parameters = auto.arima(training.data)
print(parameters)
## Series: training.data 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2    mean
##       0.0747  -0.047  0.1211
## s.e.  0.0159   0.016  0.0056
## 
## sigma^2 = 0.1185:  log likelihood = -1407.05
## AIC=2822.1   AICc=2822.11   BIC=2847.28
arima.model = arima(training.data, order = c(5,0,1), seasonal = list(order=c(0,1,0), period=365))
arima.windspeed = forecast(arima.model, 1825)

plot(arima.windspeed, lwd=3, bty="n", las=1, fg=NA, 
    xlim=c(2013, 2024), ylab="Monthly Precipatation (inches)")
grid(nx=NA, ny=NULL, lty=1)

ARIMA mode is bulit to daily precipitation data points for JFK airport that generates forecasts, and creates a visualization of the predicted monthly precipitation over a specified time period.

Conclusions

ARIMA model is a popular time series forecasting technique used to predict future points in a series based on its past values. Predicting weather accurately for next five years can offer valuable insights into potential weather conditions that make passengers a safer trip. However, the project is mainly analyzed on three variables: wind speed, wind direction and precipitation due to the limited data access. Combining ARIMA with other predictive models and incorporating additional weather-related factors, such as humidity, or pressure that could provide a more comprehensive forecast.