The project encompasses the time series climate data analysis to make short-term prediction of maximum wind speed, potential wind direction around flight landing zones and forecast extreme precipitation at JFK airport.
library(xts) # for manipulating and analyzing time series data
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(tsbox)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(dplyr)
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:xts':
##
## first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(imputeTS)
##
## Attaching package: 'imputeTS'
## The following object is masked from 'package:zoo':
##
## na.locf
library(ggplot2)
The real time observational climate data in csv file is downloaded from National Oceanic and Atmospheric Administration (NOAA), https://www.ncei.noaa.gov/ at the JFK airport’s weather station. After eliminating unwanted columns, the nyc_weather dataset has 3997 observations and 13 variables.
The variables of particular interest for this analysis are the following.
WDF5: Wind Direction of fastest 5-sec (degrees) WSF5: Wind Speed of fastest 5-sec (kmph) PRCP: Precipitiation in (inches)
csv = read.csv("https://www.ncei.noaa.gov/orders/cdo/3544714.csv", na.strings = c("", "NA", "N/A"))
range(csv$DATE)
## [1] "2013-01-01" "2023-12-11"
# Delete empty data columns
csv_del <- csv[, -c(16:29)]
nyc_weather <- csv_del[, !names(csv_del) %in% c("PGTM", "TAVG")]
head(nyc_weather)
## STATION NAME DATE AWND PRCP SNOW SNWD
## 1 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-01 15.21 0 0 0
## 2 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-02 14.54 0 0 0
## 3 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-03 11.63 0 0 0
## 4 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-04 16.33 0 0 0
## 5 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-05 11.63 0 0 0
## 6 USW00094789 JFK INTERNATIONAL AIRPORT, NY US 2013-01-06 10.29 0 0 0
## TMAX TMIN WDF2 WDF5 WSF2 WSF5
## 1 41 27 320 320 29.1 36.9
## 2 35 23 330 310 28.0 33.1
## 3 33 25 260 320 17.9 23.0
## 4 39 30 250 260 25.1 30.0
## 5 45 32 320 310 25.1 30.0
## 6 45 33 290 290 17.0 23.0
One of the primary objectives of converting observational to time series data is to be able to forecast the values for that series at future times. In the project, ARIMA forecasting model is applied to visualize the trends of accurate of NYC weather data analysis at JFK airport.
# Creating an xts object from the selected columns of the 'nyc_weather' dataset
real_time_nyc_weather = xts(nyc_weather[,c("WDF5","WSF5","PRCP")], order.by=as.Date(nyc_weather$DATE))
# Ensuring regular time intervals in the time series data
real_time_nyc_weather = ts_regular(real_time_nyc_weather)
# Filling missing values by extending the last observation forward
real_time_nyc_weather = na.fill(real_time_nyc_weather, "extend")
## Warning in storage.mode(fill.) <- storage.mode(object): NAs introduced by
## coercion
# Trimming the time series data
real_time_nyc_weather = window(real_time_nyc_weather, start=as.Date("2013-01-01"), end=as.Date("2023-12-11"))
# Plotting Wind Direction and Wind Speed
plot(ts_ts(real_time_nyc_weather$WDF5), col="brown", bty="n", las=1, fg=NA,
ylim=c(20, 360), ylab="Wind Direction (deg) and Speed (mph)")
lines(ts_ts(real_time_nyc_weather$WSF5), col="purple")
grid(nx=NA, ny=NULL, lty=1, col="gray")
legend("topright", fill=c("brown", "purple"), cex=0.7,
legend=c("WDF5", "WSF5"), bg="white")
# Creating a bar plot for Daily Rainfall
barplot(real_time_nyc_weather$PRCP, border=NA, col="darkblue", ylim=c(0, 2),
space=0, bty="n", las=1, fg=NA, ylab="Daily Rainfall (inches)")
grid(nx=NA, ny=NULL, lty=1)
A diagram of WDT5 and WSF5 shows the annual cycle of wind speed and direction as well as gusty wind that spike above the general curve. A diagram of daily of rainfall shows no clear seasonal pattern, although the presence of a number of high precipitation days over a decade is noticeable.
A descriptive statistics and characteristics of a time series data set with three columns: ‘WDF5’, ‘WSF5’, and ‘PRCP’ for the specified time period (2013-01-01 to 2023-12-11), offering insights into the range, central tendency, and variability of the recorded wind speed, direction and precipitation measurements.
#Summary Descriptive Data
summary(real_time_nyc_weather)
## Index WDF5 WSF5 PRCP
## Min. :2013-01-01 Min. : 10.0 Min. : 8.1 Min. :0.0000
## 1st Qu.:2015-09-27 1st Qu.:150.0 1st Qu.:21.9 1st Qu.:0.0000
## Median :2018-06-22 Median :210.0 Median :25.9 Median :0.0000
## Mean :2018-06-22 Mean :213.6 Mean :27.8 Mean :0.1206
## 3rd Qu.:2021-03-17 3rd Qu.:310.0 3rd Qu.:33.1 3rd Qu.:0.0400
## Max. :2023-12-11 Max. :360.0 Max. :70.0 Max. :8.0500
## NA's :48 NA's :48 NA's :1
#Structure of Data
str(real_time_nyc_weather)
## An xts object on 2013-01-01 / 2023-12-11 containing:
## Data: double [3997, 3]
## Columns: WDF5, WSF5, PRCP
## Index: Date [3997] (TZ: "UTC")
Time series decomposition separates data set into three fundamental components that can be added together to create the original data: A seasonal component A long-term trend component and A random component
When using weather data over a past decade to anticipate what might happen in the future, we often wish to separates the seasonal and random components to see the long term trends of historical data.
# Performing interpolation to fill missing values
imputed_ts_nyc_weather <- na.interpolation(real_time_nyc_weather)
## Warning: na.interpolation will be replaced by na_interpolation.
## Functionality stays the same.
## The new function name better fits modern R code style guidelines.
## Please adjust your code accordingly.
#Analysis on Fastest Wind Direction
decomposition = stl(ts_ts(imputed_ts_nyc_weather$WDF5), s.window=365, t.window=7001)
plot(decomposition)
summary(decomposition$time.series[,"trend"])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 212.8 212.8 213.0 213.6 214.3 216.0
#Analysis on Fastest Wind Speed
decomposition = stl(ts_ts(imputed_ts_nyc_weather$WSF5), s.window=365, t.window=7001)
plot(decomposition)
summary(decomposition$time.series[,"trend"])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 27.61 27.73 27.78 27.76 27.81 27.82
#Analysis on precipitation
decomposition = stl(ts_ts(imputed_ts_nyc_weather$PRCP), s.window=365, t.window=7001)
plot(decomposition)
summary(decomposition$time.series[,"trend"])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1118 0.1169 0.1213 0.1211 0.1255 0.1293
The trend of wind direction indicates an increase of approximately 0.2 degrees over the time period 2013 - 2023. The trend of wind speeds illustrates a slightly increase from 2013 - 2018, then wind speed is getting decrease until the end of the year 2023. The trend of precipitation has a small daily rise nearly 0.01 inches over the past 10 years,
Although having daily weather observations is extremely useful for analysis, visualization of decomposition captures trends, but might be confusing to others with no experience with time series analysis. Therefore, the aggregation by mean value smooths out the daily randomness to make the overall cycles and trends clearer.
# Use aggregate() function
monthly_wind_speed <- aggregate(
imputed_ts_nyc_weather$WSF5,
as.yearmon,
FUN = mean
)
# Plot the aggregated series
plot(ts_ts(monthly_wind_speed), col = "darkred", ylim = c(10, 50),
lwd = 3, bty = "n", las = 1, fg = NA, ylab = "Average Wind Speed (kmph)")
grid(nx = NA, ny = NULL, lty = 1)
# Find Monthly maximum rainfall amount
monthly_max_rainfall <- aggregate(
imputed_ts_nyc_weather$PRCP,
as.yearmon,
FUN = max
)
# Plot the aggregated series
plot(ts_ts(monthly_max_rainfall), col = "darkred", ylim = c(0, 10),
lwd = 3, bty = "n", las = 1, fg = NA, ylab = "Monthly Max rainfall (inches)")
grid(nx = NA, ny = NULL, lty = 1)
# Use aggregate() function
monthly_wind_dir <- aggregate(
imputed_ts_nyc_weather$WDF5,
as.yearmon,
FUN = mean
)
# Plot the aggregated series of Wind Direction
plot(ts_ts(monthly_wind_dir), col = "darkred", ylim = c(1, 360),
lwd = 3, bty = "n", las = 1, fg = NA, ylab = "Average Wind Direction (degree)")
grid(nx = NA, ny = NULL, lty = 1)
A line plot of average monthly wind speed shows the trend of average wind speed (in kilometers per hour), maximum monthly rainfall (in inches), and average wind direction (in degrees) across different months between 2013 to 2024.
Auto regressive integrated moving average (ARIMA) model involves a more detailed analysis of the training data using lags and forecast errors.For highly detailed weather observations data, the model takes several minutes of computing time rather than seconds.
## Apply an ARIMA model to wind speed data
training.data = ts_ts(imputed_ts_nyc_weather$WSF5)
parameters = auto.arima(training.data)
print(parameters)
## Series: training.data
## ARIMA(4,0,2) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 mean
## 0.3576 0.8253 -0.2767 0.0698 -0.003 -0.9095 27.7443
## s.e. 0.0288 0.0348 0.0215 0.0170 0.024 0.0225 0.4569
##
## sigma^2 = 64.06: log likelihood = -13981.69
## AIC=27979.37 AICc=27979.41 BIC=28029.72
This detailed information about the fitted ARIMA model, including its orders p = 4 (four autoregressive terms), d = 0 (no differencing), and q = 2 (two moving average terms), coefficients Autoregressive coefficients for lag 1, 2, 3, and 4, standard error for each estimates, other statistical significance, information criteria used for model selection, Lower values of AIC, AICc and BIC indicate a better-fitting model and measures evaluating the model’s goodness of fit to selected data points.
# Build an ARIMA model, generate forecasts
arima.model = arima(training.data, order = c(5,0,1), seasonal = list(order=c(0,1,0), period=365))
arima.windspeed = forecast(arima.model, 1825)
# Plotpredicted mean monthly wind speed
plot(arima.windspeed, lwd=3, bty="n", las=1, fg=NA,
xlim=c(2013, 2024), ylab="Mean Monthly Wind Speed (kmph)")
grid(nx=NA, ny=NULL, lty=1)
The plot displays the forecasted values of the mean monthly wind speed over time.
## Apply an ARIMA model to nonseasonal data
training.data = ts_ts(imputed_ts_nyc_weather$PRCP)
parameters = auto.arima(training.data)
print(parameters)
## Series: training.data
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.0747 -0.047 0.1211
## s.e. 0.0159 0.016 0.0056
##
## sigma^2 = 0.1185: log likelihood = -1407.05
## AIC=2822.1 AICc=2822.11 BIC=2847.28
arima.model = arima(training.data, order = c(5,0,1), seasonal = list(order=c(0,1,0), period=365))
arima.windspeed = forecast(arima.model, 1825)
plot(arima.windspeed, lwd=3, bty="n", las=1, fg=NA,
xlim=c(2013, 2024), ylab="Monthly Precipatation (inches)")
grid(nx=NA, ny=NULL, lty=1)
ARIMA mode is bulit to daily precipitation data points for JFK airport that generates forecasts, and creates a visualization of the predicted monthly precipitation over a specified time period.
ARIMA model is a popular time series forecasting technique used to predict future points in a series based on its past values. Predicting weather accurately for next five years can offer valuable insights into potential weather conditions that make passengers a safer trip. However, the project is mainly analyzed on three variables: wind speed, wind direction and precipitation due to the limited data access. Combining ARIMA with other predictive models and incorporating additional weather-related factors, such as humidity, or pressure that could provide a more comprehensive forecast.