The Baregg Tunnel dataset includes the number of vehicles that passed through the tunnel each day from November 2003 to November 2005. Since the data is collected daily, it can be used to predict future traffic volumes. The goal is to apply different forecasting models and identify which method produces the most accurate predictions for future time periods.
Naïve Model v/s Linear Regression
Validation period: Jul 2005 – Nov 2005.
The data shows a clear pattern that repeats every week.
There is a slight upward trend over time.
There are no significant or extreme outliers present in the dataset.
Training: Nov 1, 2003 – Jun 30, 2005 Validation: Jul 1, 2005 – Nov 30, 2005
## # A tibble: 1 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Naive Test -12734. 16821. 13606. -12.5 13.1 2.78 1.97 0.376
## # A tibble: 1 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 LinReg Test -1603. 5869. 3900. -1.82 3.70 0.798 0.686 0.622
The naïve forecast remains constant over time.
Linear regression is able to capture the weekly fluctuations in the data.
The regression model follows the actual traffic values more closely.
## # A tibble: 2 × 5
## .model ME RMSE MAE MAPE
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 LinReg -1603. 5869. 3900. 3.70
## 2 Naive -12734. 16821. 13606. 13.1
## MASE Naïve: 1.444
## MASE Linear Regression: 0.414
A MASE value below 1 indicates that the model performs better than the naïve benchmark.
The Linear Regression model has a lower MASE than the naïve approach, showing it provides more accurate predictions.
Other error measures such as ME, RMSE, and MAE are also smaller for the regression model, confirming its better performance.
The residuals vary randomly over time, indicating they are mostly white noise.
The histogram is centered around zero, which suggests there is no systematic bias in the model.
The ACF plot shows very few spikes, meaning most of the autocorrelation has been captured by the model. ## Conclusion The Linear Regression model performs better than the Naïve method across all evaluation metrics.
Including both the trend and weekly seasonality helps improve the accuracy of the forecasts.
The residuals do not show any clear pattern, which indicates the forecasts are reliable.
Recommendation: The Linear Regression (TSLM) model should be used to forecast daily traffic at the Baregg Tunnel.
The traffic data demonstrates a clear weekly seasonal pattern along with a mild upward trend over time. By incorporating both the trend and weekly seasonality, the Linear Regression model generates more accurate predictions than the Naïve approach. As a result, this method is the most suitable choice for forecasting future traffic volumes.