Author

Farzana

1 Section1: Dataset Overview and Context

1.1 Overview

The dataset tracks monthly total vehicle sales in the United States from 1976 to 2024. The data shows clear seasonal patterns with peaks typically occurring in March-June and troughs in January-February. Vehicle sales are influenced by economic conditions (GDP, interest rates, employment), consumer confidence, manufacturer incentives, and seasonal buying patterns. The series exhibits both cyclical behavior aligned with economic cycles and structural changes like the 2008 financial crisis and 2020 pandemic disruptions.

Forecasting vehicle sales presents moderate difficulty. While strong seasonality and economic relationships provide useful signals, the series is subject to unpredictable shocks (oil prices, supply chain disruptions) and changing consumer preferences. Long-term forecasting is complicated by industry transformations like the shift toward electric vehicles and evolving mobility trends.

2 Exploratory Data Analysis

2.1 Time Series Visualization

Show code
library(dplyr)
library(ggplot2)
library(tidyr)
library(kableExtra)
library(zoo)
library(forecast)
library(tseries)
library(lubridate)

vehicle_sales <- read.csv("total_vehicle_sales.csv")
vehicle_sales$date <- as.Date(vehicle_sales$date)

# time series object
vehicle_ts <- ts(vehicle_sales$vehicle_sales, start = c(1976, 1), frequency = 12)

# Basic time series plot
ggplot(vehicle_sales, aes(x = date, y = vehicle_sales)) +
  geom_line(color = "#2C3E50") +
  labs(title = "US Vehicle Sales Over Time",
       x = "Year",
       y = "Total Sales",
       caption = "Source: Your Data Source") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))

Show code
# Density plot
ggplot(vehicle_sales, aes(x = vehicle_sales)) +
  geom_density(fill = "#3498DB", alpha = 0.7) +
  labs(title = "Distribution of Vehicle Sales",
       x = "Sales Volume",
       y = "Density") +
  theme_minimal()

Show code
# Monthly boxplot
vehicle_sales$month <- format(vehicle_sales$date, "%b")
vehicle_sales$month <- factor(vehicle_sales$month, levels = month.abb)

ggplot(vehicle_sales, aes(x = month, y = vehicle_sales)) +
  geom_boxplot(fill = "#3498DB", alpha = 0.7) +
  labs(title = "Monthly Distribution of Vehicle Sales",
       x = "Month",
       y = "Sales Volume") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45))

Show code
# Calculate summary statistics
summary_stats <- data.frame(
  Metric = c("Number of Observations",
             "Mean",
             "Median",
             "Standard Deviation",
             "Minimum",
             "Maximum",
             "1st Quartile",
             "3rd Quartile"),
  Value = c(length(vehicle_ts),
            mean(vehicle_ts),
            median(vehicle_ts),
            sd(vehicle_ts),
            min(vehicle_ts),
            max(vehicle_ts),
            quantile(vehicle_ts, 0.25),
            quantile(vehicle_ts, 0.75))
)

# Create formatted table
kable(summary_stats, 
      caption = "Summary Statistics of Vehicle Sales",
      format = "html",
      digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Summary Statistics of Vehicle Sales
Metric Value
Number of Observations 587.00
Mean 1263.82
Median 1271.20
Standard Deviation 220.79
Minimum 670.47
Maximum 1845.71
1st Quartile 1119.52
3rd Quartile 1421.77
  • Central Tendency: The similarity between the mean and median suggests the data has a symmetric distribution without significant skewness. With a central tendency in the range of 1263–1271, this points to a consistent and stable sales volume over the years.
  • Variability: A standard deviation of 220.79 indicates a moderate variation in sales values, with the majority of months staying within one standard deviation above or below the mean.
  • Range: The minimum value of 670.47 reflects lower sales during challenging economic times or off-peak seasons, while the maximum value of 1845.71 represents sales at their peak during high-demand seasons or favorable economic conditions.
  • Quartiles: “The interquartile range (IQR = 1421.77 - 1119.52 = 302.25) reflects moderate variability in the middle 50% of the sales data. This range captures the typical month-to-month fluctuations in vehicle sales, excluding any extreme outliers or unusual events.
  • Outliers: Sales values outside the minimum and maximum are likely tied to major economic events, such as financial crises or periods of economic boom.
  • Seasonality: The close alignment of the quartiles, mean, and median further reinforces the existence of cyclical and seasonal patterns in the data.
  • Challenges in Forecasting: The variability indicated by the standard deviation, along with occasional extremes in the data, makes forecasting somewhat challenging, particularly when factoring in external shocks.

3 Section 3: Time Series Components Analysis

3.1 Moving Average Analysis

Show code
# Calculate 12-month moving average
ma_12 <- rollmean(vehicle_ts, k = 12, align = "center")

# Create data frame for plotting
ma_df <- data.frame(
  date = vehicle_sales$date[6:(length(vehicle_ts)-6)],
  original = vehicle_ts[6:(length(vehicle_ts)-6)],
  ma = ma_12
)

# Plot original series with moving average
ggplot(ma_df, aes(x = date)) +
  geom_line(aes(y = original, color = "Original"), alpha = 0.7) +
  geom_line(aes(y = ma, color = "12-Month Moving Average"), size = 1) +
  scale_color_manual(values = c("Original" = "#2C3E50", 
                               "12-Month Moving Average" = "#E74C3C")) +
  labs(title = "Vehicle Sales with 12-Month Moving Average",
       x = "Year",
       y = "Sales Volume",
       color = "Series") +
  theme_minimal()

Show code
# Calculate and plot remainder series
ma_df$remainder <- ma_df$original - ma_df$ma

ggplot(ma_df, aes(x = date, y = remainder)) +
  geom_line(color = "#2C3E50") +
  labs(title = "Remainder Series (Original - Moving Average)",
       x = "Year",
       y = "Remainder") +
  theme_minimal()

3.2 Seasonality Analysis

Show code
# Decompose time series
decomp <- decompose(vehicle_ts, type = "multiplicative")

# Plot decomposition
autoplot(decomp) +
  theme_minimal() +
  labs(title = "Multiplicative Time Series Decomposition")

Show code
# Seasonal plot
ggseasonplot(vehicle_ts, 
             year.labels = TRUE, 
             year.labels.left = TRUE) +
  theme_minimal() +
  labs(title = "Seasonal Plot of Vehicle Sales",
       x = "Month",
       y = "Sales Volume")

3.2.1 Observations

  • Cyclicality: Both the moving average and the remainder series clearly emphasize strong cyclical patterns.
  • Seasonality: Patterns in the moving average indicate seasonality, though further decomposition or spectral analysis is needed to confirm its intensity.
  • Forecasting Challenges: The remainder series suggests that unexpected short-term events, such as shocks or anomalies, could pose challenges for accurate forecasting.

4 Section 4: Naive Forecasting

4.1 Forecasting Results

Show code
# Create seasonal naive forecast
forecast_length <- 6
snaive_forecast <- snaive(vehicle_ts, h = forecast_length)

# Plot forecast
autoplot(snaive_forecast) +
  theme_minimal() +
  labs(title = "6-Period Seasonal Naive Forecast",
       x = "Year",
       y = "Sales Volume") +
  guides(colour = guide_legend(title = "Series"))

Show code
# Calculate accuracy metrics
accuracy_metrics <- accuracy(snaive_forecast)
kable(accuracy_metrics, 
      caption = "Forecast Accuracy Metrics",
      format = "html",
      digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Forecast Accuracy Metrics
ME RMSE MAE MPE MAPE MASE ACF1
Training set 5.285 157.667 114.406 -0.498 9.701 1 0.625
Show code
# Compare with simple naive forecast
naive_forecast <- naive(vehicle_ts, h = forecast_length)
naive_accuracy <- accuracy(naive_forecast)

# Compare both approaches
forecast_comparison <- autoplot(vehicle_ts) +
  autolayer(naive_forecast, series = "Naive", PI = FALSE) +
  autolayer(snaive_forecast, series = "Seasonal Naive", PI = FALSE) +
  theme_minimal() +
  labs(title = "Comparison of Naive and Seasonal Naive Forecasts",
       x = "Year",
       y = "Sales Volume")

print(forecast_comparison)

4.1.1 Analysis of Forecasting Results

  1. 6-Period Seasonal Naive Forecast
    • This plot displays the predicted sales for the next six periods using a seasonal naïve model.
    • The model accurately captures recurring seasonal trends by relying on the most recent seasonal values for its predictions.
    • The forecast closely aligns with the observed seasonal patterns, especially during months of high and low demand, showcasing the model’s effectiveness for this dataset’s strong seasonality..
  2. Comparison of Naive and Seasonal Naive Forecasts
    • The naive forecast assumes the most recent data point remains constant, disregarding any seasonal effects.
    • In contrast, the seasonal naïve forecast accounts for recurring seasonal patterns, producing predictions that more accurately reflect observed trends.
    • This comparison underscores the seasonal naïve forecast’s strength in delivering more realistic short-term predictions, especially for datasets with strong