The dataset tracks monthly total vehicle sales in the United States from 1976 to 2024. The data shows clear seasonal patterns with peaks typically occurring in March-June and troughs in January-February. Vehicle sales are influenced by economic conditions (GDP, interest rates, employment), consumer confidence, manufacturer incentives, and seasonal buying patterns. The series exhibits both cyclical behavior aligned with economic cycles and structural changes like the 2008 financial crisis and 2020 pandemic disruptions.
Forecasting vehicle sales presents moderate difficulty. While strong seasonality and economic relationships provide useful signals, the series is subject to unpredictable shocks (oil prices, supply chain disruptions) and changing consumer preferences. Long-term forecasting is complicated by industry transformations like the shift toward electric vehicles and evolving mobility trends.
library(dplyr)library(ggplot2)library(tidyr)library(kableExtra)library(zoo)library(forecast)library(tseries)library(lubridate)vehicle_sales <-read.csv("total_vehicle_sales.csv")vehicle_sales$date <-as.Date(vehicle_sales$date)# time series objectvehicle_ts <-ts(vehicle_sales$vehicle_sales, start =c(1976, 1), frequency =12)# Basic time series plotggplot(vehicle_sales, aes(x = date, y = vehicle_sales)) +geom_line(color ="#2C3E50") +labs(title ="US Vehicle Sales Over Time",x ="Year",y ="Total Sales",caption ="Source: Your Data Source") +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14, face ="bold"))
Show code
# Density plotggplot(vehicle_sales, aes(x = vehicle_sales)) +geom_density(fill ="#3498DB", alpha =0.7) +labs(title ="Distribution of Vehicle Sales",x ="Sales Volume",y ="Density") +theme_minimal()
Central Tendency: The similarity between the mean and median suggests the data has a symmetric distribution without significant skewness. With a central tendency in the range of 1263–1271, this points to a consistent and stable sales volume over the years.
Variability: A standard deviation of 220.79 indicates a moderate variation in sales values, with the majority of months staying within one standard deviation above or below the mean.
Range: The minimum value of 670.47 reflects lower sales during challenging economic times or off-peak seasons, while the maximum value of 1845.71 represents sales at their peak during high-demand seasons or favorable economic conditions.
Quartiles: “The interquartile range (IQR = 1421.77 - 1119.52 = 302.25) reflects moderate variability in the middle 50% of the sales data. This range captures the typical month-to-month fluctuations in vehicle sales, excluding any extreme outliers or unusual events.
Outliers: Sales values outside the minimum and maximum are likely tied to major economic events, such as financial crises or periods of economic boom.
Seasonality: The close alignment of the quartiles, mean, and median further reinforces the existence of cyclical and seasonal patterns in the data.
Challenges in Forecasting: The variability indicated by the standard deviation, along with occasional extremes in the data, makes forecasting somewhat challenging, particularly when factoring in external shocks.
3 Section 3: Time Series Components Analysis
3.1 Moving Average Analysis
Show code
# Calculate 12-month moving averagema_12 <-rollmean(vehicle_ts, k =12, align ="center")# Create data frame for plottingma_df <-data.frame(date = vehicle_sales$date[6:(length(vehicle_ts)-6)],original = vehicle_ts[6:(length(vehicle_ts)-6)],ma = ma_12)# Plot original series with moving averageggplot(ma_df, aes(x = date)) +geom_line(aes(y = original, color ="Original"), alpha =0.7) +geom_line(aes(y = ma, color ="12-Month Moving Average"), size =1) +scale_color_manual(values =c("Original"="#2C3E50", "12-Month Moving Average"="#E74C3C")) +labs(title ="Vehicle Sales with 12-Month Moving Average",x ="Year",y ="Sales Volume",color ="Series") +theme_minimal()
Show code
# Calculate and plot remainder seriesma_df$remainder <- ma_df$original - ma_df$maggplot(ma_df, aes(x = date, y = remainder)) +geom_line(color ="#2C3E50") +labs(title ="Remainder Series (Original - Moving Average)",x ="Year",y ="Remainder") +theme_minimal()
3.2 Seasonality Analysis
Show code
# Decompose time seriesdecomp <-decompose(vehicle_ts, type ="multiplicative")# Plot decompositionautoplot(decomp) +theme_minimal() +labs(title ="Multiplicative Time Series Decomposition")
Cyclicality: Both the moving average and the remainder series clearly emphasize strong cyclical patterns.
Seasonality: Patterns in the moving average indicate seasonality, though further decomposition or spectral analysis is needed to confirm its intensity.
Forecasting Challenges: The remainder series suggests that unexpected short-term events, such as shocks or anomalies, could pose challenges for accurate forecasting.
# Compare with simple naive forecastnaive_forecast <-naive(vehicle_ts, h = forecast_length)naive_accuracy <-accuracy(naive_forecast)# Compare both approachesforecast_comparison <-autoplot(vehicle_ts) +autolayer(naive_forecast, series ="Naive", PI =FALSE) +autolayer(snaive_forecast, series ="Seasonal Naive", PI =FALSE) +theme_minimal() +labs(title ="Comparison of Naive and Seasonal Naive Forecasts",x ="Year",y ="Sales Volume")print(forecast_comparison)
4.1.1 Analysis of Forecasting Results
6-Period Seasonal Naive Forecast
This plot displays the predicted sales for the next six periods using a seasonal naïve model.
The model accurately captures recurring seasonal trends by relying on the most recent seasonal values for its predictions.
The forecast closely aligns with the observed seasonal patterns, especially during months of high and low demand, showcasing the model’s effectiveness for this dataset’s strong seasonality..
Comparison of Naive and Seasonal Naive Forecasts
The naive forecast assumes the most recent data point remains constant, disregarding any seasonal effects.
In contrast, the seasonal naïve forecast accounts for recurring seasonal patterns, producing predictions that more accurately reflect observed trends.
This comparison underscores the seasonal naïve forecast’s strength in delivering more realistic short-term predictions, especially for datasets with strong
Source Code
---title: "Assignment 2"author: "Farzana"format: html: code-fold: true code-summary: "Show code" code-tools: true toc: true toc-depth: 2 toc-location: left theme: cosmo css: styles.css self-contained: true embed-resources: true number-sections: true html-math-method: katex---```{r}#| include: falseknitr::opts_chunk$set(fig.path ="figures/",fig.width =10,fig.height =6,fig.retina =2,out.width ="100%",cache =TRUE)```# Section1: Dataset Overview and Context {.tabset}## OverviewThe dataset tracks monthly total vehicle sales in the United States from 1976 to 2024. The data shows clear seasonal patterns with peaks typically occurring in March-June and troughs in January-February. Vehicle sales are influenced by economic conditions (GDP, interest rates, employment), consumer confidence, manufacturer incentives, and seasonal buying patterns. The series exhibits both cyclical behavior aligned with economic cycles and structural changes like the 2008 financial crisis and 2020 pandemic disruptions.Forecasting vehicle sales presents moderate difficulty. While strong seasonality and economic relationships provide useful signals, the series is subject to unpredictable shocks (oil prices, supply chain disruptions) and changing consumer preferences. Long-term forecasting is complicated by industry transformations like the shift toward electric vehicles and evolving mobility trends.# Exploratory Data Analysis {.tabset}## Time Series Visualization::: {.panel-tabset}### Line Chart```{r}#| label: setup#| warning: false#| message: falselibrary(dplyr)library(ggplot2)library(tidyr)library(kableExtra)library(zoo)library(forecast)library(tseries)library(lubridate)vehicle_sales <-read.csv("total_vehicle_sales.csv")vehicle_sales$date <-as.Date(vehicle_sales$date)# time series objectvehicle_ts <-ts(vehicle_sales$vehicle_sales, start =c(1976, 1), frequency =12)# Basic time series plotggplot(vehicle_sales, aes(x = date, y = vehicle_sales)) +geom_line(color ="#2C3E50") +labs(title ="US Vehicle Sales Over Time",x ="Year",y ="Total Sales",caption ="Source: Your Data Source") +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14, face ="bold"))```### Density Plot```{r}# Density plotggplot(vehicle_sales, aes(x = vehicle_sales)) +geom_density(fill ="#3498DB", alpha =0.7) +labs(title ="Distribution of Vehicle Sales",x ="Sales Volume",y ="Density") +theme_minimal()```### Monthly Boxplot```{r}# Monthly boxplotvehicle_sales$month <-format(vehicle_sales$date, "%b")vehicle_sales$month <-factor(vehicle_sales$month, levels = month.abb)ggplot(vehicle_sales, aes(x = month, y = vehicle_sales)) +geom_boxplot(fill ="#3498DB", alpha =0.7) +labs(title ="Monthly Distribution of Vehicle Sales",x ="Month",y ="Sales Volume") +theme_minimal() +theme(axis.text.x =element_text(angle =45))```### Summary Statistics```{r}#| label: summary-stats#| warning: false#| message: false# Calculate summary statisticssummary_stats <-data.frame(Metric =c("Number of Observations","Mean","Median","Standard Deviation","Minimum","Maximum","1st Quartile","3rd Quartile"),Value =c(length(vehicle_ts),mean(vehicle_ts),median(vehicle_ts),sd(vehicle_ts),min(vehicle_ts),max(vehicle_ts),quantile(vehicle_ts, 0.25),quantile(vehicle_ts, 0.75)))# Create formatted tablekable(summary_stats, caption ="Summary Statistics of Vehicle Sales",format ="html",digits =2) %>%kable_styling(bootstrap_options =c("striped", "hover"))```### Understanding Summary Statistics- **Central Tendency**: The similarity between the mean and median suggests the data has a symmetric distribution without significant skewness. With a central tendency in the range of 1263–1271, this points to a consistent and stable sales volume over the years.- **Variability**: A standard deviation of 220.79 indicates a moderate variation in sales values, with the majority of months staying within one standard deviation above or below the mean.- **Range**: The minimum value of 670.47 reflects lower sales during challenging economic times or off-peak seasons, while the maximum value of 1845.71 represents sales at their peak during high-demand seasons or favorable economic conditions.- **Quartiles**: "The interquartile range (IQR = 1421.77 - 1119.52 = 302.25) reflects moderate variability in the middle 50% of the sales data. This range captures the typical month-to-month fluctuations in vehicle sales, excluding any extreme outliers or unusual events.### Initial Analysis- **Outliers**: Sales values outside the minimum and maximum are likely tied to major economic events, such as financial crises or periods of economic boom.- **Seasonality**: The close alignment of the quartiles, mean, and median further reinforces the existence of cyclical and seasonal patterns in the data.- **Challenges in Forecasting**: The variability indicated by the standard deviation, along with occasional extremes in the data, makes forecasting somewhat challenging, particularly when factoring in external shocks.:::# Section 3: Time Series Components Analysis {.tabset}## Moving Average Analysis```{r}#| label: moving-average#| warning: false#| message: false# Calculate 12-month moving averagema_12 <-rollmean(vehicle_ts, k =12, align ="center")# Create data frame for plottingma_df <-data.frame(date = vehicle_sales$date[6:(length(vehicle_ts)-6)],original = vehicle_ts[6:(length(vehicle_ts)-6)],ma = ma_12)# Plot original series with moving averageggplot(ma_df, aes(x = date)) +geom_line(aes(y = original, color ="Original"), alpha =0.7) +geom_line(aes(y = ma, color ="12-Month Moving Average"), size =1) +scale_color_manual(values =c("Original"="#2C3E50", "12-Month Moving Average"="#E74C3C")) +labs(title ="Vehicle Sales with 12-Month Moving Average",x ="Year",y ="Sales Volume",color ="Series") +theme_minimal()# Calculate and plot remainder seriesma_df$remainder <- ma_df$original - ma_df$maggplot(ma_df, aes(x = date, y = remainder)) +geom_line(color ="#2C3E50") +labs(title ="Remainder Series (Original - Moving Average)",x ="Year",y ="Remainder") +theme_minimal()```## Seasonality Analysis```{r}#| label: seasonality#| warning: false#| message: false# Decompose time seriesdecomp <-decompose(vehicle_ts, type ="multiplicative")# Plot decompositionautoplot(decomp) +theme_minimal() +labs(title ="Multiplicative Time Series Decomposition")# Seasonal plotggseasonplot(vehicle_ts, year.labels =TRUE, year.labels.left =TRUE) +theme_minimal() +labs(title ="Seasonal Plot of Vehicle Sales",x ="Month",y ="Sales Volume")```### Observations- **Cyclicality**: Both the moving average and the remainder series clearly emphasize strong cyclical patterns.- **Seasonality**: Patterns in the moving average indicate seasonality, though further decomposition or spectral analysis is needed to confirm its intensity.- **Forecasting Challenges**: The remainder series suggests that unexpected short-term events, such as shocks or anomalies, could pose challenges for accurate forecasting.# Section 4: Naive Forecasting {.tabset}## Forecasting Results```{r}#| label: naive-forecast#| warning: false#| message: false# Create seasonal naive forecastforecast_length <-6snaive_forecast <-snaive(vehicle_ts, h = forecast_length)# Plot forecastautoplot(snaive_forecast) +theme_minimal() +labs(title ="6-Period Seasonal Naive Forecast",x ="Year",y ="Sales Volume") +guides(colour =guide_legend(title ="Series"))# Calculate accuracy metricsaccuracy_metrics <-accuracy(snaive_forecast)kable(accuracy_metrics, caption ="Forecast Accuracy Metrics",format ="html",digits =3) %>%kable_styling(bootstrap_options =c("striped", "hover"))# Compare with simple naive forecastnaive_forecast <-naive(vehicle_ts, h = forecast_length)naive_accuracy <-accuracy(naive_forecast)# Compare both approachesforecast_comparison <-autoplot(vehicle_ts) +autolayer(naive_forecast, series ="Naive", PI =FALSE) +autolayer(snaive_forecast, series ="Seasonal Naive", PI =FALSE) +theme_minimal() +labs(title ="Comparison of Naive and Seasonal Naive Forecasts",x ="Year",y ="Sales Volume")print(forecast_comparison)```### Analysis of Forecasting Results1. **6-Period Seasonal Naive Forecast** - This plot displays the predicted sales for the next six periods using a seasonal naïve model. - The model accurately captures recurring seasonal trends by relying on the most recent seasonal values for its predictions. - The forecast closely aligns with the observed seasonal patterns, especially during months of high and low demand, showcasing the model's effectiveness for this dataset's strong seasonality..2. **Comparison of Naive and Seasonal Naive Forecasts** - The naive forecast assumes the most recent data point remains constant, disregarding any seasonal effects. - In contrast, the seasonal naïve forecast accounts for recurring seasonal patterns, producing predictions that more accurately reflect observed trends. - This comparison underscores the seasonal naïve forecast's strength in delivering more realistic short-term predictions, especially for datasets with strong