Introduction
Understanding the changes in the vehicle market is crucial for consumers, policymakers, economists, and multiple industries. According to the Alliance for Automotive Innovation, around 4.8 percent of the nominal GDP in the United States is within the automotive ecosystem (i.e., car servicing, dealerships, transportation services, etc.). In addition, cars make up a critical part of our lives in the United States. We rely on them to get us from place to place, to and from work, and to safely transport ourselves and our families. Put simply, cars are what drive Americans daily, and with our current infrastructure, their absence would make our lives much more difficult.
Thus, it is important to consider how changes in the car market impact consumers. Everyone has to buy a car at some point, and having more information regarding the changes in price, inflation, and sales may help us make better choices and know when and how to spend our money. In our economic outlook project, we will analyze real-time consumer interest in vehicles in the U.S. and observe how it interacts with price changes. By answering this question, we will provide tools to help take advantage of opportunities within the car market.
Data
This report uses multiple datasets to analyze the U.S. vehicle market, focusing specifically on consumer price indexes (CPI) for new and used vehicles. Four primary time series data were sourced from the Federal Reserve Economic Data (FRED) database:
In addition to these FRED data, we include data from Google Trends, using search frequency for used car prices as a stand in for consumer interest / market hotness. The specific query we are analyzing is “new car prices,” which represents an interesting interaction for all three of our series.
Interaction between Total Vehicle Sales and “new car prices”: by comparing these two datasets, we can observe whether the search frequency predicts vehicle sales, and by what magnitude this occurs. In addition, we can incorporate “new car prices” searches in our forecasts through ARIMAX and VAR, creating more accurate and insightful forecasts to better predict future total vehicle sales. Our justification for creating forecasts using these two series is that, by measuring “market hotness” through search frequency data for new car prices, one can predict the volume of total vehicle sales in the United States. With VAR, we can observe the causal relationship between these two time series datasets. We repeated this process with the same google trend data for the used and new CPI’s as well.
Research Questions
In this outlook, we wanted to analyze real-time consumer interest and how it interacts with real price changes. To answer this quesiton, we ask the following questions:
Can we accurately predict total vehicle sales and inflation in the car market?
How much of a role does Google search frequency for things like “new car prices” impact inflation in the car market?
Summary of Findings
We implemented both ARIMA and ARIMAX models for the total vehicle sales series after normalizing, log-transforming, and differencing the data to achieve stationarity. Among the ARIMA specifications, the automated mode performed best based on AICc, and its residuals showed desirable properties with minimal autocorrelation and approximate normal distribution. The ARIMAX model, incorporating Google Trends data as an exogenous predictor, produced less accurate forecasts than our best ARIMA model, having a lower AICc. We found that forecasting total vehicle sales in the U.S. is complex and likely unreliable because of how severe the recent shocks were to the car market. Further steps should be taken to incorporate additional regressors and analyze other potential predictors’ viability.
For the used car CPI series, we implemented both ARIMA and ARIMAX models after log-transforming and differencing the data to achieve stationarity. Among the ARIMA specifications, the automated model performed best based on AICc, and its residuals showed desirable properties with minimal autocorrelation and approximate normal distribution The ARIMAX model, incorporating Google Trends data as an exogenous predictor, produced more adaptive forecasts that aligned better with the actual series, especially in the post-2021 period, where price dynamics were more volatile. This suggests that search-based interest captured relevant external demand-side signals that were not embedded in historical price movements alone.
In modeling the new car CPI, we followed a similar approach by performing seasonal diagnostics, STL decomposition, and second-order differencing to address trend and seasonality. The ARIMA(2,1,0)(1,0,0) model emerged as the best performer with the lowest AICc when more training data was included, and its residuals indicated a well-specified model with low serial correlation. However, out-of-sample forecasts from ARIMA alone tended to overshoot the actual log CPI, while the ARIMAX model incorporating Google Trends slightly improved alignment, capturing the flattening trend more accurately. Overall, the exogenous variable contributed modestly to forecast precision, especially in recent stabilization periods.
General Overview
## # A tsibble: 6 x 4 [1M]
## date `total sales` log_sales total_cpi
## <mth> <dbl> <dbl> <dbl>
## 1 2000 Jan 18.6 1.40 4.61
## 2 2000 Feb 19.4 1.44 4.61
## 3 2000 Mar 18.3 1.38 4.61
## 4 2000 Apr 17.9 1.36 4.61
## 5 2000 May 17.9 1.36 4.61
## 6 2000 Jun 17.6 1.34 4.62
This plot displays total U.S. vehicle sales from 2000 onward. Sales hovered around 16–20 million units per year prior to the Great Recession, with a sharp collapse beginning in 2008 as the financial crisis severely impacted consumer demand and credit availability. A slow but steady recovery followed, with sales gradually returning to pre-recession levels by the mid-2010s.
This plot shows the log transformation of sales, which will help us interpret the changes more clearly.
This seasonal plot shows that total vehicle sales exhibit some potential seasonality, but it is overall unlikely. We can observe some spikes around Q3 across all years, but it seems that overall trends are not completely determined by seasonality. Additionally, this plot helps us visualize how the series lacks a linear-upward trend.
This subseries plot shows that total vehicle sales shows little to no meaningful seasonal variation. Monthly patterns are mostly flat, with the possible exception of July, where the mean is a little higher than the other months. We can observe significant troughs during the great recession and pandemic.
This STL decomposition of the log-transformed total vehicle sales highlights that the dominant driver of changes in prices over time is the long-term trend component. Prices remained fairly flat for much of the 2000s and 2010s, but there was a significant upward shift beginning around 2021—likely due to post-pandemic supply shortages, strong demand, and inflationary pressures. The seasonal component is minor and consistent, suggesting only modest month-to-month variation. The remainder (or residual) component stays close to zero with occasional deviations, indicating that the decomposition model effectively isolates the main patterns in the data.
Stationarity test
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.461 0.0508
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 0
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 0
ACF & PACF plots
It looks like our data would benefit from at least one difference.
The ACF plot show some significant spikes around 1, 2, and 12, suggesting possible AR(1) or seasonal effects (because of the lags around 12 and 24). However, most of the lags fall within the bands, suggesting that the residual structure is mostly white noise. The differenced series is now stationary and ready for ARIMA modelling.
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 auto 0.00438 343. -676. -675. -658. <cpl [24]> <cpl [2]>
## 2 arima2 0.00443 341. -674. -674. -660. <cpl [0]> <cpl [14]>
## 3 arima4 0.00448 339. -671. -671. -657. <cpl [14]> <cpl [0]>
## 4 arima1 0.00454 337. -669. -669. -662. <cpl [0]> <cpl [1]>
## 5 arima3 0.00461 335. -665. -665. -654. <cpl [13]> <cpl [0]>
We tested serveral ARIMA models using training data from 2000 to 2021. Based on AICc values, the auto ARIMA model is the best fit. This suggests that it captures the underlying time series structure more effectively than our manual alternaitves. The auto model is defined as: ARIMA(0,1,2)(2,0,0)[12].
The residual plots from the selected ARIMA model look clean, with the ACF plot looking indiscernible from white noise. There are some dips during recessionary periods, but our residual plot looks mostly stable.
The forecast vs. actual plot shows that the model under-estimates sales within the test set. This may be due to recency bias, as the automotive market was recently shocked by the pandemic. This observations highlights how omitting data in a training set may influence forecasts. To address this, we will include more data in the training set:
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 auto 0.00417 382. -749. -749. -724. <cpl [26]> <cpl [2]>
## 2 arima2 0.00423 379. -749. -749. -735. <cpl [0]> <cpl [14]>
## 3 arima4 0.00428 377. -746. -746. -731. <cpl [14]> <cpl [0]>
## 4 arima1 0.00433 374. -744. -744. -737. <cpl [0]> <cpl [1]>
## 5 arima3 0.00440 372. -739. -739. -728. <cpl [13]> <cpl [0]>
We extended the range of our training set from 2021 to 2023. This will help with forecasts significantly, especially because of the recent trough in 2020. The best performing model among our speculated models is the auto model, which is ARIMA(2,1,2)(2,0,0)[12].
The residual diagnostics for ARIMA(2,1,2)(2,0,0)[12] look good, with the ACF plot looking indiscernible from white noise (mostly). The distribution of our residuals is normal, and there is not much variation in our residual plot (besides recessionary periods).
ARIMAX
## Month used.car.price...United.States.
## 1 2004-01 86
## 2 2004-02 78
## 3 2004-03 85
## 4 2004-04 82
## 5 2004-05 80
## 6 2004-06 74
## # A tsibble: 6 x 3 [1M]
## month used_car_price_united_states date
## <chr> <int> <mth>
## 1 2004-01 86 2004 Jan
## 2 2004-02 78 2004 Feb
## 3 2004-03 85 2004 Mar
## 4 2004-04 82 2004 Apr
## 5 2004-05 80 2004 May
## 6 2004-06 74 2004 Jun
Here, we import and prepare Google Trends data on “new car prices” search frequency in the U.s., which will serve as an external regressor in our ARIMAX model. The dataset is cleaned, converted to a monthly tsibble, and aligned by date. This setup allows us to test whether public search interest (market hotness) helps explain or improve forecasts of total vehicle sales.
To make the Google Trends data meaningful in real terms, we adjust the nominal search-based price index using the 2020 average CPI as a base. This creates a new variable (real_price) which accounts for inflation and aligns the trend data with actual economic conditions. Despite a warning about vector length mismatch, the output confirms that the merge and transformation were successful.
## # A tsibble: 253 x 4 [1M]
## date log_sales diff_log_sales real_price
## <mth> <dbl> <dbl> <dbl>
## 1 2004 Jan 1.30 -0.0387 87.0
## 2 2004 Feb 1.32 0.0195 78.3
## 3 2004 Mar 1.33 0.0128 58.3
## 4 2004 Apr 1.31 -0.0204 43.4
## 5 2004 May 1.39 0.0735 55.0
## 6 2004 Jun 1.27 -0.117 61.6
## 7 2004 Jul 1.34 0.0684 88.0
## 8 2004 Aug 1.33 -0.00835 73.4
## 9 2004 Sep 1.37 0.0394 72.1
## 10 2004 Oct 1.35 -0.0211 64.5
## # ℹ 243 more rows
## Series: log_sales
## Model: LM w/ ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2 log_real_price intercept
## 0.9512 -0.1337 -0.1222 -0.0394 1.3613
## s.e. 0.0225 0.0701 0.0786 0.0224 0.1011
##
## sigma^2 estimated as 0.004172: log likelihood=318.57
## AIC=-625.15 AICc=-624.79 BIC=-604.26
Looking at the residuals, we can see that the ARIMAX model looks a little worse compared to our ARIMA model. For example, when comparing the two ACF plots, we can see that the ARIMA model is more similar to white noise relative to the ARIMAX model, which has a significant spike at lag 13. This may suggest that our additional regressor may not improve our model.
This 12-month out-of-sample forecast from the ARIMA(2,1,2)(0,0,0)[12] model projects a flat trajectory for percent changes in total vehicle sales. The model smooths over recent fluctuations, suggesting that it prioritizes long-run average behavior over short-term volatility. The forecast resembles a mean-reverting process, indicating limited predictive power for capturing month-to-month changes in sales growth.
Our ARIMAX model is not very different from our ARIMA model, however, it does appear more flat, resembling the mean-reverting process previously discussed. This could imply that either the predictors are weakly informative or that the system is stabilizing after recent shocks.
## Series: log_sales
## Model: LM w/ ARIMA(2,0,1) errors
##
## Coefficients:
## ar1 ar2 ma1 log_real_price intercept
## 1.5434 -0.5583 -0.7446 -0.0444 1.3839
## s.e. 0.1711 0.1614 0.1447 0.0214 0.1011
##
## sigma^2 estimated as 0.004012: log likelihood=340.69
## AIC=-669.37 AICc=-669.03 BIC=-648.17
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 arima1 0.00433 374. -744. -744. -737. <cpl [0]> <cpl [1]>
## 2 arima2 0.00423 379. -749. -749. -735. <cpl [0]> <cpl [14]>
## 3 arima3 0.00440 372. -739. -739. -728. <cpl [13]> <cpl [0]>
## 4 arima4 0.00428 377. -746. -746. -731. <cpl [14]> <cpl [0]>
## 5 auto 0.00417 382. -749. -749. -724. <cpl [26]> <cpl [2]>
When comparing the AICc for the ARIMAX model to our best performing ARIMA model (auto), we can see that the ARIMA model significantly outperforms the ARIMAX model, again suggesting that perhaps, the regressor lacks predictive power when included in the total vehicle sales series.
Total Vehicle Sales & Google Search Data
In this setion, we aim to test a Vector Autogregressive (VAR) model for Total Vehicle Sales and Google search frequency to measure its accuracy in forecasting. We create a forecast from the training data and compare that to the figures of the original plot.
Unit-root tests
## [1] 1
## [1] 1
Run VAR:
##
## VAR Estimation Results:
## =========================
## Endogenous variables: total.sales.d, new.car.prices.search.d
## Deterministic variables: const
## Sample size: 226
## Log Likelihood: 518.455
## Roots of the characteristic polynomial:
## 0.3737 0.3737 0.3173 0.3173
## Call:
## VAR(y = ys, p = p_opt, type = "const", season = 12L)
##
##
## Estimation results for equation total.sales.d:
## ==============================================
## total.sales.d = total.sales.d.l1 + new.car.prices.search.d.l1 + total.sales.d.l2 + new.car.prices.search.d.l2 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11
##
## Estimate Std. Error t value Pr(>|t|)
## total.sales.d.l1 -0.129145 0.070297 -1.837 0.0676 .
## new.car.prices.search.d.l1 -0.067477 0.049978 -1.350 0.1784
## total.sales.d.l2 -0.141114 0.070580 -1.999 0.0469 *
## new.car.prices.search.d.l2 -0.035003 0.048599 -0.720 0.4722
## const -0.001456 0.004562 -0.319 0.7500
## sd1 -0.005445 0.022478 -0.242 0.8088
## sd2 0.001889 0.022621 0.083 0.9335
## sd3 -0.016529 0.022187 -0.745 0.4571
## sd4 0.016662 0.022138 0.753 0.4525
## sd5 -0.005600 0.022178 -0.253 0.8009
## sd6 0.026552 0.022109 1.201 0.2311
## sd7 0.005331 0.022161 0.241 0.8101
## sd8 -0.015602 0.022119 -0.705 0.4814
## sd9 -0.015453 0.022763 -0.679 0.4980
## sd10 -0.003651 0.022830 -0.160 0.8731
## sd11 0.002181 0.022121 0.099 0.9216
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.06797 on 210 degrees of freedom
## Multiple R-Squared: 0.07522, Adjusted R-squared: 0.009165
## F-statistic: 1.139 on 15 and 210 DF, p-value: 0.3234
##
##
## Estimation results for equation new.car.prices.search.d:
## ========================================================
## new.car.prices.search.d = total.sales.d.l1 + new.car.prices.search.d.l1 + total.sales.d.l2 + new.car.prices.search.d.l2 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11
##
## Estimate Std. Error t value Pr(>|t|)
## total.sales.d.l1 -0.177779 0.099234 -1.792 0.07465 .
## new.car.prices.search.d.l1 -0.237856 0.070551 -3.371 0.00089 ***
## total.sales.d.l2 -0.010365 0.099632 -0.104 0.91724
## new.car.prices.search.d.l2 -0.102193 0.068604 -1.490 0.13782
## const -0.010224 0.006441 -1.587 0.11393
## sd1 0.006331 0.031731 0.200 0.84204
## sd2 -0.010985 0.031933 -0.344 0.73119
## sd3 -0.038013 0.031320 -1.214 0.22623
## sd4 -0.013007 0.031251 -0.416 0.67769
## sd5 -0.035170 0.031307 -1.123 0.26256
## sd6 -0.009403 0.031210 -0.301 0.76350
## sd7 -0.062944 0.031283 -2.012 0.04549 *
## sd8 -0.155066 0.031224 -4.966 0.00000141 ***
## sd9 -0.104773 0.032133 -3.261 0.00130 **
## sd10 -0.054854 0.032227 -1.702 0.09022 .
## sd11 -0.051561 0.031227 -1.651 0.10019
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.09595 on 210 degrees of freedom
## Multiple R-Squared: 0.2205, Adjusted R-squared: 0.1648
## F-statistic: 3.96 on 15 and 210 DF, p-value: 0.000002744
##
##
##
## Covariance matrix of residuals:
## total.sales.d new.car.prices.search.d
## total.sales.d 0.004620 0.001462
## new.car.prices.search.d 0.001462 0.009205
##
## Correlation matrix of residuals:
## total.sales.d new.car.prices.search.d
## total.sales.d 1.0000 0.2242
## new.car.prices.search.d 0.2242 1.0000
## AIC(n)
## 2
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var_fit
## Chi-squared = 60.315, df = 40, p-value = 0.0205
Forecasting with VAR
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var_fit
## Chi-squared = 57.107, df = 44, p-value = 0.08882
Impulse Response Functions
## [1] "total.sales.d" "new.car.prices.search.d"
Granger Causality
## $Granger
##
## Granger causality H0: total.sales.d do not Granger-cause
## new.car.prices.search.d
##
## data: VAR object var_fit
## F-Test = 4.0921, df1 = 1, df2 = 448, p-value = 0.04368
##
##
## $Instant
##
## H0: No instantaneous causality between: total.sales.d and
## new.car.prices.search.d
##
## data: VAR object var_fit
## Chi-squared = 12.893, df = 1, p-value = 0.0003297
## $Granger
##
## Granger causality H0: new.car.prices.search.d do not Granger-cause
## total.sales.d
##
## data: VAR object var_fit
## F-Test = 0.69067, df1 = 1, df2 = 448, p-value = 0.4064
##
##
## $Instant
##
## H0: No instantaneous causality between: new.car.prices.search.d and
## total.sales.d
##
## data: VAR object var_fit
## Chi-squared = 12.893, df = 1, p-value = 0.0003297
Accuracy check
##
## Accuracy for total.sales.d
## ME RMSE MAE MPE
## Training set -0.000000000000000001765202 0.06725857 0.03808528 143.04777
## Test set 0.002183941212102147918367 0.03516229 0.02961127 92.17631
## MAPE MASE ACF1 Theil's U
## Training set 151.91710 0.6857797 -0.02118466 NA
## Test set 96.53059 0.5331931 -0.43516430 0.9103917
##
## Accuracy for new.car.prices.search.d
## ME RMSE MAE MPE MAPE
## Training set 0.000000000000000002011267 0.1028375 0.07222670 NaN Inf
## Test set -0.002996017723777167664212 0.1284703 0.09211164 Inf Inf
## MASE ACF1 Theil's U
## Training set 0.7469200 0.001841961 NA
## Test set 0.9525566 -0.146851986 NaN
##
## ACCURACY — Total-sales differenced log
## ME RMSE MAE MPE MAPE
## Test set 0.002183941 0.03516229 0.02961127 92.17631 96.53059
##
## ACCURACY — Prices-search differenced log
## ME RMSE MAE MPE MAPE
## Test set -0.002996018 0.1284703 0.09211164 Inf Inf
## # A tsibble: 6 x 5 [1M]
## date series_id value realtime_start realtime_end
## <mth> <chr> <dbl> <date> <date>
## 1 2000 Jan CUSR0000SETA02 154. 2025-07-04 2025-07-04
## 2 2000 Feb CUSR0000SETA02 153 2025-07-04 2025-07-04
## 3 2000 Mar CUSR0000SETA02 153 2025-07-04 2025-07-04
## 4 2000 Apr CUSR0000SETA02 154 2025-07-04 2025-07-04
## 5 2000 May CUSR0000SETA02 155. 2025-07-04 2025-07-04
## 6 2000 Jun CUSR0000SETA02 156. 2025-07-04 2025-07-04
The autoplot() visualization presents the full historical trajectory of the used car CPI from 2000 onward. The most striking feature is the dramatic and unprecedented spike around 2021, when the index surged well beyond 200. This sharp rise reflects a unique period of post-pandemic disruption, characterized by semiconductor shortages, supply chain bottlenecks, and a surge in demand for used vehicles due to limited new vehicle production. Prior to this shock, the index showed relatively moderate fluctuations, with some cyclical downturns during economic slowdowns (e.g., around 2009). The gradual decline following the 2021 peak suggests a market correction, although prices remain elevated compared to historical norms.
Based on the result from ndiffs(), which indicated that one level of differencing is required for stationarity, we apply a logarithmic transformation to the used car CPI series.
The seasonal plot of the log-transformed used car CPI reveals a lack of strong, consistent seasonal patterns across years. While recent years like 2021–2024 show higher overall price levels and some upward movement in the summer and late-year months, earlier years display relatively flat and stable trends. This suggests that short-term monthly seasonality is weak compared to broader structural shifts in the market, especially during the pandemic period.
The subseries plot provides a breakdown of used car CPI trends by month across different years. While the spike in values during 2021–2022 is clearly visible across all months, there is little evidence of consistent intra-year seasonality—monthly patterns are not strongly repetitive over time. This supports the earlier observation that structural shocks, rather than seasonal effects, drive most of the variation in used vehicle prices.
The STL decomposition breaks down the log-transformed used car CPI into its trend, seasonal, and remainder components. The trend component captures the long-run upward shift starting around 2020, reflecting the pandemic-driven price surge. Seasonal effects are relatively minor and stable over time, while the remainder highlights short-term volatility, especially during the COVID-19 period when residuals become more erratic—suggesting unusual shocks beyond trend or seasonality.
Stationarity test
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 1.66 0.01
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 0
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
The log-transformed CPI plot highlights relative percent changes over time. The steep rise post-2020 confirms a rapid inflation in used car prices. This transformation prepares the data for ARIMA modeling by stabilizing variance.
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.207 0.1
The first-order differenced log CPI series shows percent changes month-to-month. The series is now stationary with constant mean and variance. Extreme spikes around 2021 reflect volatile price swings during the pandemic.
ACF & PACF
The ACF plot shows a slow decay, while the PACF cuts off sharply after lag 1. This suggests an AR(1) structure is appropriate for modeling. The differenced series is now stationary and ready for ARIMA fitting.
## # A tsibble: 6 x 3 [1M]
## date log_cpi used_car_cpi_FOD
## <mth> <dbl> <dbl>
## 1 2000 Jan 5.04 NA
## 2 2000 Feb 5.03 -0.00587
## 3 2000 Mar 5.03 0
## 4 2000 Apr 5.04 0.00651
## 5 2000 May 5.05 0.00905
## 6 2000 Jun 5.05 0.00193
## # A tsibble: 6 x 3 [1M]
## date log_cpi used_car_cpi_FOD
## <mth> <dbl> <dbl>
## 1 2024 Aug 5.16 -0.00156
## 2 2024 Sep 5.17 0.00503
## 3 2024 Oct 5.18 0.0121
## 4 2024 Nov 5.19 0.0129
## 5 2024 Dec 5.20 0.00758
## 6 2025 Jan 5.22 0.0217
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 auto 0.0000933 845. -1679. -1678. -1657. <cpl [3]> <cpl [13]>
## 2 arima2 0.0000973 842. -1677. -1676. -1662. <cpl [0]> <cpl [14]>
## 3 arima4 0.000105 833. -1658. -1657. -1643. <cpl [14]> <cpl [0]>
## 4 arima3 0.000105 832. -1657. -1657. -1647. <cpl [13]> <cpl [0]>
## 5 arima1 0.000130 804. -1604. -1604. -1597. <cpl [0]> <cpl [1]>
We tested several ARIMA specifications using training data from 1953 to 2021. Based on AICc values, the automatic model selection (auto) provided the best fit. This suggests it captures the underlying time series structure more effectively than our manual alternatives.
The residual plots from the selected ARIMA model look clean. There’s no strong autocorrelation left in the ACF, and the residuals are roughly normally distributed around zero. The slight increase in volatility after 2020 makes sense given the market turbulence during the pandemic.
The forecast vs. actual plot shows that the model projects a strong upward trend in used car CPI, continuing the steep rise from 2020–2021. However, actual prices have started to level off or decline slightly, indicating that the model may be overestimating future growth. Combined with the previous forecast plot, this highlights the model’s limitations in capturing recent turning points, likely due to structural changes not present in the training data.
Since the forecast above is not accurate when compared to real data, we add more data to the training set:
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 arima2 0.000108 905. -1802. -1802. -1788. <cpl [0]> <cpl [14]>
## 2 auto 0.000108 905. -1802. -1802. -1788. <cpl [0]> <cpl [14]>
## 3 arima4 0.000116 895. -1781. -1781. -1767. <cpl [14]> <cpl [0]>
## 4 arima3 0.000119 891. -1775. -1775. -1764. <cpl [13]> <cpl [0]>
## 5 arima1 0.000134 873. -1741. -1741. -1734. <cpl [0]> <cpl [1]>
We extended the training window through 2023 to incorporate more recent price dynamics. With this additional data, ARIMA(0,1,2)(0,0,1) emerged as the best-performing model based on AICc, slightly outperforming the auto-selected model. This suggests that including the post-pandemic adjustment phase helps the model better capture the full structure of the used car CPI.
The residual diagnostics for ARIMA(0,1,2)(0,0,1) look solid overall. Residuals are centered around zero with no major autocorrelation, though there is a minor spike at lag 10. The distribution is close to normal, suggesting the model is well-specified despite some noise in the pandemic period.
ARIMAX
## Month new.car.prices...United.States.
## 1 2004-01 98
## 2 2004-02 76
## 3 2004-03 91
## 4 2004-04 83
## 5 2004-05 100
## 6 2004-06 94
## # A tsibble: 6 x 3 [1M]
## month new_car_prices_united_states date
## <chr> <int> <mth>
## 1 2004-01 98 2004 Jan
## 2 2004-02 76 2004 Feb
## 3 2004-03 91 2004 Mar
## 4 2004-04 83 2004 Apr
## 5 2004-05 100 2004 May
## 6 2004-06 94 2004 Jun
Here, we import and preprocess Google Trends data on “new car prices” in the U.S., which will serve as an external regressor in our ARIMAX model. The dataset is cleaned, converted to a monthly tsibble, and aligned by date. This setup allows us to test whether public search interest helps explain or improve forecasts of used car CPI.
## # A tsibble: 253 x 8 [1M]
## month new_car_prices_united_states date series_id value realtime_start
## <chr> <int> <mth> <chr> <dbl> <date>
## 1 2004-01 98 2004 Jan CUSR0000S… 132. 2025-07-04
## 2 2004-02 76 2004 Feb CUSR0000S… 132. 2025-07-04
## 3 2004-03 91 2004 Mar CUSR0000S… 132. 2025-07-04
## 4 2004-04 83 2004 Apr CUSR0000S… 132. 2025-07-04
## 5 2004-05 100 2004 May CUSR0000S… 132. 2025-07-04
## 6 2004-06 94 2004 Jun CUSR0000S… 130. 2025-07-04
## 7 2004-07 95 2004 Jul CUSR0000S… 131. 2025-07-04
## 8 2004-08 96 2004 Aug CUSR0000S… 132 2025-07-04
## 9 2004-09 77 2004 Sep CUSR0000S… 136. 2025-07-04
## 10 2004-10 69 2004 Oct CUSR0000S… 137. 2025-07-04
## # ℹ 243 more rows
## # ℹ 2 more variables: realtime_end <date>, real_price <dbl>
To make the Google Trends data meaningful in real terms, we adjust the nominal search-based price index using the 2020 average CPI as a base. This creates a new variable (real_price) which accounts for inflation and aligns the trend data with actual economic conditions. Despite a warning about vector length mismatch, the output confirms that the merge and transformation were successful.
## # A tsibble: 253 x 3 [1M]
## date log_cpi real_price
## <mth> <dbl> <dbl>
## 1 2004 Jan 4.88 104.
## 2 2004 Feb 4.88 81.1
## 3 2004 Mar 4.88 97.8
## 4 2004 Apr 4.88 87.6
## 5 2004 May 4.88 105.
## 6 2004 Jun 4.87 97.1
## 7 2004 Jul 4.87 99.3
## 8 2004 Aug 4.88 105.
## 9 2004 Sep 4.91 86.9
## 10 2004 Oct 4.92 78.7
## # ℹ 243 more rows
## Series: log_cpi
## Model: LM w/ ARIMA(0,1,2)(0,0,1)[12] errors
##
## Coefficients:
## ma1 ma2 sma1 log_real_price
## 0.6977 0.4683 -0.1589 0.0000
## s.e. 0.0569 0.0626 0.0676 0.0048
##
## sigma^2 estimated as 0.0001242: log likelihood=737.24
## AIC=-1464.48 AICc=-1464.22 BIC=-1447.09
The ARIMAX model incorporates Google Trends data as an external regressor and forecasts a more stable path for used car CPI. Compared to the pure ARIMA model, it flattens the trend and better aligns with the recent dip in actual values. This suggests that search interest may help improve short-term forecasting accuracy.
The ARIMAX residuals appear roughly centered around zero and normally distributed, suggesting a good model fit. There’s a small autocorrelation spike at lag 7, but overall the residuals are well-behaved. This supports the reliability of the ARIMAX model, especially given the added complexity of external regressors.
This 12-month out-of-sample forecast from the ARIMA-only model shows a modest upward trend in used car CPI. The model appears to smooth over recent volatility and does not fully capture the leveling off seen in actual prices. This suggests that while ARIMA provides a reasonable baseline, it may underreact to recent turning points without external information.
The ARIMAX 12-month out-of-sample forecast produces a flatter and more stable projection than the ARIMA-only model. By incorporating Google Trends data, the model better reflects the recent slowdown in price growth. This suggests the external regressor helps anchor the forecast in current market sentiment.
## Series: log_cpi
## Model: LM w/ ARIMA(0,1,2)(0,0,1)[12] errors
##
## Coefficients:
## ma1 ma2 sma1 log_real_price
## 0.6977 0.4683 -0.1589 0.0000
## s.e. 0.0569 0.0626 0.0676 0.0048
##
## sigma^2 estimated as 0.0001242: log likelihood=737.24
## AIC=-1464.48 AICc=-1464.22 BIC=-1447.09
## Warning in report.mdl_df(used_car_cpi_fit): Model reporting is only supported
## for individual models, so a glance will be shown. To see the report for a
## specific model, use `select()` and `filter()` to identify a single model.
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 arima1 0.000134 873. -1741. -1741. -1734. <cpl [0]> <cpl [1]>
## 2 arima2 0.000108 905. -1802. -1802. -1788. <cpl [0]> <cpl [14]>
## 3 arima3 0.000119 891. -1775. -1775. -1764. <cpl [13]> <cpl [0]>
## 4 arima4 0.000116 895. -1781. -1781. -1767. <cpl [14]> <cpl [0]>
## 5 auto 0.000108 905. -1802. -1802. -1788. <cpl [0]> <cpl [14]>
Compared to the best ARIMA-only model (arima2), ARIMAX has a higher AICc, confirming that the external regressor doesn’t improve model fit in this case.
Used Vehicle CPI & Google Search Data
In this setion, we aim to test a Vector Autogregressive (VAR) model for Used Vehicle CPI and Google search frequency to measure its accuracy in forecasting. We create a forecast from the training data and compare that to the figures of the original plot.
Unit-root tests
## [1] 1
## [1] 1
Run VAR:
##
## VAR Estimation Results:
## =========================
## Endogenous variables: used.cpi.d, new.car.prices.search.d
## Deterministic variables: const
## Sample size: 219
## Log Likelihood: 930.786
## Roots of the characteristic polynomial:
## 0.8996 0.8996 0.8907 0.8803 0.8803 0.8755 0.8755 0.8239 0.8239 0.8132 0.8132 0.8082 0.8082 0.7798 0.7798 0.734 0.734 0.2132
## Call:
## VAR(y = ys, p = p_opt, type = "const", season = 12L)
##
##
## Estimation results for equation used.cpi.d:
## ===========================================
## used.cpi.d = used.cpi.d.l1 + new.car.prices.search.d.l1 + used.cpi.d.l2 + new.car.prices.search.d.l2 + used.cpi.d.l3 + new.car.prices.search.d.l3 + used.cpi.d.l4 + new.car.prices.search.d.l4 + used.cpi.d.l5 + new.car.prices.search.d.l5 + used.cpi.d.l6 + new.car.prices.search.d.l6 + used.cpi.d.l7 + new.car.prices.search.d.l7 + used.cpi.d.l8 + new.car.prices.search.d.l8 + used.cpi.d.l9 + new.car.prices.search.d.l9 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11
##
## Estimate Std. Error t value Pr(>|t|)
## used.cpi.d.l1 0.65453197 0.07246198 9.033 < 0.0000000000000002
## new.car.prices.search.d.l1 -0.02029503 0.00872399 -2.326 0.0211
## used.cpi.d.l2 0.09628844 0.08619849 1.117 0.2654
## new.car.prices.search.d.l2 -0.00667651 0.00906578 -0.736 0.4624
## used.cpi.d.l3 -0.37067051 0.08622141 -4.299 0.0000274
## new.car.prices.search.d.l3 0.00502632 0.00895297 0.561 0.5752
## used.cpi.d.l4 0.13489662 0.08983993 1.502 0.1349
## new.car.prices.search.d.l4 0.00615081 0.00896921 0.686 0.4937
## used.cpi.d.l5 0.10477528 0.08967939 1.168 0.2441
## new.car.prices.search.d.l5 0.00303149 0.00906259 0.335 0.7384
## used.cpi.d.l6 -0.06968356 0.08940957 -0.779 0.4367
## new.car.prices.search.d.l6 -0.00005928 0.00901777 -0.007 0.9948
## used.cpi.d.l7 0.10684916 0.08532235 1.252 0.2120
## new.car.prices.search.d.l7 -0.00183708 0.00897180 -0.205 0.8380
## used.cpi.d.l8 0.19610613 0.08677028 2.260 0.0250
## new.car.prices.search.d.l8 0.00734601 0.00891265 0.824 0.4109
## used.cpi.d.l9 -0.13580115 0.07514067 -1.807 0.0723
## new.car.prices.search.d.l9 0.00522073 0.00844483 0.618 0.5372
## const 0.00029654 0.00086702 0.342 0.7327
## sd1 -0.00161145 0.00386131 -0.417 0.6769
## sd2 -0.00285933 0.00413575 -0.691 0.4902
## sd3 0.00366518 0.00416154 0.881 0.3796
## sd4 0.00212393 0.00423061 0.502 0.6162
## sd5 -0.00086037 0.00429345 -0.200 0.8414
## sd6 -0.00244336 0.00411460 -0.594 0.5533
## sd7 0.00423444 0.00402194 1.053 0.2938
## sd8 0.00111245 0.00391550 0.284 0.7766
## sd9 -0.00226393 0.00408889 -0.554 0.5805
## sd10 -0.00315525 0.00407424 -0.774 0.4396
## sd11 0.00140619 0.00380067 0.370 0.7118
##
## used.cpi.d.l1 ***
## new.car.prices.search.d.l1 *
## used.cpi.d.l2
## new.car.prices.search.d.l2
## used.cpi.d.l3 ***
## new.car.prices.search.d.l3
## used.cpi.d.l4
## new.car.prices.search.d.l4
## used.cpi.d.l5
## new.car.prices.search.d.l5
## used.cpi.d.l6
## new.car.prices.search.d.l6
## used.cpi.d.l7
## new.car.prices.search.d.l7
## used.cpi.d.l8 *
## new.car.prices.search.d.l8
## used.cpi.d.l9 .
## new.car.prices.search.d.l9
## const
## sd1
## sd2
## sd3
## sd4
## sd5
## sd6
## sd7
## sd8
## sd9
## sd10
## sd11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.01084 on 189 degrees of freedom
## Multiple R-Squared: 0.5046, Adjusted R-squared: 0.4286
## F-statistic: 6.639 on 29 and 189 DF, p-value: < 0.00000000000000022
##
##
## Estimation results for equation new.car.prices.search.d:
## ========================================================
## new.car.prices.search.d = used.cpi.d.l1 + new.car.prices.search.d.l1 + used.cpi.d.l2 + new.car.prices.search.d.l2 + used.cpi.d.l3 + new.car.prices.search.d.l3 + used.cpi.d.l4 + new.car.prices.search.d.l4 + used.cpi.d.l5 + new.car.prices.search.d.l5 + used.cpi.d.l6 + new.car.prices.search.d.l6 + used.cpi.d.l7 + new.car.prices.search.d.l7 + used.cpi.d.l8 + new.car.prices.search.d.l8 + used.cpi.d.l9 + new.car.prices.search.d.l9 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11
##
## Estimate Std. Error t value Pr(>|t|)
## used.cpi.d.l1 2.0199721 0.5990324 3.372 0.000905 ***
## new.car.prices.search.d.l1 -0.3039052 0.0721199 -4.214 0.0000388 ***
## used.cpi.d.l2 -1.0621445 0.7125901 -1.491 0.137749
## new.car.prices.search.d.l2 -0.1554874 0.0749455 -2.075 0.039371 *
## used.cpi.d.l3 0.3432003 0.7127796 0.481 0.630721
## new.car.prices.search.d.l3 -0.1077265 0.0740129 -1.456 0.147187
## used.cpi.d.l4 0.4694867 0.7426934 0.632 0.528059
## new.car.prices.search.d.l4 -0.0571300 0.0741471 -0.770 0.441969
## used.cpi.d.l5 -0.5053135 0.7413662 -0.682 0.496328
## new.car.prices.search.d.l5 0.0290829 0.0749191 0.388 0.698312
## used.cpi.d.l6 -0.4839026 0.7391356 -0.655 0.513465
## new.car.prices.search.d.l6 -0.1058502 0.0745486 -1.420 0.157290
## used.cpi.d.l7 1.6950459 0.7053472 2.403 0.017222 *
## new.car.prices.search.d.l7 0.1056817 0.0741685 1.425 0.155839
## used.cpi.d.l8 -1.1529600 0.7173170 -1.607 0.109653
## new.car.prices.search.d.l8 0.1372754 0.0736795 1.863 0.063994 .
## used.cpi.d.l9 1.2206480 0.6211768 1.965 0.050873 .
## new.car.prices.search.d.l9 -0.1114998 0.0698121 -1.597 0.111905
## const -0.0162332 0.0071675 -2.265 0.024658 *
## sd1 0.0229882 0.0319209 0.720 0.472314
## sd2 -0.0031860 0.0341897 -0.093 0.925853
## sd3 0.0110184 0.0344029 0.320 0.749114
## sd4 0.0093923 0.0349738 0.269 0.788567
## sd5 -0.0313080 0.0354933 -0.882 0.378852
## sd6 -0.0001634 0.0340148 -0.005 0.996171
## sd7 -0.0469277 0.0332488 -1.411 0.159767
## sd8 -0.1507831 0.0323689 -4.658 0.0000060 ***
## sd9 -0.0882489 0.0338023 -2.611 0.009759 **
## sd10 -0.0504991 0.0336812 -1.499 0.135457
## sd11 -0.0497024 0.0314196 -1.582 0.115346
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.08965 on 189 degrees of freedom
## Multiple R-Squared: 0.3605, Adjusted R-squared: 0.2624
## F-statistic: 3.675 on 29 and 189 DF, p-value: 0.00000003338
##
##
##
## Covariance matrix of residuals:
## used.cpi.d new.car.prices.search.d
## used.cpi.d 0.00011761 0.00009569
## new.car.prices.search.d 0.00009569 0.00803781
##
## Correlation matrix of residuals:
## used.cpi.d new.car.prices.search.d
## used.cpi.d 1.00000 0.09842
## new.car.prices.search.d 0.09842 1.00000
## AIC(n)
## 9
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var_fit
## Chi-squared = 19.262, df = 12, p-value = 0.08239
Forecasting with VAR
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var_fit
## Chi-squared = 23.27, df = 12, p-value = 0.02552
Impulse Response Functions
## [1] "used.cpi.d" "new.car.prices.search.d"
Granger Causality
## $Granger
##
## Granger causality H0: used.cpi.d do not Granger-cause
## new.car.prices.search.d
##
## data: VAR object var_fit
## F-Test = 3.5059, df1 = 9, df2 = 400, p-value = 0.000342
##
##
## $Instant
##
## H0: No instantaneous causality between: used.cpi.d and
## new.car.prices.search.d
##
## data: VAR object var_fit
## Chi-squared = 1.2696, df = 1, p-value = 0.2598
## $Granger
##
## Granger causality H0: new.car.prices.search.d do not Granger-cause
## used.cpi.d
##
## data: VAR object var_fit
## F-Test = 0.87009, df1 = 9, df2 = 400, p-value = 0.5521
##
##
## $Instant
##
## H0: No instantaneous causality between: new.car.prices.search.d and
## used.cpi.d
##
## data: VAR object var_fit
## Chi-squared = 1.2696, df = 1, p-value = 0.2598
Accuracy check
##
## Accuracy for used.cpi.d
## ME RMSE MAE MPE
## Training set -0.0000000000000000001064399 0.01029270 0.00653742 132.5995
## Test set 0.0007418851346656051264777 0.01607424 0.01271855 86.4099
## MAPE MASE ACF1 Theil's U
## Training set 296.42889 0.4989852 -0.0151869 NA
## Test set 88.32723 0.9707753 0.2898438 0.9146236
##
## Accuracy for new.car.prices.search.d
## ME RMSE MAE MPE MAPE
## Training set 0.000000000000000001921055 0.09245507 0.06738042 NaN Inf
## Test set 0.009672906235877005640122 0.12722813 0.08910886 NaN Inf
## MASE ACF1 Theil's U
## Training set 0.6968030 -0.00688993 NA
## Test set 0.9215039 -0.12962685 NaN
##
## ACCURACY — Used-CPI differenced log
## ME RMSE MAE MPE MAPE
## Test set 0.0007418851 0.01607424 0.01271855 86.4099 88.32723
##
## ACCURACY — Prices-search differenced log
## ME RMSE MAE MPE MAPE
## Test set 0.009672906 0.1272281 0.08910886 NaN Inf
## # A tsibble: 6 x 5 [1M]
## date series_id value realtime_start realtime_end
## <mth> <chr> <dbl> <date> <date>
## 1 2000 Jan CUUR0000SETA01 143. 2025-07-04 2025-07-04
## 2 2000 Feb CUUR0000SETA01 143 2025-07-04 2025-07-04
## 3 2000 Mar CUUR0000SETA01 143. 2025-07-04 2025-07-04
## 4 2000 Apr CUUR0000SETA01 144. 2025-07-04 2025-07-04
## 5 2000 May CUUR0000SETA01 143. 2025-07-04 2025-07-04
## 6 2000 Jun CUUR0000SETA01 143. 2025-07-04 2025-07-04
## Plot variable not specified, automatically selected `.vars = value`
This plot shows the CPI for new vehicles from 2000 onward. Prices were fairly stable for nearly two decades, followed by a sharp and sustained increase beginning around 2021. The steep post-pandemic rise likely reflects production shortages, supply chain disruptions, and elevated consumer demand—all of which drove new car prices significantly higher.
This seasonal plot shows that new car CPI exhibits weak month-to-month seasonality. Most lines follow a relatively flat pattern within each year, with only minor upward drifts in some months. The real standout is the post-2020 period, where overall price levels shift significantly higher—but that’s a level change, not a seasonal effect.
The subseries plot confirms that new car CPI shows little to no meaningful seasonal variation. Monthly patterns are fairly flat across years, with no consistent peaks or dips by month. Instead, the dominant feature is a structural level shift after 2020, not cyclical behavior.
The STL decomposition shows that the main driver of new car CPI is the long-term trend, especially the sharp rise post-2020. The seasonal component is small and stable, confirming minimal monthly effects. The remainder is centered around zero with occasional spikes, suggesting the model captures most of the variation in the trend and seasonal components.
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 3.30 0.01
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 0
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 2
The ndiffs test indicates that the log-transformed new car CPI requires two differences to achieve stationarity.
This second differenced plot shows smooth, gradual growth before 2020 and a sharp spike during the pandemic period, which justifies the need for multiple differences. Taking two differences helps remove both trend and acceleration, preparing the data for ARIMA modeling.
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.00937 0.1
The second-order differenced log CPI series looks stationary, with residual fluctuations centered around zero. The ACF shows mild spikes but no strong autocorrelation, and the PACF cuts off quickly—supporting the use of a low-order ARIMA model. Overall, the transformation appears appropriate for model fitting.
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 auto 0.0000121 1113. -2216. -2216. -2199. <cpl [12]> <cpl [14]>
## 2 arima4 0.0000126 1112. -2215. -2215. -2201. <cpl [14]> <cpl [0]>
## 3 arima2 0.0000127 1111. -2214. -2214. -2200. <cpl [0]> <cpl [14]>
## 4 arima3 0.0000130 1107. -2207. -2207. -2197. <cpl [13]> <cpl [0]>
## 5 arima1 0.0000143 1095. -2184. -2184. -2174. <cpl [0]> <cpl [1]>
We trained several ARIMA models on second-order differenced new car CPI data (1953–2021). The auto-selected model had the lowest AICc, indicating the best in-sample fit among all candidates. This suggests that the automated specification effectively captures the data’s structure, making it a strong choice for forecasting.
The residuals from the auto-selected ARIMA model appear well-behaved: they’re centered around zero, roughly symmetric, and show no obvious patterns over time. The ACF of residuals lacks significant spikes, suggesting no strong autocorrelation remains. This supports the model’s adequacy and indicates it’s capturing the key structure in the new car CPI data.
This forecast vs. actual plot shows the ARIMA model’s prediction for new car CPI on the log scale. The forecast captures the upward trend but slightly overestimates future values, with actual data trending toward the lower bound of the confidence intervals. Overall, the model gives a reasonable projection, but the widening prediction bands highlight growing uncertainty over time.
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 arima4 0.0000120 1220. -2431. -2431. -2417. <cpl [14]> <cpl [0]>
## 2 arima2 0.0000121 1219. -2430. -2430. -2416. <cpl [0]> <cpl [14]>
## 3 arima3 0.0000124 1215. -2423. -2423. -2412. <cpl [13]> <cpl [0]>
## 4 auto 0.0000121 1215. -2420. -2420. -2402. <cpl [12]> <cpl [14]>
## 5 arima1 0.0000139 1199. -2393. -2393. -2382. <cpl [0]> <cpl [1]>
After extending the training data through 2023, model arima4 (ARIMA(2,1,0)(1,0,0)) now provides the best fit based on the lowest AICc value (−2431), indicating a notable improvement in model performance. The increase in data helped stabilize parameter estimates and sharpen model ranking. This update confirms that including more recent data enhances model selection and likely improves forecast reliability.
This forecast vs. actual plot for the arima4 model shows that the forecasted values (blue line) closely track the actual Log CPI (black line) for new cars. The actual observations mostly fall within the 80% and 95% prediction intervals, suggesting the model captures the recent dynamics well. Visually, there’s no sign of major divergence, so this model seems reasonably calibrated for short-term forecasting.
ARIMAX
## Month new.car.prices...United.States.
## 1 2004-01 98
## 2 2004-02 76
## 3 2004-03 91
## 4 2004-04 83
## 5 2004-05 100
## 6 2004-06 94
## # A tsibble: 6 x 3 [1M]
## month new_car_prices_united_states date
## <chr> <int> <mth>
## 1 2004-01 98 2004 Jan
## 2 2004-02 76 2004 Feb
## 3 2004-03 91 2004 Mar
## 4 2004-04 83 2004 Apr
## 5 2004-05 100 2004 May
## 6 2004-06 94 2004 Jun
## # A tsibble: 253 x 8 [1M]
## month new_car_prices_united_states date series_id value realtime_start
## <chr> <int> <mth> <chr> <dbl> <date>
## 1 2004-01 98 2004 Jan CUUR0000S… 138 2025-07-04
## 2 2004-02 76 2004 Feb CUUR0000S… 138. 2025-07-04
## 3 2004-03 91 2004 Mar CUUR0000S… 138. 2025-07-04
## 4 2004-04 83 2004 Apr CUUR0000S… 138. 2025-07-04
## 5 2004-05 100 2004 May CUUR0000S… 137. 2025-07-04
## 6 2004-06 94 2004 Jun CUUR0000S… 137. 2025-07-04
## 7 2004-07 95 2004 Jul CUUR0000S… 136. 2025-07-04
## 8 2004-08 96 2004 Aug CUUR0000S… 135. 2025-07-04
## 9 2004-09 77 2004 Sep CUUR0000S… 135. 2025-07-04
## 10 2004-10 69 2004 Oct CUUR0000S… 136. 2025-07-04
## # ℹ 243 more rows
## # ℹ 2 more variables: realtime_end <date>, real_price <dbl>
## Series: log_cpi
## Model: LM w/ ARIMA(1,2,2)(1,0,1)[12] errors
##
## Coefficients:
## ar1 ma1 ma2 sar1 sma1 log_real_price
## 0.3752 -0.6588 -0.3265 0.8858 -0.7062 -0.0013
## s.e. 0.1202 0.1322 0.1259 0.1595 0.2580 0.0016
##
## sigma^2 estimated as 0.00001244: log likelihood=1008.99
## AIC=-2003.98 AICc=-2003.5 BIC=-1979.68
This ARIMAX residual diagnostic plot looks solid overall. The residuals are centered around zero with constant variance, and the histogram appears roughly normal, suggesting the model’s errors are well-behaved. The ACF shows no major autocorrelation left in the residuals, meaning the ARIMAX model captured most of the serial dependence in the data.
The ARIMA model’s 12-month out-of-sample forecast (red) remains flat, suggesting stabilization in log CPI for new vehicles. It tracks recent levels but fails to capture the sharp post-2021 volatility. The forecast is statistically reasonable but lacks responsiveness to external drivers.
The ARIMAX model’s 12-month out-of-sample forecast (blue) looks almost the same as the ARIMA model, suggesting no significant effect of adding Google trend data.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `real_price = new_car_prices_united_states *
## (base_cpi_new/value)`.
## Caused by warning in `base_cpi_new / value`:
## ! longer object length is not a multiple of shorter object length
## Series: log_cpi
## Model: LM w/ ARIMA(1,2,2)(1,0,1)[12] errors
##
## Coefficients:
## ar1 ma1 ma2 sar1 sma1 log_real_price
## 0.3752 -0.6588 -0.3265 0.8858 -0.7062 -0.0013
## s.e. 0.1202 0.1322 0.1259 0.1595 0.2580 0.0016
##
## sigma^2 estimated as 0.00001244: log likelihood=1008.99
## AIC=-2003.98 AICc=-2003.5 BIC=-1979.68
## Warning in report.mdl_df(new_car_cpi_fit): Model reporting is only supported
## for individual models, so a glance will be shown. To see the report for a
## specific model, use `select()` and `filter()` to identify a single model.
## # A tibble: 5 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 arima1 0.0000139 1199. -2393. -2393. -2382. <cpl [0]> <cpl [1]>
## 2 arima2 0.0000121 1219. -2430. -2430. -2416. <cpl [0]> <cpl [14]>
## 3 arima3 0.0000124 1215. -2423. -2423. -2412. <cpl [13]> <cpl [0]>
## 4 arima4 0.0000120 1220. -2431. -2431. -2417. <cpl [14]> <cpl [0]>
## 5 auto 0.0000121 1215. -2420. -2420. -2402. <cpl [12]> <cpl [14]>
New Vehicle CPI & Google Search Data
In this section, we aim to test a Vector Autogregressive (VAR) model for New Vehicle CPI and Google search frequency to measure its accuracy in forecasting. We create a forecast from the training data and compare that to the figures of the original plot.
Unit-root tests
## [1] 2
## [1] 1
Run VAR:
##
## VAR Estimation Results:
## =========================
## Endogenous variables: new.cpi.d, new.car.prices.search.d
## Deterministic variables: const
## Sample size: 217
## Log Likelihood: 1180.344
## Roots of the characteristic polynomial:
## 0.9097 0.9097 0.8915 0.8915 0.8861 0.8861 0.8842 0.8842 0.8593 0.8511 0.8511 0.8352 0.8352 0.8217 0.8217 0.813 0.7774 0.7774 0.2832 0.02261
## Call:
## VAR(y = ys, p = p_opt, type = "const", season = 12L)
##
##
## Estimation results for equation new.cpi.d:
## ==========================================
## new.cpi.d = new.cpi.d.l1 + new.car.prices.search.d.l1 + new.cpi.d.l2 + new.car.prices.search.d.l2 + new.cpi.d.l3 + new.car.prices.search.d.l3 + new.cpi.d.l4 + new.car.prices.search.d.l4 + new.cpi.d.l5 + new.car.prices.search.d.l5 + new.cpi.d.l6 + new.car.prices.search.d.l6 + new.cpi.d.l7 + new.car.prices.search.d.l7 + new.cpi.d.l8 + new.car.prices.search.d.l8 + new.cpi.d.l9 + new.car.prices.search.d.l9 + new.cpi.d.l10 + new.car.prices.search.d.l10 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11
##
## Estimate Std. Error t value Pr(>|t|)
## new.cpi.d.l1 -0.351844548 0.072422496 -4.858 0.000002515912
## new.car.prices.search.d.l1 -0.001028050 0.002650068 -0.388 0.698511
## new.cpi.d.l2 -0.493517246 0.074470229 -6.627 0.000000000363
## new.car.prices.search.d.l2 0.005392229 0.002678833 2.013 0.045575
## new.cpi.d.l3 -0.345071652 0.079885063 -4.320 0.000025468178
## new.car.prices.search.d.l3 0.010468537 0.002654767 3.943 0.000114
## new.cpi.d.l4 -0.364033103 0.081395521 -4.472 0.000013473170
## new.car.prices.search.d.l4 0.003717817 0.002685912 1.384 0.167967
## new.cpi.d.l5 -0.227181579 0.080402856 -2.826 0.005239
## new.car.prices.search.d.l5 0.003133426 0.002723808 1.150 0.251470
## new.cpi.d.l6 -0.392618809 0.080903106 -4.853 0.000002575959
## new.car.prices.search.d.l6 -0.000006646 0.002711436 -0.002 0.998047
## new.cpi.d.l7 -0.239867708 0.080026796 -2.997 0.003097
## new.car.prices.search.d.l7 -0.002772125 0.002706089 -1.024 0.306983
## new.cpi.d.l8 -0.276305667 0.078279355 -3.530 0.000525
## new.car.prices.search.d.l8 -0.002778146 0.002734083 -1.016 0.310901
## new.cpi.d.l9 -0.249651344 0.071201671 -3.506 0.000570
## new.car.prices.search.d.l9 -0.000150883 0.002782615 -0.054 0.956816
## new.cpi.d.l10 -0.158607047 0.071573660 -2.216 0.027911
## new.car.prices.search.d.l10 -0.004791734 0.002648021 -1.810 0.071988
## const 0.000123712 0.000241502 0.512 0.609079
## sd1 -0.001642847 0.001227715 -1.338 0.182495
## sd2 -0.000343074 0.001364936 -0.251 0.801824
## sd3 -0.000657730 0.001384411 -0.475 0.635278
## sd4 -0.000406746 0.001444803 -0.282 0.778623
## sd5 -0.001122140 0.001453543 -0.772 0.441097
## sd6 -0.002989324 0.001372364 -2.178 0.030654
## sd7 0.000501228 0.001398603 0.358 0.720470
## sd8 0.002521717 0.001388110 1.817 0.070888
## sd9 0.001504080 0.001332576 1.129 0.260486
## sd10 0.002034929 0.001312761 1.550 0.122823
## sd11 0.003351020 0.001201169 2.790 0.005826
##
## new.cpi.d.l1 ***
## new.car.prices.search.d.l1
## new.cpi.d.l2 ***
## new.car.prices.search.d.l2 *
## new.cpi.d.l3 ***
## new.car.prices.search.d.l3 ***
## new.cpi.d.l4 ***
## new.car.prices.search.d.l4
## new.cpi.d.l5 **
## new.car.prices.search.d.l5
## new.cpi.d.l6 ***
## new.car.prices.search.d.l6
## new.cpi.d.l7 **
## new.car.prices.search.d.l7
## new.cpi.d.l8 ***
## new.car.prices.search.d.l8
## new.cpi.d.l9 ***
## new.car.prices.search.d.l9
## new.cpi.d.l10 *
## new.car.prices.search.d.l10 .
## const
## sd1
## sd2
## sd3
## sd4
## sd5
## sd6 *
## sd7
## sd8 .
## sd9
## sd10
## sd11 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.003287 on 185 degrees of freedom
## Multiple R-Squared: 0.4732, Adjusted R-squared: 0.385
## F-statistic: 5.362 on 31 and 185 DF, p-value: 0.000000000000127
##
##
## Estimation results for equation new.car.prices.search.d:
## ========================================================
## new.car.prices.search.d = new.cpi.d.l1 + new.car.prices.search.d.l1 + new.cpi.d.l2 + new.car.prices.search.d.l2 + new.cpi.d.l3 + new.car.prices.search.d.l3 + new.cpi.d.l4 + new.car.prices.search.d.l4 + new.cpi.d.l5 + new.car.prices.search.d.l5 + new.cpi.d.l6 + new.car.prices.search.d.l6 + new.cpi.d.l7 + new.car.prices.search.d.l7 + new.cpi.d.l8 + new.car.prices.search.d.l8 + new.cpi.d.l9 + new.car.prices.search.d.l9 + new.cpi.d.l10 + new.car.prices.search.d.l10 + const + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 + sd8 + sd9 + sd10 + sd11
##
## Estimate Std. Error t value Pr(>|t|)
## new.cpi.d.l1 5.024774 2.005527 2.505 0.013092 *
## new.car.prices.search.d.l1 -0.262183 0.073386 -3.573 0.000451 ***
## new.cpi.d.l2 4.520455 2.062233 2.192 0.029626 *
## new.car.prices.search.d.l2 -0.131389 0.074182 -1.771 0.078179 .
## new.cpi.d.l3 5.080553 2.212180 2.297 0.022760 *
## new.car.prices.search.d.l3 -0.103484 0.073516 -1.408 0.160916
## new.cpi.d.l4 -0.192388 2.254008 -0.085 0.932073
## new.car.prices.search.d.l4 -0.123981 0.074378 -1.667 0.097228 .
## new.cpi.d.l5 3.442750 2.226519 1.546 0.123754
## new.car.prices.search.d.l5 -0.031936 0.075428 -0.423 0.672497
## new.cpi.d.l6 2.323831 2.240372 1.037 0.300972
## new.car.prices.search.d.l6 -0.059462 0.075085 -0.792 0.429414
## new.cpi.d.l7 6.369936 2.216105 2.874 0.004522 **
## new.car.prices.search.d.l7 0.202396 0.074937 2.701 0.007558 **
## new.cpi.d.l8 2.166523 2.167715 0.999 0.318882
## new.car.prices.search.d.l8 0.208092 0.075712 2.748 0.006581 **
## new.cpi.d.l9 7.191208 1.971720 3.647 0.000345 ***
## new.car.prices.search.d.l9 0.010558 0.077056 0.137 0.891163
## new.cpi.d.l10 2.395606 1.982021 1.209 0.228333
## new.car.prices.search.d.l10 0.070027 0.073329 0.955 0.340843
## const -0.009314 0.006688 -1.393 0.165361
## sd1 -0.003112 0.033998 -0.092 0.927170
## sd2 0.011498 0.037798 0.304 0.761315
## sd3 0.035316 0.038337 0.921 0.358148
## sd4 -0.032924 0.040010 -0.823 0.411616
## sd5 -0.016109 0.040252 -0.400 0.689467
## sd6 -0.051922 0.038004 -1.366 0.173518
## sd7 -0.114936 0.038730 -2.968 0.003398 **
## sd8 -0.084428 0.038440 -2.196 0.029308 *
## sd9 -0.064054 0.036902 -1.736 0.084263 .
## sd10 -0.063543 0.036353 -1.748 0.082134 .
## sd11 -0.016300 0.033263 -0.490 0.624692
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.09101 on 185 degrees of freedom
## Multiple R-Squared: 0.3504, Adjusted R-squared: 0.2416
## F-statistic: 3.22 on 31 and 185 DF, p-value: 0.0000004912
##
##
##
## Covariance matrix of residuals:
## new.cpi.d new.car.prices.search.d
## new.cpi.d 0.00001080 -0.00002342
## new.car.prices.search.d -0.00002342 0.00828352
##
## Correlation matrix of residuals:
## new.cpi.d new.car.prices.search.d
## new.cpi.d 1.00000 -0.07831
## new.car.prices.search.d -0.07831 1.00000
## AIC(n)
## 10
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var_fit
## Chi-squared = 11.556, df = 8, p-value = 0.1721
Forecasting with VAR
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var_fit
## Chi-squared = 13.693, df = 8, p-value = 0.09013
Impulse Response Functions
## [1] "new.cpi.d" "new.car.prices.search.d"
Granger Causality
## $Granger
##
## Granger causality H0: new.cpi.d do not Granger-cause
## new.car.prices.search.d
##
## data: VAR object var_fit
## F-Test = 5.1341, df1 = 10, df2 = 392, p-value = 0.000000456
##
##
## $Instant
##
## H0: No instantaneous causality between: new.cpi.d and
## new.car.prices.search.d
##
## data: VAR object var_fit
## Chi-squared = 1.9016, df = 1, p-value = 0.1679
## $Granger
##
## Granger causality H0: new.car.prices.search.d do not Granger-cause
## new.cpi.d
##
## data: VAR object var_fit
## F-Test = 1.6387, df1 = 10, df2 = 392, p-value = 0.0936
##
##
## $Instant
##
## H0: No instantaneous causality between: new.car.prices.search.d and
## new.cpi.d
##
## data: VAR object var_fit
## Chi-squared = 1.9016, df = 1, p-value = 0.1679