library(readr)
library(tidyverse)
library(tsibble)
library(lubridate)
library(forecast)
library(tseries)
library(seasonal)
# setwd() / getwd() : define & query working directory
# read_csv(), read.csv() : import data
# ts(), plot.ts(), autoplot() : create & plot ts objects
# arima.sim() : simulate ARMA
# abline(), lm() : add mean / trend lines
# ARMAacf() : theoretical ACF/PACF
# acf(), pacf(), tsdisplay() : sample ACF/PACF
# polyroot() : roots → causality / invertibility
# BoxCox(), BoxCox.lambda() : Box-Cox transform
# diff(ts, lag = k) : lag-k differencing
# ar(..., method = "yw") : Yule–Walker estimation
# arima(), forecast::Arima() : MLE fitting + AIC/AICc
ppi_raw <- read_csv("/Users/kadenpatenaude/Desktop/PSTAT 274/SemiconductorPPI.csv", col_types = cols()) |>
rename(date = observation_date,
ppi = PCU33443344) |>
mutate(date = ymd(date))
ppi_ts <- ts(ppi_raw$ppi,
start = c(year(min(ppi_raw$date)), month(min(ppi_raw$date))),
frequency = 12)
I have chosen The Producer Price Index by Industry: Semiconductor and Other Electronic Component Manufacturing (PCU33443344.csv) which is a monthly, not-seasonally-adjusted price index published by the U.S. Bureau of Labor Statistics (base = 100 at December 1984) at https://fred.stlouisfed.org/series/PCU33443344. It currently spans December 1984 – March 2025 (486 monthly observations), with another coming May 15th 2025.
I chose this dataset because semiconductors sit at the core of nearly every modern supply chain. Tracking and modeling their producer-price inflation helps: gauge cost-push inflation pressures that may spill into the CPI, support real-time business planning for electronics manufacturers, and provide an economic view on any indicators of supply-chain stress. The objective is to build a conservative stochastic model of the monthly percentage change in the index and generate short-term forecasts (1–12 months ahead).
autoplot(ppi_ts) +
labs(title = "Semiconductor PPI (NSA, Dec 1984 = 100)",
y = "Index level")
# STL to visualize seasonality, though its weakly seasonal due to its decomposition
ppi_stl <- stl(ppi_ts, s.window = "periodic")
autoplot(ppi_stl) + labs(title = "STL decomposition")
Because of the a pronounced upward drift until the early 2000s, mild decline, then accelerated growth post-2020, we can notice the breaks and level shifts around 2001 and 2020, both times where the technology sectors had major changes, with the dot-com boom in 2001 and teh 2020 COVID supply chain shock.
# Because the prices are non-stationary, we can log-transform to stabilize variance and difference to remove the stochastic trend.
log_ppi <- log(ppi_ts)
d_log_ppi <- diff(log_ppi, lag = 1) # monthly log-returns
# Augmented Dickey-Fuller on levels and differences
adf_level <- adf.test(log_ppi, k = 12) # allow annual lags
adf_diff <- adf.test(na.omit(d_log_ppi), k = 12)
adf_level$p.value
## [1] 0.9208342
adf_diff$p.value
## [1] 0.01
#levels fail to reject unit root (p > 0.1); first-differences are stationary (p < 0.01) so we'll just let d=1.
tsdisplay(d_log_ppi,
lag.max = 48,
main = "Monthly log-growth: ACF / PACF")
fit011_011 <- Arima(log_ppi,
order = c(0,1,1),
seasonal = list(order = c(0,0,1), period = 12),
method = "ML")
fit011_011
## Series: log_ppi
## ARIMA(0,1,1)(0,0,1)[12]
##
## Coefficients:
## ma1 sma1
## 0.1815 0.1070
## s.e. 0.0400 0.0422
##
## sigma^2 = 2.18e-05: log likelihood = 1908.18
## AIC=-3810.35 AICc=-3810.3 BIC=-3797.81
checkresiduals(fit011_011, lag = 36)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(0,0,1)[12]
## Q* = 60.891, df = 34, p-value = 0.003099
##
## Model df: 2. Total lags used: 36
The sample ACF shows a gradual taper and a clear seasonal spike at lag 12, indicating one seasonal MA term. The sample PACF has a single pronounced spike at lag 1, then small values, which matches an MA(1). A smaller spike at lag 12 supports the seasonal MA(1).
C. III ONLY: The graph shows clearly seasonal spikes around lags 1, 12, and 14 which is obvious of seasonality, but the ACF values stay positive and decaying which means the series is non-stationary, but the sign of any trend is not encoded in the ACF.
Let \[ \bar x_n = \frac{1}{n}\sum_{t=1}^n x_t ,\qquad \hat\gamma_n(h)=\frac{1}{n}\sum_{t=1}^{\,n-h}(x_t-\bar x_n)(x_{t+h}-\bar x_n),\qquad \hat r_1=\frac{\hat\gamma_n(1)}{\hat\gamma_n(0)} . \]
And let \(x_{n+1}=\bar x_n\)
then \[ \bar x_{\,n+1} =\frac{n\bar x_n+\bar x_n}{n+1} =\bar x_n . \]
So the mean does not change.
Checking the Autocovariance at lag 1:
Because \(x_{n+1}-\bar x_n = 0\), \[ \tilde\gamma_{\,n+1}(1) =\frac{1}{n+1}\sum_{t=1}^{n}(x_t-\bar x_n)(x_{t+1}-\bar x_n) =\frac{n}{n+1}\hat\gamma_n(1). \]
and the variance (lag 0)
\[ \tilde\gamma_{\,n+1}(0) =\frac{1}{n+1}\sum_{t=1}^{n+1}(x_t-\bar x_n)^2 =\frac{n}{n+1}\hat\gamma_n(0). \]
we have a new sample ACF
\[ \tilde r_1 =\frac{\tilde\gamma_{\,n+1}(1)}{\tilde\gamma_{\,n+1}(0)} =\frac{\tfrac{n}{n+1}\hat\gamma_n(1)}{\tfrac{n}{n+1}\hat\gamma_n(0)} =\hat r_1 . \]
so B (\(\tilde r_1 = r_1\)) is correct.
Let:
\[ X_t = 0.8\,X_{t-1} + Z_t, \qquad Z_t \sim \mathcal N(0,1), \]
and we want to show
\[ \bar X \;=\; \frac14\sum_{t=1}^{4} X_t ,\qquad \operatorname{Var}(\bar X)\;=\;\frac1{4^2}\,\operatorname{Var}\!\Bigl(\sum_{t=1}^{4} X_t\Bigr). \]
For an AR(1) with coefficient \(\phi=0.8\) and innovation variance \(\sigma_Z^2=1\):
\[ \gamma(0)=\frac{\sigma_Z^2}{1-\phi^{2}} =\frac{1}{1-0.8^{2}} =\frac{25}{9},\qquad \gamma(h)=\phi^{\,h}\,\gamma(0). \]
Using the general formula for \(\operatorname{Var}(\bar X)\)
\[ \operatorname{Var}(\bar X)= \frac1{n^2}\Bigl[n\,\gamma(0) + 2\sum_{h=1}^{n-1}(n-h)\,\gamma(h)\Bigr], \quad n=4. \]
we can compute each term:
\[ 2\sum_{h=1}^{3}(4-h)\,\gamma(h) =2\!\left[3\gamma(1)+2\gamma(2)+1\gamma(3)\right] =2\!\left[3\cdot\frac{20}{9}+2\cdot\frac{16}{9}+\frac{64}{45}\right] =\frac{524}{45}. \]
and together \[ \operatorname{Var}(\bar X) =\frac1{16}\Bigl(\frac{100}{9}+\frac{524}{45}\Bigr) =\frac1{16}\cdot\frac{774}{45} =\frac{387}{180} =\frac{43}{20} =2.15. \]
Sample size \(n = 100\)
\(\hat\rho(1)=0.438,\;
\hat\rho(2)=0.145\)
First lets compute the theoretical autocorrelations for an MA(1)
\[ \rho(1)=\frac{\theta}{1+\theta^{2}} =\frac{0.6}{1+0.6^{2}} =0.441176, \qquad \rho(k)=0 \text{ for } k\ge 2 . \]
And teh variances of the sample ACF using Bartlett’s formula
For an MA(1)
\(w_{11}=1-3\rho(1)^{2}+4\rho(1)^{4}\)
\(w_{ii}=1+2\rho(1)^{2}\) for \(i>1\)
\[ \begin{aligned} \rho(1)^{2} &=0.194633,\\ \rho(1)^{4} &=0.037012,\\ w_{11} &=1-3(0.194633)+4(0.037012)=0.5641,\\ w_{22} &=1+2(0.194633)=1.3893. \end{aligned} \]
Sampling variances:
\[ \operatorname{Var}\!\bigl[\hat\rho(1)\bigr]=\frac{w_{11}}{n}=0.00564,\qquad \operatorname{Var}\!\bigl[\hat\rho(2)\bigr]=\frac{w_{22}}{n}=0.01389. \]
Standard errors:
\[ \text{SE}_{1}=0.0751,\qquad \text{SE}_{2}=0.1179. \]
95 % confidence interval is then:
\[ \hat\rho(h)\;\pm\;1.96\; \text{SE}_{h} \]
and checking for consistency with the MA(1) model
Since both conditions hold, the data is consistent with an MA(1) model with \(\theta = 0.6\)