library(fpp3)
library(fredr)
library(tidyverse)
library(patchwork)
library(knitr)
library(kableExtra)
library(gridExtra)
library(writexl)
library(tseries)
library(forecast)
library(quantmod)
Sys.setlocale("LC_TIME", "en_US.UTF-8")
remove(list=ls())
fredr_set_key("11c23965cf4414274d293d0c36ec7507") Discussion3_Stationarity
1. Time series stationary
1.1 TSMC closing price
Load data
# load data
getSymbols("TSM", src = "yahoo", from = "2021-01-01")[1] "TSM"
tsm <- tibble(
date = index(TSM),
close = as.numeric(TSM$TSM.Close)
) %>%
as_tsibble(index = date)Time series plot
It is clear from a simple eyeball test that TSMC’s stock price is not stationary, especially due to the obvious instability in its mean.
autoplot(tsm,close)+
labs(title = "TSMC closing price",
subtitle="Daily,From 2021/01/04 to 2026/02/06",
x = "Year",
y = "Price(USD)",
caption="Source:Yahoo"
)KPSS test
Using the KPSS test to assess stationarity :
Null hypothesis: TSMC stock price series is stationary.
Alternative hypothesis : TSMC stock price series is non‑stationary.
Since the KPSS p‑value is only 0.01, we reject the null hypothesis at the 95% confidence level. Therefore, the TSMC stock price series is non‑stationary.
tsm |>
features(close, unitroot_kpss)1.2 Unemployment Rate in U.S.
Load data
unrate <- fredr(series_id = "UNRATE",
observation_start = as.Date("2021-01-01"),
observation_end = as.Date("2025-09-01")
) |>
transmute(Month = yearmonth(date), value) |>
as_tsibble(index = Month)Time Series Plot
The U.S. unemployment rate series is non‑stationary because its mean clearly changes over time. The downward trend in 2021–2022, the flat period in 2022–2024, and the upward movement in 2024–2025 all indicate violations of mean stationarity.”
autoplot(unrate, value) +
labs(title = "Unemployment rate in U.S.",
subtitle="Monthly,From Jan-2021 to Sep-2025",
x = "Month",
y = "Unemployment rate",
caption="Source:Fred"
)KPSS test
Using the KPSS test to assess stationarity :
Null hypothesis: Unemployment Rate in U.S. series is stationary.
Alternative hypothesis : Unemployment Rate in U.S. series is non‑stationary.
Since the KPSS p‑value is 0.04 , we reject the null hypothesis at the 95% confidence level. Therefore, the Unemployment Rate in U.S. series is non‑stationary.
unrate |>
features(value, unitroot_kpss)1.3 Industrial Production: Utilities: Electric and Gas Utilities
Load data
ipg <- fredr(series_id = "IPG2211A2N",
observation_start = as.Date("2021-01-01"),
observation_end = as.Date("2025-12-01")
) |>
transmute(Month = yearmonth(date), value) |>
as_tsibble(index = Month)Time series plot
From the IPG plot, the mean, variance, and autocorrelation structure all appear stable based on a simple visual inspection.
autoplot(ipg, value) +
labs(title = "Industrial Production: Utilities: Electric and Gas Utilities",
subtitle="Monthly,From Jan-2021 to Dec-2025",
x = "Month",
y = "Index 2017=100",
caption="Source:Fred"
)ADF & KPSS test
To verify that the series is stationary, I applied both the ADF test and the KPSS test as a form of double validation.
Using the ADF test to assess stationarity :
Null hypothesis: Industrial Production: Utilities: Electric and Gas Utilities in U.S. series is non‑stationary.
Alternative hypothesis : Industrial Production: Utilities: Electric and Gas Utilities series is stationary.
Since the ADF p‑value is 0.01 , we reject the null hypothesis at the 95% confidence level. Therefore, the Industrial Production: Utilities: Electric and Gas Utilities series is stationary.
adf.test(ipg$value)Warning in adf.test(ipg$value): p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: ipg$value
Dickey-Fuller = -5.9577, Lag order = 3, p-value = 0.01
alternative hypothesis: stationary
Using the KPSS test to assess stationarity :
Null hypothesis: Industrial Production: Utilities: Electric and Gas Utilities series is stationary.
Alternative hypothesis : Industrial Production: Utilities: Electric and Gas Utilities in U.S. series is non‑stationary.
Since the KPSS p‑value is 0.1, we do not have sufficient evidence to reject the null hypothesis at the 95% confidence level. Therefore, there is no evidence suggesting that the series is non‑stationary.
ipg |>
features(value, unitroot_kpss)2. ACF & PACF in time series
2.1 TSMC closing price
Make time series stationary
I applied differencing to the TSMC stock price data. Visually, aside from a few outliers, the series generally appears to be stationary. I confirmed this observation using the ADF test, which yielded a p-value of 0.01. Since the p-value is less than 0.05, we reject the null hypothesis at the 95% confidence level. This provides sufficient evidence to conclude that the Change in TSMC price is stationary.
tsm_diff <- tsm |>
mutate(diff_close = difference(close))
autoplot(tsm_diff,diff_close) +
labs(title = "Change in TSMC price",
x = "Year",
y = "Price (USD)")Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).
adf.test(na.omit(tsm_diff$diff_close))Warning in adf.test(na.omit(tsm_diff$diff_close)): p-value smaller than printed
p-value
Augmented Dickey-Fuller Test
data: na.omit(tsm_diff$diff_close)
Dickey-Fuller = -9.9277, Lag order = 10, p-value = 0.01
alternative hypothesis: stationary
ACF & PACF plot
From the ACF plot, we can observe that aside from Lag 0 where perfect correlation is expected, the majority of lags fall within the significance bounds marked by the dashed lines. The PACF plot displays a similar pattern; while a few lags slightly exceed the limits, most remain within the interval. Consequently, the differenced data exhibits characteristics of White Noise, suggesting that the original series follows a Random Walk. This implies that short-term stock price fluctuations are stochastic and difficult to predict based solely on historical price patterns.
clean_tsm <- na.omit(tsm_diff$diff_close)
par(mfrow = c(1,3))
plot.ts(clean_tsm, main = "Change in Price", ylab = "USD")
acf(clean_tsm, main = "ACF", lag.max = 20)
pacf(clean_tsm, main = "PACF", lag.max = 20)2.2 Unemployment Rate in U.S.
Make time series stationary
Regarding the unemployment rate, due to a significant downward trend that caused instability in the mean, I applied first differencing to the data. Visually, aside from a few months with higher volatility, most fluctuations in the differenced series fall within a range of 0.2%. Furthermore, the Phillips-Perron test yielded a p-value of 0.01, allowing us to reject the null hypothesis. This confirms that the differenced data is stationary.
unrate_diff <- unrate |>
mutate(diff_unrate = difference(value))
autoplot(unrate_diff,diff_unrate)+
labs(title = "Change in Unemployment rate",
x = "Year",
y = " Unemployment rate")Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).
clean_unrate <- na.omit(unrate_diff$diff_unrate)
pp.test(clean_unrate)Warning in pp.test(clean_unrate): p-value smaller than printed p-value
Phillips-Perron Unit Root Test
data: clean_unrate
Dickey-Fuller Z(alpha) = -72.337, Truncation lag parameter = 3, p-value
= 0.01
alternative hypothesis: stationary
ACF & PACF plot
Observing the charts, the ACF exhibits a wave-like tailing off pattern, while the PACF shows a distinct cut-off. Since the PACF spikes at Lag 2 and then converges, the differenced series follows an AR(2) process. Therefore, we adopt an ARIMA(2, 1, 0) model for the original series.
par(mfrow = c(1,3))
plot.ts(clean_unrate, main = "Change in Unployment rate", ylab = "%")
acf(clean_unrate, main = "ACF", lag.max = 20)
pacf(clean_unrate, main = "PACF", lag.max = 20)2.3 Industrial Production: Utilities: Electric and Gas Utilities
ACF & PACF plot
In Part 1, the ADF and KPSS tests indicated that the series is stationary. However, the ACF plot shows a slow, wave-like decay, which reveals a strong seasonal pattern and is characteristic of an AR process.
Furthermore, the PACF plot displays a sharp cut-off after Lag 2, with subsequent lags mostly falling within the significance bounds.
Since the ACF tails off and the PACF cuts off, this suggests that an AR model is the most appropriate fit for this time series
ipg_ts <- ts(ipg$value, start = c(2021, 1), frequency = 12)
par(mfrow = c(1,3))
plot(ipg_ts, main = "IPG Time Series", ylab = "Value", xlab = "Time")
acf(as.numeric(ipg_ts), main = "ACF of IPG", lag.max = 20)
pacf(as.numeric(ipg_ts), main = "PACF of IPG", lag.max = 20)3. Decomposition in time series
3.1 TSMC closing price
From this graph, we can see that the changes in TSMC’s stock price are mainly caused by the trend, which is clearly going up.
Even though the season_year looks like a seasonal pattern, its value is very small compared to the trend.
This matches our result from differencing: after removing this strong trend, the data becomes just like random white noise.
tsm |>
fill_gaps() |> fill(close, .direction = "down") |>
model(
STL(close ~ season(window = "periodic"),
robust = TRUE)
) |>
components() |>
autoplot() +
labs(title = "STL Decomposition: TSMC")3.2 Unemployment Rate in U.S.
The decomposition shows that the unemployment rate is mainly affected by the trend, which makes the data non-stationary.
We can also see some large fluctuations in the remainder, similar to what we saw after differencing.
The scale of the seasonality is very small, so it doesn’t have much effect. This is likely because the data downloaded from FRED was already seasonally adjusted.
unrate |>
model(
STL(value ~ season(window = "periodic"),
robust = TRUE)
) |>
components() |>
autoplot() +
labs(title = "STL Decomposition: Unemployment Rate")3.3 Industrial Production: Utilities: Electric and Gas Utilities
The decomposition of the IPG series shows a distinct pattern compared to the previous two. While the upward trend is mild (increasing from 102 to 110), the seasonality has a massive impact on the data.
Although initial statistical tests indicated the data was stationary (which is why no differencing was applied), the ACF plot and this decomposition clearly reveal a strong seasonal pattern.
This suggests a conflict between the test results and the visual evidence. While the tests failed to reject stationarity, the strong seasonality observed here implies that the data is not strictly stationary. For future research, applying seasonal differencing would be highly recommended to remove this pattern and obtain a better dataset for modeling.”
ipg |>
model(
STL(value ~ season(window = "periodic"),
robust = TRUE)
) |>
components() |>
autoplot() +
labs(title = "STL Decomposition: IPG")