INTRODUCTION TO STATISTICAL DATA ANALYSIS
Time Series Analysis
Work with a Time Series Dataset:
Install Packages and Load Libraries
library(ggplot2)
library(forecast)
library(tseries)
Set a Working Directory
setwd("~/R_TRAINING")
Import a Time Series Dataset and Visualize it
cpi_data<-read.csv("cpi_data.csv")
head(cpi_data)
DATE CPI
1 1/1/1967 34.8
2 2/1/1967 34.7
3 3/1/1967 34.7
4 4/1/1967 34.6
5 5/1/1967 34.6
6 6/1/1967 34.9
Convert to time series object
ts_CPI<-ts(CPI,start =c(1967,1), frequency = 12)
head(ts_CPI)
Jan Feb Mar Apr May Jun
1967 34.8 34.7 34.7 34.6 34.6 34.9
#View(CPI)
Plot time series data
plot(ts_CPI, main="Time Series of CPI", col="magenta", lwd=2)

Decompose the time series into trend, seasonality, and
residuals.
plot(decompose(ts_CPI))

Check Stationarity using ADF
adf.test(ts_CPI)
Augmented Dickey-Fuller Test
data: ts_CPI
Dickey-Fuller = -1.7508, Lag order = 8, p-value = 0.6838
alternative hypothesis: stationary
- if p-value< 0.05, the series is stationary
- If p-value > 0.05, the data is non-stationary. Take the first
difference
Take the first difference and test for stationarity
diff_data<-diff(ts_CPI)
plot(diff_data, main="Differenced Time Series", col="tan2")

adf.test(diff_data)
Augmented Dickey-Fuller Test
data: diff_data
Dickey-Fuller = -5.1378, Lag order = 8, p-value = 0.01
alternative hypothesis: stationary
Analyze autocorrelation using ACF.
Plot the ACF
par(mfrow=c(1,2))
acf(ts_CPI, main="ACF Plot")
pacf(ts_CPI, main="PACF Plot")

par(mfrow=c(1,1))
Build and evaluate an ARIMA model for forecasting.
Fit ARIMA
model<-auto.arima(ts_CPI)
summary(model)
Series: ts_CPI
ARIMA(1,2,1)(0,0,2)[12]
Coefficients:
ar1 ma1 sma1 sma2
0.2005 -0.7875 -0.1435 -0.1493
s.e. 0.0643 0.0464 0.0390 0.0388
sigma^2 = 0.1455: log likelihood = -305.59
AIC=621.19 AICc=621.28 BIC=643.75
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.008997016 0.3797776 0.2660481 0.01059324 0.2197267 0.05301554
ACF1
Training set -0.003908124
Model Diagnostics
tsdiag(model)

checkresiduals(model)

Ljung-Box test
data: Residuals from ARIMA(1,2,1)(0,0,2)[12]
Q* = 18.379, df = 20, p-value = 0.5624
Model df: 4. Total lags used: 24
Forecasting
forecast_values<-forecast(model, h=12)
head(forecast_values)
$method
[1] "ARIMA(1,2,1)(0,0,2)[12]"
$model
Series: ts_CPI
ARIMA(1,2,1)(0,0,2)[12]
Coefficients:
ar1 ma1 sma1 sma2
0.2005 -0.7875 -0.1435 -0.1493
s.e. 0.0643 0.0464 0.0390 0.0388
sigma^2 = 0.1455: log likelihood = -305.59
AIC=621.19 AICc=621.28 BIC=643.75
$level
[1] 80 95
$mean
Jan Feb Mar Apr May Jun Jul Aug
2023 319.3141 320.0558 320.9268 321.9420
2024 326.4583 327.3673 328.4571 329.4843
Sep Oct Nov Dec
2023 322.7623 323.5906 324.5323 325.5392
2024
$lower
80% 95%
May 2023 318.8252 318.5664
Jun 2023 319.2095 318.7616
Jul 2023 319.7379 319.1085
Aug 2023 320.4088 319.5972
Sep 2023 320.8763 319.8778
Oct 2023 321.3397 320.1482
Nov 2023 321.9033 320.5115
Dec 2023 322.5180 320.9187
Jan 2024 323.0310 321.2166
Feb 2024 323.5198 321.4830
Mar 2024 324.1756 321.9091
Apr 2024 324.7552 322.2518
$upper
80% 95%
May 2023 319.8030 320.0618
Jun 2023 320.9021 321.3501
Jul 2023 322.1158 322.7452
Aug 2023 323.4751 324.2867
Sep 2023 324.6484 325.6468
Oct 2023 325.8415 327.0331
Nov 2023 327.1614 328.5532
Dec 2023 328.5604 330.1597
Jan 2024 329.8856 331.7000
Feb 2024 331.2148 333.2515
Mar 2024 332.7386 335.0051
Apr 2024 334.2133 336.7167
Plot the Forecasted Values
plot(forecast_values, main = "ARIMA FORECAST CPI", col = "magenta")
