Abstract
In this workshop we learn how to design/calibrate a seasonal ARIMA model (SARIMA) for a time series.You will work in RStudio. Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop.
At the beginning of the R Notebook write Workshop 2 - Financial Econometrics II and your name (as we did in previous workshop).
Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop. More specifically, you have to:
Replicate all the R Code along with its output.
You have to do whatever is asked in the workshop. It can be: a) Responses to specific questions and/or do an exercise/challenge.
Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.
You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.
A seasonal ARIMA model, also called ARIMA/SARIMA model, is usually applied to monthly or quarterly time series. This model is very effective for any of the following type of economic/financial series:
Historical monthly or quarterly sales data
Historical monthly or quarterly price data
Historical monthly or quarterly data related to any variable of any financial statement. For example: cost of good sold, sales and general administrative expenses, income tax, EBIT, etc.
Historical monthly or quarterly data related to any macro economic variable. For example: inflation indexes, GDP indexes, exchange rates, etc.
These models are not so effective for daily stock prices or financial indexes. The main reason is that one of the most important conditions for a financial market to exist is that no body can predict stock prices since the log of stock prices are supposed to behave as a Random Walk! However, these models are useful in this context compared to just doing intuitive guesses.
Each ARIMA-SARIMA model needs to be CALIBRATED. The general steps to calibrate a model are the following:
I recommend that if you have monthly or quarterly data of any financial or economic variable, start testing whether the seasonal difference of the log of the series is stationary. If you have daily data, it is hard to model for seasonality. In the case of daily data, start testing whether the first difference of the log is stationary.
In this step you start with a version of the series.
For next steps we will assume that we decided to use the seasonal difference of the log of the series.
Test whether the seasonal difference is stationary. If so, continue to next step. If not, then change the variable to the first difference of the log of the series, which is the % change for each period. Most of the economic and financial series become stationary with the first log difference.
Run the ACF and PACF plots to identify the ARIMA-SARIMA parameters p, d, q, P, D, Q. The ACF is the auto-correlation plot, and the PACF is the partial autocorrelation plot. Both plots show auto*correlations between the variable and its own lags.
An ARIMA/SARIMA model has the following “parameters” that need to be defined:
arima(p,d,q) sarima(P,D,Q,#Periods)
p: refers to the number of autoregressive (AR) terms. Usually this parameter is either 0,1 or 2.
d: refers to how many first differences where needed to the series in order to make the series stationary. Usually this parameter is either 0 or 1.
q: refers to the number of moving average (MA) terms in the model. Usually this parameter is either 0,1 or 2.
P: refers to the number of SEASONAL autoregressive terms. Usually this parameter is either 0 or 1.
D: refers to how many SEASONAL differences were needed to the series to make the series stationary. Usually this parameter is either 0 or 1.
Q: refers to the number of SEASONAL moving average terms. Usually this parameter is either 0 or 1.
#periods: refers to the number of periods in the year. For example, if the data is monthly, then #periods=12.
These parameters are usually expressed as: ARIMA(p,d,q) SARIMA(P,D,Q, #periods).
When you identify the first values of these paremeters, then you will have your first CALIBRATION of the model.
Estimate/Run the ARIMA-SARIMA model
Run the ACF and PACF of the residuals/errors of the model to check whether the errors is a white noise series. In other words, if there is no significant autocorrelations of the errors.
If there is one or more significant autocorrelations, we can include other term(s) in the ARIMA-SARIMA model, and go back to step 4.
If there the errors seem like a white noise (no signficant autocorrelations), then we can continue and finish the calibration process.
Interpret the model
Run a forecast using the model
Let’s start practicing with these models.
We will work with a simple dataset that has the number of air passengers of an US Airline. This series might have a strong seasonality since people used to fly in vacation periods, which is the the same each year.
Download the excel file from a web site:
download.file("http://www.apradie.com/ec2004/air2.xlsx", "air2.xlsx", mode="wb")
In order to read Excel files, we need the readxl package:
library(readxl)
## Warning: package 'readxl' was built under R version 4.0.5
Now, we read the data from the Excel file:
<- read_excel("air2.xlsx")
dataset # I familiarize with the data
head(dataset)
## # A tibble: 6 x 4
## air time t month
## <dbl> <dbl> <dbl> <dttm>
## 1 112 1949 1 1949-01-01 00:00:00
## 2 118 1949. 2 1949-02-01 00:00:00
## 3 132 1949. 3 1949-03-01 00:00:00
## 4 129 1949. 4 1949-04-01 00:00:00
## 5 121 1949. 5 1949-05-01 00:00:00
## 6 135 1949. 6 1949-06-01 00:00:00
In order to perform time-series analysis, it is better to use xts objects. We can construct this by using the xts() method of the xts package. Objects of class xts are composed by 2 elements: the core data and the index (class Date, POSIX,etc).
# Load library xts and zoo. Install the packages if you haven't already.
library(xts)
library(zoo)
library(tseries)
library(forecast)
# I get the date of the dataset:
<- dataset$month
date # I join both objects using xts() and assign it to a new object called air.xts
<- xts(x=dataset$air, frequency = "monthly", order.by=date)
air.xtsnames(air.xts)<-c("Passengers")
class(air.xts)
## [1] "xts" "zoo"
# I see behaviour of air passengers over time
plot(air.xts$Passengers)
This dataset contains air passengers by month from an airline. Do the following:
Does the series look stationary? Obviously not, it looks with a clear growing trend, and a seasonality pattern. The mean of the series GROWS OVER TIME.
Get the natural logarithm of the series:
= log(air.xts) lnair.xts
Since we see a clear seasonality pattern, it is strongly recommended to use the seasonal difference of the log of the series. So, we will check whether we can treat the seasonal difference of the log as STATIONARY.
We start by plotting the seasonal difference of the log, which is the annual % growth (in cc) of the air passengers:
plot(diff(lnair.xts,lag=12))
It is hard to see whether this is stationary or not, so we have to run the Dicky-Fuller test. Run the Dicky-Fuller test using the adf.test function of the tseries packages.
adf.test(na.omit(diff(lnair.xts,lag=12)),alternative="stationary",k=0)
## Warning in adf.test(na.omit(diff(lnair.xts, lag = 12)), alternative =
## "stationary", : p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(diff(lnair.xts, lag = 12))
## Dickey-Fuller = -4.988, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary
We got a p-value<0.05, so we can treat the seasonal difference (the annual % growth) as stationary and continue with this series to calibrate an ARIMA-SARIMA model.
Once we confirm that our series is stationary, then the first step is to do the AC and PAC plots to identify how many AR and MA terms we can include in our ARIMA/SARIMA model.
Remember that we can compute the autocorrelation plots with the acf2 function from the astsa package. Let’s examine the autocorrelations of our series. Remember that our series is the seasonal difference of the log of air passengers, which is the annual % growth of air passengers month by month.
library(astsa)
# I create a ts object from the xts; the acf2 works better with ts objects
=ts(lnair.xts$Passengers)
lnair acf2(diff(lnair,lag=12))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.71 0.62 0.48 0.44 0.39 0.32 0.24 0.19 0.15 -0.01 -0.11 -0.24 -0.14
## PACF 0.71 0.23 -0.06 0.10 0.05 -0.06 -0.06 0.01 -0.01 -0.29 -0.16 -0.14 0.29
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## ACF -0.14 -0.10 -0.15 -0.10 -0.11 -0.14 -0.16 -0.11 -0.08
## PACF 0.06 0.05 -0.04 0.13 -0.06 -0.15 -0.02 0.12 -0.18
I used the ts function to transform the dataset into a ts dataset since this type of R object will allow us to easily run and forecast arima-sarima models.
The ACF shows the autocorrelations (AC) between the series and its lags, and the PACF shows the partial autocorrelations (PAC) between the series and its lags.
The VERTICAL lines show the MAGNITUD of the AUTOCORRELATION between a LAG and the current value of the series.
The blue horizontal dotted lines in the plots cover the 95% confidence interval for the autocorrelations. So, if the vertical line croses the blue dotted line, it means that that specific autocorrelation is SIGNIFICANTLY DIFFERENT THAN ZERO (it could be positive or negative).
Now what is the general interpretation of the these plots?
First, we always have to remember what we are modelling. In this case we are modelling the seasonal difference of the log, which is the ANNUAL % GROWTH of passengers month by month.
In the ACF plot I see that the first 8 ACs are positive and significant, and they gradually decay. For example, the AC of LAG 1 is about 70% (almost 0.70). This means that ANNUAL % growth of air passengers is strongly correlated (with a correlation of 70%) with ITS OWN ANNUAL % GROWTH of the PREVIOUS month (its LAG 1).
The AC of LAG 2 is about 62%, so the ANNUAL % GROWTH of passengers is strongly correlated with ITS OWN ANNUAL % GROWTH of 2 MONTHS AGO (its LAG 2)… and so on.
It is important to mention that the ACF plot shows autocorrelations independently of each other.
In the case of PACF, we see PARTIAL autocorrelations, which measure HOW MUCH ELSE the series is correlated with each its own LAGS, AFTER CONSIDERING the effect of LOWER-ORDER LAG AUTOCORRELATIONS.
I see that the PACF shows only the first 2 autocorrelations to be positive AND SINGIFICANT, and the magnitude of the following autocorrelations goes down to zero or negative very quickly.
When the ACF plot shows a SLOW DECLINE of autocorrelations, and the PACF shows a FAST DECLINE of autocorrelatinos, this is a clear PATTERN OF AN AR(p) MODEL. In this case, the # of AR terms is determined by the SIGNIFICANT LAGS in the PACF plot.
We see that also the AC of LAG 10 is statistically significant and negative, but it is recommended first to start with the AR(2) model, and then check the residuals of the model to see whether there is another significant lag.
Then, in this case we start with an ARIMA with p=2 and q=0.
Read Dr. Nau slides to review more PATTERNS of ACF and PACF plots.
In this case, I can start with the following calibration. Now we can use the log series, and leave the model to automatically calculate the SEASONAL difference. Then, I will use the log of the series (log of air passengers) and define my model as:
arima(p,d,q) sarima(P,D,Q,#periods in the year)
In our example, the values for the parameters are:
arima(2,0,0) sarima(0,1,0,12)
What this model tells us?
You first have to look at the # of periods. In this case, you have 12 periods in a year, so your series should be monthly.
Then you look at d and D (first differences and the seasonal differences). In this case, d=0 and D=1. What does this mean? this means that we are modeling the SEASONAL difference (D=1) of the log of the variable, which is the annual % growth (month by month). Since d=0 this means that we are NOT using the first difference of the series.
Then you look at the p, q, P, Q. p tells you how many AR terms are included in your model. In this case, p=2, so your model includes the first 2 AR terms. Since q, P and Q are zero, then your model does not include any of these type of terms.
How this model can be expressed mathematically?
Since you are modeling the seasonal difference, you can express this model as the following equation:
\(\triangle^{12}lnair=\phi_{0}+\phi_{1}(L^{1}\triangle^{12}lnair_{t})+\phi_{2}(L^{2}\triangle^{12}lnair_{t})+\varepsilon_{t}\)
Using a simpler notation, this equation can be expressed as:
\(s12.lnair_{t}=\phi_{0}+\phi_{1}L1.s12.lnair_{t}+\phi_{2}L2.s12.lnair_{t}+\varepsilon_{t}\)
Remember that s12.lnair is the seasonal difference of the log of air passengers, which is the annual %growth of air passengers month by month. Then we can read this mathematical expression as: “The annual %growth (month by month) of air passengers at time t can be determined by its own annual % growth of air passengers at time t-1 and t-2 (the %growth of the previous 2 months), and by a random shock”
Now we can estimate the ARIMA-SARIMA model using the Arima function of the forecast package.
In this case, instead of creating a variable for the seasonal difference of the log (s12.lnair), we will configure the parameters of the Arima function so that the log transformation and the seasonal difference will be calculated before estimating the arima coefficients.
In our first calibration we discussed above, we defined the following values of the ARIMA-SARIMA model for the log of the number of air passengers:
arima(2,0,0) sarima(0,1,0,12)
This means that D=1, so R will calculate the seasonal difference, and then estimate the arima model and include 2 AR terms and no MA terms.
We can run this model in R as follows:
<- Arima(air.xts$Passengers, # We use the Passengers column of the dataset
m1 order = c(2,0,0), # we indicate that p=0; d=0; q=0
seasonal = list(order=c(0,1,0),period=12), #P=0; D=1; Q=0, #periods=12
include.constant = TRUE, # Here we indicate to include the phi0 coefficient
lambda = 0) # Here we indicate to apply the natural log to the series
m1
## Series: air.xts$Passengers
## ARIMA(2,0,0)(0,1,0)[12] with drift
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 drift
## 0.5540 0.2378 0.0096
## s.e. 0.0845 0.0848 0.0014
##
## sigma^2 estimated as 0.001742: log likelihood=233.13
## AIC=-458.26 AICc=-457.95 BIC=-446.73
If we set lamda = 0 this means that the the model will use the log of the variable (passengers); this is one of the Box-Jenkins mathematical transformations, and it is the most common for financial and economic series.
We can alternatively use the function sarima from the astsa model to run the same arima model. The sarima function provides more detail in the output, but the Arima function has advantages when we want to get the forecast with the model. We run sarima for the same model as:
<-sarima(log(air.xts$Passengers), p=2, d=0, q=0,P=0,D=1,Q=0,S=12) m1a
## initial value -2.794390
## iter 2 value -2.940651
## iter 3 value -3.175869
## iter 4 value -3.186870
## iter 5 value -3.189461
## iter 6 value -3.189605
## iter 7 value -3.189606
## iter 8 value -3.189607
## iter 9 value -3.189607
## iter 10 value -3.189608
## iter 11 value -3.189609
## iter 12 value -3.189609
## iter 12 value -3.189609
## iter 12 value -3.189609
## final value -3.189609
## converged
## initial value -3.184486
## iter 2 value -3.184596
## iter 3 value -3.184959
## iter 4 value -3.185009
## iter 5 value -3.185059
## iter 6 value -3.185073
## iter 7 value -3.185076
## iter 8 value -3.185076
## iter 8 value -3.185076
## final value -3.185076
## converged
m1a
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 constant
## 0.5540 0.2378 0.0096
## s.e. 0.0845 0.0848 0.0014
##
## sigma^2 estimated as 0.001701: log likelihood = 233.13, aic = -458.26
##
## $degrees_of_freedom
## [1] 129
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.5540 0.0845 6.5597 0.0000
## ar2 0.2378 0.0848 2.8037 0.0058
## constant 0.0096 0.0014 6.8655 0.0000
##
## $AIC
## [1] -3.204618
##
## $AICc
## [1] -3.203411
##
## $BIC
## [1] -3.12398
As we see, we need to take the log first and then specify the model. The result is exactly the same as the Arima result, but with sarima we can see the coefficients, their p-values and other important plots of the model. Also, we can see important plots such as ACF of the errors.
Since the Arima function does not display the p values and t values of the coefficients, we can use the function coeftest from the lmtest library.
You have to install the lmtest library and the load it:
library(lmtest)
Now we can see the coefficients and their corresponding standard error, pvalue and t value. We apply the coeftest function to the m1 model created with Arima:
coeftest(m1)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.5540129 0.0844572 6.5597 5.392e-11 ***
## ar2 0.2377939 0.0848135 2.8037 0.005052 **
## drift 0.0095854 0.0013962 6.8655 6.625e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The drift coefficient is the phi0 coefficient. phi0 and both lags are positive and SIGNIFICANT. What does this mean? First, I have to remember which variable I am modelling. I am modeling the seasonal difference of log passengers, which is the annual % growth of air passengers month by month. Then, I can interpret this ARIMA-SARIMA model as follows:
The annual % growth of air passengers is positively and significantly related with its annual % growth of air passengers of last month (lag1). In a similar way, the annual % growth of air passengers is positively and significantly related with the annual % growth of two months ago. More specifically, 55.4% of annual growth of last month is passed to this month’s annual % growth, and only 23.78% of the annual growth of 2 months ago is passed to the current annual % growth.
In other words, for each percentual point of annual growth of previous month, it is expected that the current annual growth will grow about 0.55 percent points
It is interesting to see that the sum of phi1 and phi2 (0.55+0.23) MUST be < 1 in order for the series to be stationary. If you include more arima terms, the sum of the phi’s must be less than one. If you include MA terms, the sum of the theta coefficients also must be less than one. In addition, the constant or phi0 is positive and significant, indicating that the annual growth of air passengers has a positive tendency to grow over time.
The mathematical model is: \[ E[s12.lnair] = 0.0096 + 0.554*L1.s12.lnair + 0.2378*L2.s12.lnair \] or \[ s12.lnair = 0.0096 + 0.554*L1.s12.lnair + 0.2378*L2.s12.lnair + error \]
We need to generate a series for the residuals of the ARIMA-SARIMA model. To do this, get the residuals as follows:
<- m1$residuals
air_res head(air_res)
## Time Series:
## Start = 1
## End = 432001
## Frequency = 1.15740740740741e-05
## [1] 0.004708908 0.004751501 0.004854026 0.004821445 0.004747833 0.004847728
Test whether the residual series (the air_res variable) is a white noise running the ac and pac plots. A white noise is a series that has no signicant autocorrelations with any of its own lags.
acf2(air_res)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.01 0.01 -0.13 0.02 0.09 0.05 -0.03 0.05 0.20 -0.01 -0.09 -0.40 0.04
## PACF 0.01 0.01 -0.13 0.02 0.10 0.03 -0.03 0.08 0.21 -0.04 -0.10 -0.37 0.01
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## ACF 0.01 0.11 -0.11 0.07 0.04 -0.05 -0.13 -0.03 -0.03
## PACF -0.04 0.01 -0.09 0.16 0.11 -0.08 -0.08 0.16 -0.09
INTERPRET these plots
Can you add another AR or MA term to improve the previous model and then to make the residual a white noise ?
We re-run the Arima model we run above, but now we will transform the dataset from xts to a ts R object to improve the visualization of the forecast:
#Convert the xts to a ts dataset:
<- ts(coredata(air.xts$Passengers),start=c(1949,1),frequency=12)
air.ts # We display the content of the ts dataset:
air.ts
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
# Run the Arima model:
<- Arima(air.ts, order = c(2,0,0), # p=2:include 2 AR terms; q=0: No MA terms;
m1 seasonal = list(order=c(0,1,0), # D=1: use seasonal difference; No Seasonal terms
period=12), # periods=12 period in the year
include.constant = TRUE, # include the phi0 (drift) coefficient
lambda = 0) # do the log transformation
m1
## Series: air.ts
## ARIMA(2,0,0)(0,1,0)[12] with drift
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 drift
## 0.5540 0.2378 0.0096
## s.e. 0.0845 0.0848 0.0014
##
## sigma^2 estimated as 0.001742: log likelihood=233.13
## AIC=-458.26 AICc=-457.95 BIC=-446.73
# We can see the pvalues of each ARIMA-SARIMA coefficient using the
# coeftest from the lmtest library:
coeftest(m1)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.5540129 0.0844572 6.5597 5.392e-11 ***
## ar2 0.2377939 0.0848135 2.8037 0.005052 **
## drift 0.0095854 0.0013962 6.8655 6.625e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The pvalue of the drift, \(\phi_0\), and the pvalues of the AR coefficients, \(\phi_1\) and \(\phi_2\), are very significant (<0.05). Then we keep these terms in the model.
Using the last Arima model, run a forecast for the following 24 months.
We run the forescast using the arima model m1 (the one I run with the Arima function. We indicate to do a forecast for 24 months, specifying h=24:
<- forecast(m1, h=24)
forecast_air # I stored the information of the forecast in forecast_air
# I plot the forecast
autoplot(forecast_air)
The forecast_air R object has both, historical data and forecast data. A very nice feature of the forecast function after using Arima function, is that we DO NOT HAVE TO do transformations to get the original scale of the variable, which is in tons. The Arima function automatically does the log transformation, and the forecast function automatically does the exponential transofrmation to get the forecast in tons!
The forecast_air R object is a list type of object that contains several elements.
attributes(forecast_air)
## $names
## [1] "method" "model" "level" "mean" "lower" "upper"
## [7] "x" "series" "fitted" "residuals"
##
## $class
## [1] "forecast"
The forecast values are in the mean attribute:
$mean forecast_air
## Jan Feb Mar Apr May Jun Jul Aug
## 1961 450.5662 424.4918 457.4920 505.5161 519.5149 590.6793 688.5216 672.2711
## 1962 503.1498 474.3893 511.5911 565.5928 581.5127 661.4150 771.2137 753.2077
## Sep Oct Nov Dec
## 1961 564.5825 513.1307 434.6578 481.9836
## 1962 632.6921 575.1384 487.2573 540.3795
forecast_air also has historical data (x), residuals, the lower and upper limit of the 95% confidence interval of the forecast, etc !
Go to the course site (Material) and download Dr. Nau slides about ARIMA/SARIMA models. Read the slides and focus on his recommendations to select the number of p, d, q, P, D, Q for any ARIMA/SARIMA model. Download the following Dataset:
download.file("http://www.apradie.com/ec2004/salesfab2.xlsx", "salesfabs2.xlsx", mode="wb")
Now, we read the data from the Excel file and construct an xts dataset:
<- read_xlsx("salesfabs2.xlsx", sheet = "Sheet1")
sales_brands
#I create an xts object to handle the object and dates in a better way:
# I construct a date variable as a sequence of dates using the month column:
<- seq(as.Date(sales_brands$month[1]),
date by="month",
length.out = nrow(sales_brands) )
# I delete the first column since it has the month, and I will construct
# an xts dataset with the index equal to the month
= sales_brands[,-1]
sales_brands
# I create the xts object:
<- xts(x=sales_brands, frequency = "monthly", order.by=date) sales_brands
This dataset contains consumer national monthly sales of a Category of products. The sales is in tons (qfab#) and prices (pfab#) and include sales of few products. You have to do the following:
Generate the log variables for price (pfab1) and sales in volume (qfab1) for product 1.
Now you have to work in a model for qfab1, sales in tons of product 1:
Graph the natural log of sales in tons of this product. WHAT DO YOU OBSERVE about the series? does it look stationary or nonstationary? does it look with seasonality?
If the series is not stationary, try whether the seasonal difference of the log looks as stationary (s12.lnqfab1). Do a time series plot, run and INTERPRET the Dicky-Fuller test
According to Dr Nau slides and what you learned in this workshop, calibrate the best ARIMA/SARIMA model.
What is a white noise series? RESPOND WITH YOUR WORDS
Create a variable for the residuals of your model. Run the ACF plot and the PACF plot for this residual. Does the residual look like a white noise?
Why do you think that the residuals of your best model should look like A White noise? RESPOND WITH YOUR OWN WORDS
Create 24 new observations in the future
Create a forecast for the future observations using your last model
(optional) Estimate the Akaike Information Criterion (AIC) of the model. To do this, run the following command right after running your arima model:
Do a quick research about what is the AIC of the model and EXPLAIN WITH YOUR OWN WORDS
Go to the course site (Material) and download Dr. Nau slides about ARIMA/SARIMA models.
Go to Canvas and respond Quiz 4. You will have 3 attempts. Questions in this Quiz are related to concepts of the readings related to this Workshop.
The grade of this Workshop will be the following:
Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.