Time Series
The basic idea behind time series is that we use the past behavior of a variable to predict its future values.
Time series arise in a vast verity of circumstances, including but not limited to:
- Daily closing stock prices
- Daily relative values of currencies
- Monthly unemployment rates in a country
- Quarterly public debt levels in a country
- Weekly viewership figures of a TV series
- Average annual CO2 emissions
- Quarterly sales figures of a retailer
- Monthly production figures of a factory
- Annual % growth of GDP of an economy
- etc. etc. etc.
Example 1
To gain an better understanding of how time series operate, we will examine the monthly sales figures for a mobile phone Model M sold by a particular retailer. The monthly sales figures over a 24 month period are given in the table below:
| Jan |
197 |
296 |
| Feb |
211 |
276 |
| Mar |
203 |
305 |
| Apr |
247 |
308 |
| May |
239 |
356 |
| Jun |
269 |
393 |
| Jul |
308 |
363 |
| Aug |
262 |
386 |
| Sep |
258 |
443 |
| Oct |
256 |
308 |
| Nov |
261 |
358 |
| Dec |
288 |
384 |
- To begin the times series analysis of this data, we create a data vector MS (Monthly Sales) to contain the sales data:
MS<-c(197,211,203,247,239,269,308,262,258,256,261,288,296,276,305,308,356,393,363,386,443,308,358,384)
MS
length(MS)
Next we need to create a time vector against which these sales figures are plotted: We need a vector with the same length as MS, i.e. with 24 entries, starting at 1 and increasing in increments of 1. * We can automate a lot of this for future examples using the length() and seq() functions
Time <- seq(1,length(MS),1)
Time
length(Time)
This creates a sequence of values starting at 1, ending at length(MS) and increasing with a step-size 1.
Time and MS now both have 24 entries, so we can plot them on the same graph:
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
- The function lines() indicates that a line should be drawn between each of the data points of the time series.
- Recall that the argument pch appearing in plot() selects the type of marker used to mark the data points. Its possible values are 1 to 26.
Forecasting
Recall from lectures that we used a linear regression model to model the data in this time series. This model was given by \[
\hat{y}_t=198.03+8.07t
\] where \(t\) referred to a month number.
We will now create our own R function corresponding to this, which we are going to call Forecast1
Forecast1 <- function(t){
198.03+8.07*t
}
- The values predicted by this model at each of the months in Time are now given by
Forecast1(Time)
Mean Absolute Deviation (MAD) & Mean Square Error (MSE)
- Recall that the Mean Absolute Deviation (MAD) of a model, was given by
\[
MAD= \frac{1}{n}\sum_{t=1}^{N}\vert y_{t}-\hat{y}_t\vert
\]
- \(y_t\) denotes the actual value of the variable \(y\) at time \(t\)
- \(\hat{y}_t\) denotes the predicted value of the variable \(y\) at time \(t\)
- \(n\) is the number of observations we have, i.e. the number of actual values \(y_t\).
- R will calculate this for us automatically as follows:
MAD1<-mean(abs(MS-Forecast1(Time)))
MAD1
Exercise 1
Modify this code block to find the MSE of the model.
MSE1<-mean(abs(MS-Forecast1(Time))^2)
MSE1
Prediction Intervals
\(t_{\frac{\alpha}{2},n-2}\) we us the
t_star = abs(qt(0.05,df=22)) # df= Number of months-2
\(x^*\)
x_star=27
\(y^*\)
y_star=Forecast1(27)
y_star
MSE
MSE1
\(\bar{x}\)
x_bar=mean(Time)
\(\Sigma_{i=1}^{n}(x_i-\bar{x})^2\)
Sum1=sum((Time-x_bar)^2)
Upper boundary of CI
y_star+t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))
Lower boundary of CI
y_star-t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))
The 90% Prediction Interval
We are 90% confident that sales in the 27th month will be between 362 and 469 units.
Exercise 2
Find the 90% prediction interval for sales in the 27th month.
Exercise 3
The closing values of Apple Inc. (AAPL) Stock on the NASDAQ Stock Exchange from 8 August 2017 to 8 November 2017 are given in the data file AppleQuotes(3M).csv, available on Moodle. (Available at http://www.nasdaq.com)
Using this data set answer the following:
- Import the data in this file using the following and call the data structure AAPL
AAPL<-read.csv('AppleQuotes(3M).csv')
- Create two data vectors from this file, one for the closing value and one for the day
Create a time series plot for this data.
From this data plot, determine if there is a trend in the closing value of Apple stock over the past 3 months.
Use the function lm(Closing Value ~ Day) to create a linear regression model for this data.
lm(Closing~Day)
Create a linear model to forecast this data.
Create an R function to represent this model.
Find the MAD and MSE of this model
Find the 95% prediction interval for the closing price of Apple Stock in 10 days from now.
Exercise 4
The closing values of Google Inc. (GOOGL) Stock on the NASDAQ Stock Exchange from 8 November 2016 to 8 November 2017 are given in the data file GoogleQuotes(3M).csv, available on Moodle . (Available at http://www.nasdaq.com).
Using this data set answer the following:
- Import the data in this file using the following and call the data structure GOOGL
GOOGL<-read.csv('GoogleQuotes(1Y).csv')
Create two data vectors from this file, one for the trading value and one for the day
Create a time series plot for this data.
From this data plot, determine if there is a trend in the closing value of Google stock over the past year.
Use the function lm(Trading Value ~ Day) to create a linear regression model for this data.
Create a linear model to forecast this data.
Create an R function to represent this model.
Find the MAD and MSE of this model
Find the 90% and 99% prediction intervals for the trading volume of Google Stock in 10 days from now.
Exercise 5
The % Growth in GDP of Chin, the UK, the US, Ireland, the EU, the OECD and the World, for the years 1961-2016 are given in the data file RegionalGDPGrowth(1961-2016).csv. Import this data file into R and answer the following questions. (Available at http://www.worldbank.org)
Create a data vector for the Year a separate vector for the GDP growth of each country in the data file.
Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=1 row and B=2 columns for the GDP growth of China and the US To illustrate how this function works, the time-series plot from Example 1 is plotted in 1 row and 2 columns
par(mfrow=c(1,2))
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=3 row and B=1 columns for the GDP growth of China and the US, the UK
Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=2 row and B=2 columns for the GDP growth of the EU, the US, the OECD and the World.
Is there any apparent trend in economic growth observable from these time-series.
From the time-series plots, which region has shown the most consistent economic growth between 1961 and 2016.
