List of R colors:

http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

Time Series

Example 1

To gain an better understanding of how time series operate, we will examine the monthly sales figures for a mobile phone Model M sold by a particular retailer. The monthly sales figures over a 24 month period are given in the table below:

Month Monthly Sales Figures (Year 1) Monthly Sales Figures (Year 2)
Jan 197 296
Feb 211 276
Mar 203 305
Apr 247 308
May 239 356
Jun 269 393
Jul 308 363
Aug 262 386
Sep 258 443
Oct 256 308
Nov 261 358
Dec 288 384
  • To begin the times series analysis of this data, we create a data vector MS (Monthly Sales) to contain the sales data:
MS<-c(197,211,203,247,239,269,308,262,258,256,261,288,296,276,305,308,356,393,363,386,443,308,358,384)
MS
length(MS)

Next we need to create a time vector against which these sales figures are plotted: We need a vector with the same length as MS, i.e. with 24 entries, starting at 1 and increasing in increments of 1. * We can automate a lot of this for future examples using the length() and seq() functions

Time <- seq(1,length(MS),1)
Time
length(Time)
  • This creates a sequence of values starting at 1, ending at length(MS) and increasing with a step-size 1.

  • Time and MS now both have 24 entries, so we can plot them on the same graph:

plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
  • The function lines() indicates that a line should be drawn between each of the data points of the time series.
  • Recall that the argument pch appearing in plot() selects the type of marker used to mark the data points. Its possible values are 1 to 26.

Forecasting

  • Recall from lectures that we used a linear regression model to model the data in this time series. This model was given by \[ \hat{y}_t=198.03+8.07t \] where \(t\) referred to a month number.

  • We will now create our own R function corresponding to this, which we are going to call Forecast1

Forecast1 <- function(t){
  198.03+8.07*t
}
  • The values predicted by this model at each of the months in Time are now given by
Forecast1(Time)

Mean Absolute Deviation (MAD) & Mean Square Error (MSE)

  • Recall that the Mean Absolute Deviation (MAD) of a model, was given by

\[ MAD= \frac{1}{n}\sum_{t=1}^{N}\vert y_{t}-\hat{y}_t\vert \]

  • \(y_t\) denotes the actual value of the variable \(y\) at time \(t\)
  • \(\hat{y}_t\) denotes the predicted value of the variable \(y\) at time \(t\)
  • \(n\) is the number of observations we have, i.e. the number of actual values \(y_t\).
  • R will calculate this for us automatically as follows:
MAD1<-mean(abs(MS-Forecast1(Time)))
MAD1
  • The code and the formula correspond as follows

    1. MS \(\leftrightarrow y_t\),

    2. Forecast(Time) \(\leftrightarrow \hat{y}_t\)

    3. abs(MS-Forecast1(Time)) \(\leftrightarrow \vert y_t-\hat{y}_t\vert\)

    4. mean(abs(MS-Forecast1(Time))) \(\leftrightarrow \frac{1}{n}\sum_{t=1}^{n}\vert y_t-\hat{y}_t\vert\)

  • Recall also that the Mean Square Error (MSE) of a model is given by \[ MSE=\frac{1}{n}\sum_{t=1}^{n}\vert y_t-\hat{y}_t\vert^2 \]

Exercise 1

Modify this code block to find the MSE of the model.

MSE1<-mean(abs(MS-Forecast1(Time))^2)
MSE1

Prediction Intervals

\(t_{\frac{\alpha}{2},n-2}\) we us the

t_star = abs(qt(0.05,df=22)) # df= Number of months-2

\(x^*\)

x_star=27

\(y^*\)

y_star=Forecast1(27)
y_star

MSE

MSE1

\(\bar{x}\)

x_bar=mean(Time)

\(\Sigma_{i=1}^{n}(x_i-\bar{x})^2\)

Sum1=sum((Time-x_bar)^2)

Upper boundary of CI

y_star+t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))

Lower boundary of CI

y_star-t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))

The 90% Prediction Interval

We are 90% confident that sales in the 27th month will be between 362 and 469 units.

Exercise 2

Find the 90% prediction interval for sales in the 27th month.

Exercise 3

The closing values of Apple Inc. (AAPL) Stock on the NASDAQ Stock Exchange from 8 August 2017 to 8 November 2017 are given in the data file AppleQuotes(3M).csv, available on Moodle. (Available at http://www.nasdaq.com)

Using this data set answer the following:

  1. Import the data in this file using the following and call the data structure AAPL
AAPL<-read.csv('AppleQuotes(3M).csv')
  1. Create two data vectors from this file, one for the closing value and one for the day
  1. Create a time series plot for this data.

  2. From this data plot, determine if there is a trend in the closing value of Apple stock over the past 3 months.

  3. Use the function lm(Closing Value ~ Day) to create a linear regression model for this data.

lm(Closing~Day)
  1. Create a linear model to forecast this data.

  2. Create an R function to represent this model.

  3. Find the MAD and MSE of this model

  4. Find the 95% prediction interval for the closing price of Apple Stock in 10 days from now.

Exercise 4

The closing values of Google Inc. (GOOGL) Stock on the NASDAQ Stock Exchange from 8 November 2016 to 8 November 2017 are given in the data file GoogleQuotes(3M).csv, available on Moodle . (Available at http://www.nasdaq.com).

Using this data set answer the following:

  1. Import the data in this file using the following and call the data structure GOOGL
GOOGL<-read.csv('GoogleQuotes(1Y).csv')
  1. Create two data vectors from this file, one for the trading value and one for the day

  2. Create a time series plot for this data.

  3. From this data plot, determine if there is a trend in the closing value of Google stock over the past year.

  4. Use the function lm(Trading Value ~ Day) to create a linear regression model for this data.

  1. Create a linear model to forecast this data.

  2. Create an R function to represent this model.

  3. Find the MAD and MSE of this model

  4. Find the 90% and 99% prediction intervals for the trading volume of Google Stock in 10 days from now.

Exercise 5

The % Growth in GDP of Chin, the UK, the US, Ireland, the EU, the OECD and the World, for the years 1961-2016 are given in the data file RegionalGDPGrowth(1961-2016).csv. Import this data file into R and answer the following questions. (Available at http://www.worldbank.org)

  1. Create a data vector for the Year a separate vector for the GDP growth of each country in the data file.

  2. Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=1 row and B=2 columns for the GDP growth of China and the US To illustrate how this function works, the time-series plot from Example 1 is plotted in 1 row and 2 columns

par(mfrow=c(1,2))
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
  1. Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=3 row and B=1 columns for the GDP growth of China and the US, the UK

  2. Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=2 row and B=2 columns for the GDP growth of the EU, the US, the OECD and the World.

  3. Is there any apparent trend in economic growth observable from these time-series.

  4. From the time-series plots, which region has shown the most consistent economic growth between 1961 and 2016.

