Time Series
The basic idea behind time series is that we use the past behavior of a variable to predict its future values.
Time series arise in a vast verity of circumstances, including but not limited to:
- Daily closing stock prices
- Daily relative values of currencies
- Monthly unemployment rates in a country
- Quarterly public debt levels in a country
- Weekly viewership figures of a TV series
- Average annual CO2 emissions
- Quarterly sales figures of a retailer
- Monthly production figures of a factory
- Annual % growth of GDP of an economy
- etc. etc. etc.
Example 1
To gain an better understanding of how time series operate, we will examine the monthly sales figures for a mobile phone Model M sold by a particular retailer. The monthly sales figures over a 24 month period are given in the table below:
| Jan |
197 |
296 |
| Feb |
211 |
276 |
| Mar |
203 |
305 |
| Apr |
247 |
308 |
| May |
239 |
356 |
| Jun |
269 |
393 |
| Jul |
308 |
363 |
| Aug |
262 |
386 |
| Sep |
258 |
443 |
| Oct |
256 |
308 |
| Nov |
261 |
358 |
| Dec |
288 |
384 |
- To begin the times series analysis of this data, we create a data vector MS (Monthly Sales) to contain the sales data:
MS<-c(197,211,203,247,239,269,308,262,258,256,261,288,296,276,305,308,356,393,363,386,443,308,358,384)
MS
length(MS)
Next we need to create a time vector against which these sales figures are plotted: We need a vector with the same length as MS, i.e. with 24 entries, starting at 1 and increasing in increments of 1. * We can automate a lot of this for future examples using the length() and seq() functions
Time <- seq(1,length(MS),1)
Time
length(Time)
This creates a sequence of values starting at 1, ending at length(MS) and increasing with a step-size 1.
Time and MS now both have 24 entries, so we can plot them on the same graph:
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
- The function lines() indicates that a line should be drawn between each of the data points of the time series.
- Recall that the argument pch appearing in plot() selects the type of marker used to mark the data points. Its possible values are 1 to 26.
Forecasting
Recall from lectures that we used a linear regression model to model the data in this time series. This model was given by \[
\hat{y}_t=198.03+8.07t
\] where \(t\) referred to a month number.
We will now create our own R function corresponding to this, which we are going to call Forecast1
Forecast1 <- function(t){
198.03+8.07*t
}
- The values predicted by this model at each of the months in Time are now given by
Forecast1(Time)
Mean Absolute Deviation (MAD) & Mean Square Error (MSE)
- Recall that the Mean Absolute Deviation (MAD) of a model, was given by
\[
MAD= \frac{1}{n}\sum_{t=1}^{N}\vert y_{t}-\hat{y}_t\vert
\]
- \(y_t\) denotes the actual value of the variable \(y\) at time \(t\)
- \(\hat{y}_t\) denotes the predicted value of the variable \(y\) at time \(t\)
- \(n\) is the number of observations we have, i.e. the number of actual values \(y_t\).
- R will calculate this for us automatically as follows:
MAD1<-mean(abs(MS-Forecast1(Time)))
MAD1
Exercise 1
Modify this code block to find the MSE of the model.
MSE1<-mean(abs(MS-Forecast1(Time))^2)
MSE1
Prediction Intervals
\(t_{\frac{\alpha}{2},n-2}\) we us the
t_star = abs(qt(0.05,df=22)) # df= Number of months-2
\(x^*\)
x_star=27
\(y^*\)
y_star=Forecast1(27)
y_star
MSE
MSE1
\(\bar{x}\)
x_bar=mean(Time)
\(\Sigma_{i=1}^{n}(x_i-\bar{x})^2\)
Sum1=sum((Time-x_bar)^2)
Upper boundary of CI
y_star+t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))
Lower boundary of CI
y_star-t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))
The 90% Prediction Interval
We are 90% confident that sales in the 27th month will be between 362 and 469 units.
Exercise 2
Find the 90% prediction interval for sales in the 27th month.
Exercise 3
The closing values of Apple Inc. (AAPL) Stock on the NASDAQ Stock Exchange from 8 August 2017 to 8 November 2017 are given in the data file AppleQuotes(3M).csv, available on Moodle. (Available at http://www.nasdaq.com)
Using this data set answer the following:
- Import the data in this file using the following and call the data structure AAPL
AAPL<-read.csv('AppleQuotes(3M).csv')
- Create two data vectors from this file, one for the closing value and one for the day
Create a time series plot for this data.
From this data plot, determine if there is a trend in the closing value of Apple stock over the past 3 months.
Use the function lm(Closing Value ~ Day) to create a linear regression model for this data.
lm(Closing~Day)
Create a linear model to forecast this data.
Create an R function to represent this model.
Find the MAD and MSE of this model
Find the 95% prediction interval for the closing price of Apple Stock in 10 days from now.
Exercise 4
The closing values of Google Inc. (GOOGL) Stock on the NASDAQ Stock Exchange from 8 November 2016 to 8 November 2017 are given in the data file GoogleQuotes(3M).csv, available on Moodle . (Available at http://www.nasdaq.com).
Using this data set answer the following:
- Import the data in this file using the following and call the data structure GOOGL
GOOGL<-read.csv('GoogleQuotes(1Y).csv')
Create two data vectors from this file, one for the trading value and one for the day
Create a time series plot for this data.
From this data plot, determine if there is a trend in the closing value of Google stock over the past year.
Use the function lm(Trading Value ~ Day) to create a linear regression model for this data.
Create a linear model to forecast this data.
Create an R function to represent this model.
Find the MAD and MSE of this model
Find the 90% and 99% prediction intervals for the trading volume of Google Stock in 10 days from now.
Exercise 5
The % Growth in GDP of Chin, the UK, the US, Ireland, the EU, the OECD and the World, for the years 1961-2016 are given in the data file RegionalGDPGrowth(1961-2016).csv. Import this data file into R and answer the following questions. (Available at http://www.worldbank.org)
Create a data vector for the Year a separate vector for the GDP growth of each country in the data file.
Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=1 row and B=2 columns for the GDP growth of China and the US To illustrate how this function works, the time-series plot from Example 1 is plotted in 1 row and 2 columns
par(mfrow=c(1,2))
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=3 row and B=1 columns for the GDP growth of China and the US, the UK
Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=2 row and B=2 columns for the GDP growth of the EU, the US, the OECD and the World.
Is there any apparent trend in economic growth observable from these time-series.
From the time-series plots, which region has shown the most consistent economic growth between 1961 and 2016.
---
title: "Data Visualisation 2019 - Assignment 6"
output:
  html_notebook: default
  html_document:
    df_print: paged
  pdf_document: default
---



## List of R colors:
http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf


# Time Series

* The basic idea behind time series is that we use the past behavior of a variable to predict its future values.

* Time series arise in a vast verity of circumstances, including but not limited to:
    
    * Daily closing stock prices
    * Daily relative values of currencies
    * Monthly unemployment rates in a country
    * Quarterly public debt levels in a country
    * Weekly viewership figures of a TV series
    * Average annual CO2 emissions
    * Quarterly sales figures of a retailer
    * Monthly production figures of a factory
    * Annual % growth of GDP of an economy
    * etc. etc. etc.
    
## Example 1 

To gain an better understanding of how time series operate, we will examine the monthly sales figures for a mobile phone __Model M__ sold by a particular retailer. The monthly sales figures over a 24 month period are given in the table below:

|  Month  | Monthly Sales Figures (Year 1) | Monthly Sales Figures (Year 2)  |
|---------|--------------------------------|---------------------------------|
|   Jan   |               197              |             296                 |
|   Feb   |               211              |             276                 |
|   Mar   |               203              |             305                 |
|   Apr   |               247              |             308                 |
|   May   |               239              |             356                 |
|   Jun   |               269              |             393                 |
|   Jul   |               308              |             363                 |
|   Aug   |               262              |             386                 |
|   Sep   |               258              |             443                 |
|   Oct   |               256              |             308                 |
|   Nov   |               261              |             358                 |
|   Dec   |               288              |             384                 |

* To begin the times series analysis of this data, we create a data vector __MS__ (Monthly Sales) to contain the sales data:
```{r}
MS<-c(197,211,203,247,239,269,308,262,258,256,261,288,296,276,305,308,356,393,363,386,443,308,358,384)
MS
length(MS)
```
*Next we need to create a _time vector_ against which these sales figures are plotted:
    * We need a vector with the same length as __MS__, i.e. with 24 entries, starting at 1 and increasing        in increments of 1.
    * We can automate a lot of this for future examples using the __length()__ and __seq()__ functions
```{r}
Time <- seq(1,length(MS),1)
Time
length(Time)
```
* This creates a sequence of values starting at 1, ending at __length(MS)__ and increasing with a step-size 1.

* __Time__ and __MS__ now both have 24 entries, so we can plot them on the same graph:
```{r}
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
```
* The function __lines()__ indicates that a line should be drawn between each of the data points of the time series. 
* Recall that the argument __pch__ appearing in __plot()__ selects the type of marker used to mark the data points.  Its possible values are 1 to 26.

## Forecasting
* Recall from lectures that we used a __linear regression model__ to model the data in this time series. This model was given by
\[
\hat{y}_t=198.03+8.07t
\]
where $t$ referred to a month number.

* We will now create our own __R function__ corresponding to this, which we are going to call __Forecast1__

```{r}
Forecast1 <- function(t){
  198.03+8.07*t
}
```
* The values predicted by this model at each of the months in __Time__ are now given by
```{r}
Forecast1(Time)
```
## Mean Absolute Deviation (MAD) & Mean Square Error (MSE)
* Recall that the __Mean Absolute Deviation__  (__MAD__) of a model, was given by

\[
MAD= \frac{1}{n}\sum_{t=1}^{N}\vert y_{t}-\hat{y}_t\vert
\]
    
  * $y_t$ denotes the actual value of the variable $y$ at time $t$
  * $\hat{y}_t$ denotes the predicted value of the variable $y$ at time $t$
  * $n$ is the number of observations we have, i.e. the number of actual values $y_t$.
  * __R__ will calculate this for us automatically as follows:
```{r}
MAD1<-mean(abs(MS-Forecast1(Time)))
MAD1
```
* The code and the formula correspond as follows

    1. __MS__ $\leftrightarrow y_t$, 
    
    2. __Forecast(Time)__ $\leftrightarrow \hat{y}_t$
    
    3. __abs(MS-Forecast1(Time))__ $\leftrightarrow \vert y_t-\hat{y}_t\vert$
    
    4. __mean(abs(MS-Forecast1(Time)))__ $\leftrightarrow \frac{1}{n}\sum_{t=1}^{n}\vert y_t-\hat{y}_t\vert$

* Recall also that the __Mean Square Error__  (__MSE__) of a model is given by
\[
MSE=\frac{1}{n}\sum_{t=1}^{n}\vert y_t-\hat{y}_t\vert^2
\]

## Exercise 1

Modify this code block to find the __MSE__ of the model. 

```{r}
MSE1<-mean(abs(MS-Forecast1(Time))^2)
MSE1
```


# Prediction Intervals
* We can also use the model to forecast future sales values of phone model M.

* To find the __90% Prediction Interval__ of the model at month 27 say, we can use thee following formula for this interval
\[
  \hat{y}_{x^{*}}\pm t_{\frac{\alpha}{2},n-2}\sqrt{MSE\left(1+\frac{1}{n}+\frac{x^{*}-\bar{x}}{\sum_{i=1}^{n}(x_i-\bar{x})^2}\right)}
\]
* The symbols in the formula have the following meaning
    1. $x^{*}=27$, the month number we want to predict sales figures for
    2. $\hat{y}_{x^{*}} = \hat{y}_{27}$, the predicted sales for month 27
    3. $\alpha = 1-\frac{90}{100}=0.1$, the confidence parameter fro the 90% Prediction Interval
    4. $n=24$, the number of actual data values we have
    6. $n-2$ the number of __degrees of freedom (df)__ used 
    7. $t_{\frac{\alpha}{2},n-2}$, the critical $t$-value given these parameters
    8. $MSE$ the mean square error of the model
    9. $\bar{x}$ the mean month number, in this case $\bar{x}=\frac{1}{24}\sum_{i=1}^{24}i = \frac{1+2+\ldots+24}{24}$
* To find the critical value

$t_{\frac{\alpha}{2},n-2}$ we us the 
```{r}
t_star = abs(qt(0.05,df=22)) # df= Number of months-2
```
$x^*$
```{r}
x_star=27
```

$y^*$

```{r}
y_star=Forecast1(27)
y_star
```
MSE
```{r}
MSE1
```

$\bar{x}$

```{r}
x_bar=mean(Time)
```

$\Sigma_{i=1}^{n}(x_i-\bar{x})^2$
```{r}
Sum1=sum((Time-x_bar)^2)
```

### Upper boundary of CI

```{r}
y_star+t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))
```
### Lower boundary of CI

```{r}
y_star-t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))
```

## The 90% Prediction Interval
We are 90% confident that sales in the 27th month will be between 362 and 469 units. 



# Exercise 2
Find the 90% prediction interval for sales in the 27th month.

# Exercise 3

The closing values of Apple Inc. (AAPL) Stock on the NASDAQ Stock Exchange from 8 August 2017 to 8 November 2017 are given in the data file __AppleQuotes(3M).csv__, available on Moodle.
(Available at http://www.nasdaq.com)


Using this data set answer the following:

1. Import the data in this file using the following and call the data structure __AAPL__
```{r}
AAPL<-read.csv('AppleQuotes(3M).csv')
```
2. Create __two__ data vectors from this file, one for the __closing value__ and one for the __day__

```{r}

```
3. Create a time series plot for this data.

4. From this data plot, determine if there is a trend in the closing value of Apple stock over the past 3 months.

5. Use the function __lm(Closing Value ~ Day)__ to create a linear regression model for this data.
```{r}
lm(Closing~Day)
```
6. Create a linear model to forecast this data.

7. Create an __R__ function to represent this model.

8. Find the MAD and MSE of this model

9. Find the 95% prediction interval for the closing price of Apple Stock in 10 days from now.


# Exercise 4

The closing values of Google Inc. (GOOGL) Stock on the NASDAQ Stock Exchange from 8 November 2016 to 8 November 2017 are given in the data file __GoogleQuotes(3M).csv__, available on Moodle .
(Available at http://www.nasdaq.com).


Using this data set answer the following:

1. Import the data in this file using the following and call the data structure __GOOGL__
```{r}
GOOGL<-read.csv('GoogleQuotes(1Y).csv')
```
2. Create __two__ data vectors from this file, one for the __trading value__ and one for the __day__

3. Create a time series plot for this data.

4. From this data plot, determine if there is a trend in the closing value of Google stock over the past year.

5. Use the function __lm(Trading Value ~ Day)__ to create a linear regression model for this data.
```{r}

```
6. Create a linear model to forecast this data.

7. Create an __R__ function to represent this model.

8. Find the __MAD__ and __MSE__ of this model

9. Find the 90% and 99% prediction intervals for the  trading volume of Google Stock in 10 days from now.

# Exercise 5

The % Growth in GDP of Chin, the UK, the US, Ireland, the EU, the OECD and the World, for the years 1961-2016 are given in the data file __RegionalGDPGrowth(1961-2016).csv__. Import this data file into __R__ and answer the following questions. 
(Available at http://www.worldbank.org)


1. Create a data vector for the __Year__  a separate vector for the GDP growth of each country in the data file.

2. Use the function __par(mfrow=c(A,B))__ to create a collection of time-series plots in __A=1__ row and __B=2__ columns for the GDP growth of China and the US
 To illustrate how this function works, the time-series plot from __Example 1__ is plotted in __1__ row and __2__ columns  
```{r}
par(mfrow=c(1,2))
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
```
3. Use the function __par(mfrow=c(A,B))__ to create a collection of time-series plots in __A=3__ row and __B=1__ columns for the GDP growth of China and the US, the UK

 
4. Use the function __par(mfrow=c(A,B))__ to create a collection of time-series plots in __A=2__ row and __B=2__ columns for the GDP growth of the EU, the US, the OECD and the World.

5. Is there any apparent trend in economic growth observable from these time-series.

6. From the time-series plots, which region has shown the most consistent economic growth between 1961 and 2016.
