List of R colors:

http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

Time Series

Example 1

To gain an better understanding of how time series operate, we will examine the monthly sales figures for a mobile phone Model M sold by a particular retailer. The monthly sales figures over a 24 month period are given in the table below:

Month Monthly Sales Figures (Year 1) Monthly Sales Figures (Year 2)
Jan 197 296
Feb 211 276
Mar 203 305
Apr 247 308
May 239 356
Jun 269 393
Jul 308 363
Aug 262 386
Sep 258 443
Oct 256 308
Nov 261 358
Dec 288 384
  • To begin the times series analysis of this data, we create a data vector MS (Monthly Sales) to contain the sales data:
MS<-c(197,211,203,247,239,269,308,262,258,256,261,288,296,276,305,308,356,393,363,386,443,308,358,384)
MS
length(MS)

Next we need to create a time vector against which these sales figures are plotted: We need a vector with the same length as MS, i.e. with 24 entries, starting at 1 and increasing in increments of 1. * We can automate a lot of this for future examples using the length() and seq() functions

Time <- seq(1,length(MS),1)
Time
length(Time)
  • This creates a sequence of values starting at 1, ending at length(MS) and increasing with a step-size 1.

  • Time and MS now both have 24 entries, so we can plot them on the same graph:

plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
  • The function lines() indicates that a line should be drawn between each of the data points of the time series.
  • Recall that the argument pch appearing in plot() selects the type of marker used to mark the data points. Its possible values are 1 to 26.

Forecasting

  • Recall from lectures that we used a linear regression model to model the data in this time series. This model was given by \[ \hat{y}_t=198.03+8.07t \] where \(t\) referred to a month number.

  • We will now create our own R function corresponding to this, which we are going to call Forecast1

Forecast1 <- function(t){
  198.03+8.07*t
}
  • The values predicted by this model at each of the months in Time are now given by
Forecast1(Time)

Mean Absolute Deviation (MAD) & Mean Square Error (MSE)

  • Recall that the Mean Absolute Deviation (MAD) of a model, was given by

\[ MAD= \frac{1}{n}\sum_{t=1}^{N}\vert y_{t}-\hat{y}_t\vert \]

  • \(y_t\) denotes the actual value of the variable \(y\) at time \(t\)
  • \(\hat{y}_t\) denotes the predicted value of the variable \(y\) at time \(t\)
  • \(n\) is the number of observations we have, i.e. the number of actual values \(y_t\).
  • R will calculate this for us automatically as follows:
MAD1<-mean(abs(MS-Forecast1(Time)))
MAD1
  • The code and the formula correspond as follows

    1. MS \(\leftrightarrow y_t\),

    2. Forecast(Time) \(\leftrightarrow \hat{y}_t\)

    3. abs(MS-Forecast1(Time)) \(\leftrightarrow \vert y_t-\hat{y}_t\vert\)

    4. mean(abs(MS-Forecast1(Time))) \(\leftrightarrow \frac{1}{n}\sum_{t=1}^{n}\vert y_t-\hat{y}_t\vert\)

  • Recall also that the Mean Square Error (MSE) of a model is given by \[ MSE=\frac{1}{n}\sum_{t=1}^{n}\vert y_t-\hat{y}_t\vert^2 \]

Exercise 1

Modify this code block to find the MSE of the model.

MSE1<-mean(abs(MS-Forecast1(Time))^2)
MSE1

Prediction Intervals

\(t_{\frac{\alpha}{2},n-2}\) we us the

t_star = abs(qt(0.05,df=22)) # df= Number of months-2

\(x^*\)

x_star=27

\(y^*\)

y_star=Forecast1(27)
y_star

MSE

MSE1

\(\bar{x}\)

x_bar=mean(Time)

\(\Sigma_{i=1}^{n}(x_i-\bar{x})^2\)

Sum1=sum((Time-x_bar)^2)

Upper boundary of CI

y_star+t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))

Lower boundary of CI

y_star-t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))

The 90% Prediction Interval

We are 90% confident that sales in the 27th month will be between 362 and 469 units.

Exercise 2

Find the 90% prediction interval for sales in the 27th month.

Exercise 3

The closing values of Apple Inc. (AAPL) Stock on the NASDAQ Stock Exchange from 8 August 2017 to 8 November 2017 are given in the data file AppleQuotes(3M).csv, available on Moodle. (Available at http://www.nasdaq.com)

Using this data set answer the following:

  1. Import the data in this file using the following and call the data structure AAPL
AAPL<-read.csv('AppleQuotes(3M).csv')
  1. Create two data vectors from this file, one for the closing value and one for the day
  1. Create a time series plot for this data.

  2. From this data plot, determine if there is a trend in the closing value of Apple stock over the past 3 months.

  3. Use the function lm(Closing Value ~ Day) to create a linear regression model for this data.

lm(Closing~Day)
  1. Create a linear model to forecast this data.

  2. Create an R function to represent this model.

  3. Find the MAD and MSE of this model

  4. Find the 95% prediction interval for the closing price of Apple Stock in 10 days from now.

Exercise 4

The closing values of Google Inc. (GOOGL) Stock on the NASDAQ Stock Exchange from 8 November 2016 to 8 November 2017 are given in the data file GoogleQuotes(3M).csv, available on Moodle . (Available at http://www.nasdaq.com).

Using this data set answer the following:

  1. Import the data in this file using the following and call the data structure GOOGL
GOOGL<-read.csv('GoogleQuotes(1Y).csv')
  1. Create two data vectors from this file, one for the trading value and one for the day

  2. Create a time series plot for this data.

  3. From this data plot, determine if there is a trend in the closing value of Google stock over the past year.

  4. Use the function lm(Trading Value ~ Day) to create a linear regression model for this data.

  1. Create a linear model to forecast this data.

  2. Create an R function to represent this model.

  3. Find the MAD and MSE of this model

  4. Find the 90% and 99% prediction intervals for the trading volume of Google Stock in 10 days from now.

Exercise 5

The % Growth in GDP of Chin, the UK, the US, Ireland, the EU, the OECD and the World, for the years 1961-2016 are given in the data file RegionalGDPGrowth(1961-2016).csv. Import this data file into R and answer the following questions. (Available at http://www.worldbank.org)

  1. Create a data vector for the Year a separate vector for the GDP growth of each country in the data file.

  2. Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=1 row and B=2 columns for the GDP growth of China and the US To illustrate how this function works, the time-series plot from Example 1 is plotted in 1 row and 2 columns

par(mfrow=c(1,2))
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
  1. Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=3 row and B=1 columns for the GDP growth of China and the US, the UK

  2. Use the function par(mfrow=c(A,B)) to create a collection of time-series plots in A=2 row and B=2 columns for the GDP growth of the EU, the US, the OECD and the World.

  3. Is there any apparent trend in economic growth observable from these time-series.

  4. From the time-series plots, which region has shown the most consistent economic growth between 1961 and 2016.

---
title: "Data Visualisation 2019 - Assignment 6"
output:
  html_notebook: default
  html_document:
    df_print: paged
  pdf_document: default
---



## List of R colors:
http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf


# Time Series

* The basic idea behind time series is that we use the past behavior of a variable to predict its future values.

* Time series arise in a vast verity of circumstances, including but not limited to:
    
    * Daily closing stock prices
    * Daily relative values of currencies
    * Monthly unemployment rates in a country
    * Quarterly public debt levels in a country
    * Weekly viewership figures of a TV series
    * Average annual CO2 emissions
    * Quarterly sales figures of a retailer
    * Monthly production figures of a factory
    * Annual % growth of GDP of an economy
    * etc. etc. etc.
    
## Example 1 

To gain an better understanding of how time series operate, we will examine the monthly sales figures for a mobile phone __Model M__ sold by a particular retailer. The monthly sales figures over a 24 month period are given in the table below:

|  Month  | Monthly Sales Figures (Year 1) | Monthly Sales Figures (Year 2)  |
|---------|--------------------------------|---------------------------------|
|   Jan   |               197              |             296                 |
|   Feb   |               211              |             276                 |
|   Mar   |               203              |             305                 |
|   Apr   |               247              |             308                 |
|   May   |               239              |             356                 |
|   Jun   |               269              |             393                 |
|   Jul   |               308              |             363                 |
|   Aug   |               262              |             386                 |
|   Sep   |               258              |             443                 |
|   Oct   |               256              |             308                 |
|   Nov   |               261              |             358                 |
|   Dec   |               288              |             384                 |

* To begin the times series analysis of this data, we create a data vector __MS__ (Monthly Sales) to contain the sales data:
```{r}
MS<-c(197,211,203,247,239,269,308,262,258,256,261,288,296,276,305,308,356,393,363,386,443,308,358,384)
MS
length(MS)
```
*Next we need to create a _time vector_ against which these sales figures are plotted:
    * We need a vector with the same length as __MS__, i.e. with 24 entries, starting at 1 and increasing        in increments of 1.
    * We can automate a lot of this for future examples using the __length()__ and __seq()__ functions
```{r}
Time <- seq(1,length(MS),1)
Time
length(Time)
```
* This creates a sequence of values starting at 1, ending at __length(MS)__ and increasing with a step-size 1.

* __Time__ and __MS__ now both have 24 entries, so we can plot them on the same graph:
```{r}
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
```
* The function __lines()__ indicates that a line should be drawn between each of the data points of the time series. 
* Recall that the argument __pch__ appearing in __plot()__ selects the type of marker used to mark the data points.  Its possible values are 1 to 26.

## Forecasting
* Recall from lectures that we used a __linear regression model__ to model the data in this time series. This model was given by
\[
\hat{y}_t=198.03+8.07t
\]
where $t$ referred to a month number.

* We will now create our own __R function__ corresponding to this, which we are going to call __Forecast1__

```{r}
Forecast1 <- function(t){
  198.03+8.07*t
}
```
* The values predicted by this model at each of the months in __Time__ are now given by
```{r}
Forecast1(Time)
```
## Mean Absolute Deviation (MAD) & Mean Square Error (MSE)
* Recall that the __Mean Absolute Deviation__  (__MAD__) of a model, was given by

\[
MAD= \frac{1}{n}\sum_{t=1}^{N}\vert y_{t}-\hat{y}_t\vert
\]
    
  * $y_t$ denotes the actual value of the variable $y$ at time $t$
  * $\hat{y}_t$ denotes the predicted value of the variable $y$ at time $t$
  * $n$ is the number of observations we have, i.e. the number of actual values $y_t$.
  * __R__ will calculate this for us automatically as follows:
```{r}
MAD1<-mean(abs(MS-Forecast1(Time)))
MAD1
```
* The code and the formula correspond as follows

    1. __MS__ $\leftrightarrow y_t$, 
    
    2. __Forecast(Time)__ $\leftrightarrow \hat{y}_t$
    
    3. __abs(MS-Forecast1(Time))__ $\leftrightarrow \vert y_t-\hat{y}_t\vert$
    
    4. __mean(abs(MS-Forecast1(Time)))__ $\leftrightarrow \frac{1}{n}\sum_{t=1}^{n}\vert y_t-\hat{y}_t\vert$

* Recall also that the __Mean Square Error__  (__MSE__) of a model is given by
\[
MSE=\frac{1}{n}\sum_{t=1}^{n}\vert y_t-\hat{y}_t\vert^2
\]

## Exercise 1

Modify this code block to find the __MSE__ of the model. 

```{r}
MSE1<-mean(abs(MS-Forecast1(Time))^2)
MSE1
```


# Prediction Intervals
* We can also use the model to forecast future sales values of phone model M.

* To find the __90% Prediction Interval__ of the model at month 27 say, we can use thee following formula for this interval
\[
  \hat{y}_{x^{*}}\pm t_{\frac{\alpha}{2},n-2}\sqrt{MSE\left(1+\frac{1}{n}+\frac{x^{*}-\bar{x}}{\sum_{i=1}^{n}(x_i-\bar{x})^2}\right)}
\]
* The symbols in the formula have the following meaning
    1. $x^{*}=27$, the month number we want to predict sales figures for
    2. $\hat{y}_{x^{*}} = \hat{y}_{27}$, the predicted sales for month 27
    3. $\alpha = 1-\frac{90}{100}=0.1$, the confidence parameter fro the 90% Prediction Interval
    4. $n=24$, the number of actual data values we have
    6. $n-2$ the number of __degrees of freedom (df)__ used 
    7. $t_{\frac{\alpha}{2},n-2}$, the critical $t$-value given these parameters
    8. $MSE$ the mean square error of the model
    9. $\bar{x}$ the mean month number, in this case $\bar{x}=\frac{1}{24}\sum_{i=1}^{24}i = \frac{1+2+\ldots+24}{24}$
* To find the critical value

$t_{\frac{\alpha}{2},n-2}$ we us the 
```{r}
t_star = abs(qt(0.05,df=22)) # df= Number of months-2
```
$x^*$
```{r}
x_star=27
```

$y^*$

```{r}
y_star=Forecast1(27)
y_star
```
MSE
```{r}
MSE1
```

$\bar{x}$

```{r}
x_bar=mean(Time)
```

$\Sigma_{i=1}^{n}(x_i-\bar{x})^2$
```{r}
Sum1=sum((Time-x_bar)^2)
```

### Upper boundary of CI

```{r}
y_star+t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))
```
### Lower boundary of CI

```{r}
y_star-t_star*sqrt(MSE1*(1+1/24+(x_star-x_bar)/Sum1))
```

## The 90% Prediction Interval
We are 90% confident that sales in the 27th month will be between 362 and 469 units. 



# Exercise 2
Find the 90% prediction interval for sales in the 27th month.

# Exercise 3

The closing values of Apple Inc. (AAPL) Stock on the NASDAQ Stock Exchange from 8 August 2017 to 8 November 2017 are given in the data file __AppleQuotes(3M).csv__, available on Moodle.
(Available at http://www.nasdaq.com)


Using this data set answer the following:

1. Import the data in this file using the following and call the data structure __AAPL__
```{r}
AAPL<-read.csv('AppleQuotes(3M).csv')
```
2. Create __two__ data vectors from this file, one for the __closing value__ and one for the __day__

```{r}

```
3. Create a time series plot for this data.

4. From this data plot, determine if there is a trend in the closing value of Apple stock over the past 3 months.

5. Use the function __lm(Closing Value ~ Day)__ to create a linear regression model for this data.
```{r}
lm(Closing~Day)
```
6. Create a linear model to forecast this data.

7. Create an __R__ function to represent this model.

8. Find the MAD and MSE of this model

9. Find the 95% prediction interval for the closing price of Apple Stock in 10 days from now.


# Exercise 4

The closing values of Google Inc. (GOOGL) Stock on the NASDAQ Stock Exchange from 8 November 2016 to 8 November 2017 are given in the data file __GoogleQuotes(3M).csv__, available on Moodle .
(Available at http://www.nasdaq.com).


Using this data set answer the following:

1. Import the data in this file using the following and call the data structure __GOOGL__
```{r}
GOOGL<-read.csv('GoogleQuotes(1Y).csv')
```
2. Create __two__ data vectors from this file, one for the __trading value__ and one for the __day__

3. Create a time series plot for this data.

4. From this data plot, determine if there is a trend in the closing value of Google stock over the past year.

5. Use the function __lm(Trading Value ~ Day)__ to create a linear regression model for this data.
```{r}

```
6. Create a linear model to forecast this data.

7. Create an __R__ function to represent this model.

8. Find the __MAD__ and __MSE__ of this model

9. Find the 90% and 99% prediction intervals for the  trading volume of Google Stock in 10 days from now.

# Exercise 5

The % Growth in GDP of Chin, the UK, the US, Ireland, the EU, the OECD and the World, for the years 1961-2016 are given in the data file __RegionalGDPGrowth(1961-2016).csv__. Import this data file into __R__ and answer the following questions. 
(Available at http://www.worldbank.org)


1. Create a data vector for the __Year__  a separate vector for the GDP growth of each country in the data file.

2. Use the function __par(mfrow=c(A,B))__ to create a collection of time-series plots in __A=1__ row and __B=2__ columns for the GDP growth of China and the US
 To illustrate how this function works, the time-series plot from __Example 1__ is plotted in __1__ row and __2__ columns  
```{r}
par(mfrow=c(1,2))
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
plot(Time,MS,pch=15,col="red",ylab="Monthly Sales Figures", xlab="Month",main="Monthly Sales Figures of Phone Model M")
lines(Time,MS)
```
3. Use the function __par(mfrow=c(A,B))__ to create a collection of time-series plots in __A=3__ row and __B=1__ columns for the GDP growth of China and the US, the UK

 
4. Use the function __par(mfrow=c(A,B))__ to create a collection of time-series plots in __A=2__ row and __B=2__ columns for the GDP growth of the EU, the US, the OECD and the World.

5. Is there any apparent trend in economic growth observable from these time-series.

6. From the time-series plots, which region has shown the most consistent economic growth between 1961 and 2016.
