Assignment1-Rpubs-.knit

Introduction
Loading Require Packages and Data File
Descriptive analysis of Time Series Data
Modelling (Model Fitting)
4.1. Linear Trend Model in Time
4.2. Quadratic Trend Model in Time
4.3. Seasonal or Cyclical Trend Model in Time
4.4. Cosine or Harmonic Trend Model in Time
Residual Analysis and Diagnosis
5.1. Residual Analysis and Diagnosis of Linear Trend Model
5.2. Residual Analysis and Diagnosis of Quadratic Trend Model
5.3. Residual Analysis and Diagnosis of Seasonal or Cyclical Trend Model
5.4. Residual Analysis and Diagnosis of Cosine or Harmonic Trend Model
Choosing the Best Fit Model
Forecasting Using Best Fit Model
Proposed Solution
Conclusion
References

1. Introduction

In this report, we will analyse dataset containing the return (in AUD 100,000) of a share market trader’s investment portfolio based on 127 observations out of possible 252 trading days in year. We will assume all the data are collected in the same year and consecutives days, thus, will disregard weekend day. As Friday and Monday will be taken as consecutive days, there will not be any adjustments for weekend days.
- From time series plot, we will look for Trend, Seasonality, Changing Variance, Behaviours and Change Point.
- Afterwards, will find the best fitting model among Linear, Quadratic, Seasonal/ Cyclical and Cosine/ Harmonic by implementing models and using residual analysis.
- Lastly, based on the best model, prediction of next 15 trading days will be made.

2. Setting Up Packages and Loading Data File

Package TSA will be loaded using code as it’s useful for time series analysis and working directory is set as well as data will be loaded as below.

# Set working directory not to repeat the path to the folder while loading data
setwd("~/Desktop/2nd Year 1st Sem/Time Series/Assignment 1")
return <- read.csv("assignment1Data2023.csv")
head(return)

##   X        x
## 1 1 150.8031
## 2 2 152.2122
## 3 3 151.4941
## 4 4 150.6815
## 5 5 151.0155
## 6 6 151.0590

As the current data frame of our dataset doesn’t have column names in R, we will rename those columns as follows.

colnames(return) <- c("Day","Value")
head(return)

##   Day    Value
## 1   1 150.8031
## 2   2 152.2122
## 3   3 151.4941
## 4   4 150.6815
## 5   5 151.0155
## 6   6 151.0590

class(return)

## [1] "data.frame"

3. Descriptive analysis of Time Series Data

Time Series plot of the dataset will be looked at first. To work with time series analysis packages, we need to convert the data type of our dataset to time series. Thus, we will create a time series object named as returnts as below. The start of the series is 1, end of the series is 127 and, the frequency of series is 1 as it is daily dataset. The start and end of the series is known by taking a look at dataset. The observation starts from day 1 and end in day 127.

# Convert to the TS object
returnts <- ts(as.vector(return$Value), start=1, end = 127)
returnts

## Time Series:
## Start = 1 
## End = 127 
## Frequency = 1 
##   [1] 150.80305807 152.21215799 151.49405248 150.68153801 151.01548379
##   [6] 151.05896873 152.24529300 154.32836585 154.99211320 151.47319862
##  [11] 148.65887120 149.42475661 149.76723188 152.79419244 156.32484804
##  [16] 155.47711985 148.64751873 144.28974550 145.15092082 145.12823085
##  [21] 150.98001094 155.18967939 152.31702307 141.41912041 135.88269351
##  [26] 137.35092521 136.38396383 145.79343214 151.98285080 146.45194985
##  [31] 130.45934365 122.22024827 123.96194926 123.76340190 137.00970421
##  [36] 143.94322947 135.92688865 117.26486197 108.57562583 109.74256049
##  [41] 110.60954367 129.01589743 135.49843374 124.31976202 104.65195279
##  [46]  95.26900949  94.29487663  94.37477787 117.53027695 123.77761512
##  [51] 109.88510644  88.11615162  78.25287150  76.08756434  75.57225471
##  [56] 102.50237665 108.08497244  92.67132160  69.40893637  58.84454029
##  [61]  56.13468662  56.91251923  87.40934899  93.06782601  76.51780460
##  [66]  52.35868054  41.32878155  38.34250451  40.69185825  74.43156392
##  [71]  81.05790547  63.98375651  38.45748121  27.31432454  24.78323409
##  [76]  29.52859185  64.81749992  71.96911598  54.21016437  26.14338285
##  [81]  14.02972048  10.95491723  17.52416632  52.46378082  59.97072454
##  [86]  43.17610325  12.84834639   0.01752939  -4.03357918   3.83155944
##  [91]  38.65303227  47.93226560  32.04728507   0.78978042 -12.15661705
##  [96] -16.54803490  -6.29142400  28.71521115  39.03123618  22.97984606
## [101] -10.09253006 -23.88937164 -28.87604996 -15.81700422  21.48876386
## [106]  33.26822703  17.48928106 -16.72770955 -32.43157779 -38.77786096
## [111] -21.00997026  18.81868817  32.96428210  16.06261533 -19.34561245
## [116] -36.37031978 -44.01729753 -20.96522767  21.14554925  37.57495177
## [121]  19.41408315 -16.29602394 -35.55475699 -44.62194939 -17.30734816
## [126]  25.25656020  43.71448310

summary(returnts)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -44.62   21.32   71.97   71.35  135.90  156.32

According to summary statistics below, minimum is -44.62, 1st Quantile is 21.32, Median is 71.97, Mean is 71.35, 3rd Quantile is 135.90 and Max is 156.32 (in 100,000 AUD).

#Time Series Plot
plot(returnts,
     type = "o",
     xlab = "Time",
     ylab = "Investment Return (in AUD 100,000)",
     main = " Time series plot of share market trader's investment return (127 days)",
     cex.main = 0.9
     )

Now, we will focus on five bullet points for investment return series as follows. From the above graph, the following five behaviour are observed.

Trend: An overall downward trend is observed with small changing local trends. (The pink line in above Figure)
Seasonality: Repeating pattern is obviously noticed. (The navy line in in above Figure)
Changing Variance: Fluctuations are getting larger over the time. So, changing variance is present is this time series. (The green line in above Figure).
Behaviours (Moving Average/ Autoregressive): Succeeding time points are not spotted, so are is no autoregressive behaviours. However, moving average is present, points are going up and down smoothly. (The orange line in above Figure.
Intervention/ Change Point: There is no sudden dip or peak point changes in the series, there’s no intervention or change point.

Afterwards, we will find out impact of yesterday’s value on today’s value. For this, zlag() function which generates first and subsequent lags of time series will be used. We create variable y with our time series and variable x with first lag of series.

#Finding correlation from lags
y = returnts
head(y)

## [1] 150.8031 152.2122 151.4941 150.6815 151.0155 151.0590

# Generate first lag of investment return series
x = zlag(returnts)
head(x)

## [1]       NA 150.8031 152.2122 151.4941 150.6815 151.0155

In first lag, the first value is denoted as NA because as it is first lag and values of x are being push to right by one value. Thus, we create variable name index to get rid of that NA value in x afterwards find correlation.

#Create an index to get rid of first NA value in x
index = 2:length(x)
#Calculate correlation between numerical values in x and y
cor(y[index],x[index])

## [1] 0.9635593

plot(y[index],x[index], 
     ylab = "Investment Return Series", 
     xlab = "First lag of Investment Return series", 
     main = "Scatter plot of Investment Return Series and its first lag",
     cex.main = 0.9)

Correlation between original series and first lag is 0.96 meaning yesterday has impact on today. As you can see from Figure 1, the plot is going up and down smoothly rather than going down and up suddenly, thus, autocorrelation is expected as shown in Figure 2. If moving average behaviour is present, we can expect the significant correlation in lags.

4. Modelling (Model Fitting)

We aim to find best-fitting model by trying out and analysing various model. Our series tend to have deterministic trend so, we will model this trend with 4 trend models:
1. Linear Trend
2. Quadratic Trend
3. Seasonal or Cyclical Trend
4. Cosine or Harmonic Trend
From those models, we know that seasonality can’t be captured with linear and quadratic trend model. With frequency of 1, the model assumes there is no seasonality and hence it will fail to capture seasonality. Thus, we will use frequency of greater than 1 in seasonal and cosine trend model with the purpose of capturing seasonality. We will find frequency of time series by constructing ACF Plot as below and the plot is shown in following figure.

#Finding frequency from ACF Plot
acf(returnts,lag.max = 50, main="ACF Plot of Investment Return Series Over 50 lag")

From the Figure, we can see that the pattern is repeating every 6 or 7 lags. By trying out both of those frequencies, frequency of 7 has better statistics and diagnosis than frequency of 6. Thus, we will choose frequency of 7 for seasonal and cosine model.
We will create new time series object seasonal_returnts with frequency of 7 and its’ time series plot as below and the plot is shown in above Figure.

#Creating new time series with frequency of 7 for seasonal and cosine trend model
seasonal_retrunts <- ts(data = as.vector(return$Value), start = 1, frequency = 7)
seasonal_retrunts

## Time Series:
## Start = c(1, 1) 
## End = c(19, 1) 
## Frequency = 7 
##   [1] 150.80305807 152.21215799 151.49405248 150.68153801 151.01548379
##   [6] 151.05896873 152.24529300 154.32836585 154.99211320 151.47319862
##  [11] 148.65887120 149.42475661 149.76723188 152.79419244 156.32484804
##  [16] 155.47711985 148.64751873 144.28974550 145.15092082 145.12823085
##  [21] 150.98001094 155.18967939 152.31702307 141.41912041 135.88269351
##  [26] 137.35092521 136.38396383 145.79343214 151.98285080 146.45194985
##  [31] 130.45934365 122.22024827 123.96194926 123.76340190 137.00970421
##  [36] 143.94322947 135.92688865 117.26486197 108.57562583 109.74256049
##  [41] 110.60954367 129.01589743 135.49843374 124.31976202 104.65195279
##  [46]  95.26900949  94.29487663  94.37477787 117.53027695 123.77761512
##  [51] 109.88510644  88.11615162  78.25287150  76.08756434  75.57225471
##  [56] 102.50237665 108.08497244  92.67132160  69.40893637  58.84454029
##  [61]  56.13468662  56.91251923  87.40934899  93.06782601  76.51780460
##  [66]  52.35868054  41.32878155  38.34250451  40.69185825  74.43156392
##  [71]  81.05790547  63.98375651  38.45748121  27.31432454  24.78323409
##  [76]  29.52859185  64.81749992  71.96911598  54.21016437  26.14338285
##  [81]  14.02972048  10.95491723  17.52416632  52.46378082  59.97072454
##  [86]  43.17610325  12.84834639   0.01752939  -4.03357918   3.83155944
##  [91]  38.65303227  47.93226560  32.04728507   0.78978042 -12.15661705
##  [96] -16.54803490  -6.29142400  28.71521115  39.03123618  22.97984606
## [101] -10.09253006 -23.88937164 -28.87604996 -15.81700422  21.48876386
## [106]  33.26822703  17.48928106 -16.72770955 -32.43157779 -38.77786096
## [111] -21.00997026  18.81868817  32.96428210  16.06261533 -19.34561245
## [116] -36.37031978 -44.01729753 -20.96522767  21.14554925  37.57495177
## [121]  19.41408315 -16.29602394 -35.55475699 -44.62194939 -17.30734816
## [126]  25.25656020  43.71448310

#Time Series Plot
plot(seasonal_retrunts,
     type = "o",
     xlab = "Time",
     ylab = "Investment Return (in AUD 100,000)",
     main = " Time series plot of share market trader's investment return (127 days)",
     cex.main = 0.9)

Time series object named returnts which is created with frequency of 1 will be used for linear and quadratic trend model.
Time series object named seasonal_returnts which is created with frequency of 7 will be used for linear and quadratic trend model.

4.1. Linear Trend Model in Time

The deterministic linear tend model is $\mu_t = \beta_0 + \beta_1 t$ where $\beta_0$ is intercept and $\beta_1$ is the slope of the linear trends. In terms of time series data $Y = \mu_t + e_t$ where $mu_t$ is the trend component that we want to model or remove and $e_t$ contains leftover after capturing trend.
We will fit linear trend line to time series data. In order to do that, we will extract time from time series object using time() function and afterwards use lm() function which fit linear model using our time series returnts and time t into model1.

#First model : Linear Trend model in time
#Time Series Plot
t <- time(returnts) 
t

## Time Series:
## Start = 1 
## End = 127 
## Frequency = 1 
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127

model1 <- lm(returnts ~ t)  #Label the model

The linear trend model of time series is plotted as below and plot is shown as below.

plot(returnts,
     ylab = "Investment Return (in AUD 100,000)",
     xlab='Time',
     type='o', 
     main = "Linear trend line to share market trader's investment return series",
     cex.main = 0.9)
abline(model1)

summary(model1)

## 
## Call:
## lm(formula = returnts ~ t)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.053 -17.959   2.056  15.095  72.799 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 173.38533    3.85513   44.98   <2e-16 ***
## t            -1.59425    0.05227  -30.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.59 on 125 degrees of freedom
## Multiple R-squared:  0.8816, Adjusted R-squared:  0.8806 
## F-statistic: 930.3 on 1 and 125 DF,  p-value: < 2.2e-16

From summary() functions, we get detailed statistics of the model1 as above.
We will perform hypothesis testing.
H0: $\beta_0 = 0$ H1: $\beta_0!=0$
H0: $\beta_1 = 0$ H1: $\beta_1!=0$
For intercept ($\beta_0$) and linear time component t ($\beta_1$), p-value is <2e-16***. As it’s less than 0.05, we will reject null hypothesis for $\beta_0$ and $\beta_1$ Thus, Linear trend coefficient is statistically significant.
The overall p-value 2.2e-16 is also less that 0.05, so we can reject null hypothesis. Reject null hypothesis implies that linear model fits the series and R-squared value 0.8806 (88%) variation in investment return series can be explained by Linear Trend Model. This R-squared value can be considered as ideal value.
Taking a look at residual, the range is a bit huge in the error and the median is 2.056 implies that we are expecting not very super symmetrical residuals. The best-case scenario of median should be near 0. However, we will deep dive more into residuals by doing more analysis in the later part of this report.

4.2 Quadratic Trend Model in Time

The deterministic linear tend model is $\hat{u}_t$= $\beta_0$+ $\beta_1$ t+ $\beta_0$ t^2 where $\beta_0$ is intercept, $\beta_1$ corresponds to quadratic trend in time and $\beta_2$ to quadratic trend in time.
We will fit linear trend line to time series data. In order to do that, we will extract time from time series object using time() function and afterwards use lm() function which fit quadratic model using our time series returnts and time t into model2. Here, we need t^2 since it’s a quadratic trend model.

#Second model : Quadratic Trend model in time
t = time(returnts) # Get time points
t2 = t^2 # Create t^2
#Label the model 
model2 = lm(returnts ~ t + t2)

The quadratic trend model of time series is plotted as below and plot shown in following Figure.

plot(
  ts(fitted(model2)),
  ylab = "Investment Return (in AUD 100,000)",
  xlab = "Time",
  main = "Fitted quadratic curve to share market trader's investment return series",
  cex.main = 0.9,
  ylim = c(min(c(fitted(model2), as.vector(returnts))) ,
           max(c(fitted(model2), as.vector(returnts)))
  ) 
)
lines(as.vector(returnts),type="o")

summary(model2)

## 
## Call:
## lm(formula = returnts ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.345 -18.387   0.941  15.584  68.478 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 177.915494   5.838752  30.471  < 2e-16 ***
## t            -1.804958   0.210586  -8.571 3.43e-14 ***
## t2            0.001646   0.001594   1.033    0.304    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.59 on 124 degrees of freedom
## Multiple R-squared:  0.8826, Adjusted R-squared:  0.8807 
## F-statistic: 465.9 on 2 and 124 DF,  p-value: < 2.2e-16

From summary() functions, we get detailed statistics of the model2 as above.
We will perform hypothesis testing.
H0: $\beta_0 = 0$ H1: $\beta_0!=0$
H0: $\beta_1 = 0$ H1: $\beta_1!=0$
H0: $\beta_2 = 0$ H1: $\beta_2!=0$
The p-value of intercept and t which are <2e-16*** and 3.43e-14*** respectively are less than 0.05. Thus, we will reject null hypothesis for intercept and linear trend coefficient. However, the p-value of t2 is greater than 0.05, meaning that we fail to reject null hypothesis. In conclusion, the linear trend coefficient is significant whereas quadratic trend coefficient isn’t.
The overall p-value 2.2e-16 is less that 0.05, so we can reject null hypothesis. Reject null hypothesis implies that quadratic model fits the series and R-squared value 0.8807 (88%) variation in investment return series can be explained by Quadratic Trend Model. This R-squared value can be considered as very ideal value.
Taking a look at residual, the range is a bit huge in the error and the median is 0.941 implies that we are expecting more symmetrical residuals than linear trend model. The best-case scenario of median should be near 0. However, we will deep dive more into residuals by doing more analysis in the later part of this report.

4.3. Seasonal or Cyclical Trend Model in Time

We will use season() function to generate seasonal pattern from time series and create a variable named pattern to store them. Then, we create model3 by using lm() which fit seasonal model using our time series seasonal_returnts and the variable season(), and -1 is included as a parameter to remove intercept term.

#Third Model : Seasonal Trend Model in Time
pattern=season(seasonal_retrunts)
model3=lm(seasonal_retrunts ~ pattern -1) # -1 removes the intercept term

Afterwards, I renamed coefficients in my model as Pattern 1, Pattern 2, …, Pattern 7.

names(model3$coefficients) <- c('Pattern 1', 'Pattern 2', 'Pattern 3', 'Pattern 4', 'Pattern 5', 'Pattern 6', 'Pattern 7')

Pattern 1 corresponds to t = 1, 8, 15, …
Pattern 2 corresponds to t = 2, 9, 16, …
…
Pattern 7 corresponds to t = 7, 14, 21, …
The seasonal trend model of time series is plotted as below and plot shown in following Figure.

plot(
  ts(fitted(model3)), 
  ylab = "Investment Return (in AUD 100,000)",
  xlab = "Time",
  main = "Fitted Seasonal model to share market trader's investment return series",
  cex.main = 0.9,
  ylim = c(min(c(fitted(model3), as.vector(seasonal_retrunts))),
           max(c(fitted(model3), as.vector(seasonal_retrunts)))), 
  col = "green" 
) 
lines(as.vector(seasonal_retrunts),type="o")

summary(model3)

## 
## Call:
## lm(formula = seasonal_retrunts ~ pattern - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -96.865 -56.530  -2.747  58.007  98.773 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## Pattern 1    95.81      14.17   6.761 5.27e-10 ***
## Pattern 2    87.23      14.56   5.991 2.24e-08 ***
## Pattern 3    65.06      14.56   4.469 1.80e-05 ***
## Pattern 4    54.72      14.56   3.758 0.000266 ***
## Pattern 5    52.24      14.56   3.588 0.000483 ***
## Pattern 6    58.54      14.56   4.021 0.000102 ***
## Pattern 7    84.50      14.56   5.804 5.39e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61.77 on 120 degrees of freedom
## Multiple R-squared:  0.5979, Adjusted R-squared:  0.5744 
## F-statistic: 25.49 on 7 and 120 DF,  p-value: < 2.2e-16

From summary() functions, we get detailed statistics of the model3 as above.
We can see that all the p-values for the model are less than 0.05, so we can reject null hypothesis and can conclude all the coefficients are significant.
The overall p-value 2.2e-16 is also less that 0.05, so we can reject null hypothesis. Reject null hypothesis implies that seasonal model fits the series and R-squared value 0.5744 (57%) variation in investment return series can be explained by Seasonal Trend Model. However, this R-squared value is not very ideal value but not also very bad value.
Taking a look at residual, the range is huge in the error and the median is -2.747 implies that we are expecting not very symmetrical residuals. The best-case scenario of median should be near 0. However, we will deep dive more into residuals by doing more analysis in the later part of this report.

4.4. Cosine or Harmonic Trend Model in Time

Cyclic mathematical function can also be used to create repeating patterns. One of the functions is cosine curve/ wave which is $\mu_t=\beta_0+\beta_1 cos(2 pi.f.t) + \beta_2 sine(2 pi.f.t)$ where f is the frequency of the curve. As t varies, the curve oscillates within [-$\beta,\beta$] interval.
The curve repeats itself exactly every 1/f time units (1/f is the period of cosine wave).
We will fit cosine curve at the frequency of 1 to investment return series. Then, we create model4 by using lm() which fit cosine model using our time series seasonal_returnts and the har.

#Fouth model : Cosine Trend model in time
har <- harmonic(seasonal_retrunts, 1)
#Label the model
model4 <- lm(seasonal_retrunts ~ har)

The cosine trend model of time series is plotted as below and plot shown in following Figure .

plot(
  ts(fitted(model4)), 
  ylab = "Investment Return (in AUD 100,000)",
  xlab = "Time",
  main = "Fitted cosine wave to share market trader's investment return series",
  ylim = c(min(c(fitted(model4), as.vector(seasonal_retrunts))),
           max(c(fitted(model4), as.vector(seasonal_retrunts)))), 
  col = "green" 
) 
lines(as.vector(seasonal_retrunts),type="o")

summary(model4)

## 
## Call:
## lm(formula = seasonal_retrunts ~ har)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -94.243 -56.084  -6.569  59.584 101.395 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      71.175      5.399  13.184  < 2e-16 ***
## harcos(2*pi*t)   22.608      7.605   2.973  0.00355 ** 
## harsin(2*pi*t)    2.731      7.665   0.356  0.72217    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60.84 on 124 degrees of freedom
## Multiple R-squared:  0.06742,    Adjusted R-squared:  0.05238 
## F-statistic: 4.482 on 2 and 124 DF,  p-value: 0.0132

From summary() functions, we get detailed statistics of the model4 as above.
We can see that sin coefficient and overall model is insignificant as their p-value are greater than 0.05. Also, R-squared value is very low and it’s not something we want to use for future prediction.
Taking a look at residual, the range is a very huge in the error and the median is -6.569 implies that we are expecting not symmetrical residuals in error. The best-case scenario of median should be near 0. We can ignore this model as we will not use this one for future analysis however, we will deep dive more into residuals to see how they are behaving.

5. Residual Analysis and Diagnosis

Residuals are the remaining values or observation which are not able to captured by the model. If the model is suitable and working, the residuals will behave roughly which will look like white noise. If not, pattern, seasonality, trend will be spotted in the residuals. The followings describe the behaviours we expect from plots of residual or standardised residuals and how to implement those plots. Model name needs modification for each model.
- In Time series plot, we want totally random pattern and standardised residuals to be limited between -4 and +4.
- In Histogram plot, we want symmetrical distribution and lies in between -4 and +4.
- In Quantile-Quantile (QQ) plot, we want all the dot stick to reference line as that straight-line pattern support assumption of normally distributed stochastic component.
- In Shapiro-Walk test, we want p-value to be greater than $\alpha=0.05$ in order not to reject null hypothesis which is H0: Data are normally distributed. If we don’t reject null hypothesis, it implies the normality of residuals.
- In Sample Autocorrelation Function (ACF), we want all the bars to be under horizontal dashed confidence limits.

5.1. Residual Analysis and Diagnosis of Linear Trend Model

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model1
## W = 0.97322, p-value = 0.01269

Following are the findings derived from the above analysis.
- Standardized Residuals Plot. The values are not completely random and there’s still a pattern. Seasonality, Changing Variance and Moving average is still present.
- Histogram Plot: Most of the standard residuals are between -1 and 1 and almost symmetrical but it seems to be slight right-skewed distribution.
- QQ Plot: Considerable number of values are lining on the reference line. However, a few points at the beginning and end are further away but it’s still acceptable result.
- ACF Plot: Most of the lags still have significant autocorrelation and they are all in the residuals meaning our model failed to capture most of the important information from time series. The wave pattern suggesting seasonality is still present in time series.
- PACF Plot: Same as ACF, there is still autocorrelation till lag 11. It means we fail to capture most important information from time series.
- Shapiro-wilk test: P-value 0.01269, it’s smaller than $$5. Therefore, we conclude to reject null hypothesis, H_0: Data are normally distributed. Hence, residuals are not normal. If other tests are showing residuals are normal, we can take $\alpha=0.01$ and make residuals normal. Since other tests are also showing other way, we will continue with 0.05.

5.2. Residual Analysis and Diagnosis of Quadratic Trend Model

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model2
## W = 0.97472, p-value = 0.01764

Following are the findings derived from the above analysis.
- Standardized Residuals Plot. The values are not completely random and there’s still a pattern. Seasonality, Changing Variance and Moving average is still present.
- Histogram Plot: Most of the standard residuals are between -1 and 1 and almost symmetrical.
- QQ Plot: Considerable number of values are lining on the reference line. However, a few points at the beginning and end are further away but it’s still acceptable result.
- ACF Plot: Most of the lags still have significant autocorrelation and they are all in the residuals meaning our model failed to capture most of the important information from time series. The wave pattern suggesting seasonality is still present in time series.
PACF Plot: Same as ACF, there is still autocorrelation till lag 11. It means we fail to capture most important information from time series.
Shapiro-wilk test: P-value 0.01764, it’s smaller than $\alpha=0.05$ Therefore, we conclude to reject null hypothesis, H_0: Data are normally distributed. Hence, residuals are not normal. If other tests are showing residuals are normal, we can take $\alpha=0.01$ and make residuals normal. Since other tests are also showing other way, we will continue with 0.05.

5.3. Residual Analysis and Diagnosis of Seasonal or Cyclic Trend Model

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model3
## W = 0.92612, p-value = 3.159e-06

Following are the findings derived from the above analysis.
- Standardized Residuals Plot. The values are not completely random. The downward trend, moving average and a bit seasonality at start and end is still present.
- Histogram Plot: Most of the standard residuals are between -1 and 1 and they are almost symmetrical.
- QQ Plot: Considerable number of values are lining on the reference line. However, nearly half of points at the beginning and end are further away.
- ACF Plot: All of the lags still have significant autocorrelation and they are all in the residuals meaning our model failed to capture important information from time series.
- PACF Plot: Same as ACF, there is still some autocorrelation.
- Shapiro-wilk test: P-value 3.159e-0.6, it’s smaller than $\alpha=0.05$ Therefore, we conclude to reject null hypothesis, H_0: Data are normally distributed. Hence, residuals are not normal.

5.4. Residual Analysis and Diagnosis of Cosine or Harmonic Trend Model

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model4
## W = 0.92615, p-value = 3.17e-06

Following are the findings derived from the above analysis.
- Standardized Residuals Plot. The values are not completely random. The downward trend, moving average and a bit seasonality at start and end is still present.
- Histogram Plot: Most of the standard residuals are between -1 and 1 and they are almost symmetrical.
- QQ Plot: Considerable number of values are lining on the reference line. However, nearly half of points at the beginning and end are further away.
- ACF Plot: All of the lags still have significant autocorrelation and they are all in the residuals meaning our model failed to capture important information from time series.
- PACF Plot: Same as ACF, there is still some autocorrelation.
- Shapiro-wilk test: P-value 3.17e-0.6, it’s smaller than $\alpha=0.05$. Therefore, we conclude to reject null hypothesis, H0: Data are normally distributed. Hence, residuals are not normal.

6. Choosing the Best Fit Model

Taking a look at all the models, all of them doesn’t seem to capture the return investment series very well. Yet, we have to choose one model in order to for further prediction. Based on the analysis and findings from all 4 models:
- Linear Trend model vs Quadratic Trend model: Both of those models are able to capture downward trend very well in the time series. But not moving average and seasonality, thus, the pattern is still presents in the residuals. The autocorrelation is still present as well. And also fail the Shapiro-Walk Test. Despite a bit difference in R-squared value and Histogram of residual, they are quite identical. Thus, if I have to choose 1 model among these two models, I would go with Quadratic Trend model as its’ R-squared value 0.8807 is a bit higher than Linear trend model’s R-squared 0.8807 and Histogram is more symmetric.
- Seasonal Trend model vs Harmonic Trend model: Both of those models are able to capture seasonality almost perfectly but they can’t seem to capture moving average and downward trends which are still present in the residuals. There isn’t much difference in residual analysis for both models. The autocorrelation is still present, downward trend is still there and fail the Shapiro-Walk test. Nevertheless, the R-squared value of Seasonal Trend (0.5744) is much more promising than Harmony Trend model’s R-squared value (0.05238). Thus, I will go with Seasonal Trend Model rather than Harmonic Trend Model.
- Quadratic Trend model vs Seasonal Trend Model: Between those two models, one of them (Quadratic Trend model) can’t capture seasonality and another model (Seasonal Trend Model) can’t capture the downward trend. Both model’s residual analysis is not ideal, however, R-square and ACF of Quadratic trend model is much better. So, I will choose Quadratic Trend Model as the best fit among all four models and proceed the forecasting with that model.

7. Forecasting Using Best Fit Model

Out of four deterministic trend models we fitted, after doing modelling and detailed diagnosis check, we have decided to choose quadratic trend model as the most effective among four since R-squared value and analysis proved more appropriate compared with others. Using this model, we will predict investment return of share market trader for next 15 trading days.
- Our original observation is till day 127 and we want to know for next 15 trading days. Thus, we create variable t and store trading day 128 to 142.
- As we are using quadratic trend model, we need t^2 as well and afterwards create data frame with t and t^2 in it. Then, we can proceed with prediction as follows. The results look as follow.

#Forecasting with best fit model
t = c(128:142)
t2 = t^2
future <- data.frame(t, t2)
forecast <- predict(model2 , future , interval = "prediction") 
print(forecast)

##          fit       lwr        upr
## 1  -26.14887 -70.41401 18.1162589
## 2  -27.53078 -71.89306 16.8315133
## 3  -28.90938 -73.37393 15.5551602
## 4  -30.28470 -74.85674 14.2873401
## 5  -31.65673 -76.34165 13.0281928
## 6  -33.02546 -77.82877 11.7778576
## 7  -34.39090 -79.31827 10.5364729
## 8  -35.75305 -80.81027  9.3041762
## 9  -37.11190 -82.30491  8.0811037
## 10 -38.46746 -83.80232  6.8673905
## 11 -39.81974 -85.30264  5.6631702
## 12 -41.16871 -86.80600  4.4685747
## 13 -42.51440 -88.31253  3.2837342
## 14 -43.85679 -89.82237  2.1087771
## 15 -45.19590 -91.33562  0.9438294

The forecast using Quadratic Trend Model is shown in following Figure.

plot(returnts,
     type='o',
     xlim =c(1,142),
     ylim=c(-90,150),
     ylab = "Investment Return (in AUD 100,000)",
     xlab = "Time",
     main="Forecasts using Quadratic Trend Model Fitted to Investment Return Series",
     cex.main=0.9)
lines(ts(as.vector(forecast[,1]), start = 128), col="red", type="l")
lines(ts(as.vector(forecast[,2]), start = 128), col="blue", type="l") 
lines(ts(as.vector(forecast[,3]), start = 128), col="blue", type="l")
legend("bottomleft", 
       lty=1, 
       pch=1, col=c("black","blue","red"),
       text.width = 20, c("Data","5% forecast limits", "Forecasts"))

In above figure,
- The Black line shows observation for 127 trading days.
- The Red Line shows prediction for next 15 trading days.
- The Blue Line shows upper bound and lower bound of prediction value for next 15 trading days.

8. Recommend Solution

As mentioned in in Section 6 – Choosing Bet Fit Model, all four models aren’t able to capture the pattern in time series very well. Each of the model has its weakness and their residual analysis isn’t good enough to use that model for future analysis. In my opinion, the better opinion for this type of time series pattern is to use the mix of quadratic and seasonal model. The reason is – with quadratic model, we can capture downward trend and with seasonal model, we are able to capture seasonal trends. We will give a try to that model to see whether it could really capture most of the patterns present in our investment return series.
The seasonal trend model of time series is plotted as below and plot shown in following Figure.

#Recommend Solution
#Combination of Cubic Trend Model and Seasonal Model 
t <- time(returnts) 
t2 <- t^2
model5=lm(seasonal_retrunts ~ pattern + t + t2  - 1 )
summary(model5)

## 
## Call:
## lm(formula = seasonal_retrunts ~ pattern + t + t2 - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.649  -9.099  -0.641   9.216  45.273 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## patternMonday    201.206050   4.820589   41.74   <2e-16 ***
## patternTuesday   188.813995   4.932101   38.28   <2e-16 ***
## patternWednesday 168.244133   4.949564   33.99   <2e-16 ***
## patternThursday  159.502619   4.965967   32.12   <2e-16 ***
## patternFriday    158.620533   4.981318   31.84   <2e-16 ***
## patternSaturday  166.512593   4.995628   33.33   <2e-16 ***
## patternSunday    194.064846   5.008908   38.74   <2e-16 ***
## t                 -1.755723   0.139203  -12.61   <2e-16 ***
## t2                 0.001253   0.001054    1.19    0.237    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.26 on 118 degrees of freedom
## Multiple R-squared:  0.9789, Adjusted R-squared:  0.9773 
## F-statistic: 609.4 on 9 and 118 DF,  p-value: < 2.2e-16

x11()
plot(
  ts(fitted(model5)), 
  ylab='y', 
  main = "Fitted seaonal plus quadratic curve to investment return series",
  ylim = c(min(c(fitted(model5), as.vector(seasonal_retrunts))),
           max(c(fitted(model5), as.vector(seasonal_retrunts)))),
  col="red"
)
lines(as.vector(seasonal_retrunts),type="o")

From summary() functions, we get detailed statistics of the model5 as below.

summary(model5)

## 
## Call:
## lm(formula = seasonal_retrunts ~ pattern + t + t2 - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.649  -9.099  -0.641   9.216  45.273 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## patternMonday    201.206050   4.820589   41.74   <2e-16 ***
## patternTuesday   188.813995   4.932101   38.28   <2e-16 ***
## patternWednesday 168.244133   4.949564   33.99   <2e-16 ***
## patternThursday  159.502619   4.965967   32.12   <2e-16 ***
## patternFriday    158.620533   4.981318   31.84   <2e-16 ***
## patternSaturday  166.512593   4.995628   33.33   <2e-16 ***
## patternSunday    194.064846   5.008908   38.74   <2e-16 ***
## t                 -1.755723   0.139203  -12.61   <2e-16 ***
## t2                 0.001253   0.001054    1.19    0.237    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.26 on 118 degrees of freedom
## Multiple R-squared:  0.9789, Adjusted R-squared:  0.9773 
## F-statistic: 609.4 on 9 and 118 DF,  p-value: < 2.2e-16

Following Figure is the plots from residual analysis.

# Residual analysis
par(mfrow=c(2,3))
res.model5 = rstudent(model5)
plot(y = res.model5, 
     x = as.vector(time(seasonal_retrunts)),
     xlab = 'Time',
     ylab='Standardized Residuals',
     type='o',
     main = "Standardised residuals from Seasonal plus Quadratic Trend model")
hist(res.model5,
     xlab='Standardized Residuals', 
     main = "Histogram of standardised residuals")
qqnorm(y=res.model5, 
       main = "QQ plot of standardised residuals")
qqline(y=res.model5, 
       col = 2, 
       lwd = 1, 
       lty = 2)
shapiro.test(res.model5)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model5
## W = 0.97937, p-value = 0.0496

acf(res.model5, main = "ACF of standardized residuals.")
pacf(res.model5, main = "PACF of standardized residuals.")

Based on the above analysis, this model is able to capture more important information. Only a bit seasonality is present in residual. This might be because the model isn’t able to fit the moving average pattern in the return investment series. Maybe, ARIMA model might be a better fit. Although current model looks better than other 4 models, this is not suitable for prediction as R-value 0.9773 mean it’s over fitting.

9. Conclusion

In this report, we fitted four different deterministic trend models, namely, Linear, Quadratic, Seasonal/ Cyclic and Cosine/ Harmonic models to market shareholder’s investment series over 127 days period. All models have a lot of limitation and shortcoming. Yet, from those four models, we chose Quadratic Trend Model as best fit model and did forecasting for 15 more trading days as its’ results are more significant and diagnosis are more acceptable, nevertheless it’s not ideal. Since it’s not ideal, we proposed a combination of quadratic and seasonal model and its’ result looks better than all four models though it’s over fitted and not appropriate for forecasting. In conclusion, since all the models aren’t fitted efficiently in the time series, maybe, time series is stochastic trend instead of deterministic trend. If that’s correct, in my opinion, ARIMA model might be the better fit model than current tested models in this report.

10. References

Hyndman, R., Athanasopoulos, G (n.d.). Forecasting: Principles and Practice. https://otexts.com/fpp2/
Crequeira, V., (n.d). Understanding Time Series Trend. https://towardsdatascience.com/understanding-time-series-trend-addfd9d7764e
Choudhury, S., (2021). Approaches For Time Series Analysis. https://medium.com/analytics-vidhya/approaches-for-time-series-analysis-db27fe252c70

Time Series Analysis of
Share Market Trader’s Investment Return (127 days) and
Prediction for Next 15 Trading Days

MATH1318 - Time Series Analysis Assignment 1 (2022)

Su Myat Noe Yee (s3913797)

Table of Contents

1. Introduction

2. Setting Up Packages and Loading Data File

3. Descriptive analysis of Time Series Data

4. Modelling (Model Fitting)

4.1. Linear Trend Model in Time

4.2 Quadratic Trend Model in Time

4.3. Seasonal or Cyclical Trend Model in Time

4.4. Cosine or Harmonic Trend Model in Time

5. Residual Analysis and Diagnosis

5.1. Residual Analysis and Diagnosis of Linear Trend Model

5.2. Residual Analysis and Diagnosis of Quadratic Trend Model

5.3. Residual Analysis and Diagnosis of Seasonal or Cyclic Trend Model

5.4. Residual Analysis and Diagnosis of Cosine or Harmonic Trend Model

6. Choosing the Best Fit Model

7. Forecasting Using Best Fit Model

9. Conclusion

10. References

Time Series Analysis of Share Market Trader’s Investment Return (127 days) and Prediction for Next 15 Trading Days

MATH1318 - Time Series Analysis Assignment 1 (2022)

Su Myat Noe Yee (s3913797)

Table of Contents

1. Introduction

2. Setting Up Packages and Loading Data File

3. Descriptive analysis of Time Series Data

4. Modelling (Model Fitting)

4.1. Linear Trend Model in Time

4.2 Quadratic Trend Model in Time

4.3. Seasonal or Cyclical Trend Model in Time

4.4. Cosine or Harmonic Trend Model in Time

5. Residual Analysis and Diagnosis

5.1. Residual Analysis and Diagnosis of Linear Trend Model

5.2. Residual Analysis and Diagnosis of Quadratic Trend Model

5.3. Residual Analysis and Diagnosis of Seasonal or Cyclic Trend Model

5.4. Residual Analysis and Diagnosis of Cosine or Harmonic Trend Model

6. Choosing the Best Fit Model

7. Forecasting Using Best Fit Model

8. Recommend Solution

9. Conclusion

10. References

Time Series Analysis of
Share Market Trader’s Investment Return (127 days) and
Prediction for Next 15 Trading Days