1 Introduction

This assignment looks at the conceptual understanding of decomposing time series and forecasting with decomposing. We want to enhance our conceptual understanding of methods of decomposition and forecasting. Also want to find the appropriate training size to produce the best performance.

1.1 Data Description

  • Month- Month of the year (in numbers. Ex: 1-Jan, 2-Feb…)
  • Year- The year it was
  • LowTemp- Lowest temperature
  • HighTemp- Highest temperature
  • WarmestMin- The lowest warm temperature
  • ColdestHigh- The highest cold temperature
  • AveMin-The average minimum temperature
  • AveMax- The average maximum temperature
  • meanTemp- The mean temperature
  • TotPrecip- The total precipitation
  • TotSnow- The total snow
  • Max24hrPrecip- The maximum amount of precipitation in 24 hours

2 Define time series object

Since this is monthly data, frequency =12 will be used the define the time series object.

US bond monthly rates

US bond monthly rates

2.1 Forecasting with Decomposing

The following visual representations show the different behaviors of the two methods of decomposition.

Classical decomposition of additive time series

Classical decomposition of additive time series

STL decomposition of additive time series

STL decomposition of additive time series

The second model seems to be better at visualizing the trends of the data. Looking at the decomposition visuals we can see that there is a maximum around 2015 and a minimum around 2018. Looking at the graphs both look easy to interpret but the second one has a more simple approach to showing you the trend. The second decomposing model has a more smooth way of showing us the data compared to the first.

We next perform error analysis.

Error comparison between forecast results with different sample sizes
MSE MAPE
n.144 0.1555844 0.0001788
n.109 0.1603712 0.0001822
n. 73 0.1732466 0.0001917
n. 48 0.1981214 0.0002088

Now we can see the values for the MSE and MAPE. We used the same algorithm with 4 different sample sizes and compared the resulting accuracy measures. The sample size of 144 gives the lowest MSE and lowest MAPE. This means that it is the best model due to its lower error. It seems like the n=144 model outperforms the rest of the models. We are confident that the larger sample size (n=144) will be a good representative of the data. We do not have any concerns of things like over fitting since we have a large sample size.

3 Conclusion

We just showed the initial time series, the decomposition of the time series, and the error analysis. We concluded which graph we thought was better and then talked about the best value for n that gave us the lowest error which was n=144.

---
title: "Seattle Weather Time Series"
author: "Ryan Lebo"
date: "2024-11-27"
output: 
  html_document:
    toc: yes
    toc_depth: 4
    toc_float: yes
    fig_width: 4
    fig_caption: yes
    number_sections: yes
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    theme: lumen
  word_document:
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
  pdf_document:
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
editor_options:
  chunk_output_type: inline
slways_allow_html: true
---

```{=html}

<style type="text/css">

/* Cascading Style Sheets (CSS) is a stylesheet language used to describe the presentation of a document written in HTML or XML. it is a simple mechanism for adding style (e.g., fonts, colors, spacing) to Web documents. */

h1.title {  /* Title - font specifications of the report title */
  font-size: 24px;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}
h4.author { /* Header 4 - font specifications for authors  */
  font-size: 20px;
  font-family: system-ui;
  color: DarkRed;
  text-align: center;
}
h4.date { /* Header 4 - font specifications for the date  */
  font-size: 18px;
  font-family: system-ui;
  color: DarkBlue;
  text-align: center;
}
h1 { /* Header 1 - font specifications for level 1 section title  */
    font-size: 22px;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: center;
}
h2 { /* Header 2 - font specifications for level 2 section title */
    font-size: 20px;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - font specifications of level 3 section title  */
    font-size: 18px;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - font specifications of level 4 section title  */
    font-size: 18px;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

body { background-color:white; }

.highlightme { background-color:yellow; }

p { background-color:white; }

</style>
```
```{r setup, include=FALSE}
# Detect, install, and load packages if needed.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("leaflet")) {
   install.packages("leaflet")
   library(leaflet)
}
if (!require("EnvStats")) {
   install.packages("EnvStats")
   library(EnvStats)
}
if (!require("MASS")) {
   install.packages("MASS")
   library(MASS)
}
if (!require("phytools")) {
   install.packages("phytools")
   library(phytools)
}
if (!require("mlbench")) {
   install.packages("mlbench")
   library(mlbench)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ISwR")) {
   install.packages("ISwR")
   library(ISwR)
}
if (!require("ggplot2")) {
   install.packages("ggplot2")
   library(ggplot2)
}
if (!require("forecast")) {
   install.packages("forecast")
   library(forecast)
}
knitr::opts_chunk$set(echo = FALSE,  
                   warning = FALSE,  
                                     
                   message = FALSE,  
                   results = TRUE,  
                   comment = FALSE   
                      )   
```

```{r}
s_weath <- read.csv("https://raw.githubusercontent.com/RyanLebo/STA-321/refs/heads/main/seattle_weather.csv", header = TRUE)
n.row = dim(s_weath)[1]
data.sea.weath = s_weath[(n.row-150):n.row, ]
```

# Introduction
This assignment looks at the conceptual understanding of decomposing time series and forecasting with decomposing. We want to enhance our conceptual understanding of methods of decomposition and forecasting. Also want to find the appropriate training size to produce the best performance.

## Data Description

* Month- Month of the year (in numbers. Ex: 1-Jan, 2-Feb...)
* Year- The year it was
* LowTemp- Lowest temperature 
* HighTemp- Highest temperature 
* WarmestMin- The lowest warm temperature 
* ColdestHigh- The highest cold temperature 
* AveMin-The average minimum temperature
* AveMax- The average maximum temperature
* meanTemp- The mean temperature
* TotPrecip- The total precipitation
* TotSnow- The total snow
* Max24hrPrecip- The maximum amount of precipitation in 24 hours


# Define time series object

Since this is monthly data, frequency =12 will be used the define the time series object.

```{r  fig.align='center', fig.width=5, fig.height= 3, fig.cap="US bond monthly rates"}
seaweath.ts = ts(data.sea.weath[,9], frequency = 12, start = c(2008, 1))
par(mar=c(2,2,2,2))
plot(seaweath.ts, main="Seattle Monthly Mean Temperature ", ylab="MeanTemp", xlab="year")
```

## Forecasting with Decomposing

The following visual representations show the different behaviors of the two methods of decomposition.

```{r fig.align='center', fig.cap= "Classical decomposition of additive time series", fig.width=6, fig.height=4}
sea.decomp = decompose(seaweath.ts)
par(mar=c(2,2,2,2))
plot(sea.decomp, xlab="")
```



```{r fig.align='center', fig.cap= "STL decomposition of additive time series", fig.width=7, fig.height=4}
sea2.decomp=stl(seaweath.ts, s.window = 12)
par(mar=c(2,2,2,2))
plot(sea2.decomp)
```

The second model seems to be better at visualizing the trends of the data. Looking at the decomposition visuals we can see that there is a maximum around 2015 and a minimum around 2018. Looking at the graphs both look easy to interpret but the second one has a more simple approach to showing you the trend. The second decomposing model has a more smooth way of showing us the data compared to the first. 


```{r}
ini.data = data.sea.weath[,2]
n0 = length(ini.data)
##
train.data01 = data.sea.weath[1:(n0-7), 2]
train.data02 = data.sea.weath[37:(n0-7), 2]
train.data03 = data.sea.weath[73:(n0-7), 2]
train.data04 = data.sea.weath[97:(n0-7), 2]
## last 7 observations
test.data = data.sea.weath[(n0-6):n0,2]
##
train01.ts = ts(train.data01, frequency = 12, start = c(2008, 1))
train02.ts = ts(train.data02, frequency = 12, start = c(2011, 1))
train03.ts = ts(train.data03, frequency = 12, start = c(2014, 1))
train04.ts = ts(train.data04, frequency = 12, start = c(2016, 1))
##
stl01 = stl(train01.ts, s.window = 12)
stl02 = stl(train02.ts, s.window = 12)
stl03 = stl(train03.ts, s.window = 12)
stl04 = stl(train04.ts, s.window = 12)
## Forecast with decomposing
fcst01 = forecast(stl01,h=7, method="naive")
fcst02 = forecast(stl02,h=7, method="naive")
fcst03 = forecast(stl03,h=7, method="naive")
fcst04 = forecast(stl04,h=7, method="naive")
```

We next perform error analysis.

```{r}
## To compare different errors, we will not use the percentage for MAPE
PE01=(test.data-fcst01$mean)/fcst01$mean
PE02=(test.data-fcst02$mean)/fcst02$mean
PE03=(test.data-fcst03$mean)/fcst03$mean
PE04=(test.data-fcst04$mean)/fcst04$mean
###
MAPE1 = mean(abs(PE01))
MAPE2 = mean(abs(PE02))
MAPE3 = mean(abs(PE03))
MAPE4 = mean(abs(PE04))
###
E1=test.data-fcst01$mean
E2=test.data-fcst02$mean
E3=test.data-fcst03$mean
E4=test.data-fcst04$mean
##
MSE1=mean(E1^2)
MSE2=mean(E2^2)
MSE3=mean(E3^2)
MSE4=mean(E4^2)
###
MSE=c(MSE1, MSE2, MSE3, MSE4)
MAPE=c(MAPE1, MAPE2, MAPE3, MAPE4)
accuracy=cbind(MSE=MSE, MAPE=MAPE)
row.names(accuracy)=c("n.144", "n.109", "n. 73", "n. 48")
kable(accuracy, caption="Error comparison between forecast results with different sample sizes")
```



```{r}
par(mfrow=c(1,2))
plot(1:4, MSE, type="b", col="darkred", ylab="Error", xlab="",
   
     main="MSE", axes=FALSE)
labs=c("n=144", "n=109", "n=73", "n=48")
axis(1, at=1:4, label=labs)
axis(2)

text(1:4, MAPE+0.03, as.character(round(MAPE,4)), col="blue", cex=0.7)
text(1:4, MSE-0.03, as.character(round(MSE,4)), col="darkred", cex=0.7)
legend(1.5, 0.63, c("MSE", "MAPE"), col=c("darkred","blue"), lty=1, bty="n", cex=0.7)

plot(1:4, MAPE, type="b", col="darkred", ylab="Error", xlab="",
  
     main="MAPE", axes=FALSE)
labs=c("n=144", "n=109", "n=73", "n=48")
axis(1, at=1:4, label=labs)
axis(2)

```

Now we can see the values for the MSE and MAPE. We used the same algorithm with 4 different sample sizes and compared the resulting accuracy measures. The sample size of 144 gives the lowest MSE and lowest MAPE. This means that it is the best model due to its lower error. It seems like the n=144 model outperforms the rest of the models. We are confident that the larger sample size (n=144) will be a good representative of the data. We do not have any concerns of things like over fitting since we have a large sample size.

# Conclusion

We just showed the initial time series, the decomposition of the time series, and the error analysis. We concluded which graph we thought was better and then talked about the best value for n that gave us the lowest error which was n=144.




