Problem_2.1

  1. Use the help function to explore what the series gold , woolyrnq and gas represent.
  1. Use autoplot() to plot each of these in separate plots.
library(httr)    
set_config(use_proxy(url="10.3.100.207",port=8080))
library(ggplot2)
library(forecast)
library(tidyverse)
library(readr)
library(fpp2)

theme_set(theme_classic())
autoplot(gold) + 
  ggtitle('Gold prices: forecast package')

autoplot(woolyrnq) + 
  ggtitle('Woolyrnq prices: forecast package') +
  xlab("Time") +
  ylab("Woolyrnq")

autoplot(gas) + 
  ggtitle('Gas prices: forecast package')

b. What is the frequency of each series? Hint: apply the frequency() function.

## [1] "The frequency in gold dataset is: 1"
## [1] "The frequency in woolyrnq dataset is: 4"
## [1] "The frequency in gas dataset is: 12"
  1. Use which.max() to spot the outlier in the gold series. Which observation was it?
## [1] "The observation that is considered outlier is: 770"
## [1] "The gold price on that observation was: 593.7"

Problem_2.2

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

  1. You can read the data into R with the following script:

  2. Convert the data to time series

mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)

(The [,-1] removes the first column which contains the quarters as we don’t need them now.)

  1. Construct time series plots of each of the three series
autoplot(mytimeseries, facets=TRUE)

Check what happens when you don’t include facets=TRUE

autoplot(mytimeseries)

The visualization grouped by series using a colored labels on the right hand side.

Problem_2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

  1. You can read the data into R with the following script:
library(readxl)
if(!file.exists("retail.xlsx")){
  download.file("https://otexts.com/fpp2/extrafiles/retail.xlsx", "retail.xlsx")
}
retaildata <- readxl::read_excel("retail.xlsx", skip=1)

The second argument (skip=1) is required because the Excel sheet has two header rows.

  1. Select one of the time series as follows (but replace the column name with your own chosen column):
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))
  1. Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
autoplot(myts) 

From the time plot we can assume that there is annual increasing trend in retails, where the the trend is moving upward. Although, between 2000 and 2010 the increas flattened out, after 2010 the retail sales follow the increasing trend again.

ggseasonplot(myts, polar = TRUE)

The polar seasonal plot reveals the underlying seasonal patterns more clearly where the retail sales are always spikes starting Octo to Dec. 

ggsubseriesplot(myts)

This subseries plot emphasises the seasonal pattern from the polar plot, where the sales increases during Octo and continue increasing during Nov and Dec. This seems logical as thses are months of the holidays.

gglagplot(myts)

The lag plot shows a positive linear positive relationship between most of the quarters especially at lag 12 confirming that the data has annual seasonality.

ggAcf(myts, lag.max = 75)

The ACF shows to what extend the linear relationship in the given series. The correlogram shows that r1 and r12 are higher than other lags. This is due to sesonal pattern in the given data. The peak tends to be year apart and the troughs tends to be every 6 months apart. Fianlly, the correlations are significantly different from zero.

Problem_2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

The hsales is the monthly sales of new one-family houses sold in the USA since 1973.

autoplot(hsales) 

ggseasonplot(hsales, polar = TRUE)

ggsubseriesplot(hsales)

gglagplot(hsales)

ggAcf(hsales, lag.max = 70)

This series has a cyclic behaviour every 6-9 years.

These plots have indicated that the series has an annual seasonality and is strongly dependent on the previous month of the data. The seasonal plot confirms this seasonality with the peak in March and April. Also, the subseries plot shows that the series increases from Jan to March then there is a fluctuating but keeps decreasing until Dec. 

The lag plot shows a strong linear relationship at lag 1. Then the relationship become weaker across lags till lag 16. This concludes that the timeseries has annual seasonality, but mainly depends on the previous month of data.

The autocorrelation plot has r12 which is the highest. Although, all lags are significantly different then 0, the majority of the lags have negative relationship.

autoplot(usdeaths) 

ggseasonplot(usdeaths, polar = TRUE)

ggsubseriesplot(usdeaths)

gglagplot(usdeaths)

ggAcf(usdeaths, lag.max = 70)

This series has a seasonality, with a spike in July and a trough in Feb.

The seasonal plot confirms the annual seasonality. It also hits at a decreasing trend, but there is not enough data to confirm.

The subseries plot shows the seasonal behavior of the time series (decreasing from July to Feb., then decreasing from Feb. to July).

The lagplot has the strongest linear relationship at lag 12, confirming the annual seasonality present in the time series.

The autocorrelation plot has r12 higher than the other lags, due to the annual seasonal pattern in the data. Even higher is r1, which shows that the previous month in the time series is indicative of the next. r6 and r18 show strong negative correlations, which confirm the annual seasonality.

autoplot(bricksq) 

ggseasonplot(bricksq, polar = TRUE)

ggsubseriesplot(bricksq)

gglagplot(bricksq)

ggAcf(bricksq, lag.max = 70)

The series seems to have an annual seasonality with an incrasing trend and a cyclic every 8 years.

From the seasonplot, it seems that Q1 and Q3 are the searies peak.

The Subseries plot shows the increases and decreases in this time series are constants across quarters.

The lag plot shows that lag 1 and lag 4 have the strongest linear relationship

The autocorrelation plot shows that the peaks are 4, 8, 12,.. because it is quartly increasing. All lags have significantly different than zero.

autoplot(sunspotarea) 

# ggseasonplot(sunspotarea)
# ggsubseriesplot(sunspotarea)
gglagplot(sunspotarea)

ggAcf(sunspotarea, lag.max = 70)

The sunspotarea is not seasonal!

autoplot(gasoline) 

ggseasonplot(gasoline, polar = TRUE)

# ggsubseriesplot(gasoline)
gglagplot(gasoline)

ggAcf(gasoline, lag.max = 70)

This series is an annual seasonality with an increasing trend. The seasonal plot indicates a peak in weeks 30 - 39 and a trough during weeks 5 - 11.