Please submit exercises 2.1, 2.2, 2.3 and 2.6 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .rmd file with your code.

  1. Use the help function to explore what the series gold, woolyrnq and gas represent.
head(gold)
## Time Series:
## Start = 1 
## End = 6 
## Frequency = 1 
## [1] 306.25 299.50 303.45 296.75 304.40 298.35
tsdisplay(gold)

head(woolyrnq)
##      Qtr1 Qtr2 Qtr3 Qtr4
## 1965 6172 6709 6633 6660
## 1966 6786 6800
tsdisplay(woolyrnq)

head(gas)
##       Jan  Feb  Mar  Apr  May  Jun
## 1956 1709 1646 1794 1878 2173 2321
tsdisplay(gas)

Use autoplot() to plot each of these in separate plots.

autoplot(gold) + ggtitle("Daily Morning Gold Prices ($) Jan 1 1985 - Mar 31 1989") +
  ylab("$") + xlab("Days") 

autoplot(woolyrnq) + ggtitle("Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994") +
  ylab("Tons") + xlab("")

autoplot(gas) + ggtitle("Australian monthly gas production: 1956 - 1995") +
  ylab("") + xlab("Months")

What is the frequency of each series? Hint: apply the frequency() function.

paste0("Gold Frequency: ", frequency(gold))
## [1] "Gold Frequency: 1"
paste0("Woolyrnq Frequency: ", frequency(woolyrnq))
## [1] "Woolyrnq Frequency: 4"
paste0("Gas Frequency: ", frequency(gas))
## [1] "Gas Frequency: 12"

Use which.max() to spot the outlier in the gold series. Which observation was it?

paste0("When gold got maximum value? ", which.max(gold))
## [1] "When gold got maximum value? 770"
paste0("What was the gold's maximum value? ", gold[which.max(gold)])
## [1] "What was the gold's maximum value? 593.7"
  1. Download the file tute1.csv from OTexts.org/fpp2/extrafiles/tute1.csv, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

tute1 <- read.csv('https://raw.githubusercontent.com/niteen11/CUNY_DATA_624/master/Dataset/tute1.csv')
kable(head(tute1))
X Sales AdBudget GDP
Mar-81 1020.2 659.2 251.8
Jun-81 889.2 589.0 290.9
Sep-81 795.0 512.5 290.8
Dec-81 1003.9 614.1 292.4
Mar-82 1057.7 647.2 279.1
Jun-82 944.4 602.0 254.0

Convert the data to time series

mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)

Construct time series plots of each of the three series

autoplot(mytimeseries, facets=TRUE)

Check what happens when you don’t include facets=TRUE.

autoplot(mytimeseries)

After removing ‘facets=TRUE’,The resultant graph does not subset them into individual plots.

  1. Download some monthly Australian retail data from OTexts.org/fpp2/extrafiles/retail.xlsx. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

You can read the data into R with the following script:

retaildata <- readxl::read_excel("C:\\NITEEN\\GitHub\\CUNY_DATA_624\\Dataset\\retail.xlsx", skip=1)

Select one of the time series as follows (but replace the column name with your own chosen column):

myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))

Explore your chosen retail time series using the following functions:
autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

autoplot(myts)

ggseasonplot(myts)

ggsubseriesplot(myts)

gglagplot(myts, lags = 12)

ggAcf(myts)

Per definition of trend, seasonality, cylic from the text books, the time series data appears to have positive trend due to the ACF AutoCorrelation Function) trended time series that have slowly decreasing positive values over time.It does not appear to have enough seasonal lag. and there is no cyclic trend as well. The overall sales appear to have positive trend over the period of time.

  1. Use the the following graphics functions: autoplot, ggseasonplot, ggsubseriesplot, gglagplot, ggAcf and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?

Exploring hsales

head(hsales)
##      Jan Feb Mar Apr May Jun
## 1973  55  60  68  63  65  61
autoplot(hsales)

ggseasonplot(hsales)

ggsubseriesplot(hsales)

gglagplot(hsales)

ggAcf(hsales, lag.max = 400)

Based upon above plots, the we can spot seasonality and cyclicity. The seasonal component appears to be Spring (March). The cyclic component is indicated by some troughs and crests (around 10yrs period).

Exploring usdeaths

autoplot(usdeaths)

ggseasonplot(usdeaths)

ggsubseriesplot(usdeaths)

gglagplot(usdeaths)

ggAcf(usdeaths, lag.max = 60)

We can spot Seasonality in the usdeaths time series data. From the above plot July appears to have most number of deaths.

Exploring bricksq

head(bricksq)
##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956  189  204  208  197
## 1957  187  214
autoplot(bricksq)

ggseasonplot(bricksq)

ggsubseriesplot(bricksq)

gglagplot(bricksq)

ggAcf(bricksq, lag.max = 200)

Above graphic functions helps to spot strong trend in the bricksq timeseries dataset with little seasonality.

Exploring sunspotarea

head(sunspotarea)
## Time Series:
## Start = 1875 
## End = 1880 
## Frequency = 1 
## [1] 213.13333 109.28333  92.85833  22.21667  36.33333 446.75000
autoplot(sunspotarea)

#ggseasonplot(sunspotarea) # Seasonal plots do not work with frequencies of 1

We noticed that ggseasonplot, not seasonal so can’t draw it. Also, ggsubseriesplot is not seasonal so useless to draw it.

gglagplot(sunspotarea)

ggAcf(sunspotarea, lag.max = 50)

I can spot strong cyclicity in the sunspotarea time series dataset. The data frequency is yearly here. However, there is no clear evidence of trend and seasonality.

Exploring gasoline

head(gasoline)
## Time Series:
## Start = 1991.1 
## End = 1991.19582477755 
## Frequency = 52.1785714285714 
## [1] 6.621 6.433 6.582 7.224 6.875 6.947
autoplot(gasoline)

ggseasonplot(gasoline)

gglagplot(gasoline)

ggAcf(gasoline, lag.max = 1000)

I can certainly spot seasonality and trend in the gasoline time series data. The lag plot shows some positive corelation which indicates that there is a seasonality component. The sales tend to be positively trending over the period of time but flattening a bit around and after 2005.

Reference:
https://otexts.com/fpp2/autocorrelation.html
https://otexts.com/fpp2/tspatterns.html
https://otexts.com/fpp2/lag-plots.html