Please submit exercises 2.1, 2.2, 2.3 and 2.6 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .rmd file with your code.
head(gold)
## Time Series:
## Start = 1
## End = 6
## Frequency = 1
## [1] 306.25 299.50 303.45 296.75 304.40 298.35
tsdisplay(gold)
head(woolyrnq)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1965 6172 6709 6633 6660
## 1966 6786 6800
tsdisplay(woolyrnq)
head(gas)
## Jan Feb Mar Apr May Jun
## 1956 1709 1646 1794 1878 2173 2321
tsdisplay(gas)
Use autoplot() to plot each of these in separate plots.
autoplot(gold) + ggtitle("Daily Morning Gold Prices ($) Jan 1 1985 - Mar 31 1989") +
ylab("$") + xlab("Days")
autoplot(woolyrnq) + ggtitle("Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994") +
ylab("Tons") + xlab("")
autoplot(gas) + ggtitle("Australian monthly gas production: 1956 - 1995") +
ylab("") + xlab("Months")
What is the frequency of each series? Hint: apply the frequency() function.
paste0("Gold Frequency: ", frequency(gold))
## [1] "Gold Frequency: 1"
paste0("Woolyrnq Frequency: ", frequency(woolyrnq))
## [1] "Woolyrnq Frequency: 4"
paste0("Gas Frequency: ", frequency(gas))
## [1] "Gas Frequency: 12"
Use which.max() to spot the outlier in the gold series. Which observation was it?
paste0("When gold got maximum value? ", which.max(gold))
## [1] "When gold got maximum value? 770"
paste0("What was the gold's maximum value? ", gold[which.max(gold)])
## [1] "What was the gold's maximum value? 593.7"
You can read the data into R with the following script:
tute1 <- read.csv('https://raw.githubusercontent.com/niteen11/CUNY_DATA_624/master/Dataset/tute1.csv')
kable(head(tute1))
X | Sales | AdBudget | GDP |
---|---|---|---|
Mar-81 | 1020.2 | 659.2 | 251.8 |
Jun-81 | 889.2 | 589.0 | 290.9 |
Sep-81 | 795.0 | 512.5 | 290.8 |
Dec-81 | 1003.9 | 614.1 | 292.4 |
Mar-82 | 1057.7 | 647.2 | 279.1 |
Jun-82 | 944.4 | 602.0 | 254.0 |
Convert the data to time series
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
Construct time series plots of each of the three series
autoplot(mytimeseries, facets=TRUE)
Check what happens when you don’t include facets=TRUE.
autoplot(mytimeseries)
After removing ‘facets=TRUE’,The resultant graph does not subset them into individual plots.
You can read the data into R with the following script:
retaildata <- readxl::read_excel("C:\\NITEEN\\GitHub\\CUNY_DATA_624\\Dataset\\retail.xlsx", skip=1)
Select one of the time series as follows (but replace the column name with your own chosen column):
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))
Explore your chosen retail time series using the following functions:
autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
autoplot(myts)
ggseasonplot(myts)
ggsubseriesplot(myts)
gglagplot(myts, lags = 12)
ggAcf(myts)
Per definition of trend, seasonality, cylic from the text books, the time series data appears to have positive trend due to the ACF AutoCorrelation Function) trended time series that have slowly decreasing positive values over time.It does not appear to have enough seasonal lag. and there is no cyclic trend as well. The overall sales appear to have positive trend over the period of time.
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
Exploring hsales
head(hsales)
## Jan Feb Mar Apr May Jun
## 1973 55 60 68 63 65 61
autoplot(hsales)
ggseasonplot(hsales)
ggsubseriesplot(hsales)
gglagplot(hsales)
ggAcf(hsales, lag.max = 400)
Based upon above plots, the we can spot seasonality and cyclicity. The seasonal component appears to be Spring (March). The cyclic component is indicated by some troughs and crests (around 10yrs period).
Exploring usdeaths
autoplot(usdeaths)
ggseasonplot(usdeaths)
ggsubseriesplot(usdeaths)
gglagplot(usdeaths)
ggAcf(usdeaths, lag.max = 60)
We can spot Seasonality in the usdeaths time series data. From the above plot July appears to have most number of deaths.
Exploring bricksq
head(bricksq)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1956 189 204 208 197
## 1957 187 214
autoplot(bricksq)
ggseasonplot(bricksq)
ggsubseriesplot(bricksq)
gglagplot(bricksq)
ggAcf(bricksq, lag.max = 200)
Above graphic functions helps to spot strong trend in the bricksq timeseries dataset with little seasonality.
Exploring sunspotarea
head(sunspotarea)
## Time Series:
## Start = 1875
## End = 1880
## Frequency = 1
## [1] 213.13333 109.28333 92.85833 22.21667 36.33333 446.75000
autoplot(sunspotarea)
#ggseasonplot(sunspotarea) # Seasonal plots do not work with frequencies of 1
We noticed that ggseasonplot, not seasonal so can’t draw it. Also, ggsubseriesplot is not seasonal so useless to draw it.
gglagplot(sunspotarea)
ggAcf(sunspotarea, lag.max = 50)
I can spot strong cyclicity in the sunspotarea time series dataset. The data frequency is yearly here. However, there is no clear evidence of trend and seasonality.
Exploring gasoline
head(gasoline)
## Time Series:
## Start = 1991.1
## End = 1991.19582477755
## Frequency = 52.1785714285714
## [1] 6.621 6.433 6.582 7.224 6.875 6.947
autoplot(gasoline)
ggseasonplot(gasoline)
gglagplot(gasoline)
ggAcf(gasoline, lag.max = 1000)
I can certainly spot seasonality and trend in the gasoline time series data. The lag plot shows some positive corelation which indicates that there is a seasonality component. The sales tend to be positively trending over the period of time but flattening a bit around and after 2005.
Reference:
https://otexts.com/fpp2/autocorrelation.html
https://otexts.com/fpp2/tspatterns.html
https://otexts.com/fpp2/lag-plots.html