Use the help function to explore what the series gold, woolyrnq and gas represent.
autoplot(gold) +
ggtitle('Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.') +
xlab('Year') +
ylab('Price')autoplot(woolyrnq) +
ggtitle('Quarterly production of woollen yarn in Australia') +
xlab('Quarter') +
ylab('Production')## [1] 1
## [1] 4
## [1] 12
Gold is yearly (frequency = 1), WoolyRQN is quarterly (freq=4) and gas is monthly (freq=12)
## [1] 593.7
## [1] "1987-02-09"
The max gold price was 593.7 seen on 2/9/1987.
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labeled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
(The [,-1] removes the first column which contains the quarters as we don’t need them now.)
Check what happens when you don’t include facets=TRUE.
Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
The second argument (skip=1) is required because the Excel sheet has two header rows.
autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
There is a clear trend where value is steadily increasing over time. We see cldar yearly seasonality with a peak in Dec and a drop in Feb). This is confirmed with the the lagplot 12 that has a very high linear correlation. In addition, while it’s subtle, there is a cyclic pattern (notice 1990, 2000 and ~2009) are a little higher than the general trendline that is otherwise fairly linear.
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
There does not appear to be any clear trends over time. We see clear yearly seasonality with a peak in March and a trough in Dec). This is confirmed with the the acf where lag 12 is significant. Note that prices are also correlate with lag 1 (a given month is close to the price of the previus month). In addition, a cyclic pattern with troughs at 1975, 1982, 1992 and peaks at 1978, 1986 and 1994 (approximately 8 years).
There does not appear to be any clear trends over time. We see clear yearly seasonality with a peak in July and a trough in Feb). This is confirmed with the the acf where lag 12 is significant. Note that deaths are also correlated with lag 1 (a given month is close to the deaths of the previus month). In addition, there might be a slight cyclic pattern where there are higher deahs in 1973 and 1979 and dipping between. Its hard to tell off hand if this is truely a cycle or random fluctuations - we would need more years extending to see if the cycle repeats at all.
autoplot(bricksq) +
ggtitle('Australian Quarterly clay brick production') +
xlab('Year') +
ylab('Production')There is a clear increasing trend from 1950 to 1975, after which we see a cyclic patterns where production cycles ~8 year. We see clear yearly seasonality with a peak in Q3. This is confirmed with the the acf where lag 12 is significant. Note that the lagplot shows a strong correlation up until 1975 at which the cyclic pattern starts, but we still see a reasonable correlation post 1975. Note that production are also correlated with lag 1 (a given month is close to the production of the previus month).
Sunspot activity doesn’t have the traditional seasonal component, but rather a 10~11 year strong cyclic component. Since it varies slightly, and its greater than a year, I’m guessing “seasonality” isn’t the right term, though we do have a fixed frequency. Cyclic implies irregular which this isn’t, so on the other hand, maybe it is a 10~11 year seasonal effect. That said, R cannot handle multiyear seasonal pattterns so several of the timeseries functions give errors.
# Note: Since the index is non-integer, we get an error when trying to do ggsubseriesplot(). The solution appears to be to convert freq=52 and then identify the year with 53 weeks and remove 1 week. That seems out of scope for this problem, so I'll just skip the problematic ggsubseriesplot().
# gasoline_2 <- as.ts(gasoline, frequency = 52)
autoplot(gasoline) +
ggtitle('US finished motor gasoline product supplied') +
xlab('Year') +
ylab('million barrels per day')We see a strong increasing trend from 1992 until ~2008 when the Finanical collapse hit causing massive drop in auto purchases until the econpmy pcked back up in ~ 2014. We see a seasonal component with a trough in Feb and a peak in Sept/Oct (when new car models hit the market).