gold, woolyrnq and gas represent.The gold series describes gold prices (in US dollars) on a daily basis, from January 1, 1985 through March 31, 1989.
The woolyrnq series shows production of woolen yarn (in tonnes) in Australia. The data is gathered on a quarterly basis from Mar 1965 through Sep 1994.
The gas data set shows monthly gas production in Australia from 1956 - 1995.
autoplot() to plot each of these in separate plots.autoplot(ts(gold, start=c(1985,1), end=c(1989,90), frequency=365.25)) +
ggtitle("Gold Prices") + ylab("US Dollars") +
xlab("Date")
autoplot(woolyrnq) +
ggtitle("Wollen Yarn Production", subtitle="Australia") +
scale_y_continuous(labels = unit_format(accuracy = 1, unit = "K", scale = 1e-3)) +
ylab("Tonnes") +
xlab("Date")
autoplot(gas) +
ggtitle("Gas Production", subtitle="Australia") +
scale_y_continuous(labels = unit_format(unit = "K", scale = 1e-3)) +
ylab("Production") +
xlab("Date")
gold dataset has a frequency of 1, while the woolyrnq dataset has a frequency of 4, and the gas dataset has a frequency of 12.gold dataset is the 770th observation, which represents a value of $593.70.tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.tute1 <- read_csv("tute1.csv")
tutets <- ts(tute1[,-1],start = 1981,frequency = 4)
autoplot(tutets,facets = TRUE)
Check what happens when you don’t include facets=TRUE.
autoplot(tutets)
facets = TRUE shows all 3 series in the same graph, without breaking them out into facets. This may cause issues with scale on some datasets.retail <- readxl::read_xlsx("retail.xlsx",skip=1)
# A3349414R series ("Turnover ; Victoria ; Liquor retailing ;")
myts <- ts(retail[,"A3349414R"], frequency = 12, start = c(1982,4))
autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf(). Can you spot any seasonality, cyclicity and trend? What do you learn about the series?# Initial plot of the data
autoplot(myts) +
labs(title="Liquor Retailing Turnover",subtitle="Victoria") +
xlab("Date") + ylab("Turnover")
ggseasonplot(myts) +
labs(title="Seasonal Plot", subtitle="Liquor Retail Turnover in Victoria") +
ylab("Turnover")
The seasonal plot, while busy, does show a bit more about the seasonality. There is definitely an increase in turnover at the end of the year.
There also may be a bit of a seasonal trend around March that seems to be more pronounced as the years go by.
ggsubseriesplot(myts) +
labs(title="Subseries Plot", subtitle="Liquor Retail Turnover in Victoria") +
ylab("Turnover")
gglagplot(myts, lags=12) +
labs(title="Lag Plots", subtitle="Liquor Retail Turnover in Victoria")
This graph shows that most months have some positive correlation with previous months, to varying degrees.
Let’s drill-down and look at some specifics:
gglagplot(myts, lags=1) +
labs(title="Lag Plots", subtitle="Liquor Retail Turnover in Victoria")
gglagplot(myts, lags=12, set.lags = 12) +
labs(title="Lag Plots", subtitle="Liquor Retail Turnover in Victoria")
ggAcf(myts) +
labs(title = "Autocorrelation Plot",
subtitle="Liquor Retail Turnover in Victoria")
The autocorrelation plot shows much of the same of what we saw above, there is an upward trend in the data, indicated by the larger values for smaller lags that decrease as the lags get larger.
The plot also confirms some level of slight seasonality, which we also suspected. This is shown by slightly larger values at lags 12 and 24.
autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline. Can you spot any seasonality, cyclicity and trend? What do you learn about the series?hsalesautoplot(hsales) +
labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")
ggseasonplot(hsales, polar = T) +
labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")
ggsubseriesplot(hsales) +
labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")
gglagplot(hsales) +
labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")
ggAcf(hsales) +
labs(title="Sales of Single-Family Houses", subtitle="USA since 1973")
hsales dataset, we see evidence of a cycle (about 7 years or so in length), and seasonality, but little evidence of a trend.usdeathsautoplot(usdeaths) +
labs(title="Monthly Accidental Deaths", subtitle="United States") +
scale_y_continuous(labels = comma)
ggseasonplot(usdeaths) +
labs(title="Monthly Accidental Deaths", subtitle="United States") +
scale_y_continuous(labels = comma)
ggsubseriesplot(usdeaths) +
labs(title="Monthly Accidental Deaths", subtitle="United States") +
scale_y_continuous(labels = comma)
gglagplot(usdeaths, lags = 12) +
labs(title="Monthly Accidental Deaths", subtitle="United States") +
scale_y_continuous(labels = unit_format(unit = "K", scale = 1e-3))
ggAcf(usdeaths) +
labs(title="Monthly Accidental Deaths", subtitle="United States")
usdeaths dataset, we see definite seasonality. The acceident rate seems to increase in the summer months and decrease in the winter months. there doesn’t appear to be much trend. The values for 1973 are higher, though that might be due to a change in record-keeping methods, definitions of an “accident”, or the federal 55mph speed limits on highways (signed into law in 1974).bricksqautoplot(bricksq) +
labs(title="Quarterly Clay Brick Production", subtitle="Australia") +
scale_y_continuous(labels = comma)
ggseasonplot(bricksq) +
labs(title="Quarterly Clay Brick Production", subtitle="Australia") +
scale_y_continuous(labels = comma)
ggsubseriesplot(bricksq) +
labs(title="Quarterly Clay Brick Production", subtitle="Australia") +
scale_y_continuous(labels = comma)
gglagplot(bricksq, lags = 12) +
labs(title="Quarterly Clay Brick Production", subtitle="Australia") +
scale_y_continuous(labels = unit_format(unit = "K", scale = 1e-3))
ggAcf(bricksq) +
labs(title="Quarterly Clay Brick Production", subtitle="Australia")
bricksq dataset we see an upward trend through about 1980 when the trend seems to flatten. The data also appears to get rather volatile around 1975, which may be the start of a cyclical pattern (increase, big drop, increase, big drop). There does appear to be some slight seasonality, with Q1 generally being lower that following quarters.sunspotareaautoplot(sunspotarea) +
labs(title="Annual Average Sunspot Area") +
scale_y_continuous(labels = comma)
gglagplot(sunspotarea, lags = 12) +
labs(title="Annual Average Sunspot Area") +
scale_y_continuous(labels = unit_format(unit = "K", scale = 1e-3))
ggAcf(sunspotarea) +
labs(title="Annual Average Sunspot Area")
sunspotarea dataset, we can see that there is a cycle of about 4-5 years length. There is no seasonality, as the data are collected annually. As far as a trend, there doesn’t appear to be one (though perhaps there is a larger cycle at work here).gasolineautoplot(gasoline) +
labs(title="Weekly Gasoline Supplied", subtitle="United States") +
ylab("Barrels Per Day") +
scale_y_continuous(labels = unit_format(unit = "M"))
ggseasonplot(gasoline) +
labs(title="Weekly Gasoline Supplied", subtitle="United States") +
ylab("Barrels Per Day") +
scale_y_continuous(labels = unit_format(unit = "M"))
ggsubseriesplot(ts(gasoline, frequency = 52)) +
labs(title="Weekly Gasoline Supplied", subtitle="United States") +
ylab("Barrels Per Day") +
scale_y_continuous(labels = unit_format(unit = "M"))
gglagplot(gasoline, lags = 10) +
labs(title="Weekly Gasoline Supplied", subtitle="United States") +
scale_y_continuous(labels = unit_format(unit = "M"))
ggAcf(gasoline) +
labs(title="Weekly Gasoline Supplied", subtitle="United States")
Looking at the gasoline dataset, we see a lot of data present. The first graph shows what appears to be an upward trend from 1990 until about 2005. The data then seems to decrease a bit until about 2012 when it then begins another increase.
There does appear to be some seasonality in the middle of the year where the values increase, but it is difficult to see with all the years on one plot.
ggseasonplot(window(gasoline, end=1995)) +
labs(title="Weekly Gasoline Supplied", subtitle="United States") +
ylab("Barrels Per Day") +
scale_y_continuous(labels = unit_format(unit = "M"))
ggseasonplot(window(gasoline, start=1996, end=2000)) +
labs(title="Weekly Gasoline Supplied", subtitle="United States") +
ylab("Barrels Per Day") +
scale_y_continuous(labels = unit_format(unit = "M"))
ggseasonplot(window(gasoline, start=2001, end=2005)) +
labs(title="Weekly Gasoline Supplied", subtitle="United States") +
ylab("Barrels Per Day") +
scale_y_continuous(labels = unit_format(unit = "M"))
ggseasonplot(window(gasoline, start=2006, end=2010)) +
labs(title="Weekly Gasoline Supplied", subtitle="United States") +
ylab("Barrels Per Day") +
scale_y_continuous(labels = unit_format(unit = "M"))
ggseasonplot(window(gasoline, start=2011, end=2017)) +
labs(title="Weekly Gasoline Supplied", subtitle="United States") +
ylab("Barrels Per Day") +
scale_y_continuous(labels = unit_format(unit = "M"))