library(fpp2)
library(readxl)
#?gold # daily morning gold prices in US dollars (01.01.1985 - 03.31.1989)
#?woolyrnq # quarterly production of woollen yarn in Australia in tons (03.1965 - 09.1994)
#?gas # Australian monthly gas production (1956 - 1995)
autoplot(gold) # seasonality (day) and clear outlier(s) (day ~770)
autoplot(woolyrnq) #seasonality (quarter)
autoplot(gas) #seasonality (month) and trend (starting 1970)
lapply(list(gold,woolyrnq,gas), frequency) # apply frequency() to each series
## [[1]]
## [1] 1
##
## [[2]]
## [1] 4
##
## [[3]]
## [1] 12
The frequency of the gold series is 1 (daily), the frequency of the woolyrnq series is 4 (quarterly), and the frequency of the gas series is 12 (monthly).
which.max(gold) # day 770 - plot observation confirmed
## [1] 770
The outlier in the gold series is day 770, which confirms my initial plot observation.
tute1 <- read.csv("tute1.csv", header=TRUE) # 100 x 4
View(tute1)
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4) # remove first column and create time series object starting in 1981 with a frequency of 4 (quarterly)
autoplot(mytimeseries, facets=TRUE)
autoplot(mytimeseries, facets=FALSE)
When TRUE, facets results in three separate time series plots, each with its own data-specific y-axis but a shared x-axis. When FALSE, the three time series are combined in a single plot with shared y- and x-axes.
retaildata <- read_excel("retail.xlsx", skip=1) # 381 x 19
myts <- ts(retaildata[,4], frequency=12, start=c(1982,4)) # 4th column ("A3349338X")
autoplot(myts)
ggseasonplot(myts)
ggsubseriesplot(myts)
gglagplot(myts)
ggAcf(myts)
autoplot()
The time series is trending upward from the start and shows monthly seasonality. Starting in late 1994, there is increasing variation that continues to grow over time. The series also shows a notable dip around the start of the year 2000.
ggseasonplot()
The seasonal plot reflects the initial plot in that more recent years show greater sales. There is variation across years, but more recently, there appear to be common dips in sales in February, June, and November. Notably, irrespective of year, sales show a clear increase from November to December. The holiday shopping season could be one driver of that increase.
ggsubseriesplot()
The seasonal subseries plot confirms the notably higher sales in December and relatively lower sales in February and June. Regardless of month, recent sales have decreased from a relative peak a few years earlier (5-10 years based on prior plots).
gglagplot()
The lag plots are faceted and somewhat difficult to interpret. All of the plots show strong positive relationships, which suggest strong seasonality of the data. There also appears to be increasing variation as k increases, which reflects the general change in the time series over time.
ggAcf()
The autocorrelation plot shows a trend in that it shows large positive values that decrease as k increases. The slightly higher values for lags of 12 and 24--multiples of the monthly seasonal frequency--suggest seasonality.
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
plot_function <- function(ts) { # function as attempt to save space
print(autoplot(ts))
print(ggseasonplot(ts))
print(ggsubseriesplot(ts))
print(gglagplot(ts, do.lines = FALSE)) # the line plots are an absolute mess
print(ggAcf(ts))
}
#?hsales # monthly sales of new single family homes in US (1973 -)
plot_function(hsales)
autoplot()
Monthly single family home sales show seasonality (monthly). There is no clear overall trend, but there does appear to be cyclicity, of roughly seven to ten years, from trough to peak and back (e.g. 1975 to 1982, 1982 to 1992).
ggseasonplot()
The seasonal plot shows some patterns: notably, home sales tend to increase into early spring, and they tend to decrease into winter.
ggsubseriesplot()
The seasonal subseries plot confirms the seasonal plot in that spring months tend to show higher sales than do late fall and early winter months.
gglagplot()
The lag plots show a somewhat positive relationship regardless of k, though the relationship appears weaker as k increases.
ggAcf()
The autocorrelation plot suggests strong seasonality given the relative peaks at k = 1, 12, and 24. There is long-term downward change, though it's unclear whether that change indicates a clear trend. Also, around k = 18, there is insignificant autocorrelation, meaning the values are not significantly different from zero.
#?usdeaths # monthly accidental deaths in USA (1973 - 1978)
plot_function(usdeaths)
autoplot()
Monthly accidental deaths--in the USA, from 1973 to 1978--show clear seasonality (monthly) but no apparent trend or cyclicity. There are defined peaks during the summer months and troughs during the winter months.
ggseasonplot()
The seasonal plot shows, irrespective of year, the number of deaths increasing through the spring months to a peak in July, and decreasing through the fall and winter to a trough in February.
ggsubseriesplot()
Likewise, the seasonal subseries plot shows the July peaks and February troughs, though there is some within-month variation in deaths over the course of the time series.
gglagplot()
The faceted lag plots show seasonality, with a positive relationship for k = 1 and a clearer, stronger positive one for k = 12. There are some possibly negative relationships for k = 5 through k = 7.
ggAcf()
The autocorrelation plot appears like a cosine wave, with clear seasonality given the relative peaks at k = 1, 12, and 24 (and relative troughs at k = 6 and 18). Between those peaks and troughs, the positive autocorrelations are higher (in magnitude) and more significant than the negative ones.
#?bricksq # quarterly clay brick production (1956 - 1994)
plot_function(bricksq)
autoplot()
Quarterly clay brick production show seasonality (quarterly) and a trend from 1956 to roughly 1975, when production growth began to flatten. There are several large decreases (~1975, 1983, and 1991), and also possible seven-ish-year cycles (1975 to 1982, 1983 to 1990).
ggseasonplot()
The seasonal plot appears to show slightly higher brick production during quarters 2 and 3, with relatively lower production during quarter 1 in particularly. There is clear growth in production over time, to a point, which reflects the prior time series plot.
ggsubseriesplot()
The seasonal subseries plot reveals similar production patterns--upward trend to plateau with greater variation--across quarters, though at different levels. Here again, quarters 2 and 3 have the highest average production, and quarter 1 has the lowest.
gglagplot()
Regardless of value of k and quarter, the faceted lag plots show positive relationships with greater variation (somewhat weaker correlations) as lagged values increase.
ggAcf()
The autocorrelation plot shows seasonality given the relative peaks where lags are multiples of four, as well as the overall trend in production given the large positive autocorrelations decrease as the number of lags increases.
#?sunspotarea # annual average sunspot area (1875 - 2015)
autoplot(sunspotarea)
gglagplot(sunspotarea, do.lines = FALSE)
ggAcf(sunspotarea)
autoplot()
The annual averages show roughly decade-long, relatively fixed patterns (fixed cycles? multi-year seasons?) from trough to peak (and vice versa). The peaks trend upward from 1875 to around 1960.
ggseasonplot()
The function returns an error given the data are annual and not seasonal.
ggsubseriesplot()
The function returns an error given the data are annual and not seasonal.
gglagplot()
The faceted lag plots do not depict particularly strong relationships. The plot for k = 1 is somewhat more positive, while the plots for k = 4, 5, and 6 are more negative.
ggAcf()
The autocorrelation plot reflects the patterns apparent in the time series plot. There are relatively strong peaks at k = 1, 10/11, and 21, and relatively strong troughs at k = 5 and 16.
#?gasoline # US finished motor gasoline product supplied (02.02.1991 - 01.20.2017)
autoplot(gasoline)
ggseasonplot(gasoline)
gglagplot(gasoline, do.lines = FALSE)
ggAcf(gasoline)
autoplot()
US gasoline production trends upward from 1991 before leveling off in 2007 and decreasing slightly through around 2014. Seasonality is somewhat apparent in the plot, though it is difficult to discern that the data are in fact seasonal by week. There could be cyclicity given the slight years-long decrease then increase from 2007 through 2017.
ggseasonplot()
Given the number of years in the time series, the season plot is cluttered and tough to interpret. Generally, there appears to be slightly greater relative supply during the middle of the year.
ggsubseriesplot()
The function returns an error. Regardless, given previous subseries plots, I am unsure how a weekly subseries plot could be visualized clearly.
gglagplot()
The faceted lag plots reveal relatively strong positive autocorrelations regardless of value of k or, perhaps, week of the year. There is not clear evidence of negative autocorrelation.
ggAcf()
The autocorrelation plot suggests the time series features both a trend--large positive autocorrelations decrease in size as k increases--and seasonality (relative peaks on the year).