gold, woolyrnq and gas represent.autoplot() to plot each of these in separate plots.which.max() to spot the outlier in the gold series. Which observation was it?This data represents the daily morning gold prices in US dollars from 1 January 1985 to 31 March 1989.
The frequency of the series is daily (or 365). That is, each entry’s value is separated by 1 day.
The outlier value in the gold series occurs at the 770th observation (770 days after 1/1/1985). The value is 593.7
data(gold)
autoplot(gold) +
ggtitle('Daily Morning Gold Prices in US dollars', subtitle='1/1/1985 to 31/3/1989') +
labs(x='Days', y='Price') +
geom_point(data=data_frame(x=which.max(gold), y=gold[which.max(gold)]),
aes(x, y), color='red')
which.max(gold)
## [1] 770
gold[which.max(gold)]
## [1] 593.7
The data represents the quarterly production of woollen yarn in Australia measured in tons from May 1965 to Sept 1994.
The frequency of this series is quarterly (or 4). That is, each entry’s value is separated by 1 quarter.
data(woolyrnq)
autoplot(woolyrnq) +
ggtitle('Quarterly Yarn Production in Australia', subtitle='5/1965 to 9/1994') +
labs(x='Year', y='Price')
The data represents the monthly gas production of Australia from 1956 to 1995.
The frequency of this series is monthly (or 12). That is, each entry’s value is separated by 1 month.
data(gas)
autoplot(gas) +
ggtitle('Monthly Gas Production in Australia', subtitle='1956 to 1995') +
labs(x='Year', y='Price')
tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.mytimeseries <- read_csv('./tute1.csv') %>%
select(-X1) %>%
ts(start=1981, frequency=4)
mytimeseries %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Sales | AdBudget | GDP |
|---|---|---|
| 1020.2 | 659.2 | 251.8 |
| 889.2 | 589.0 | 290.9 |
| 795.0 | 512.5 | 290.8 |
| 1003.9 | 614.1 | 292.4 |
| 1057.7 | 647.2 | 279.1 |
| 944.4 | 602.0 | 254.0 |
mytimeseries %>%
autoplot(facets=TRUE)
mytimeseries %>%
autoplot()
Facets separate the data into three plots. This allows for easier comparison of the changes between the values as the y-axis for each facet is standardized. The second plots, without facets, may be advantagious if we are interested in the absolute value difference between the series and not their relative changes over time.
retaildata <- readxl::read_excel('./retail.xlsx', skip=1)
retaildata %>%
select(1:5) %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Series ID | A3349335T | A3349627V | A3349338X | A3349398A |
|---|---|---|---|---|
| 1982-04-01 | 303.1 | 41.7 | 63.9 | 408.7 |
| 1982-05-01 | 297.8 | 43.1 | 64.0 | 404.9 |
| 1982-06-01 | 298.0 | 40.3 | 62.7 | 401.0 |
| 1982-07-01 | 307.9 | 40.9 | 65.6 | 414.4 |
| 1982-08-01 | 299.2 | 42.1 | 62.6 | 403.8 |
| 1982-09-01 | 305.4 | 42.0 | 64.4 | 411.8 |
myts <- retaildata %>%
select(A3349335T) %>%
ts(frequency=12, start=c(1982, 4))
myts %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
| A3349335T |
|---|
| 303.1 |
| 297.8 |
| 298.0 |
| 307.9 |
| 299.2 |
| 305.4 |
myts %>%
autoplot()
This plot appears to show a general upward trend over time.
myts %>%
ggseasonplot()
This plot, combined with the previous one, appears to show seasonality in the data. Specifically, the data appears to fall in February (in recent years) and rise in December
myts %>%
ggsubseriesplot()
myts %>%
gglagplot()
The previous two plots appears to show that the data is highly correlated year after year. That is, while usage appears to be rising, the usage rate apperas to be the same. Notice that the most highly correlated lag is lag12 indicating that while all months are similar to each other, identical months are even more similar.
myts %>%
ggAcf()
This plot appears to support the previous assertion that the data is highly correlated month after month. The lower correlation as the lag grows can be attributed to the trend.
In conclusion, the data appears to be trending upwards, highly correlated month over month and year over year but with seasonality in December. There does not appear to be any signs of cyclicity.
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
This question requires plotting the same handful of plots for each time series. To simplify this process I wrote two functions to display the plots. The plus side of these functions is that they greatly simplify the code. The down side is that I cannot customize each plot. Thus, these plots really should only be used for EDA. I also wrote a third method to display a portion of the data for reference.
SINGLE.PLOT <- function(plot){
tryCatch({
plot %>%
print()
},
error=function(cond){
message(cond)
})
}
PLOT.GENERATOR <- function(data){
data %>%
autoplot() %>%
SINGLE.PLOT()
SINGLE.PLOT(data %>% ggseasonplot())
SINGLE.PLOT(data %>% ggsubseriesplot())
data %>%
ggAcf() %>%
SINGLE.PLOT()
data %>%
gglagplot() %>%
SINGLE.PLOT()
}
DISPLAY.SERIES <- function(data){
data %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
}
This data represents the monthly sales of new one-family houses sold in the USA since 1973.
data(hsales)
hsales %>%
DISPLAY.SERIES()
| x |
|---|
| 55 |
| 60 |
| 68 |
| 63 |
| 65 |
| 61 |
hsales %>%
PLOT.GENERATOR()
hsales appears to show cyclicity of about 10 years. Notice the troughs around 1975, 1982 and 1992 with peaks in between. This possibly reflects recessions and recovery cycles. There is also strong seaonality with sales being highest in the spring. It makes sense that sales would be higher when the weather is nicer and coming out of the winter. Finally, there is no apparent trend in the data. House sales rise and fall but stay within a fairly standard range of values.
This data represents monthly accidental deaths in the USA
data(usdeaths)
usdeaths %>%
DISPLAY.SERIES()
| x |
|---|
| 9007 |
| 8106 |
| 8928 |
| 9137 |
| 10017 |
| 10826 |
usdeaths %>%
PLOT.GENERATOR()
usdeaths shows strong seasonality. This appears to reflect the fact that when the weather is nice more people leave their homes to partake in activities and these activities lead to higher death rates. The first of the every month appears to have unusually high rates but this is likely due to reporting lag and the fact that mass reporting occurs on the first of the month. The yearly cyclicity is just an alternative interpretation of the high seasonality. There appears to be no greater cyclicity or trend in the data.
This data represents australian quarterly clay brick production from 1956 to 1994.
data(bricksq)
bricksq %>%
DISPLAY.SERIES()
| x |
|---|
| 189 |
| 204 |
| 208 |
| 197 |
| 187 |
| 214 |
bricksq %>%
PLOT.GENERATOR()
bricksq shows a strong growing trend that eventually evens out around 1975. This indicates a growing need for clay brick in Australia up until the market was saturated in the mid 70’s. Afterwards there appears to be cyclicity that likely aligns with general recessions which would result in fewer buildings being built (and thus fewer bricks needed). This cycle appears roughly every 10 years followed by a rebound. There is also strong seasonality with Q1 having much lower production than other quarters. This likely reflects the fact that fewer buildings are made during this time of the year.
This data represents annual average of sunspot areas.
data(sunspotarea)
sunspotarea %>%
DISPLAY.SERIES()
| x |
|---|
| 213.13333 |
| 109.28333 |
| 92.85833 |
| 22.21667 |
| 36.33333 |
| 446.75000 |
sunspotarea %>%
PLOT.GENERATOR()
sunspotarea show strong cyclidity on a roughly decade scale. This seems to indicate that sunspots grow and fade in intensity over a period of many years. Within that however, there is much noise. The lag plots show that year over year changes are widely inconsistent. Due to the data being yearly there is no seasonality. There also does not appear to be any trend although given the large time scale of the cyclidity, we made need a broader picture to see any trends.
This data represents weekly US motor gasoline supplies from 1991 to 2017.
data(gasoline)
gasoline %>%
DISPLAY.SERIES()
| x |
|---|
| 6.621 |
| 6.433 |
| 6.582 |
| 7.224 |
| 6.875 |
| 6.947 |
gasoline %>%
PLOT.GENERATOR()
gasoline shows a strong upward trend throughout the 90’s until it levels off in the 2000s. I am not confident calling this a trend as this also happens to highly correlate with the economy. It may be possible that this data is more cyclical with barrels rising and falling with the economy. In general we would expect to travel less and order fewer things when the economy is poor. There is strong seasonality that shows that people tend to travel more during the Summer months. This is unsurprising of course as people are likely travelling on vacations in July and August.