Using R random() function define the following two ts objects:
set.seed(101) #For reproduction
#Assume the obervation values follow Unif(1, 100)
##The frequency should be 365.25 to include the leap year.
myts1 <- ts(runif(365.25*2, 1, 100), start = c(2019,1,1), frequency = 365.25)
#Correct Answer
myts2 <- ts(runif(365.25/7*5*26, 1, 100), start = c(2019,1,1), frequency = 365.25/7*5)
#ibrary(timeDate)
#Assume starting date is 2019-01-01 and ending date is 2019-06-30
#tS <- timeSequence(from = "2019-01-01", to = "2019-06-30")
#Remove weekends from tS
#inds <- tS[isWeekday(tS)]
#Create time series with "zoo" function
#Assume the obervation values follow Unif(1, 100)
#myts2 <- zoo(runif(length(inds), 1, 100), inds, frequency = 1)
The help() function within fpp2 package will help you to explore and describe existing time series objects. Use the help function to explore the following time series: “chicken”, “dole”, “usdeaths”, “gold”, “h02”, “gasoline”
- Plot each time series and describe your observations including the frequency of time series and outliers. (Hint: You can use frequency() and which.max() functions)
- Spot any seasonality, cyclicity, or trend
Dataset Description: Price of chicken in the United States from 1924 to 1993.
autoplot(chicken) + ggtitle({"Price of Chicken in the U.S."}) + xlab("Year") + ylab("Dollars")
The time plot reveals the following features:
Dataset Description: Monthly total of people on unemployment benefits in Australia from Jan 1965 to Jul 1992.
autoplot(dole) + ggtitle({"Unemployement Benefits in Australia"}) + xlab("Month") + ylab("Total Number of People")
The time plot reveals the following features:
ggseasonplot(dole, year.labels = TRUE, year.labels.left = TRUE) + ggtitle({"Unemployement Benefits in Australia"}) + xlab("Month") + ylab("Total Number of People")
According to the seasonal plot above, the number of people on unemployment benefits increases significantly in 1975 and 1982. Meanwhile, it is clear that since 1991, this number has been continuously increasin, especially in 1992.
ggsubseriesplot(dole) + ggtitle({"Unemployement Benefits in Australia"}) + xlab("Month") + ylab("Total Number of People")
The subseries plot above shows that there is no seasonality in this series. Because the average number of each month is very close, which means the “month” is not a factor that relates with the number of people on unemployment benefits. On the other hand, there is an overall upward trend within each month.
ggAcf(window(dole)) + ggtitle({"Unemployement Benefits in Australia"})
The correlogram confirms our conclusion that there is seasonality in this time series dataset. In addition, this plot does show a slightly decrease as the lags increase, which indicates a trend as we observe in the subseries plot.
Dataset Description: Monthly accidental deaths in USA.
autoplot(usdeaths) + ggtitle({"Accidental Deaths in USA"}) + xlab("Time") + ylab("Total Number of Deaths")
The time plot reveals the following features:
p1 <- ggseasonplot(usdeaths, year.labels = TRUE, year.labels.left = TRUE, main = NULL) + ylab("Total Number of Deaths")
p2 <- ggseasonplot(usdeaths, polar = TRUE, main = NULL)
grid.arrange(p1, p2, ncol = 2, top = "Accidental Deaths in USA")
According to the seasonal plots, it is observed that:
ggsubseriesplot(usdeaths) + ggtitle({"Accidental Deaths in USA"}) + xlab("Time") + ylab("Total Number of Deaths")
The subseries plot shows that the average deaths of each month have similar pattern with the seasonal plots above. It further confirms the existence of a strong seasonality. However, no trend is observed within each month.
ggAcf(window(usdeaths)) + ggtitle({"Accidental Deaths in USA"})
The “scalloped” shapes shown in the correlogram are due to the seasonality.The frequency of this seasonality is 12 months, which is the number of lags between two peak values. On the other hand, no trend is observed in this plot. In addition, there are some coefficients falling into the threshold area. It reveals that this time series data has noises.
Dataset Description: Daily morning gold prices in US dollars from January 1st, 1985 to March 31st, 1989.
autoplot(gold) + geom_smooth() + ggtitle({"Daily Morning Gold Prices"}) + xlab("Day") + ylab("Dollars")
## Warning: Removed 34 rows containing non-finite values (stat_smooth).
which.max(gold)
## [1] 770
The time plot reveals the following features:
ggAcf(window(gold)) + ggtitle({"Daily Morning Gold Prices"})
The correlogram confirms that there is no seasonality in this dataset, since no “scalloped” shape is observed. The coefficients slowly decreases in the ACF as the lags increase, which is due to a trend. However, since the cofficients are small and getting close to zero, this trend has changing directions.
Dataset Description: Total monthly scripts for pharmaceutical products falling under ATC code H02, as recorded by the Australian Health Insurance Commission, from July 1991 to June 2008. Measured in millions of scripts.
autoplot(h02) + ggtitle({"Monthly Corticosteroid Drug Sales in Australia"}) + xlab("Month") + ylab("Millions of Scripts")
The time plot reveals the following features:
p1 <- ggseasonplot(h02, year.labels = TRUE, year.labels.left = TRUE, main = NULL) + ylab("Millions of Scripts")
p2 <- ggseasonplot(h02, polar = TRUE, main = NULL)
grid.arrange(p1, p2, ncol = 2, top = "Monthly Corticosteroid Drug Sales in Australia")
The seasonal plot shows a clear drop every February. Thus, the frequency of the seasonality is 12 months. On the other hand, there is a clear increasing trend from February to the next January.
ggsubseriesplot(h02) + ggtitle({"Monthly Corticosteroid Drug Sales in Australia"}) + xlab("Month") + ylab("Millions of Scripts")
The subseries plot shows two interesting features:
ggAcf(window(h02)) + ggtitle({"Daily Morning Gold Prices"})
The correlogram reveals the strong seasonality, whose frequency is 12-month.
Dataset Description: Weekly data of US finished motor gasoline product supplied, beginning February 2nd 1991, ending January 20st 2017. Units are “million barrels per day”.
autoplot(gasoline) + geom_smooth() + ggtitle({"US Finished Motor Gasoline Product Supplied"}) + xlab("Week") + ylab("Million Barrels per Day")
The time plot reveals the following features:
ggAcf(window(gasoline)) + ggtitle({"US Finished Motor Gasoline Product Supplied"})
According to the correlogram, there is a trend across the year, because of the slow decreasing in the ACF as the lags increases. Even there is a decreasing trend between 2007 and 2012, the overall products supplied increases dramatically. On the other hand, there is a “scalloped” shape in the correlogram, which means this time series dataset has a seasonal pattern with the frequency of 52.17 weeks (one year).
Download some monthly Australian retail data from the book website (Links to an external site). These represent retail sales in various categories for different Australian states and are stored in an Excel file.
Read the data into R and define a ts object of your chosen column Use read_excel() function from the readxl library to read the retail data into R. The first column is selected, which is the sales of the supermarket and grocery stores in New South Wales.The data comprises 381 data points of weekly sales from April 1982 to December 2013.
Explore the defined time series using functions that described in the lecture (Ex: autoplot(), ggAcf(), gglagplot()). Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
autoplot(myts) + ggtitle({"New South Wales: Sales of Supermarket and Grocery Stores"}) + xlab("Month") + ylab("Sales")
The time plot reveals the following features:
p1 <- ggseasonplot(myts, year.labels = TRUE, year.labels.left = TRUE, main = NULL) + ylab("Sales")
p2 <- ggseasonplot(myts, polar = TRUE, main = NULL)
grid.arrange(p1, p2, ncol = 2, top = "New South Wales: Sales of Supermarket and Grocery Stores")
According to the seasonal plots, it is clear that the sales increase each year. While, there are two outliers, September 2008 and November 2011, which have unusually smaller sales. In addition, there is no seasonal or cyclic pattern shown in the plots.
p1 <- ggsubseriesplot(myts, main = NULL) + xlab("Month") + ylab("Sales")
p2 <- ggAcf(window(myts), main = NULL)
grid.arrange(p1, p2, ncol = 2, top = "New South Wales: Sales of Supermarket and Grocery Stores")
The subseries plot and the correlogram reveal the same features as the seasonal plots: upward trending and no seasonality or cyclic pattern.
dj time series object contains 292 consecutive trading days of the Dow Jones Index.
- Use ddj <- diff(dj) to compute the daily changes in the index. Plot ddj and its ACF. Do the changes in the Dow Jones Index look like white noise?
ddj <- diff(dj)
p1 <- autoplot(ddj, main = NULL) + xlab("Day") + ylab("Index")
p2 <- ggAcf(window(ddj))
grid.arrange(p1, p2, ncol = 2, top = "Daily Change in the Dow Jones Index")
The time plot shows random fluctuations over time and no trend is observed. The autocorrelation coefficients are small and close to zero, with some random variation. Therefore, the daily changes in the Down Jones Index is a white noise.
The arrivals data set comprises quarterly international arrivals (in thousands) to Australia from Japan, New Zealand, UK and the US.
- Compare the differences between the arrivals from these four countries. Can you identify any unusual observations?
arr_ts <- arrivals
autoplot(arr_ts, facets = TRUE) + geom_smooth() + ggtitle({"International Arrivals to Australia"}) + xlab("Quarter") + ylab("Thousands")
The time plots reveal the following features:
ggpairs(as.data.frame(arr_ts))
According to the correlation plots, the arrivals of US, UK and New Zealand have similar patterns, whose correlation coefficients are larger. The arrivals of Japan are different from the other three countries. Because Japan has a bidirectional trend in the series.