Use autoplot() to plot each of these in separate plots (gold, woolyrnq and gas).
# Set minimal theme
theme_set(theme_minimal())
# Autoplot gold data
data(gold)
autoplot(gold)# Autoplot wolyrnq data
data(woolyrnq)
autoplot(woolyrnq)# Autoplot gas data
data(gas)
autoplot(gas)What is the frequency of each series? Hint: apply the frequency() function.
# Show frequencies
print(paste0("Frequency of gold data: ", frequency(gold)))## [1] "Frequency of gold data: 1"
print(paste0("Frequency of woolyrnq data: ", frequency(woolyrnq)))## [1] "Frequency of woolyrnq data: 4"
print(paste0("Frequency of gas data: ", frequency(gas)))## [1] "Frequency of gas data: 12"
Use which.max() to spot the outlier in the gold series. Which observation was it?
# Outlier in gold series
print(paste0("Max in gold data appears at index ", which.max(gold), " (", as.Date("1985-01-01") + which.max(gold), ")"))## [1] "Max in gold data appears at index 770 (1987-02-10)"
print(paste0("Max value in gold data is ", gold[which.max(gold)]))## [1] "Max value in gold data is 593.7"
Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file. You can read the data into R with the following script:
# Load data
GET('https://github.com/klgriffen96/summer23_data624/raw/main/hw_1/retail.xlsx', write_disk(tmpfile <- tempfile(fileext=".xlsx")))## Response [https://raw.githubusercontent.com/klgriffen96/summer23_data624/main/hw_1/retail.xlsx]
## Date: 2023-06-10 15:38
## Status: 200
## Content-Type: application/octet-stream
## Size: 639 kB
## <ON DISK> C:\Users\micha\AppData\Local\Temp\RtmpqsvPO3\filea1898f13ee.xlsx
retaildata <- readxl::read_excel(tmpfile, skip=1)Select one of the time series as follows (but replace the column name with your own chosen column):
# Choose variable A3349415T
myts <- ts(retaildata[,"A3349415T"], frequency=12, start=c(1982, 4))
head(myts, 24)## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1982 15.6 15.8 15.2 15.2 14.5 15.1 16.3 17.5 21.5
## 1983 15.6 16.3 16.8 16.0 16.1 16.1 16.2 17.2 16.9 18.2 19.4 24.8
## 1984 17.5 17.7 18.8
Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
There is a gradual rise from around 20 to about 100 units from 1982 through around 2007, when the trend takes a downward turn, possibly corresponding to the global economic downturn at that time. Seasonally, sales are fairly steady throughout the year until December, when there is an uptick. The uptick appears to be more pronounced in more recent years, with smaller December peaks in the 1980s. There doesn’t appear to be any cyclic patterns to the time series.
# Plots - autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()
# Time-series autoplot
autoplot(myts)# Seasonal plot
myts %>% ggseasonplot()# Seasonal subseries plots
myts %>% ggsubseriesplot()# Lag plot
myts %>%gglagplot(lags=12, do.lines=T)# Autocorrelation function plot (correlogram)
myts %>% ggAcf(lag=144)Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?
There is an upward trend from roughly 975 to 1375. There doesn’t seem to be any cyclicty, but there is a pronounced seasonal component in which the data peak in September and dipping in February.
# Load data
data(plastics)
str(plastics)## Time-Series [1:60] from 1 to 5.92: 742 697 776 898 1030 ...
summary(plastics)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 697.0 947.8 1148.0 1162.4 1362.5 1637.0
head(plastics, 24)## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 697 776 898 1030 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
# Time-series plot
autoplot(plastics)Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.
# Function to calculate MA
calcMA <- function(tsvals, m) {
# For odd m, each moving average will be calculated using the timeseries values from (t - k) to (t + k), where
# k = (m - 1) / 2, since m = 2k + 1.
# For even m, calculate each average from timeseries values from (t - k_lo) to (t + k_hi), where
# k_lo = -floor((m - 1) / 2)
# and
# k_hi = ceiling((m - 1) / 2)
# After calculating m-MA, caclate 2xm-MA by calling calcMA recursively with m = 2
# Calculate k based on m = 2k + 1
if ((m %% 2) == 0) {
# Even m
k_lo <- floor((m - 1) / 2)
k_hi <- ceiling((m - 1) / 2)
} else {
# Odd m
k_lo <- (m - 1) / 2
k_hi <- k_lo
}
# Find starting and ending values of t
t_start <- 1 + k_lo
t_end <- length(tsvals) - k_hi
# Error check to make sure t_end isn't zero or negative
if (t_end < 1) {
print("Ending value can't be zero or negative!")
return(c(0))
}
# MA calcs
print(paste0("Calculating ", m, " -MA from ", t_start, " to ", t_end, " using sliding window from t-", k_lo, " to t+", k_hi))
retval <- c() # Initialize return value
for (t in seq(t_start, t_end)) {
#print(paste0("t=", t, ", t - k_lo=", t - k_lo, ", t + k_hi=", t+ k_hi, ", mean=", mean(tsvals[(t - k_lo):(t + k_hi)])))
retval <- c(retval, mean(tsvals[(t - k_lo):(t + k_hi)]))
}
# Handle even values of m
if ((m != 2) & (m %% 2) == 0) {
retval <- calcMA(retval, 2)
}
# Pad NAs at front and back to keep the timeseries length the same as the original series
if (m != 2) {
retval <- c(rep(NA, floor(m / 2)), retval)
retval <- c(retval, rep(NA, floor(m / 2)))
}
# Return
return(retval)
}
# Calculate 12-MA
trendcycle <- ts(calcMA(plastics, 12), frequency=12)## [1] "Calculating 12 -MA from 6 to 54 using sliding window from t-5 to t+6"
## [1] "Calculating 2 -MA from 1 to 48 using sliding window from t-0 to t+1"
print("trend-cycle:")## [1] "trend-cycle:"
trendcycle## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500 NA
## Aug Sep Oct Nov Dec
## 1 977.0417 977.0833 978.4167 982.7083 990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5 NA NA NA NA NA
trendcycle %>% autoplot() +
ggtitle("Trend-cycle (2x12 Moving average)")print("")## [1] ""
# Check using built-in function
print("Checking with built-in function")## [1] "Checking with built-in function"
plastics %>% ma(12, centre=T)## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500 NA
## Aug Sep Oct Nov Dec
## 1 977.0417 977.0833 978.4167 982.7083 990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5 NA NA NA NA NA
# Calculate detrended series using classical multiplicative decomposition
detrended <- plastics / trendcycle
print("detrended data:")## [1] "detrended data:"
detrended## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 1.1924766
## 2 0.7406605 0.6922411 0.7571225 0.9007369 1.0511298 1.1598830 1.2103679
## 3 0.8018794 0.7070625 0.7827241 0.9232452 1.0437044 1.1401548 1.1133184
## 4 0.7867903 0.7049913 0.7615439 0.8919870 1.0118138 1.1139108 1.1540627
## 5 0.7492044 0.7466313 0.8152036 0.9375570 1.0864015 1.2296714 NA
## Aug Sep Oct Nov Dec
## 1 1.2445733 1.2363326 1.1559492 0.9880856 0.7905764
## 2 1.2535718 1.2363720 1.1842376 0.9656890 0.8098573
## 3 1.2216078 1.2477323 1.2261603 0.9830556 0.8545771
## 4 1.1979585 1.2216299 1.2046681 1.0442225 0.8885622
## 5 NA NA NA NA NA
# Calculate seasonal component as the average of the detrended values for each period
seasonalvals <- c()
for (i in seq(1, 12)) {
# Create boolean array containing true values for every value corresponding to the month of integer i
bool_array <- c()
for (j in seq(1, length(detrended))) {
bool_array <- c(bool_array, (j %% 12) == (i %% 12))
}
seasonalvals <- c(seasonalvals, mean(detrended[bool_array], na.rm=T))
}
seasonalvals <- ts(rep(seasonalvals, floor(length(detrended) / 12)), frequency=12)
print("seasonal data:")## [1] "seasonal data:"
seasonalvals## Jan Feb Mar Apr May Jun Jul
## 1 0.7696337 0.7127315 0.7791485 0.9133815 1.0482624 1.1609050 1.1675564
## 2 0.7696337 0.7127315 0.7791485 0.9133815 1.0482624 1.1609050 1.1675564
## 3 0.7696337 0.7127315 0.7791485 0.9133815 1.0482624 1.1609050 1.1675564
## 4 0.7696337 0.7127315 0.7791485 0.9133815 1.0482624 1.1609050 1.1675564
## 5 0.7696337 0.7127315 0.7791485 0.9133815 1.0482624 1.1609050 1.1675564
## Aug Sep Oct Nov Dec
## 1 1.2294279 1.2355167 1.1927538 0.9952632 0.8358933
## 2 1.2294279 1.2355167 1.1927538 0.9952632 0.8358933
## 3 1.2294279 1.2355167 1.1927538 0.9952632 0.8358933
## 4 1.2294279 1.2355167 1.1927538 0.9952632 0.8358933
## 5 1.2294279 1.2355167 1.1927538 0.9952632 0.8358933
seasonalvals %>% autoplot() +
ggtitle("Seasonal component")Do the results support the graphical interpretation from part a?
Yes, the trend-cycle graph illustrates that the trend is generally upward from roughly 975 to 1375, which corresponds visually to the average values on the full time-series plot. Likewise, the seasonal graphs are close to the seasonality evident in the full plot, peaking in September and reaching its nadir in February.
Compute and plot the seasonally adjusted data.
# Compute seasonally adjusted data
seasonally_adj <- plastics / seasonalvals
print("seasonally adjusted data:")## [1] "seasonally adjusted data:"
seasonally_adj## Jan Feb Mar Apr May Jun Jul
## 1 964.0950 977.9278 995.9590 983.1598 982.5784 953.5664 997.8105
## 2 962.7957 982.1370 993.3921 1020.3841 1048.4017 1053.4884 1104.8717
## 3 1164.1902 1112.6209 1135.8553 1155.0485 1148.5674 1142.2123 1116.0060
## 4 1235.6528 1208.0285 1203.8783 1214.1695 1215.3446 1224.9064 1272.7436
## 5 1338.2990 1447.9505 1445.1673 1406.8601 1400.4128 1410.1068 1379.8049
## Aug Sep Oct Nov Dec
## 1 989.0780 977.7286 948.2259 975.6213 936.7225
## 2 1097.2584 1085.3759 1086.5612 1071.0735 1077.8888
## 3 1168.0230 1192.2137 1218.1894 1175.5684 1223.8405
## 4 1264.8160 1298.2423 1341.4336 1409.6773 1446.3569
## 5 1307.9255 1236.7295 1190.5223 1124.3257 1211.8772
seasonally_adj %>% autoplot() +
ggtitle("Seasonally adjusted data")Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
As shown below, There is a noticeable effect on the trend, which exhibits a noticeable increase in the moving average over the entire year in which the outlier was introduced. Additionally, the seasonal data exhibits a sharp upward spike for the month in which the outlier was introduced (November). And as expected, the remainder for the month where the outlier was added exhibits a sharp increase.
# Add an outlier near the middle
tmp_plastics <- plastics
tmp_plastics[30] <- tmp_plastics[30] + 500
fit <- tmp_plastics %>%
decompose(type='multiplicative')
autoplot(fit) +
xlab('Year') +
ggtitle("Classical multiplicative decomposition with outlier near middle")# Seasonally adjusted data
autoplot(tmp_plastics, series='Data') +
autolayer(seasadj(fit), series='Seasonally Adjusted') +
scale_color_manual(values=c('gray', 'red'), breaks=c('Data', 'Seasonally Adjusted')) +
ggtitle('Seasonally adjusted data with outlier near middle') +
xlab('Year')Does it make any difference if the outlier is near the end rather than in the middle of the time series?
There is less of an effect if the outlier is introduced at the beginning or end, since a twelve-month moving average is taken; this means the first and last six values of the trend-cycle data can’t be calculated and, therefore, won’t affect the plots for the trend-cycle or seasonally adjusted data.
# Add an outlier near the beginning
tmp_plastics <- plastics
tmp_plastics[2] <- tmp_plastics[2] + 500
fit <- tmp_plastics %>%
decompose(type='multiplicative')
autoplot(fit) +
xlab('Year') +
ggtitle("Classical multiplicative decomposition with outlier near beginning")# Seasonally adjusted data
autoplot(tmp_plastics, series='Data') +
autolayer(seasadj(fit), series='Seasonally Adjusted') +
scale_color_manual(values=c('gray', 'red'), breaks=c('Data', 'Seasonally Adjusted')) +
ggtitle('Seasonally adjusted data with outlier near beginning') +
xlab('Year')tmp_plastics %>%
decompose(type='multiplicative')## $x
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 1197 776 898 1030 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
## 3 896 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
##
## $seasonal
## Jan Feb Mar Apr May Jun Jul
## 1 0.7682391 0.7114401 0.7777367 0.9117265 1.0463629 1.1588014 1.1532684
## 2 0.7682391 0.7114401 0.7777367 0.9117265 1.0463629 1.1588014 1.1532684
## 3 0.7682391 0.7114401 0.7777367 0.9117265 1.0463629 1.1588014 1.1532684
## 4 0.7682391 0.7114401 0.7777367 0.9117265 1.0463629 1.1588014 1.1532684
## 5 0.7682391 0.7114401 0.7777367 0.9117265 1.0463629 1.1588014 1.1532684
## Aug Sep Oct Nov Dec
## 1 1.2207160 1.2332780 1.1905925 0.9934598 0.8343786
## 2 1.2207160 1.2332780 1.1905925 0.9934598 0.8343786
## 3 1.2207160 1.2332780 1.1905925 0.9934598 0.8343786
## 4 1.2207160 1.2332780 1.1905925 0.9934598 0.8343786
## 5 1.2207160 1.2332780 1.1905925 0.9934598 0.8343786
##
## $trend
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 1018.6250
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500 NA
## Aug Sep Oct Nov Dec
## 1 997.8750 977.0833 978.4167 982.7083 990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5 NA NA NA NA NA
##
## $random
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 0.9917020
## 2 0.9641016 0.9730140 0.9734946 0.9879464 1.0045557 1.0009334 1.0495110
## 3 1.0437889 0.9938468 1.0064126 1.0126340 0.9974593 0.9839087 0.9653593
## 4 1.0241477 0.9909356 0.9791796 0.9783493 0.9669817 0.9612612 1.0006888
## 5 0.9752230 1.0494648 1.0481743 1.0283314 1.0382645 1.0611579 NA
## Aug Sep Oct Nov Dec
## 1 0.9982580 1.0024769 0.9709025 0.9945905 0.9475031
## 2 1.0269153 1.0025088 0.9946623 0.9720464 0.9706113
## 3 1.0007306 1.0117203 1.0298740 0.9895273 1.0242078
## 4 0.9813573 0.9905552 1.0118223 1.0510969 1.0649389
## 5 NA NA NA NA NA
##
## $figure
## [1] 0.7682391 0.7114401 0.7777367 0.9117265 1.0463629 1.1588014 1.1532684
## [8] 1.2207160 1.2332780 1.1905925 0.9934598 0.8343786
##
## $type
## [1] "multiplicative"
##
## attr(,"class")
## [1] "decomposed.ts"
# Add an outlier near the end
tmp_plastics <- plastics
tmp_plastics[58] <- tmp_plastics[58] + 500
fit <- tmp_plastics %>%
decompose(type='multiplicative')
autoplot(fit) +
xlab('Year') +
ggtitle("Classical multiplicative decomposition with outlier near end")# Seasonally adjusted data
autoplot(tmp_plastics, series='Data') +
autolayer(seasadj(fit), series='Seasonally Adjusted') +
scale_color_manual(values=c('gray', 'red'), breaks=c('Data', 'Seasonally Adjusted')) +
ggtitle('Seasonally adjusted data with outlier near end') +
xlab('Year')tmp_plastics %>%
decompose(type='multiplicative')## $x
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 697 776 898 1030 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
## 3 896 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1920 1119 1013
##
## $seasonal
## Jan Feb Mar Apr May Jun Jul
## 1 0.7683844 0.7115746 0.7778838 0.9083952 1.0384496 1.1497059 1.1656612
## 2 0.7683844 0.7115746 0.7778838 0.9083952 1.0384496 1.1497059 1.1656612
## 3 0.7683844 0.7115746 0.7778838 0.9083952 1.0384496 1.1497059 1.1656612
## 4 0.7683844 0.7115746 0.7778838 0.9083952 1.0384496 1.1497059 1.1656612
## 5 0.7683844 0.7115746 0.7778838 0.9083952 1.0384496 1.1497059 1.1656612
## Aug Sep Oct Nov Dec
## 1 1.2274323 1.2335112 1.1908177 0.9936477 0.8345364
## 2 1.2274323 1.2335112 1.1908177 0.9936477 0.8345364
## 3 1.2274323 1.2335112 1.1908177 0.9936477 0.8345364
## 4 1.2274323 1.2335112 1.1908177 0.9936477 0.8345364
## 5 1.2274323 1.2335112 1.1908177 0.9936477 0.8345364
##
## $trend
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1391.4167 1392.9167 1372.9167 NA
## Aug Sep Oct Nov Dec
## 1 977.0417 977.0833 978.4167 982.7083 990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5 NA NA NA NA NA
##
## $random
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 1.0230045
## 2 0.9639193 0.9728300 0.9733105 0.9915695 1.0122107 1.0088519 1.0383530
## 3 1.0435915 0.9936589 1.0062223 1.0163476 1.0050603 0.9916925 0.9550960
## 4 1.0239540 0.9907482 0.9789945 0.9819372 0.9743504 0.9688659 0.9900498
## 5 0.9750386 1.0492663 1.0479761 1.0166492 1.0148819 1.0370931 NA
## Aug Sep Oct Nov Dec
## 1 1.0139650 1.0022873 0.9707189 0.9944024 0.9473240
## 2 1.0212962 1.0023192 0.9944743 0.9718626 0.9704278
## 3 0.9952548 1.0115290 1.0296793 0.9893402 1.0240141
## 4 0.9759874 0.9903679 1.0116310 1.0508982 1.0647375
## 5 NA NA NA NA NA
##
## $figure
## [1] 0.7683844 0.7115746 0.7778838 0.9083952 1.0384496 1.1497059 1.1656612
## [8] 1.2274323 1.2335112 1.1908177 0.9936477 0.8345364
##
## $type
## [1] "multiplicative"
##
## attr(,"class")
## [1] "decomposed.ts"