library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.0 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.5.0
## ✔ ggplot2 4.0.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(scales)
library(ggrepel)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(patchwork)
library(slider)
global_economy. Plot the GDP per capita for each country
over time. Which country has the highest GDP per capita? How has this
changed over time?In this code cell I calculate GDP per capita for each country, and
apply separate box-cox and log transformations to GDP per capita
values.
I also filtered the dataset to include only values for countries, and
excluded aggregated values for developed and undeveloped countries
global_economy
gdp_per_capita <- global_economy |>
mutate(Code = as.character(Code)) |>
filter(nchar(Code) == 3) |>
filter(!is.na(GDP), !is.na(Population)) |>
mutate(GDP_per_capita = GDP / Population) |>
as_tsibble(key = Country, index = Year)
gdp_per_capita
lambda = BoxCox.lambda(gdp_per_capita$GDP_per_capita, method = 'guerrero')
gdp_per_capita <- gdp_per_capita |>
mutate(GDP_boxcox = BoxCox(GDP_per_capita, lambda)) |>
mutate(GDP_pc_log = log(GDP_per_capita))
gdp_per_capita
Let’s plot GDP per capita for all countries
gdp_per_capita |>
ggplot(aes(x= Year, y = GDP_per_capita, group = Country))+
geom_line(alpha = 0.15)+
labs(
y= 'GDP per capita',
title = 'Plot of GDP/capita for each country'
)
The plot above does not show clear picture of GDP per capita series. A lot of countries are compressed at the bottom of the plot.
The better idea is to use Log transformed values for the plot.
gdp_per_capita |> ggplot(aes(x= Year,
y = GDP_per_capita,
group = Country))+
geom_line(alpha = 0.15) +
scale_y_log10(labels = label_number())+
labs(
y = 'Log transformed GDP/capita',
title = "Log transformed GDP/capita time series for each country"
)
The log transformed plot gives a cleaner view by compressing extreme values and allowing growth patterns to be compared more effectively.
Trend: There is presence of overall upward rising trend
Cyclic: There is a strong visual
presence of overall synchronized cyclic patterns before 2000s for
countries with higher GDP/capita , while lower earners doesn’t have
overall same cyclic pattern.
After 2000s, it appears that across a larger group of the countries
cyclic patterns became more aligned over the years. This suggest that
most of countries are integrated in the world economy after
2000.
Seasonality: No visual evidence of seasonality in this plot beacause the data is annual
highest_gdp <- gdp_per_capita |>
slice_max(GDP_per_capita, n = 1)
print(highest_gdp[c('Country', "Year", 'GDP_per_capita')])
## # A tsibble: 1 x 3 [1Y]
## # Key: Country [1]
## Country Year GDP_per_capita
## <fct> <dbl> <dbl>
## 1 Monaco 2014 185153.
Monaco in 2014 had highest GDP per capita of $185153
top_gdp_by_years <- gdp_per_capita |>
as_tibble() |>
group_by(Year) |>
slice_max(GDP_per_capita, n = 1, with_ties = FALSE) |>
ungroup() |>
as_tsibble(index = Year, key = Country)
top_gdp_by_years
top_gdp_by_years |>
ggplot(aes(x = Year,
y = GDP_per_capita))+
geom_line()+
geom_point(aes(colour = Country), size = 1.9, alpha = 0.8)+
scale_y_log10(labels = label_number())
There is a rising trend in this series with a suggestion of cyclic pattern after 1980s.
In this plot we see United States dominated during 1960-1970s earning
highest GDP/capita around the world.
However, Monaco took over in overall dominance over years after 1970
till the end of series. Also, Liechtenstein and Luxembourg had few years
og highest GDP per capita values in a recent years.
global_economy.In general it is better to transform economical data into per capita
us_gdp <- global_economy |>
filter(Code == "USA") |>
mutate(GDP_per_capita = GDP / Population)
p1 <- us_gdp |>
ggplot(aes(x = Year, y = GDP / 1000000)) +
geom_line(color = "blue") +
scale_y_continuous(labels = label_number(big.mark = ",")) +
labs(title = "United States GDP",
y = "GDP (millions)")
p2 <- us_gdp |>
ggplot(aes(x = Year, y = GDP_per_capita)) +
geom_line(color = "blue") +
labs(title = "United States GDP per capita",
y = "GDP per capita")
p1 / p2
Per capita transformation didn’t make much difference. But for further analysis I would use per capita transformed data.
aus_livestock.victoria_livestock <- aus_livestock |>
filter(Animal == 'Bulls, bullocks and steers') |>
filter(State == "Victoria")
victoria_livestock |> autoplot(Count)
The plot reveals that there is a overall slightly decreasing trend, with suggestions of cyclic pattern.
In general mathematical transformations such as Box-Cox or simple Log transformations delivers desired outcome for following requirements for data:
Heteroscedasticity - given data exhibit non-constant variance
when variability increases as the level of the series increases, indicating multiplicative behavior rather than additive structurent variance
when the distribution of the data is positively skewed, as transformations can help make it more symmetric.
transformations are beneficial when the seasonal fluctuations grow proportionally with the trend
If the variance is already stable and the seasonal pattern is additive, applying such transformations may not improve analysis and can mislead interpretation of the data.
data is strictly positive
it is useful to make a ‘Rolling mean vs. Rolling standard
deviation’ plot. If points in that plot are trending upwards - variance
depends on level, therefore Log/Box-Cox transformations are probably
useful.
victoria_livestock_rm_rsd <- victoria_livestock |>
mutate(
rolling_mean = slide_dbl(Count, mean, .before = 11, .complete = TRUE),
rolling_sd = slide_dbl(Count, sd, .before = 11, .complete = TRUE)
) |>
filter(!is.na(rolling_mean), !is.na(rolling_sd))
ggplot(victoria_livestock_rm_rsd,
aes(x = rolling_mean, y = rolling_sd))+
geom_point()
The rolling mean versus rolling standard deviation plot does not show strong upward linear trend. It apears to be scattered and points are not forming a clear increasing pattern.
From both plots I can say that mathematical transformations probably will not be helpful for this time series.
Let’s apply Box-Cox transformation and see if my findings are correct
lambda_meat <- BoxCox.lambda(victoria_livestock$Count, method = 'guerrero')
lambda_meat
## [1] 0.1615099
Lambda’s value is 0.16 and it close to zero. This transformation will be similar to Log transformation.
Let’s take a look at a plot of Box-Cox transformed time series.
victoria_livestock <- victoria_livestock |>
mutate(Count_bxcx = BoxCox(Count, lambda_meat))
victoria_livestock |> autoplot(Count_bxcx)+
labs(title = 'Box-Cox transformed time series. Lambda = 0.16',
y = 'Count (box-cox transformed')
As we see there are not much differences between original data and transformed data plots.
vic_elec.vic_elec
vic_elec |>
autoplot(Demand) +
labs(title = 'Victorian Electricity Demand. Time period - 30 minutes')
From this plot I can say:
Structure appears as additive, not multiplicative
Overall variance appears to be stable and constant over years, except small amount of increasing spikes at a start of each year.
Mathematical transformations likely are not helpfull for this series
Let’s apply Box-Cox transformation.
lambda_ve <- vic_elec |>
features(Demand, features = guerrero) |>
pull(lambda_guerrero)
lambda_ve
## [1] 0.09993089
Lambda = 0.099 and is almost equals to zero, so it is appropriate just to apply Log transformation
vic_elec |>
mutate(demand_bxcx = box_cox(Demand, 0)) |>
autoplot(demand_bxcx)+
scale_y_continuous(labels = label_number())+
labs(title = 'Log transformed time series',
y = 'Demand (log transformed')
Log transformation just compressed variation in this data and did not provide better homogeneous result. Therefore transformations are not needed for given time series .
aus_production.gas_production <- aus_production |>
select(c('Quarter', 'Gas'))
#gas_production
gas_production |> autoplot(Gas)+
labs(title = "Gas production time series",
y = 'Production Volumes')
The plot reveals folowing insights:
There is a clear upward long-run trend along with strong quarterly seasonality
The size of the seasonal swings increases as the overall level of the series rises, which suggests a multiplicative structure rather than an additive one
This pattern indicates heteroscedasticity, since variability grows with the trend level of production over time.
Log transformation can stabilize variance and convert multiplicative structure into additive
lambda_gas <- gas_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
lambda_gas
## [1] 0.1095171
Lambda = 0.1 and log transformation is reasonable for this time series
gas_production <- gas_production |>
mutate(Gas_bxcx = box_cox(Gas, lambda_gas)) |>
mutate(Gas_log = log(Gas))
gas_production |> autoplot(Gas_log)+
labs(title = 'Log transformed gas production time series',
y = 'Production volumes (log transformed)' )
We see that Log transformation helped to stabilize seasonal variation and now transformed time series appear as additive structure.
canadian_gas data?canadian_gas
lambda_canada_gas <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canada_gas <- canadian_gas |>
# filter(year(Month)>1990) |>
#mutate(Volume = Volume *10000) |>
mutate(Volume_bxcx = box_cox(Volume, lambda_canada_gas)) |>
mutate(Volume_log = log(Volume)) |>
mutate(Volume_inverse = box_cox(Volume, -0.3))
lambda_canada_gas <- canada_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
lambda_canada_gas
## [1] 0.5767648
canada_gas |>
autoplot(Volume)+
labs(title = 'Canadian gas production time series')
The plot reveals that:
canada_gas <- canada_gas |>
mutate(
rolling_mean = slide_dbl(Volume, mean, .before = 11,.complete = TRUE),
rolling_sd = slide_dbl(Volume, sd, .before = 11, .complete = TRUE)
)
ggplot(canada_gas, aes(rolling_mean, rolling_sd))+
geom_point()
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_point()`).
Rolling mean vs. rollinng SD plot not showing linear rising trend of
rolling means along rolling SD’s.
It suggests that variance is not entirely proportional to the trend of
the series.
Box-cox and Log transformations
may not provide stable seasonal variance.
canada_gas |>
autoplot(Volume_bxcx)+
labs(title = "Box-cox transformed time series")
Box-cox transformation plot is almost identical to original time series plot. It confirms my findings above
canada_gas |>
autoplot(Volume_log)+
labs(title = 'Log transformed time series')
It appears that log transformation helped to stabilize seasonal variation until 1990s, but not for entire time series.
aus_retail |>
distinct(State)
aus_retail |>
distinct(Industry)
For this exercise I will use “Electrical and electronic goods retailing” time series
ind_name <- 'Electrical and electronic goods retailing'
ind_states <- aus_retail |>
filter(Industry == ind_name)
industry_aus <- ind_states |>
as.tibble() |>
group_by(Month) |>
summarise(Turnover = sum(Turnover, na.rm = TRUE),
State = 'Australia',
Industry = ind_name,
.groups = 'drop'
) |>
as_tsibble(key = c(State, Industry), index = Month)
## Warning: `as.tibble()` was deprecated in tibble 2.0.0.
## ℹ Please use `as_tibble()` instead.
## ℹ The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
industry_all <- bind_rows(ind_states, industry_aus) |>
filter(State == 'Australia') |>
arrange(Month)
industry_all |>
autoplot(Turnover)+
labs(title = "Electrical and electronic goods retailing time series")
There is strong evidence of multiplicative pattern in this time
series.
Box-cox transformations can bring stableseasonal variance
lambda_retail <- industry_all |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
lambda_retail
## [1] -0.02386302
Lambda = -0.02, it will be similar to log transformation
industry_all <- industry_all |>
mutate(Turnover_bxcx = box_cox(Turnover, lambda_retail))
industry_all |>
autoplot(Turnover_bxcx)
As we see Box-cox transformation helped to stabilize seasonal variance and made data close to homogeneous.
aus_production, Economy class passengers between Melbourne
and Sydney from ansett, and Pedestrian counts at Southern
Cross Station from pedestrian.aus_production |>
autoplot(Tobacco)
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
From this plot I can say that mathematical transformations may not bring stable variance. It appears the data does not meet requirements for mathematical transformation posted in one of the previous exercises.
There are growing and decline trends in this time series
Seasonal variance does not appear changing proportionally with trend levels
lambda_tobacco <- aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
lambda_tobacco
## [1] 0.9264636
Lambda = 0.93. It is very close to value of one, and applying this lambda likely will not change seasonal variance
aus_production |>
mutate(Tobacco_bxcx = box_cox(Tobacco, lambda_tobacco)) |>
autoplot(Tobacco_bxcx)+
labs(title = 'Box-cox transformed time series. Lambda = 0.93')
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
As we see Box-cox transformation did not stabilize the variance.
ansett
melsyd_econ_passengers <- ansett |>
filter(
Airports == 'MEL-SYD',
Class == 'Economy'
)
melsyd_econ_passengers|>
autoplot(Passengers)+
labs(title = "Economy class passengers time series")
any(melsyd_econ_passengers$Passengers <= 0)
## [1] TRUE
melsyd_econ_passengers |>
filter(Passengers == 0) |>
select(Week, Passengers)
It appears that seasonal variance does not appear changing
proportionally with trend levels.
There are also several extreme outliers causing irregular
fluctuations.
I don’t think mathematical transformations will help to stabilize variance. Probably transformation may help to stabilize aggregated monthly or quarterly periods. Since it is not requirement of exercise i will not produce results here. But I will experiment with that later on my own.
Also, there is a continuous period with zero values occured during
pilot’s strike. Log transformation is not applicable unless that period
is removed. Another option is to apply log1p
transformation
Let’s apply Box-cox transformation
lambda_melsyd <- melsyd_econ_passengers |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
lambda_melsyd
## [1] 1.999927
Lambda = 2 and it will not help to stabilize variance. It will increase variance at higher levels.
melsyd_econ_passengers <- melsyd_econ_passengers |>
mutate(Passengers_bxcx = box_cox(Passengers, lambda_melsyd)) |>
# filter(Passengers>0) |>
mutate(Passengers_log1p = log1p(Passengers))
melsyd_econ_passengers |> autoplot(Passengers_bxcx)+
labs(title = 'Box-cox transformed time series. Lambda = 2')
As we see Box-cox with lambda = 2 did not bring stable seasonal variance.
pedestrian_station <- pedestrian |>
filter(Sensor == 'Southern Cross Station')
pedestrian_station
pedestrian_station |>
autoplot(Count)+
labs(title = "Pedestrian counts at Southern Cross Station time series")
This time series plot is dense and it is difficult to make a decision
if transformation is helpfull.
I’ll use Rolling means vs Rolling SD’s plot to see if the variance
increase along with the levels of the series
pedestrian_station <- pedestrian_station |>
fill_gaps() |>
mutate(
rolling_mean = slide_dbl(Count, mean, .before = 24*7, na.rm = TRUE, .complete = TRUE),
rolling_sd = slide_dbl(Count, sd, .before = 24*7, na.rm = TRUE, .complete = TRUE)
)
ggplot(pedestrian_station,aes(rolling_mean, rolling_sd))+
geom_point()+
labs(title = "Rolling mean vs rolling SD's plot")
## Warning: Removed 168 rows containing missing values or values outside the scale range
## (`geom_point()`).
This plot shows as the rolling means increases, the rolling standard deviations also increases. This is a strong evidence of heteroscedasity along with multiplicative structure.
This time series likely have multiple seasonalities:
strong daily seasonality( rush hours, late night hours variances)
weekly seasonality (weekends probably has less pedestrians crossings compared to weekdays)
possible monthly seasonality
Therefore this strongly suggest time series seasonal variance can
stabilize after transformation.
lambda_pedestrian <- pedestrian_station |>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
lambda_pedestrian
## [1] -0.2413498
Lambda = -0.24. This is close to log transformation
Let’s apply Log transformation
pedestrian_station <- pedestrian_station |>
mutate(Count_bxcx = box_cox(Count, 0))
pedestrian_station |>
autoplot(Count_bxcx) +
labs(title = 'Log transformed time series',
y = 'Pedestrians count (log transformed)',
x = "Time period 1 hour")
After transformation we see that :
original high values are compressed
extreme spikes are reduced
there is a vertical spread
pattern still very dense( because it is hourly data)
We just need 7 positions (x1..x7), but we won’t assign values.
We’ll compute how many times each position appears in the 3 windows.
w <- rep(0, 7) # weights for x1..x7
Here each 5-term window contributes 1/5 to its included positions
w[1:5] <- w[1:5] + 1/5 # window 1: x1..x5
w[2:6] <- w[2:6] + 1/5 # window 2: x2..x6
w[3:7] <- w[3:7] + 1/5 # window 3: x3..x7
Now we take the 3-term average of the three windows and multiply by 1/3 to get average of those 3 windows
w <- w / 3
w
## [1] 0.06666667 0.13333333 0.20000000 0.20000000 0.20000000 0.13333333 0.06666667
round(w, 3)
## [1] 0.067 0.133 0.200 0.200 0.200 0.133 0.067
It returns weight coefficients for a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067
Let’s check if weights sum is equal to 1
names(w) <- paste0("x", 1:7)
w
## x1 x2 x3 x4 x5 x6 x7
## 0.06666667 0.13333333 0.20000000 0.20000000 0.20000000 0.13333333 0.06666667
aus_production.gas <- tail(aus_production, 5*4) |>
select(Gas)
gas
7a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas |>
autoplot(Gas)
The time series plot reveals following
Seasonality: We can see production is lowest in each Q1, it increases in Q2, reaching its peaks in Q3, then declines again in Q4. This repeating pattern across all years gives a strong evidence of quarterly seasonality
Trend - cycle : There is strong
evidence of uprising trend because overall level of production increases
from year to year. I can’t say that multi-year cycles appear for given
data.
7b. Use classical_decomposition with
type=multiplicative to calculate the trend-cycle and
seasonal indices.
gas_indices <- gas |>
model(
classical_decomposition(Gas, type = 'multiplicative')
) |>
components()
gas_indices |>
select(Quarter, trend, seasonal)
7c. Do the results support the graphical interpretation from part a?
We can see overall trendindices are rising year to year
and confirms my previous observation. Also there is no evidence of
long-turn cycles, we might need to observe more data to find out if
there are cycles in gas production.
Seasonal indices show quarterly seasonality:
Q1 has smallest indexes in each year (index values less than 1)
Q2 indexes are above average 1
Q3 indexes are highest in each year
Q4 below average 1.
This confirms strong quaterly seasonality.
7d. Compute and plot the seasonally adjusted data.
gas_season_adj <- gas_indices |>
mutate(seasonal_adj = Gas/seasonal)
gas_season_adj |>
select(Quarter, trend, seasonal, seasonal_adj)
gas_season_adj |>
autoplot(seasonal_adj)
We see quarterly fluctuations are removed in seasonally adjusted series. Now it consists of trend and unexplained small remainder components. This confirms rising trend in time series.
7e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas_with_outlier <- gas
gas_with_outlier$Gas[6] <- gas_with_outlier$Gas[6] + 300
gas_with_outlier
gas_with_outlier_components <- gas_with_outlier |>
model(classical_decomposition(Gas, type = 'multiplicative')
) |>
components() |>
mutate(seasonal_adjust = Gas / seasonal)
gas_with_outlier_components |>
select(Quarter, trend, seasonal, seasonal_adjust)
gas_with_outlier_components |>
autoplot(seasonal_adjust)
Classical decomposition applies moving averages to calculate
estimated trend and seasonal components, and in the case of quarterly
time series it uses 2 x 4 centered MA. Moving averages are sensitive to
extreme outliers, and single extreme outlier in the case of quarterly
series changes trend levels of 2 periods before and 2 periods after the
outlier.
Therefore, both trend-cycle and seasonally adjusted series become
locally biased around this extreme outlier.
7f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Let’s set the last period of the series as an outlier.
gas_with_end_outlier <- gas
gas_with_end_outlier$Gas[20] <- gas_with_end_outlier$Gas[20] + 300
gas_with_end_outlier
gas_with_end_outlier_components <- gas_with_end_outlier |>
model(classical_decomposition(Gas, type = 'multiplicative')
) |>
components() |>
mutate(seasonal_adjust = Gas / seasonal)
gas_with_end_outlier_components |>
select(Quarter, trend, seasonal, seasonal_adjust)
gas_with_end_outlier_components |>
autoplot(seasonal_adjust)
As mentioned earlier, classical decomposition uses moving averages
and outliers affects the trend locally around the outlier. However, in
this case the outlier occurs at the end of the series and CMA cannot be
calculated for this particular last period. This results in fewer
complete moving averages windows.
Therefore, the distortion may be less symmetric or may affect fewer
trend estimates compared to an outlier in the middle of the
series.
Same reasoning would apply if an outlier is at first period of the
series.
However, even when located at the boundaries, an outlier can still
create an illusion of a sudden increase or decrease in the underlying
trend, especially in the last few observed periods. This can lead to
misleading interpretations about recent changes in the series.
As a result, extreme outliers should always needs to be investigated
before applying of classical decomposition, because this method is not
robust to outliers and may produce biased trend and seasonal
patterns.
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries
myseries |>
autoplot(Turnover)
The plot reveals following insights.
Trend: There is a strong evidence of upward trend
Seasonality: Same fluctuations patterns year to year suggest a strong evidence of seasonality in series
Outliers: There is suggestion of extreme outlier around 1997-1998
First, let’s apply classical decomposition to compare with followed X11 decomposition method.
myseries |>
model(
classical_decomposition(Turnover, type = 'multiplicative')
) |>
components() |>
autoplot() +
labs(title = 'Classical decomposition of liquor sales in Australia')
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
myseries_x11_dcmp <- myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
myseries_x11_dcmp |>
autoplot()+
labs(title = "X11 decomposition of liquor sales in Australia")
Classical and X11 decompositions plots reveals following findings
Trend: Both decomposition methods show
upward trend over the years.
Classical decomposition apply centered moving average, which results in
smoother trend line.
However, X11 method uses iterative procedures that results in trend
component to respond more more to medium-term movements and structural
changes, making it appear less smoother than the classical
trend.
Seasonality: There is a difference in
seasonal components because classical decomposition computes average
detrended values for each period in the series. This result in seasonal
pattern has same shape year to year.
On the other hand, X11 allows seasonal factors to vary gradually over
time through itterative filtering procedures.
As a result, the classical seasonal component appears stable and
repetitive, while the X11 seasonal component delivers better flexibility
and time variation.
Random/Irregular components: Both
classical and X11 reveal significant irregular fluctuations around 1987
- 1988.
That tells us these fluctuations are not explained by the trend and
seasonality. Usually movements not explained in trend and seasonality
are leaked into reaminder/irregular components.
This strongly suggest presence of outliers in
series, and these outliers require further investigation.
a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
Trend: STL decomposition reveals rising long-term trend with labor levels rises from 6500 to 9000 scale units. The large scale range tells us that most of the variation is explained by trend component in the series
Seasonality: Monthly seasonal component
is roughly in the range of +-100 units. This tells us seasonal component
is a secondary factor of variation in this series.
There is also visual evidence of slight increase in seasonal variation
over the years.
Remainder: Most of the values are in
the range of +- 50 units, with few reaching +- 100 units range.
There are couple of extreme drops around 1992-1993 reaching -400 units,
which indicates outliers presence in this time series.
b. Is the recession of 1991/1992 visible in the estimated
components?
This extreme outliers in remainder component are explained by Australian
economy recession during 1992-1993.
Also, seasonal subseries in Figure 2 reveal that the seasonal effects
are not constant over the years.
For example:
in March and April the values rise and fall over the years
in October, December, January and February they fall and rise
the rest of months show either rise or falls over the years
Therefore, we cannot assume constant seasonal indexes over the years for this series