Consider the GDP information in global_economy.
Plot the
GDP per capita for each country over time. Which country has the highest
GDP per capita? How has this changed over time?
The legend had to be hidden in order to see the actual plot since there are 263 countries within the data set.
global_economy %>%
autoplot(GDP/Population, show.legend = FALSE)
## Warning: Removed 3242 rows containing missing values (`geom_line()`).
global_economy %>%
mutate(GDP_per_capita = GDP/Population) %>%
top_n(n = 1)
## Selecting by GDP_per_capita
## # A tsibble: 1 x 10 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Populat…¹ GDP_p…²
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7060236168. 7.18 NA NA NA 38132 185153.
## # … with abbreviated variable names ¹Population, ²GDP_per_capita
global_economy %>%
filter(Country == "Monaco") %>%
mutate(GDP_per_capita = GDP/Population,
`5-MA` = slider::slide_dbl(GDP_per_capita, mean, .before = 2, .after = 2, .complete = TRUE)) %>%
autoplot(GDP_per_capita) +
geom_line(aes(y = `5-MA`), colour = "#D55E00") +
labs(y = "GDP Per Capita",
title = "Monaco GDP Per Capita Time Series") +
guides(colour = guide_legend(title = "series"))
## Warning: Removed 11 rows containing missing values (`geom_line()`).
## Warning: Removed 15 rows containing missing values (`geom_line()`).
The output above shows us that Monaco has the highest GDP per capita. The plot that was generated above also shows us that the GDP per capita (black line) of Monaco has steadily risen over the years, with some significant dips in the mid-1980s and early-2000s, but rising back up after several years. The general trend and these dips are also reflected in the 5-MA (red line).
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
global_economy
.aus_livestock
.vic_elec
.aus_production
.global_economy
.global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP) +
labs(title = "GDP of the United States")
Since GDP changes with population, we have to apply a population adjustment, where we divide the GDP by the population.
global_economy %>%
filter(Country == "United States") %>%
mutate(GDP_per_capita = GDP/Population) %>%
autoplot(GDP_per_capita) +
labs(title = "GDP Per Capita of the United States")
aus_livestock
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
autoplot(Count) +
labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers")
It does not look like that the seasonality varies with respect to the level of the series. We see a giant drop that occurred in 1979, but we have too little data before this drop to determine if the seasonality of the data was any different after this drop. Let’s apply a Box-Cox anyways just to see what the graph looks like.
lambda <- aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
autoplot(box_cox(Count, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Count with $\\lambda$ = ",
round(lambda,2))))
The guerrero
transformation gives us a lambda value of
-0.04, which is pretty close to zero. Also, as we can see, the seasonal
variation doesn’t look all that different when comparing the two
graphs.
vic_elec
vic_elec %>%
autoplot(Demand) %>%
labs(title = "Victorian Electricity Demand")
## [[1]]
##
## $title
## [1] "Victorian Electricity Demand"
##
## attr(,"class")
## [1] "labels"
I don’t think that we have to apply any transformations here.
aus_production
aus_production %>%
autoplot(Gas) %>%
labs(title = "Gas Production")
## [[1]]
##
## $title
## [1] "Gas Production"
##
## attr(,"class")
## [1] "labels"
Right off the bat, we should apply a Box-Cox transformation, to account for the increase in variation in the series that we see in the plot above.
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
The output above shows us that a \(\lambda\) value of 0.11 has transformed the data to a point where the size of the seasonal variation is about the same across the entire series.
Why is a Box-Cox transformation unhelpful for the
canadian_gas
data?
canadian_gas %>%
autoplot(Volume)
A Box-Cox transformation would be unhelpful for the
canadian_gas
data because the size of the seasonal
variation is not significant enough to warrant the use of a Box-Cox
transformation across the entire data set as shown in the plot above. We
see an increase in the seasonal variation between 1970 to 1990, but this
variation is not as significant as the variation shown in the gas
production data in aus_production
.
What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
set.seed(23987)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`
The output above shows us that some sort of transformation must take place in order to reduce the seasonal variability that is present in the plot above. The variability increases as the level of the series increases.
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries %>%
autoplot(box_cox(Turnover, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed turnover with $\\lambda$ = ",
round(lambda,2))))
By using the guerrero
feature, the graph above shows us
that a \(\lambda\) of 0.14 gives us a
graph where the seasonal variation is consistent across the entire
series. Therefore, I would go with a Box-Cox transformation with a \(\lambda\) value of 0.14 for my dataset.
For the following series, find an appropriate Box-Cox transformation
in order to stabilize the variance. Tobacco from
aus_production
, Economy class passengers between Melbourne
and Sydney from ansett
, and Pedestrian counts at Southern
Cross Station from pedestrian
.
aus_production
.aus_production %>%
autoplot(Tobacco)
## Warning: Removed 24 rows containing missing values (`geom_line()`).
It does not look like a Box-Cox transformation would help this data out that much. There’s a bit of seasonal variation in the beginning, but it almost looks indistinguishable from the rest of the time series.
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 24 rows containing missing values (`geom_line()`).
The lambda as determined by the guerrero
feature is
almost 1, and this is reflected in the plot too. The values may have
changed when we compare the y-axis between the two plots, but the shape
and variation looks more or less the same.
ansett
ansett %>%
filter(Class == "Economy" & Airports == "MEL-SYD") %>%
autoplot(Passengers)
lambda <- ansett %>%
filter(Class == "Economy" & Airports == "MEL-SYD") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
ansett %>%
filter(Class == "Economy" & Airports == "MEL-SYD") %>%
autoplot(box_cox(Passengers, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Passengers with $\\lambda$ = ",
round(lambda,2))))
If we compare the two graphs, we can see that a Box-Cox
transformation using a \(lambda\) value
of 2 that was determined by the guerrero
feature has a
resulted in a reduction in the y-axis range/a more stable variance.
pedestrian
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
select(Date_Time, Count) %>%
autoplot(Count)
The graph above shows us a huge drop in pedestrian traffic in 2016. So it’s possible that a Box-Cox transformation could stabilize the variance shown. The plot above also shows us that a significant amount of the counts never exceed 500.
lambda <- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
select(Date_Time, Count)%>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
select(Date_Time, Count) %>%
autoplot(box_cox(Count, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Count with $\\lambda$ = ",
round(lambda,2))))
A Box-Cox transformation using a \(\lambda\) value of -0.25 as determined by
the guerrero
featured produced the plot above. The huge
drop is less noticeable in the transformed Box-Cox plot shown above.
Notice that the middle of the graph above is darker than the rest while
with the original data, the darker part was on the bottom of the graph,
so the Box-Cox transformation has shifted those counts upwards.
Consider the last five years of the Gas data from
aus_production
.
gas <- tail(aus_production, 5*4) |> select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition
with
type = multiplicative
to calculate the trend-cycle and
seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas %>%
autoplot(Gas)
The plot above exhibits consistent seasonality that does not vary and an upward moving trend-cycle.
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
autoplot() +
labs(title = "Classical multiplicative decomposition of Gas")
## Warning: Removed 2 rows containing missing values (`geom_line()`).
The results in part b support the graphical interpretation from part a. As we can see from the part b answer, there is an upward trend, and consistent seasonality that does not vary.
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Original Data")) +
geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted Data")) +
scale_colour_manual(
values = c("grey", "red"),
breaks = c("Original Data", "Seasonally Adjusted Data")
)
gas_copy <- gas
gas_copy$Gas[10] <- gas_copy$Gas[10] + 300
gas_copy %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Original Data")) +
geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted Data")) +
scale_colour_manual(
values = c("grey", "red"),
breaks = c("Original Data", "Seasonally Adjusted Data")
)
The outlier has drastically altered the shape of the “Seasonally Adjusted Data” plot. While the upward trend in the “Seasonally Adjusted Data” plot using the original data is easy to see, the upward trend is much more subdued in the data with the outlier. Also, the “Seasonally Adjusted Data” plot using the data with the outlier has this giant spike where the outlier is in the “Original Data”. We can also see that there is this saw-tooth pattern in the “Seasonally Adjusted Data” plot that repeats each year. It looks like the seasonality has leaked over to the “Seasonally Adjusted Data” plot as a result of the giant outlier.
gas_copy <- gas
gas_copy$Gas[length(gas_copy$Gas) - 1] <- gas_copy$Gas[length(gas_copy$Gas) - 1] + 300
gas_copy %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Original Data")) +
geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted Data")) +
scale_colour_manual(
values = c("grey", "red"),
breaks = c("Original Data", "Seasonally Adjusted Data")
)
The plot above shows us that it actually does make a difference if the outlier is placed near the end rather than the middle. As we can see, we do not have as pronounced of a sawtooth pattern as we did when we had the outlier in the middle of the data in Question 7e. The “Seasonally Adjusted Data” plot for the first 18 values looks like it closely matches the “Seasonally Adjusted Data” plot when the original data was used in Question 7d.
Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
set.seed(23987)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11_dcmp_myseries <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
autoplot(x11_dcmp_myseries) +
labs(title =
"Decomposition of Turnover Using X-11.")
The trend is moving upwards but we expect to see that if we just look
at a plot of the original data. The seasonality is more or less the same
across the entire time series. The irregular
plot shows us
that there may have been an increase in Turnover
in 1989,
but other than that there is not really anything unusual I can see. I
also do not see any outliers across all 4 plots.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
Is the recession of 1991/1992 visible in the estimated components?
Notice how in Figure 3.19, the remainder
plot shows us a
giant variation that begins sometime in the year 1991. Notice how the
grey bar to the left of the remainder part is larger than the grey bar
for the value
and trend
plots and smaller than
the season_year
plot. The larger the bar, the smaller the
variation, which indicates that the remainder has a larger variation
than the seasonality. The y-axis range is only 200 for the
season_year
plot while it is 400 for the
remainder
plot. There must have been a sharp decrease in
the civilian labor force that occurred sometime in 1991. You can see
this in the value
plot too; there is a small decrease in
1991.
The recession of 1991/1992 is definitely visible in the estimated
components, especially in the remainder
plot. We can
definitely see in the remainder
plot, a significant period
of variation that starts in 1991 and ends in the following year.