Exercises: 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book.
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
index(global_economy)
## Year
global_economy <- global_economy %>%
mutate(GDP_per_capita = GDP / Population)
global_economy %>%
autoplot(GDP_per_capita, show.legend = FALSE) +
labs(title = "GDP per Capita Per Country Over Time")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
top_countries <- global_economy %>%
filter(Year == max(Year, na.rm = TRUE)) %>%
arrange(desc(GDP_per_capita)) %>%
slice_head(n=10)
top_countries
## # A tsibble: 10 x 10 [1Y]
## # Key: Country [10]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Luxembourg LUX 2017 6.24e10 2.30 111. 194. 230. 599449
## 2 Macao SAR, China MAC 2017 5.04e10 9.10 136. 32.0 79.4 622567
## 3 Switzerland CHE 2017 6.79e11 1.09 98.3 53.9 65.0 8466017
## 4 Norway NOR 2017 3.99e11 1.92 115. 33.1 35.5 5282223
## 5 Iceland ISL 2017 2.39e10 3.64 122. 42.8 47.0 341284
## 6 Ireland IRL 2017 3.34e11 7.80 105. 87.9 120. 4813608
## 7 Qatar QAT 2017 1.67e11 1.58 116. 37.3 51.0 2639211
## 8 United States USA 2017 1.94e13 2.27 112. NA NA 325719178
## 9 North America NAC 2017 2.10e13 2.35 NA NA NA 362492702
## 10 Singapore SGP 2017 3.24e11 3.62 113. 149. 173. 5612253
## # ℹ 1 more variable: GDP_per_capita <dbl>
global_economy %>%
filter(Country %in% top_countries$Country) %>%
autoplot(GDP_per_capita, show.legend = TRUE) +
labs(title = "GDP per Capita Per Country Over Time",
subtitle = "Only showing the top 10 countries as of 2017")
## Warning: Removed 32 rows containing missing values or values outside the scale range
## (`geom_line()`).
Luxembourg has the highest GDP per capita as of the most recent date in the data (2017).
top_per_year <- global_economy %>%
index_by(Year) %>%
slice_max(order_by = GDP_per_capita, n = 1)
top_per_year %>%
ggplot(aes(x = Year, y=GDP_per_capita, fill=Country)) +
geom_bar(stat="identity") +
labs(title="Top County Each Year by GDP Per Capita")
In the past, Monaco has regularly had the highest GDP per Capita. Liechtenstein broke through as the highest in 2016 and 2014.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
help(aus_production)
## starting httpd help server ... done
help(global_economy)
help(aus_livestock)
help(vic_elec)
United States GDP
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP) +
labs(title = "United States GDP Over Time", x = "Year", y="GDP")
A transformation does not seem appropriate for this graph.
Slaughter of Victorian “Bulls, bullocks and steers”.
index(aus_livestock)
## Month
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
autoplot(Count) +
labs(title = "Slaughter of Victorian “Bulls, bullocks and steers", x = "Year-Month", y="# Slaughterd")
Victorian Electricity Demand
head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
index(vic_elec)
## Time
vic_elec %>%
autoplot(Demand) +
labs(title = "Victorian Electricity Demand", x = "Time (30min)", y="Electricity Demand")
This view of the Victorian electricity demand is very dense and noisy. I thought to sum it up to the daily view, but I doubt I’ll get much more out of that, so I’ll sum it up to the weekly view instead.
vic_elec %>%
mutate(Year_Week = yearweek(Date)) %>%
index_by(Year_Week) %>%
summarize(Weekly_Demand = sum(Demand)) %>%
autoplot(Weekly_Demand) +
labs(title = "Weekly Victorian Electricity Demand", x = "Time (Weekly)", y="Electricity Demand")
Now we can see that electricity demands hover in between 1250000 and 2000000 units each week, and that early 2014 saw a big spike in particular.
Gas production
aus_production %>%
autoplot(Gas) +
labs(title = "Australian Gas Production by Quarter", x = "Quarter", y="Gas Production")
The Australian gas production over time graph is increasing variance
as time goes on. This makes it a good fit to apply
box_cox().
# Referenced sec. 3.1 to get the lambda
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) +
labs(title = paste0("Transformed Australian Gas Production by Quarter. Lambda = ", round(lambda,2))
, x = "Quarter", y="Gas Production")
With the box-cox transformation, the variance now looks more consistent
over time (rather than increasing over time). We can see that around the
1970s, the gas production so a significant uptick.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
head(canadian_gas)
## # A tsibble: 6 x 2 [1M]
## Month Volume
## <mth> <dbl>
## 1 1960 Jan 1.43
## 2 1960 Feb 1.31
## 3 1960 Mar 1.40
## 4 1960 Apr 1.17
## 5 1960 May 1.12
## 6 1960 Jun 1.01
canadian_gas %>%
autoplot(Volume) +
labs(title = "Canadian Gas Production by Year-Month", x = "Year-Month", y="Gas Production")
canadian_gas %>%
gg_subseries(Volume) +
labs(title = "Canadian Gas Production Segmented by Month")
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
For the Canadian gas production, the variance does not have an easy-to-follow pattern of increasing or decreasing. This makes it hard for box-cox to do anything.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
Quick recap of what this was
Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
set.seed(12345678) myseries <- aus_retail |> filter(
Series ID== sample(aus_retail$Series ID,1)) Explore your chosen retail time series using the following functions:autoplot(), gg_season(), gg_subseries(), gg_lag(),
ACF() |> autoplot()
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
head(aus_retail)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Australian Capital Territory Cafes, restaurants… A3349849A 1982 Apr 4.4
## 2 Australian Capital Territory Cafes, restaurants… A3349849A 1982 May 3.4
## 3 Australian Capital Territory Cafes, restaurants… A3349849A 1982 Jun 3.6
## 4 Australian Capital Territory Cafes, restaurants… A3349849A 1982 Jul 4
## 5 Australian Capital Territory Cafes, restaurants… A3349849A 1982 Aug 3.6
## 6 Australian Capital Territory Cafes, restaurants… A3349849A 1982 Sep 4.2
aus_retail %>%
filter(`Series ID` == "A3349561R") %>%
autoplot(Turnover, show.legend = FALSE) +
labs(title = "AUS Retail Series A3349561R: Turnover Over Time")
aus_retail_lambda <- aus_retail %>%
filter(`Series ID` == "A3349561R") %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
aus_retail %>%
filter(`Series ID` == "A3349561R") %>%
autoplot(box_cox(Turnover, aus_retail_lambda)) +
labs(title = paste0("Transformed AUS Retail Series A3349561R: Turnover Over Time. Lambda = ", round(aus_retail_lambda,2)),
y = "Transformed Turnover")
Pulling a random series of the aus_retail data, I would use a lambda of 0.13 for the Box-Cox transformation.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
Tobacco
aus_production %>%
autoplot(Tobacco, show.legend = TRUE) +
labs(title = "AUS Tobacco Production over Time",
y = "Tobacco Production")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
tobacco_lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, tobacco_lambda), show.legend = TRUE) +
labs(title = paste0("Transformed AUS Tobacco Production over Time. Lambda = ", round(tobacco_lambda, 2)),
y = "Transformed Tobacco Production")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
Lambda is very close to 1, so next to no transformation is happening here to present the data better.
Economy class passengers between Melbourne and Sydney
ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
autoplot(Passengers, show.legend = TRUE) +
labs(title = "Economy class passengers between MEL-SYD over Time",
y = "# Passengers")
passenger_lambda <- ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
autoplot(box_cox(Passengers, passenger_lambda), show.legend = TRUE) +
labs(title =
paste0("Transformed Economy class passengers MEL-SYD over Time. Lambda = ",
round(passenger_lambda, 2)),
y = "Transformed # Passengers")
This lambda value is 2, indicating that a squared transformation is
being used to show represent the data better.
Pedestrian counts at Southern Cross Station
pedestrian %>%
filter(Sensor == "Southern Cross Station" ) %>%
autoplot(Count, show.legend = TRUE) +
labs(title = "Pedestrian counts at Southern Cross Station over Time",
y = "# Pedestrians")
pedestrian_lambda <- pedestrian %>%
filter(Sensor == "Southern Cross Station" ) %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(box_cox(Count, pedestrian_lambda), show.legend = TRUE) +
labs(title =
paste0("Transformed Pedestrian counts at Southern Cross Station over Time. Lambda = ",
round(pedestrian_lambda, 2)),
y = "Transformed # Pedestrians")
The pedestrian count data is simply too noisy at the hourly level. Re-eggregating at the weekly level.
pedestrian %>%
filter(Sensor == "Southern Cross Station" ) %>%
mutate(Year_Week = yearweek(Date)) %>%
index_by(Year_Week) %>%
summarize(Count = sum(Count)) %>%
autoplot(Count, show.legend = TRUE) +
labs(title = "Pedestrian counts at Southern Cross Station by Week",
y = "# Pedestrians")
pedestrian_lambda <- pedestrian %>%
filter(Sensor == "Southern Cross Station" ) %>%
mutate(Year_Week = yearweek(Date)) %>%
index_by(Year_Week) %>%
summarize(Count = sum(Count)) %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedestrian %>%
filter(Sensor == "Southern Cross Station" ) %>%
mutate(Year_Week = yearweek(Date)) %>%
index_by(Year_Week) %>%
summarize(Count = sum(Count)) %>%
autoplot(box_cox(Count, pedestrian_lambda), show.legend = TRUE) +
labs(title =
paste0("Transformed Pedestrian counts at Southern Cross Station by Week. Lambda = ",
round(pedestrian_lambda, 2)),
y = "Transformed # Pedestrians")
Lambda is aroun -0.1, so the transformation is a bit of an inverse relationship.
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas %>%
autoplot(Gas)
yes, it appears that there are seasonal fluctuations with a high point mid year. It also looks like it is trending up.
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a?
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of AUS gas production")
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_line()`).
Yes, using classical multiplicative decomposition, we see a positive trend up, and seasonal peaks mid year.
Compute and plot the seasonally adjusted data.
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Persons (thousands)",
title = "Total employment in US retail") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas[10,]$Gas <- gas[10,]$Gas + 300
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Persons (thousands)",
title = "Total employment in US retail") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
The outlier has altered the seasonal adjustment and the trend of the data pretty substantially. Both are very different in window of time before and after the outlier
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas[10,]$Gas <- gas[10,]$Gas - 300 # undo previous outlier
gas[2,]$Gas <- gas[2,]$Gas + 300 # place new outlier near beginning
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Persons (thousands)",
title = "Total employment in US retail") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
gas[2,]$Gas <- gas[2,]$Gas - 300 # undo outlier
gas[19,]$Gas <- gas[19,]$Gas + 300 # new outlier near end
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Persons (thousands)",
title = "Total employment in US retail") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
When the outlier is near the beginning or end, it seems to have a less drastic impact on the seasonally adjusted line or the trend line as a whole. However, with the outlier at the end, one might come to misleading conclusions when inferring what the data will look like in the future.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
aus_retail %>%
filter(`Series ID` == "A3349561R") %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components() %>%
autoplot() +
labs(title = "Decomposition of Series A3349561R AUS Retail Turnover using X-11.")
Yes, with the use of X-11, I can see some irregular spiked in a couple
of places. An example of one is around the start of 2005, which I
otherwise would not have seen because it was masked by the seasonal
spike around that time.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
We see a relatively steady, positive trend over the time frame (though it looks to be losing a little bit of traction after 1990, around the 1991/1992 recession point). We can also see outliers in the remainder section around this point in time (addressed in part b. of this question). When we look at the seasonal breakdown, we can also see that the March spike has slowly lost amplitude over the years, whereas the September and December ones have gained amplitude.
b. Is the recession of 1991/1992 visible in the estimated components?
Yes, it is very visible. The remainder section fo the decomposition shows a very clear drop between 1991 and 1992.A huge decrease in the number of persons in the civilian labour force would make sense at the time of a recession.