library(fpp3)
DATA 624: Assignment 02
Setup
Assignment
Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code.
3.1
Consider the GDP information in global_economy
. Plot the GDP
per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
<- global_economy |>
data filter(is.na(GDP) == FALSE & is.na(Population) == FALSE) |>
mutate(
GDP_PC = GDP/Population
)
|>
data autoplot(GDP_PC, show.legend = FALSE) +
facet_wrap(~Country, ncol = 8)
It looks as if the countries with the highest and most recent GDP per capita are Liechtenstein and Monaco. Let’s run a little query to see if this is true.
as_tibble(data) |>
group_by(Year) |>
summarize(
Max_GDP_PC = max(GDP_PC),
Country = Country[which.max(GDP_PC)]
|>
) arrange(desc(Year)) |>
head(n = 20)
# A tibble: 20 × 3
Year Max_GDP_PC Country
<dbl> <dbl> <fct>
1 2017 104103. Luxembourg
2 2016 168011. Monaco
3 2015 167591. Liechtenstein
4 2014 185153. Monaco
5 2013 173528. Liechtenstein
6 2012 152000. Monaco
7 2011 162155. Monaco
8 2010 144569. Monaco
9 2009 149221. Monaco
10 2008 180640. Monaco
11 2007 167125. Monaco
12 2006 133195. Monaco
13 2005 124374. Monaco
14 2004 123382. Monaco
15 2003 108978. Monaco
16 2002 89061. Monaco
17 2001 82553. Monaco
18 2000 82535. Monaco
19 1999 91384. Monaco
20 1998 93093. Monaco
Well, low lying Luxembourg had the highest GDP per capita in 2017. It’s a blip on the radar that wasn’t apparent in our plot.
Let’s see who has the highest average GDP per capita.
as_tibble(data) |>
group_by(Country) |>
summarize(
Mean_GDP_PC = mean(GDP_PC)
|>
) arrange(desc(Mean_GDP_PC)) |>
head(n = 5)
# A tibble: 5 × 2
Country Mean_GDP_PC
<fct> <dbl>
1 Monaco 86474.
2 Liechtenstein 68092.
3 San Marino 59263.
4 Cayman Islands 55868.
5 Channel Islands 50912.
3.2
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
- United States GDP from
global_economy
.
|>
global_economy filter(Country == "United States" & is.na(GDP) == FALSE) |>
autoplot(GDP) +
labs(title = "United States GDP")
The GDP is certainly exponential. We could straighten it out with a log transformation. I’m not totally sure why we would want to do this, but I have no doubt understanding will follow.
|>
global_economy filter(Country == "United States" & is.na(GDP) == FALSE) |>
autoplot(log(GDP)) +
labs(
title = "United States GDP",
y = "Log(GDP)"
)
- Slaughter of Victorian “Bulls, bullocks and steers” in
aus_livestock
.
|>
aus_livestock filter(Animal == "Bulls, bullocks and steers" & is.na(Count) == FALSE) |>
autoplot(Count, show.legend = FALSE) +
labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers") +
facet_wrap(~State, ncol = 2, scales = "free_y")
If we were isolating “Northern Territory” then I would try an inverted power plot. But I have doubts that it would be fruitful, because the data is not strictly increasing or decreasing.
- Victorian Electricity Demand from
vic_elec
.
|>
vic_elec filter(is.na(Demand) == FALSE) |>
autoplot(Demand) +
labs(title = "Victorian Electricity Demand")
We can clearly an annual seasonal pattern, but being that the data is sampled every 30 minutes over the course of three years, it’s extremely dense. Their summers appear to be very noisy. Let’s zoom in on the summer of 2013.
|>
vic_elec filter(is.na(Demand) == FALSE
& Date >= date("2012-11-01")
& Date < date("2013-03-01")
|>
) autoplot(Demand) +
labs(
title = "Victorian Electricity Demand (Summer 2012-2013)"
)
There is a lot that is interesting in this plot, but I don’t think that there is a transformation that would help. But being zoomed in, we can clearly see a daily seasonal interval.
- Gas production from
aus_production
.
|>
aus_production filter(is.na(Gas) == FALSE) |>
autoplot(Gas) +
labs(title = "Gas Production")
The gas production’s seasonal component’s variation is clearly increasing with time. Let’s try applying a log to it.
|>
aus_production filter(is.na(Gas) == FALSE) |>
autoplot(log(Gas)) +
labs(
title = "Gas Production",
y = "Log(Gas)"
)
Ah, that has worked nicely. The seasonal variation is a lot more constant now.
3.3
Why is a Box-Cox transformation unhelpful for the canadian_gas
data?
|>
canadian_gas filter(is.na(Volume) == FALSE) |>
autoplot(Volume) +
labs(title = "Canadian Gas")
I believe that it’s not helpful because the data is not strictly increasing or decreasing. It is fairly constant for the first third of the series. The second third increases and then decreases and the final third if fairly constant again. The box-cox wants to either apply a log or power transformation to the whole series. It will not help this series.
Let’s see what querrero
suggests as a lambda value and plot it.
<- canadian_gas |>
lambda features(Volume, features = guerrero) |>
pull(lambda_guerrero)
|>
canadian_gas filter(is.na(Volume) == FALSE) |>
mutate(Volume = box_cox(Volume, lambda)) |>
autoplot(Volume) +
labs(
title = "Canadian Gas with Box-Cox Transformation",
subtitle = paste("Lambda =", round(lambda, 3))
)
It compressed the data, but did not help with the variance. The variance is still increasing and decreasing as in the non-transformed state.
3.4
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
2.10.7: Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
set.seed(314159)
<- aus_retail |>
data filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
|>
data filter(is.na(Turnover) == FALSE) |>
autoplot(Turnover) +
labs(title = "Retail Data")
The series is very clearly exhibits a seasonal pattern, with a variance that increases with the level of the series. I would apply a log transformation. Let’s try calculating a \(\lambda\) value using the Guerrero method.
<- data |>
lambda filter(is.na(Turnover) == FALSE) |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
|>
data filter(is.na(Turnover) == FALSE) |>
mutate(Turnover = box_cox(Turnover, lambda)) |>
autoplot(Turnover) +
labs(
title = "Retail Data with Box-Cox Transformation",
subtitle = paste("Lambda =", round(lambda, 3))
)
That has stabilised the variance very nicely. We now have a fairly constant variance across the series.
3.5
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance:
- Tobacco from
aus_production
,
|>
aus_production filter(is.na(Tobacco) == FALSE) |>
autoplot(Tobacco) +
labs(title = "Tobacco Production")
<- aus_production |>
lambda filter(is.na(Tobacco) == FALSE) |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
|>
aus_production filter(is.na(Tobacco) == FALSE) |>
mutate(Tobacco = box_cox(Tobacco, lambda)) |>
autoplot(Tobacco) +
labs(
title = "Tobacco Production with Box-Cox Transformation",
subtitle = paste("Lambda =", round(lambda, 3))
)
That changed the scale but did not change the variance. I cannot think of a transformation that would help with this series. I think seasonal decomposition with a noise component would help this series.
- Economy class passengers between Melbourne and Sydney from
ansett
,
<- ansett |>
data filter(
== "Economy"
Class & Airports == "MEL-SYD"
& is.na(Passengers) == FALSE
)|>
data autoplot(Passengers) +
labs(title = "Economy Class Passengers between Melbourne and Sydney")
<- data |>
lambda features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
|>
data mutate(Passengers = box_cox(Passengers, lambda)) |>
autoplot(Passengers) +
labs(
title = "Economy Class Passengers between Melbourne and Sydney with Box-Cox Transformation",
subtitle = paste("Lambda =", round(lambda, 3))
)
- Pedestrian counts at Southern Cross Station from
pedestrian
.
<- pedestrian |>
data filter(
== "Southern Cross Station"
Sensor & is.na(Count) == FALSE
)|>
data autoplot(Count) +
labs(title = "Pedestrian Counts at Southern Cross Station")
<- data |>
lambda features(Count, features = guerrero) |>
pull(lambda_guerrero)
|>
data mutate(Count = box_cox(Count, lambda)) |>
autoplot(Count) +
labs(
title = "Pedestrian Counts at Southern Cross Station with Box-Cox Transformation",
subtitle = paste("Lambda =", round(lambda, 3))
)
3.7
Consider the last five years of the Gas data from aus_production
.
<- tail(aus_production, 5*4) |> select(Gas) gas
- Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
|>
gas autoplot(Gas) +
labs(
title = "Gas Production"
)
Yes, I can definitely see a seasonal pattern. There very clearly is a yearly seasonal pattern. The trend-cycle in increasing with a slight logarithmic downturn.
- Use
classical_decomposition
with type=multiplicative to calculate the trend-cycle and seasonal indices.
<- gas |>
data filter(is.na(Gas) == FALSE) |>
model(
classical_decomposition(Gas, type = "multiplicative")
|>
) components()
|>
data autoplot() +
labs(
title = "Gas Production"
)
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_line()`).
- Do the results support the graphical interpretation from part a?
Partly. It also has identified a strong yearly seasonal pattern. And it also has identified the trend-cycle as increasing, but it describes the trend-cycle as being step-wise linear where I described it as increasing with a logarithmic downturn.
- Compute and plot the seasonally adjusted data.
|>
data mutate(
Gas = Gas / seasonal
|>
) select(Gas) |>
autoplot(Gas) +
labs(
title = "Seasonally Adjusted Gas Production"
)
- Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
<- gas
gas_outlier $Gas[3] <- gas_outlier$Gas[3] + 300
gas_outlier|>
gas_outlier model(
classical_decomposition(Gas, type = "multiplicative")
|>
) components() |>
mutate(
Gas = Gas / seasonal
|>
) select(Gas) |>
autoplot(Gas) +
labs(
title = "Seasonally Adjusted Gas Production with Outlier"
)
The outlier has a very large effect on the seasonally adjusted data. First, the outlier remains very visible in the seasonally adjusted data. I think the more interesting side effect is that the outlier has also changed the seasonal component. I believe the seasonal component has been attenuated, which is causing the seasonal component to be much more pronounced in the seasonally adjusted data.
- Does it make any difference if the outlier is near the end rather than in the middle of the time series?
<- gas
gas_outlier_b <- gas
gas_outlier_e $Gas[1] <- gas$Gas[1] + 300
gas_outlier_b$Gas[20] <- gas$Gas[20] + 300
gas_outlier_e
<- function(data, title) {
plot |>
data model(
classical_decomposition(Gas, type = "multiplicative")
|>
) components() |>
mutate(
Gas = Gas / seasonal
|>
) select(Gas) |>
autoplot(Gas) +
labs(
title = title
)
}
plot(gas_outlier_b, "Seasonally Adjusted Gas Production with Outlier at Beginning")
plot(gas_outlier_e, "Seasonally Adjusted Gas Production with Outlier at End")
Yes, putting the outlier at the extremes has a different effect than placing the outlier within the series. When the effect of putting an outlier at the beginning is compared to the effect of putting an outlier at the end, the effects are more similar than when compared to the outlier in the middle. Nonetheless, they do differ. The outlier at the beginning of the series has a larger effect on the seasonal component. The outlier at the end of the series has a larger effect on the trend-cycle component.
3.8
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
2.10.7: Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
set.seed(314159)
<- aus_retail |>
data filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
|>
data model(
x11 = X_13ARIMA_SEATS(Turnover ~ x11())
|>
) components() |>
autoplot() +
labs(
title = "Retail Data with X-11 Decomposition"
)
I don’t think that it revealed anything that I had not noticed previously. But it does exaggerate outliers in the trend-cycle component that were not nearly as clear as they are when isolated. It also does a great job of amplifying the seasonal component from the early part of the series in which the seasonal component was not very clear.
3.9
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
- Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
The trend-cycle component is increasing and is very linear in its increase. The seasonal component is a lot more clear than in the original time series, though it’s amplitude is very small when compared to the trend-cycle’s range. The seasonal component has a clear annual pattern, but I think it also has a biannual pattern that splits a year into two parts that approximate a 60%/40% division (in time). The remainder component exhibits the same variation as the seasonal component does, but this might be due to a radical outlier around 1992. Which I see is in the next question.
- Is the recession of 1991/1992 visible in the estimated components?
Yes, the recession of 1991/1992 is very visible in the remainder component. It appears as two enormous, downward facing spikes around 1991/1992. I find it interesting that the trend-cycle component shows a very slight hump that peaks just before the recession strikes. But, for the most part, the trend cycle component is not very affected by the recession. The seasonal component does not appear to be affected by the recession.