Assignment 2 involves answering questions 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8, 3.9 from the textbook Forecasting: principles and practice by Rob J Hyndman and George Athanasopoulos.
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
global_economy_ts <- global_economy
global_economy_ts %>%
fill_gaps(GDP = NA, Population = NA)
## # A tsibble: 15,150 x 9 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
## 7 Afghanistan AFG 1966 1399999967. NA NA 18.6 8.57 10152331
## 8 Afghanistan AFG 1967 1673333418. NA NA 14.2 6.77 10372630
## 9 Afghanistan AFG 1968 1373333367. NA NA 15.2 8.90 10604346
## 10 Afghanistan AFG 1969 1408888922. NA NA 15.0 10.1 10854428
## # ℹ 15,140 more rows
global_economy_ts %>%
mutate(GDP_per_capita = GDP/Population) %>%
autoplot(GDP_per_capita)
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
Monaco has the highest GDP per capita.
global_economy_ts %>%
mutate(GDP_per_capita = GDP / Population) %>%
select(Country, Year, GDP, GDP_per_capita) %>%
arrange(desc(GDP_per_capita))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## # A tsibble: 15,150 x 4 [1Y]
## # Key: Country [263]
## Country Year GDP GDP_per_capita
## <fct> <dbl> <dbl> <dbl>
## 1 Monaco 2014 7060236168. 185153.
## 2 Monaco 2008 6476490406. 180640.
## 3 Liechtenstein 2014 6657170923. 179308.
## 4 Liechtenstein 2013 6391735894. 173528.
## 5 Monaco 2013 6553372278. 172589.
## 6 Monaco 2016 6468252212. 168011.
## 7 Liechtenstein 2015 6268391521. 167591.
## 8 Monaco 2007 5867916781. 167125.
## 9 Liechtenstein 2016 6214633651. 164993.
## 10 Monaco 2015 6258178995. 163369.
## # ℹ 15,140 more rows
We can see an overall growth trend. There appears some cyclic declines that happens around 1985, 2000 and 2010.
ge_monaco <- global_economy_ts %>%
filter(Country == "Monaco") %>%
mutate(GDP_per_capita = GDP/Population)
autoplot(ge_monaco, GDP_per_capita)
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect. - United States GDP from global_economy - Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock - Victorian Electricity Demand from vic_elec - Gas production from aus_production
I opted use a log_10 transformation because to reduce the exponential pattern. When using the transformation, we can more clearly see a linear relationship.
ge_USA <- global_economy %>%
filter(Country == "United States")
autoplot(ge_USA, GDP)
ge_USA <- ge_USA %>%
mutate(log10_GDP = log10(GDP))
autoplot(ge_USA, log10_GDP)
I used a Box-Cox transformation to address the fluctuations.
bbs_aus_livestock <- aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria")
autoplot(bbs_aus_livestock,Count)
lambda_bbs <- bbs_aus_livestock %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
bbs_aus_livestock %>%
autoplot(box_cox(Count, lambda_bbs))
Similarly, I tried to use the Box-Cox transformation with this dataset. Unfortunately, it doesn’t reduce the large spikes.
vic_elec %>%
autoplot(Demand)
vic_lambda <- vic_elec %>%
features(Demand, features = guerrero) %>%
pull(lambda_guerrero)
vic_elec %>%
autoplot(box_cox(Demand, vic_lambda))
No transformations are needed.
aus_production %>%
autoplot(Gas)
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
It’s not useful in the case because the Box-Cox transformation is used towards data that is non-linear and exhibits high variance.
In the chart we can observe a linear relationship, some seasonality and an upward trend.
canadian_gas %>%
autoplot(Volume)
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
Looking at the chart, there is evidence of both growing trend and increasing variance. To stablize the variance we can utilize the Box-Cox transformation. The ideal value or parameter for our transformation will be found using Guerrero’s method. Using this, it determines that our lambda will be 0.3.
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
autoplot(Turnover)
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.1375013
# plot with lambda transformation
myseries %>%
autoplot(box_cox(Turnover, lambda))
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.
The appropriate transformation found through the guerrero method is 0.9264636.
aus_production %>%
autoplot(Tobacco)
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda_t <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
lambda_t
## [1] 0.9264636
aus_production %>%
autoplot(box_cox(Tobacco,lambda_t))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
The appropriate transformation found through the guerrero method is 1.999927.
econ_mel_syd <- ansett %>%
filter(Airports == "MEL-SYD",Class == "Economy")
autoplot(econ_mel_syd,Passengers)
lambda_mel <- econ_mel_syd %>%
features(Passengers, guerrero) %>%
pull(lambda_guerrero)
lambda_mel
## [1] 1.999927
econ_mel_syd %>%
autoplot(box_cox(Passengers, lambda_mel))
The appropriate transformation found through the guerrero method is -0.2501616.
scs_ped <- pedestrian %>%
filter(Sensor == "Southern Cross Station")
autoplot(scs_ped, Count)
lambda_scs <- scs_ped %>%
features(Count, guerrero) %>%
pull(lambda_guerrero)
lambda_scs
## [1] -0.2501616
scs_ped %>%
autoplot(box_cox(Count, lambda_scs))
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
The graph shows a clear consistent seasonal pattern. The series is consistent and does not show a clear upward or downward trend.
gas %>%
autoplot(Gas)
### Part B
decomp_gas <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
autoplot(decomp_gas)
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
The classical decomposition supports the earlier interpretation in that it keeps most of the same shape with of the chart in part A. The seasonal components are less pronounced when decomposition is performed however. It’s still stable and keeps the same patterns.
decomp_gas %>%
select(Quarter, season_adjust) %>%
autoplot(season_adjust)
When inflating an observation by 300, it subdues the seasonality of the data. In other words, it distorts the time series patterns that is present.
gas_outlier <- gas
gas_outlier$Gas[1] <- gas_outlier$Gas[1] + 300
decomp_gas_outlier <- gas_outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
decomp_gas_outlier %>%
select(Quarter, season_adjust) %>%
autoplot(season_adjust)
With the outlier at the end, it’s difficult to take note of any patterns. When the outlier is in the middle, it distorts any existing patterns.
gas_outlier_end <- gas
gas_outlier_end$Gas[nrow(gas_outlier_end) - 1] <- gas_outlier_end$Gas[nrow(gas_outlier_end) - 1] + 300
decomp_gas_outlier_end <- gas_outlier_end %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
decomp_gas_outlier_end %>%
select(Quarter, season_adjust) %>%
autoplot(season_adjust)
gas_outlier_middle <- gas
middle_index <- round(nrow(gas_outlier_middle) / 2)
gas_outlier_middle$Gas[middle_index] <- gas_outlier_middle$Gas[middle_index] + 300
decomp_gas_outlier_middle <- gas_outlier_middle %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
decomp_gas_outlier_middle %>%
select(Quarter, season_adjust) %>%
autoplot(season_adjust)
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
I noticed more outliers around the early 2000’s after using the X-11 decomposition.
myseries %>%
autoplot(Turnover)
decomp_myseries <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(decomp_myseries)
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995
When observing for each part of the decomposition, we can see there is a general upward trend in the labor force, showing there is long-term growth. There are consistent fluctuations visible, a sign of seasonality. Looking at the remainder section, we can see that there are irregularities in the data.
The recession of 1991/1992 is visible in the estimated components. There is a noticeable dip around this time seen in the remainder section, which indicates a slow down in growth of the labor force.