library(tidyverse)
library(fpp3)
library(seasonal)
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
Monaco has the highest GDP per capita. Monaco and Liechtenstein has the highest GDP per capita from 1985 to 2017. Prior to 1985 the GDP per capita differences was not as great between the countries.
global_economy %>%
autoplot(GDP/Population,show.legend = FALSE) +
labs(title = "GSP per capita",y = "$US")
global_economy %>%
mutate(gdp_percapita = GDP/Population) %>%
arrange(desc(gdp_percapita))
## # A tsibble: 15,150 x 10 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7060236168. 7.18 NA NA NA 38132
## 2 Monaco MCO 2008 6476490406. 0.732 NA NA NA 35853
## 3 Liechtenstein LIE 2014 6657170923. NA NA NA NA 37127
## 4 Liechtenstein LIE 2013 6391735894. NA NA NA NA 36834
## 5 Monaco MCO 2013 6553372278. 9.57 NA NA NA 37971
## 6 Monaco MCO 2016 6468252212. 3.21 NA NA NA 38499
## 7 Liechtenstein LIE 2015 6268391521. NA NA NA NA 37403
## 8 Monaco MCO 2007 5867916781. 14.4 NA NA NA 35111
## 9 Liechtenstein LIE 2016 6214633651. NA NA NA NA 37666
## 10 Monaco MCO 2015 6258178995. 4.94 NA NA NA 38307
## # ℹ 15,140 more rows
## # ℹ 1 more variable: gdp_percapita <dbl>
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
-United States GDP from global_economy.
A transformation is not needed since the data is almost like a straight line and transformation won’t change the data.
global_economy %>%
filter(Country == "United States") %>%
autoplot() +
labs(title = "United States GDP")
## Plot variable not specified, automatically selected `.vars = GDP`
-Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
No transformation as no approriate adjustment make sense. Using lamba the number would be negative.
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
autoplot(Count) +
labs(title = "Bulls, Bullocks and steers from Victoria")
-Victorian Electricity Demand from vic_elec.
Transforming the data from daily to to show it monthly will allow you to see the dip and increases in the demand of electricity.
-Gas production from aus_production.
Transforming the gas production using the Box-Cox transformation the variation on the data is more even.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
The Box-Cox transformation was unhelpful in the Canadian_gas data as it does not tell use additional information with the transformation as the variation is almost the same. The transformation should have made the variation more consistent across the data.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
The Box-Cox transformation using lambda as 0.08 would be best as the lines are distributed evenly.
## # A tibble: 1 × 3
## State Industry lambda_guerrero
## <chr> <chr> <dbl>
## 1 Northern Territory Clothing, footwear and personal accessory … 0.0830
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
## # A tibble: 1 × 1
## lambda_guerrero
## <dbl>
## 1 0.926
With the Box-Cox Transformation it did not change much in the chart. The appropriate lamba to stablize the variance is .926.
The transformation in the Melbourne to Sydney Passengers were a bit more clear with the transformation and using lamba 2.
## # A tibble: 1 × 1
## lambda_guerrero
## <dbl>
## 1 -0.111
The Southern Cross Station Pedestrians transformation was changed to weekly and then used the box cox transformation. The lamba was negative which is a concern therefore I used a positive number 2 instead.
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
This is both a seasonal fluctuation and trend cycle which is slowing moving upwards.
gas %>%
autoplot(Gas) +
labs(title = "Gas Production")
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components () %>%
autoplot() + xlab("Quarter") +
ggtitle("Classical multiplicative decomposition of Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Yes, the results support the graphical interpretation from part a showing there is a seasonal fluctuation and upward trend in the data.
gas2 <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative"))
components(gas2) %>%
as_tsibble %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y = season_adjust), colour = "#D55E00") +
ggtitle("Classical multiplicative decomposition of Gas Production")
The outlier is showing a spike the data and the result sare different from the previous one which is more smooth across the data.
gas3 <- gas %>%
mutate(Gas = ifelse(Gas == 192, Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative"))
components(gas3) %>%
as_tsibble %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y = season_adjust), colour = "#D55E00") +
ggtitle("Classical multiplicative decomposition of Gas Production")
There is a significant difference with the outlier as the seasonally adjusted data because the outlier caused hte seasonally data to have a spike while it was more consistent across the board.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title = "Decomposition of total Retail data using X-11.")
Using the x-11 shows there irregularity between 1990 to 2000 and from 2010 to 2020 that was not originally shown in the Turnover data.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
The results of the STL decomposition shows a large irregularity happening around 1992 data. In Jan and August tends to have the lowest average labour force and Dec was the highest average labour force. July has a bit of increase in the labour in more recent years while March has been declining over the years. There is an upward trend in the data.
The recession is visible in the estimated components as there is a big decrease in the remainder chart during those periods.