library(fpp3)
Exercises: 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
global_economy
## # A tsibble: 15,150 x 9 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
## 7 Afghanistan AFG 1966 1399999967. NA NA 18.6 8.57 10152331
## 8 Afghanistan AFG 1967 1673333418. NA NA 14.2 6.77 10372630
## 9 Afghanistan AFG 1968 1373333367. NA NA 15.2 8.90 10604346
## 10 Afghanistan AFG 1969 1408888922. NA NA 15.0 10.1 10854428
## # i 15,140 more rows
global_economy %>%
mutate(GDP_Per_Capita=GDP/Population) -> nge
nge %>%
autoplot(GDP_Per_Capita, show.legend=F)
## Warning: Removed 3242 rows containing missing values (`geom_line()`).
Assuming the question is asking about the line that has had the highest
y-axis position for the longest time, we can find out which country it
corresponds to by plotting with the condition
GDP_Per_Capita > $150,000.
nge %>%
filter(GDP_Per_Capita>150000) %>%
autoplot(GDP_Per_Capita)
Monaco has the highest GDP per capita.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.
United States GDP from global_economy.
global_economy %>%
filter(Country=="United States") %>%
autoplot(GDP) +
labs(title="United States GDP, 1960-2017", y="$USD")
This one doesn’t need a transformation.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
aus_livestock %>%
filter(Animal=="Bulls, bullocks and steers", State=="Victoria") -> bulls
bulls %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Count`
lambda <- bulls %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
bulls %>%
autoplot(box_cox(Count, 5))
Using a value of lambda=5 for a Box-Cox transformation, the plot appears
more homogenous.
Victorian Electricity Demand from vic_elec.
vic_elec %>%
autoplot(Demand)
lambda <- vic_elec %>%
features(Demand, features = guerrero) %>%
pull(lambda_guerrero)
vic_elec %>%
autoplot(-1/Demand)
Using an inverse transformation has slightly lessened the variation.
Gas production from aus_production.
aus_production %>%
select(Quarter, Gas) %>%
autoplot(Gas)
The line has different variation over time.
aus_production %>%
select(Quarter, Gas) %>%
autoplot(log(Gas))
Now the graph is more homogenous in variation.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
canadian_gas %>% autoplot()
## Plot variable not specified, automatically selected `.vars = Volume`
lambda <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canadian_gas |>
autoplot(box_cox(Volume, lambda))
The variation in the line doesn’t differ much over time.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(50)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries
## # A tsibble: 441 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Queensland Other recreational goods retailing A3349480L 1982 Apr 11.1
## 2 Queensland Other recreational goods retailing A3349480L 1982 May 11.7
## 3 Queensland Other recreational goods retailing A3349480L 1982 Jun 11.5
## 4 Queensland Other recreational goods retailing A3349480L 1982 Jul 13.1
## 5 Queensland Other recreational goods retailing A3349480L 1982 Aug 13
## 6 Queensland Other recreational goods retailing A3349480L 1982 Sep 13
## 7 Queensland Other recreational goods retailing A3349480L 1982 Oct 12
## 8 Queensland Other recreational goods retailing A3349480L 1982 Nov 13.2
## 9 Queensland Other recreational goods retailing A3349480L 1982 Dec 16.2
## 10 Queensland Other recreational goods retailing A3349480L 1983 Jan 12
## # i 431 more rows
myseries %>%
autoplot(Turnover)
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries %>%
autoplot(box_cox(Turnover, lambda))
A lambda value of -0.20 removes some variaion. There is still a
difference between the first part of the plot and the middle/end.
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
Tobacco
aus_production %>%
autoplot(Tobacco)
## Warning: Removed 24 rows containing missing values (`geom_line()`).
lambda <- aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
aus_production %>% autoplot(box_cox(Tobacco, lambda))
## Warning: Removed 24 rows containing missing values (`geom_line()`).
Economy class passengers between Melbourne and Sydney from ansett
ansett %>%
filter(Class=="Economy", Airports=="MEL-SYD") -> economy
economy %>% autoplot(Passengers)
lambda <- economy |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
economy %>% autoplot(box_cox(Passengers, 6) )
Pedestrian counts at Southern Cross Station from pedestrian
ped <- pedestrian %>%
filter(Sensor=="Southern Cross Station")
ped %>% autoplot(Count)
lambda <- ped %>% features(Count, features=guerrero) %>% pull(lambda_guerrero)
ped %>% autoplot(box_cox(Count, lambda))
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas %>% autoplot()
## Plot variable not specified, automatically selected `.vars = Gas`
The trend is increasing. There is a trough every year in the first
quarter. The peaks appear to be around the middle of each year, around
the end of the second quarter.
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() %>%
select(trend)
## # A tsibble: 20 x 2 [1Q]
## trend Quarter
## <dbl> <qtr>
## 1 NA 2005 Q3
## 2 NA 2005 Q4
## 3 200. 2006 Q1
## 4 204. 2006 Q2
## 5 207 2006 Q3
## 6 210. 2006 Q4
## 7 213 2007 Q1
## 8 216. 2007 Q2
## 9 219. 2007 Q3
## 10 219. 2007 Q4
## 11 219. 2008 Q1
## 12 219 2008 Q2
## 13 219 2008 Q3
## 14 220. 2008 Q4
## 15 222. 2009 Q1
## 16 223. 2009 Q2
## 17 225. 2009 Q3
## 18 226 2009 Q4
## 19 NA 2010 Q1
## 20 NA 2010 Q2
Do the results support the graphical interpretation from part a? Yes, the indices are increasing which aligns with what we saw in the plot from part a.
Compute and plot the seasonally adjusted data.
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() %>%
select(seasonal) %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = seasonal`
gas[3,1] <- gas[3,1] + 500
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() %>%
select(seasonal) %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = seasonal`
The outlier changed the shape of the seasonally adjusted data. Now peaks
are at the first quarter of each year, not troughs. The troughs are in
two places: between the first and second quarters, and during the third
quarter.
Part e had an outlier towards the beginning, so we can try to put it at the end.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas[19,1] <- gas[19,1] + 500
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() %>%
select(seasonal) %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = seasonal`
It looks more similar in shape to the original time series. The peaks
and troughs have shifted about a quarter.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
library(seasonal)
Original
myseries %>% autoplot(Turnover)
Decomposed
set.seed(50)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11_dcmp <- myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of Retail Turnover")
It’s easier to see the trend, how it increases for years before becoming unstable later. I also notice the change in seasonality between 2000 and 2010.
The trend has the same scale as the original time series. The seasonal component graph shows that the distance between the peaks and troughs increases over time. The irregular graph reveals that the decrease seen in the original series is a severe outlier. It is shown as a precipitous drop.