library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.1
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## Warning: package 'tsibble' was built under R version 4.4.1
## Warning: package 'feasts' was built under R version 4.4.1
## Warning: package 'fabletools' was built under R version 4.4.1
## Warning: package 'fable' was built under R version 4.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(seasonal)
## Warning: package 'seasonal' was built under R version 4.4.1
##
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
##
## view
library(x13binary)
3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
global_economy|> #Captia=GDP/population
autoplot(GDP/Population, show.legend=FALSE)+ #removed legends as it was a large list
labs(y="US$", title = "GDP per Capita Overtime")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
#get country with max captia
global_economy |>
mutate(GDPCaptia= GDP/Population)|>
filter(GDPCaptia==max(GDPCaptia, na.rm=TRUE))|>
select(Country,GDPCaptia)
## # A tsibble: 1 x 3 [1Y]
## # Key: Country [1]
## Country GDPCaptia Year
## <fct> <dbl> <dbl>
## 1 Monaco 185153. 2014
#plot country with max GPD per captia
global_economy |>
filter(Country == "Monaco") |>
autoplot(GDP/Population) +
labs(title= "GDP per capita", y = "$US")
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).
Monaco was the country with the highest GDP per capita, their max GDP per captia was in 2014. Overtime the trend for Monaco was upwards mostly above the other countries’ GPD captia
3.2 For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.
“If the data shows variation that increases or decreases with the level of the series, then a transformation can be useful”
global_economy|>
filter(Country=="United States")|>
autoplot(GDP)+
labs(y="US$", title = "GDP per Capita Overtime for US")
No transformation needed since there isn’t much variance, for the times
series on United States GDP from global_economy, over all the trend is
upward.
head(aus_livestock)
## # A tsibble: 6 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory 2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory 2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory 2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory 1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory 2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory 1800
aus_livestock|> #Captia=GDP/population
filter(State=="Victoria", Animal=="Bulls, bullocks and steers")|>
autoplot(Count)+
labs(y="Count of animals slaughtered", title = "Count of slaughtered Bulls, bullocks and steers in Victoria")
No transformation was needed, overall the trend is downward.
head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
vic_elec|>
autoplot(Demand)+
labs(y= "MWh", title ="Victory Electricity Demands" )
#transformation
vic_elec|>
index_by(Date) |>
summarise(Demand = sum(Demand))|>
autoplot(Demand) +
labs(title = 'Daily Victory Electricity Demands', y = 'MWh')
A lot variance and noise in the time series plot for Victorian
Electricity Demand from vic_elec, I transformed the time series to daily
and it seemed to have les noise than the original time series. There
seems to be no trend but there is a seasonality pattern.
head(aus_production)
## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
aus_production %>% autoplot(Gas) +
labs(title = 'Gas Production', y = 'petajoules')
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero) #lambda =.1095171
aus_production |>
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = (paste0(
"Transformed gas production with lambda$ = ",
round(lambda,2))))
The time series of Gas production from aus_production was transformed
using Box-cox using a lambda of 0.11, the shape of the trend was
different more exponential for the the tranformed data while for the
regular time series the trend was more of liner upwards pattern.
3.3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?
head(canadian_gas)
## # A tsibble: 6 x 2 [1M]
## Month Volume
## <mth> <dbl>
## 1 1960 Jan 1.43
## 2 1960 Feb 1.31
## 3 1960 Mar 1.40
## 4 1960 Apr 1.17
## 5 1960 May 1.12
## 6 1960 Jun 1.01
canadian_gas|>
autoplot(Volume) +
labs(title = "Canadian Gas Data", y ="Volumne in Billions of Cubic Metres")
lambda2 <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero) #lambda =.
canadian_gas |>
autoplot(box_cox(Volume, lambda2)) +
labs(y = "",
title = (paste0(
"Transformed gas production with lambda$ = ",
round(lambda2,2))))
The Box-Cox transformation was unhelpful because the transformation
didn’t make the seasonal pattern is the same for both plot.
3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries|>
autoplot() +
labs(title = "Australian Retail Trade Turnover")
## Plot variable not specified, automatically selected `.vars = Turnover`
lambda3 <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero) #lambda =.
myseries |>
autoplot(box_cox(Turnover, lambda3)) +
labs(y = "",
title = (paste0(
"AUS Retail Trade Turnover with lambda$ = ",
round(lambda3,2))))
The lambda for for the Australian retail turnover was 0.08 which is near
to 0, therefore the natural logarithms should be used.
3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
lambda4<-aus_production|>
features(Tobacco, features = guerrero)|>
pull(lambda_guerrero)
aus_production|>
autoplot(box_cox(Tobacco, lambda4)) +
labs(y = "",
title = (paste0(
"AUS Retail Trade Turnover with lambda$ = ",
round(lambda4,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
The variance is somewhat stable with a lambda of 0.93.
head(ansett)
## # A tsibble: 6 x 4 [1W]
## # Key: Airports, Class [1]
## Week Airports Class Passengers
## <week> <chr> <chr> <dbl>
## 1 1989 W28 ADL-PER Business 193
## 2 1989 W29 ADL-PER Business 254
## 3 1989 W30 ADL-PER Business 185
## 4 1989 W31 ADL-PER Business 254
## 5 1989 W32 ADL-PER Business 191
## 6 1989 W33 ADL-PER Business 136
Economy<-ansett |>
filter(Class == 'Economy') |>
filter(Airports == 'MEL-SYD')
lambda5<-Economy|>
features(Passengers, features = guerrero)|>
pull(lambda_guerrero)
Economy|>
autoplot(box_cox(Passengers, lambda5)) +
labs(y = "",
title = paste0(
"Economy passengers MEL-SYD with lambda$ = ",
round(lambda5,2)))
The variance for the economy passengers was staized with a lambda of
2.
PedesCount<-pedestrian|>
filter(Sensor == 'Southern Cross Station')
lambda6 <- PedesCount |>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
PedesCount|>
autoplot(box_cox(Count, lambda6)) +
labs(y = "",
title = paste0(
"Pedestrian counts at Southern Cross with $\\lambda$ = ",
round(lambda6,2)))
The the plot seem hard to readbut the variance is stable at -0.25 for
pedestrian counts at Southern cross.
3.7 Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas) Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle? Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas <- tail(aus_production, 5*4) |> select(Gas)
#plot time series
gas|>
autoplot(Gas)
Based on the time series plot the trend is upwards, with a strong
seasonality pattern with an alternating up or down peak at every 3rd
quarter.
gas|>
model(classical_decomposition(Gas, type = "multiplicative"))|>
components()|>
autoplot() +
labs(title = "Classical multiplicative decomposition of Australian gas production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
The decomposition plot proves that there was an upward trend and
seasonality.
#decomposition using STL
dcmp_gas <- gas|>
model(stl = STL(Gas))
#seasonally adjusted data
components(dcmp_gas)|>
as_tsibble() |>
autoplot(Gas, color="blue") +
geom_line(aes(y=season_adjust)) +
labs(y = "Gas production",
title = "Australian Gas Production")
In this plot the black line would be the season adjusted plot and the
blue line would be the original plot.
Gas2 <- gas
Gas2$Gas[8] <- Gas2$Gas[8] + 300
dcmp_gas<-Gas2|>
model(stl=STL(Gas))
components(dcmp_gas)|>
as_tsibble() |>
autoplot(Gas, color="blue") +
geom_line(aes(y=season_adjust)) +
labs(y = "Gas production",
title = "Australian Gas Production")
Gas3 <- gas
Gas3$Gas[1] <- Gas3$Gas[1] + 300
dcmp_gas<-Gas3|>
model(stl=STL(Gas))
components(dcmp_gas)|>
as_tsibble() |>
autoplot(Gas, color="blue") +
geom_line(aes(y=season_adjust)) +
labs(y = "Gas production",
title = "Australian Gas Production")
Gas4 <- gas
Gas4$Gas[20] <- Gas4$Gas[20] + 300
dcmp_gas<-Gas4|>
model(stl=STL(Gas))
components(dcmp_gas)|>
as_tsibble() |>
autoplot(Gas, color="blue") +
geom_line(aes(y=season_adjust)) +
labs(y = "Gas production",
title = "Australian Gas Production")
Adding 300 as an outlier effected the plot, causing a high peak and
changed the pattern of the adjusted season line.
If the outlier is applied in the ends or in the middle, it will effect the plots the same way by changing the adjusted seasonal line.
3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
checkX13binary(fail.unsupported = FALSE, verbose = TRUE)
## x13binary is working properly
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries|>
autoplot() +
labs(title = "Australian Retail Trade Turnover")
## Plot variable not specified, automatically selected `.vars = Turnover`
head(myseries)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Northern Territory Clothing, footwear and perso… A3349767W 1988 Apr 2.3
## 2 Northern Territory Clothing, footwear and perso… A3349767W 1988 May 2.9
## 3 Northern Territory Clothing, footwear and perso… A3349767W 1988 Jun 2.6
## 4 Northern Territory Clothing, footwear and perso… A3349767W 1988 Jul 2.8
## 5 Northern Territory Clothing, footwear and perso… A3349767W 1988 Aug 2.9
## 6 Northern Territory Clothing, footwear and perso… A3349767W 1988 Sep 3
myseries_dcmp <- myseries|>
model(X11 = X_13ARIMA_SEATS(Turnover ~ x11()))|>
components()
autoplot(myseries_dcmp) +
labs(title = "Decomposition of Australian clothing and accessory retailing trade turnover using X-11")
3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. Is the recession of 1991/1992 visible in the estimated components?
The decomposition shows a trend that is upward. The season-year window shows a seasonal repetitive pattern, where there seems to be a upward and downward peak taht seems to increase in size overtime. The remainder shows doesn’t show much noise up until after 1987 and there is deep around 1991. The recession of 1991/1992 was visible in the estimated components in figure 3.19 since we can see a slight decrease in the trend line and the decrease in the remainder window.