library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.3 ✔ fable 0.5.0
## ✔ ggplot2 3.5.1
## Warning: package 'dplyr' was built under R version 4.4.2
## Warning: package 'ggplot2' was built under R version 4.4.2
## Warning: package 'tsibble' was built under R version 4.4.3
## Warning: package 'tsibbledata' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tsibble)
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
global_economy |>
mutate(adj_GDP = GDP / Population) |>
drop_na(adj_GDP) |>
select(Country, Year, adj_GDP) |>
autoplot(adj_GDP) + guides(col="none") +
labs(title = "GDP per capita over time",
x = "Year",
y = "GDP per capita")
The plot has too many countries to differentiate each from each other
effectively. I will create another plot of the 10 countries with the
highest GDP.
top_GDP <- global_economy |>
mutate(adj_GDP = GDP / Population) |>
filter(!is.na(adj_GDP)) |>
group_by(Country) |>
filter(adj_GDP == max(adj_GDP)) |>
ungroup() |>
slice_max(adj_GDP, n=10) |>
pull(Country)
global_economy |>
mutate(adj_GDP = GDP/Population) |>
filter(Country %in% top_GDP) |>
autoplot(adj_GDP) +
labs(title = "Top 10 countries GDP per capita",
x = "Year",
y = "GDP per capita")
## Warning: Removed 133 rows containing missing values or values outside the scale range
## (`geom_line()`).
We can see that Monaco and Liechtenstein have the highest GDP with
Monaco historically higher. However, Liechtenstein has data from 1960
while Monaco has data starting from ~1970 and would have the highest GDP
if Monaco was not added to the mix.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
global_economy |>
filter(Code == "USA") |>
autoplot(GDP) +
labs(title = "US GDP")
global_economy |>
filter(Code == "USA") |>
mutate(adj_gdp = GDP/Population) |>
autoplot(adj_gdp) +
labs(title = "US GDP per capita", x = "Year", y = "GDP per capita")
It seemed appropriate to transform the GDP to GDP per capita but the
shape of the data looks nearly identical to the original plot.
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers") |>
filter(State == "Victoria") |>
autoplot(Count) + labs(title = "Victoria Bulls, Bullocks, and Steers Slaughtered")
as_tibble(aus_livestock) |>
filter(Animal== "Bulls, bullocks and steers") |>
group_by(Month) |>
mutate(total_perc = Count/sum(Count)) |>
filter(State == "Victoria") |>
ungroup() |>
as_tsibble(index =Month ) |>
autoplot(total_perc) +
labs(title = "Share of Victorian Bulls, bullocks and steers slaughered",
x= "Date",
y= "Percent of total")
The bulls, bullocks, and steers data was transformed to reflect the
percentage of total bulls, bullocks, and steers slaughtered in Victoria
vs Australia. The shape of the plots are similar, showing a an overall
trend down but the transformed plot is less sharp in regards to change
in direction.
vic_elec |>
autoplot(Demand) +
labs(title = "Victorian electricity demand")
I don’t think this data requires transformation so I will keep the
original plot.
aus_production |>
select(Quarter, Gas) |>
autoplot(Gas) + labs(title = "Australian gas production")
gas_data <-aus_production |>
select(Quarter, Gas)
lambda <-gas_data |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
gas_data <- gas_data |>
mutate(bc = box_cox(Gas, lambda))
gas_data |>
autoplot(bc) +
labs(title = "Box cox transformed AUS gas production")
After a box cox transformation, the overall shape of the data is similar
to the original plot but it helped to stabilize the data.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
autoplot(canadian_gas, Volume)
A box-cox transformation is unhelpful because the data variation is
relatively consistent and the result will be similar to the original
plot.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
# Code from section 2.10
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
# filter data
myseries <-myseries |>
select(Month, Turnover)
# Find lambda value
lambda1 <-myseries |>
features(Turnover,features = guerrero) |>
pull(lambda_guerrero)
print(lambda1)
## [1] 0.08303631
# Box cox transformation
myseries_bc <- myseries |>
mutate(bc = box_cox(Turnover, lambda1))
# Plot data
myseries_bc |>
autoplot(Turnover) +
labs(title = "Original data")
myseries_bc |>
autoplot(bc) +
labs(title ="Transformed data")
I prefer the 2nd transformed plot since it seems a bit more stable. The
spike around 1995 is still prominent but the overall data seems more
normalized.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian
# Filter data to time and tobacco
tobacco_df <- aus_production |>
select(Quarter, Tobacco)
# Find lambda value
lambda2 <- tobacco_df |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
print(lambda2)
## [1] 0.9264636
# Box-cox transform data
tobacco_df <- tobacco_df |>
mutate(bc = box_cox(Tobacco, lambda2))
# Plot
tobacco_df |>
autoplot(Tobacco) +
labs (title = "Original data")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
tobacco_df |>
autoplot(bc) +
labs (title = "Transformed data")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
Although lambda is close to 1 and data has been transformed using
box-cox, the plot is very similar to the original.
# filter data
passenger_df <- ansett |>
filter(Class == "Economy") |>
filter(Airports == "MEL-SYD") |>
select(Week, Passengers)
# lambda value
lambda3 <- passenger_df |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
print(lambda3)
## [1] 1.999927
# box-cox transformation
passenger_df <- passenger_df |>
mutate(bc = box_cox(Passengers, lambda3))
# plot
passenger_df |>
autoplot(Passengers) +
labs(Title = "Original")
passenger_df |>
autoplot(bc) +
labs(Title = "Transformed")
The lambda is 1.999 however there seems to be a large outlier right
before 1990.
# filter data
pedestrian_df <- pedestrian |>
filter(Sensor == "Southern Cross Station") |>
select(Date_Time, Count)
# lambda value
lambda4 <- pedestrian_df |>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
print(lambda4)
## [1] -0.2501616
# box cox transformation
pedestrian_df <- pedestrian_df |>
mutate(bc = box_cox(Count, lambda4))
# plot
pedestrian_df |>
autoplot(Count) +
labs(title= "Original")
pedestrian_df |>
autoplot(bc) +
labs(title = "Transformed")
The lambda for this data set is around -0.25.
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas |>
autoplot(Gas)
There are seasonal fluctuations with an upward trend. There is an
increase of gas between Q1 and Q3. There are decreases during Q3 and
Q4.
gas |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
autoplot()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
### c. Do the results support the graphical interpretation from part a?
Yes, the results seem to support the plot in a since we can see the
clear seasonal patterns with the increase in Q1 and Q2 and decrease in
Q3 and Q4.
gas |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
as_tsibble() |>
autoplot(Gas) +
geom_line(aes(y=season_adjust))
gas_adj <- gas
gas_adj$Gas[5] <-gas_adj$Gas[5] + 100
gas_adj |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
as_tsibble() |>
autoplot(Gas) +
geom_line(aes(y=season_adjust,
color = "red"))
The outlier seems to impact the seasonally adjusted data by slightly
pushing the values up. For example, at 2009 Q1, the gas seems to be
around 225 in the original plot but with the outlier, the value is over
225.
gas_adj2 <- gas
gas_adj2$Gas[10] <-gas_adj$Gas[10] + 100
gas_adj2 |>
model(classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
as_tsibble() |>
autoplot(Gas) +
geom_line(aes(y=season_adjust,
color = "red"))
Since my original outlier plot had the outlier closer to the end, I
moved it closer to the middle in this version. It does not appear to
make a big difference where the outlier is.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
# Code from section 2.10
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
# X-11
myseries |>
select(Month, Turnover) |>
model(X_13ARIMA_SEATS(Turnover ~ x11())) |>
components() |>
autoplot()
This plot doesn’t appear to show anything new but does confirm and help
make the patterns more clean. There is a clear upward trend and there is
a somewhat seasonal component to the data.
The value plot shows the original data and presents a clear trend upward.
The trend plot has the same scale as the value plot, and appears to be more clearly increasing over time.
The season_year plot shows that there is a seasonal component to the data, but the scale is much smaller compared to the previous graphs and suggests that seasonality does not result in significant variation. This plot also shows the seasonal impact becoming slightly more pronounced in more recent years but the impact will be small since the scale is so small.
The remainder plot shows the variation not accounted for by the trend or seasonal elements. The plot does not appear to show substantial variation but it does have an outlier around 1991. The scale here is different from the other plots as well.
b.Is the recession of 1991/1992 visible in the estimated components?
Yes the recesssion is clearly visible in the remainder plot as a major outlier and less clearly in the value and trend plots where the upward trend becomes less steep and flattens slightly.