Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code1
Loading Library
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.0 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.4.1
## ✔ ggplot2 3.5.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
glimpse(global_economy)
## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…
global_economy_per_cap <- global_economy %>%
mutate(GDP_per_capita = GDP / Population)
global_economy_per_cap %>%
autoplot(GDP_per_capita, show.legend = FALSE) +
labs(x = "Year", y = "GDP Per Capita")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
Exploring the top 5 highest GDP per capita over time
global_economy_per_cap %>%
slice_max(order_by = GDP_per_capita, n = 5) %>%
ungroup()
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## # A tsibble: 5 x 10 [1Y]
## # Key: Country [2]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7060236168. 7.18 NA NA NA 38132
## 2 Monaco MCO 2008 6476490406. 0.732 NA NA NA 35853
## 3 Liechtenstein LIE 2014 6657170923. NA NA NA NA 37127
## 4 Liechtenstein LIE 2013 6391735894. NA NA NA NA 36834
## 5 Monaco MCO 2013 6553372278. 9.57 NA NA NA 37971
## # ℹ 1 more variable: GDP_per_capita <dbl>
Ploting for Monaco and Liechtenstein’s GDP Per Capita
global_economy_per_cap %>%
filter (Country %in% c("Monaco", "Liechtenstein")) %>%
autoplot (GDP_per_capita) +
labs (title = "Monaco & Liechtenstein GDP Per Capita" , x = "Year", y = "$US")
## Warning: Removed 22 rows containing missing values or values outside the scale range
## (`geom_line()`).
Overall, Monaco had the highest GDP per capita in 2014. Apart from Monaco, Liechtenstein recorded the second-highest GDP per capita in 2014 and even exceeded Monaco in 2013.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
global_economy %>%
filter (Country == 'United States') %>%
autoplot(GDP)
global_economy %>%
filter (Country == 'United States') %>%
autoplot(GDP/Population) +
labs ()
It has remained on the same trend over time, with little change overall
by transformation.
aus_livestock %>%
filter (State == "Victoria", Animal== "Bulls, bullocks and steers") %>%
autoplot () +
labs (title = "Victorian Livestock Slaughter")
## Plot variable not specified, automatically selected `.vars = Count`
aus_livestock %>%
filter (State == "Victoria", Animal== "Bulls, bullocks and steers") %>%
autoplot (log(Count)) +
labs (title = "Log Transaformation of Victorian Livestock Slaughter" , y= "Log(Count)")
It doesn’t seem to change much after applying the log
transformation.
vic_elec %>%
autoplot(Demand) +
labs(title="Victorian Electricity Demand" )
vic_transform <- vic_elec %>%
mutate(Month = month(Date)) %>%
index_by(Month) %>%
summarise(Demand=sum(Demand))
vic_transform %>%
autoplot(Demand / 1e6) +
labs(title = "Victorian Electricity Demand by Aggregation of Month",
y = "Demand (millions of kWh)")
After aggregating the monthly demand, it is observed that electricity
demand aligns with Victoria’s seasonal patterns, peaking in July during
winter and then declining. (For reference, Australian seasons are:
Summer: Dec–Feb, Autumn: Mar–May, Winter: Jun–Aug, Spring: Sep–Nov.)
aus_production %>%
autoplot(Gas)+
labs (title="Gas Production", y= "Petajoules(PJ)")
aus_production %>%
autoplot(log(Gas))+
labs(title="Log of Gas Production", y= "Petajoules(PJ)")
After applying a log transformation, the seasonal pattern becomes
clearer, and shows the upward trended.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
canadian_gas %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Volume`
lambda <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume,lambda))
A Box–Cox transformation is often used when larger values are associated with greater fluctuations. Therefore, it is not helpful for the canadian_gas data. Comparing the two graphs, there is little to no noticeable change.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
Usinging the code directly from the Exercise 7 in Section 2.10
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
Ploting the time series graph
myseries %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`
Calculating the lanbda value
lambda_2 <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda_2
## [1] 0.08303631
Ploting the time series based on the Box-Cox transformation
myseries %>%
autoplot(box_cox(Turnover,lambda_2))+
labs(title = "Austrailian Retail with lambda")
Based on the output of lambda value is 0.08 for the data, which is
approximately a log transaofrmation (lambda=0).
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
aus_production %>%
autoplot(Tobacco)
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda_3 <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
lambda_3
## [1] 0.9264636
aus_production %>%
autoplot(box_cox(Tobacco,lambda_3))+
labs(title = "Tobacco with lambda")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
Based on the function output, a lambda value of 0.92 was selected
ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
autoplot() +
labs(title = "Economy Class Passengers between Melborne and Sydney")
## Plot variable not specified, automatically selected `.vars = Passengers`
lambda_4 <- ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
lambda_4
## [1] 1.999927
ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
autoplot(box_cox(Passengers,lambda_4))+
labs(title = "Economy Class Passengers between Melborne and Sydney with lambda")
Based on the function output, a lambda value of apporiximatelt 2 was
selected
pedestrian %>%
filter (Sensor == "Southern Cross Station") %>%
autoplot(Count) +
labs (title="Pedestrian counts at Southern Cross Station")
lambda_5 <- pedestrian %>%
filter (Sensor == "Southern Cross Station") %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
lambda_5
## [1] -0.2501616
pedestrian %>%
filter (Sensor == "Southern Cross Station") %>%
autoplot(box_cox(Count,lambda_5))+
labs(title = "Economy Class Passengers between Melborne and Sydney with lambda")
Based on the function output, a lambda value of -0.2 was selected
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Gas`
Yes and there is a significant seasonal pattern with a slight upward
trend.
gas_classical <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
gas_classical %>%
autoplot()+
labs (title = "Classical Multiplicative Decomposition of Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
#### c. Do the results support the graphical interpretation from part a?
Indeed, the trend is slightly upward, and the seasonal patterns show a
clear up-and-down movement.
gas_classical %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs( title = "Gas Production") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
#### e. Change one observation to be an outlier (e.g., add 300 to one
observation), and recompute the seasonally adjusted data. What is the
effect of the outlier?
gas_300 <- gas
gas_300$Gas[1] <- gas_300$Gas[1] + 300
autoplot(gas_300,Gas)
gas_300_classical <- gas_300 %>%
model(classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
gas_300_classical %>%
autoplot()+
labs (title = "Classical Multiplicative Decomposition of Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
gas_300_classical %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
labs( title = "Gas Production - Seasonally Adjusted") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Adding 300 to the first data point caused a large initial drop, after
which the series stabilized and maintained the same pattern over
time.
gas_300_2 <- gas
gas_300_2$Gas[10] <- gas_300$Gas[10] + 300
autoplot(gas_300_2,Gas)
gas_300_2_classical <- gas_300_2 %>%
model(classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
gas_300_2_classical %>%
autoplot()+
labs (title = "Classical Multiplicative Decomposition of Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
gas_300_2_classical %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
labs( title = "Gas Production - Seasonally Adjusted") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
gas_300_3 <- gas
gas_300_3$Gas[20] <- gas_300$Gas[20] + 300
autoplot(gas_300_3,Gas)
gas_300_3_classical <- gas_300_3 %>%
model(classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
gas_300_3_classical %>%
autoplot()+
labs (title = "Classical Multiplicative Decomposition of Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
gas_300_3_classical %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
labs( title = "Gas Production - Seasonally Adjusted") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Overall, outliers in the middle of the series tend to create wider local
distortions in the trend and seasonal components, while outliers at the
end often produce large changes at the final points, making
end-of-sample trend estimates unstable.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
Based on the Exercise 3.4
##install.packages("seasonal")
library(seasonal)
##
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
##
## view
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of Retail data using X-11.")
There are several spikes observed in the graph, with some values
slightly higher or lower than the regular pattern. Despite these
fluctuations, the overall trend continues upward over time.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
The trend shows steady growth in the Australian civilian labour force between 1978 and 1995 based on the trend component. Additionally, there is a clear seasonal pattern repeating each year. Lastly, a sharp decline appears around 1991, reflecting the impact of the recession.
Yes. The 1991/1992 recession is visible as a dip in the trend component and irregular movements in the remainder.