HW_2

Assignment Instructions:

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code1

Loading Library

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.3.0     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

Exercise 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

glimpse(global_economy)

## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country    <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code       <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year       <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP        <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports    <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports    <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…

global_economy_per_cap <- global_economy %>%
  mutate(GDP_per_capita = GDP / Population)

global_economy_per_cap %>%
  autoplot(GDP_per_capita, show.legend = FALSE) +
  labs(x = "Year", y = "GDP Per Capita")

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

Exploring the top 5 highest GDP per capita over time

global_economy_per_cap %>%
  slice_max(order_by = GDP_per_capita, n = 5) %>%
  ungroup()

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

## # A tsibble: 5 x 10 [1Y]
## # Key:       Country [2]
##   Country       Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>         <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Monaco        MCO    2014 7060236168.  7.18     NA      NA      NA      38132
## 2 Monaco        MCO    2008 6476490406.  0.732    NA      NA      NA      35853
## 3 Liechtenstein LIE    2014 6657170923. NA        NA      NA      NA      37127
## 4 Liechtenstein LIE    2013 6391735894. NA        NA      NA      NA      36834
## 5 Monaco        MCO    2013 6553372278.  9.57     NA      NA      NA      37971
## # ℹ 1 more variable: GDP_per_capita <dbl>

Ploting for Monaco and Liechtenstein’s GDP Per Capita

global_economy_per_cap %>%
  filter (Country %in% c("Monaco", "Liechtenstein")) %>%
  autoplot (GDP_per_capita) +
  labs (title = "Monaco & Liechtenstein GDP Per Capita" , x = "Year", y = "$US")

## Warning: Removed 22 rows containing missing values or values outside the scale range
## (`geom_line()`).

Overall, Monaco had the highest GDP per capita in 2014. Apart from Monaco, Liechtenstein recorded the second-highest GDP per capita in 2014 and even exceeded Monaco in 2013.

Exercise 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

a. United States GDP from global_economy.

global_economy %>% 
  filter (Country == 'United States') %>%
  autoplot(GDP)

global_economy %>% 
  filter (Country == 'United States') %>%
  autoplot(GDP/Population) +
  labs ()

It has remained on the same trend over time, with little change overall by transformation.

b. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>%
  filter (State == "Victoria", Animal== "Bulls, bullocks and steers") %>%
  autoplot () +
  labs (title = "Victorian Livestock Slaughter")

## Plot variable not specified, automatically selected `.vars = Count`

aus_livestock %>%
  filter (State == "Victoria", Animal== "Bulls, bullocks and steers") %>%
  autoplot (log(Count)) +
  labs (title = "Log Transaformation of Victorian Livestock Slaughter" , y= "Log(Count)")

It doesn’t seem to change much after applying the log transformation.

c. Victorian Electricity Demand from vic_elec.

vic_elec %>%
  autoplot(Demand) +
  labs(title="Victorian Electricity Demand" )

vic_transform <- vic_elec %>%
  mutate(Month = month(Date)) %>%
  index_by(Month) %>%
  summarise(Demand=sum(Demand))


vic_transform %>%
  autoplot(Demand / 1e6) +
  labs(title = "Victorian Electricity Demand by Aggregation of Month",
       y = "Demand (millions of kWh)")

After aggregating the monthly demand, it is observed that electricity demand aligns with Victoria’s seasonal patterns, peaking in July during winter and then declining. (For reference, Australian seasons are: Summer: Dec–Feb, Autumn: Mar–May, Winter: Jun–Aug, Spring: Sep–Nov.)

d. Gas production from aus_production.

aus_production %>%
  autoplot(Gas)+
  labs (title="Gas Production", y= "Petajoules(PJ)")

aus_production %>%
  autoplot(log(Gas))+
  labs(title="Log of Gas Production", y= "Petajoules(PJ)")

After applying a log transformation, the seasonal pattern becomes clearer, and shows the upward trended.

Exercise 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Volume`

lambda <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)

canadian_gas %>%
  autoplot(box_cox(Volume,lambda))

A Box–Cox transformation is often used when larger values are associated with greater fluctuations. Therefore, it is not helpful for the canadian_gas data. Comparing the two graphs, there is little to no noticeable change.

Exercise 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Usinging the code directly from the Exercise 7 in Section 2.10

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Ploting the time series graph

myseries %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

Calculating the lanbda value

lambda_2 <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_2

## [1] 0.08303631

Ploting the time series based on the Box-Cox transformation

myseries %>%
  autoplot(box_cox(Turnover,lambda_2))+
  labs(title = "Austrailian Retail with lambda")

Based on the output of lambda value is 0.08 for the data, which is approximately a log transaofrmation (lambda=0).

Exercise 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

a. Tobacco from aus_production

aus_production %>%
  autoplot(Tobacco)

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambda_3 <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_3

## [1] 0.9264636

aus_production %>%
  autoplot(box_cox(Tobacco,lambda_3))+
  labs(title = "Tobacco with lambda")

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Based on the function output, a lambda value of 0.92 was selected

b. Economy class passengers between Melbourne and Sydney from ansett

ansett %>%
  filter(Class == "Economy", Airports == "MEL-SYD") %>%
  autoplot() +
  labs(title = "Economy Class Passengers between Melborne and Sydney")

## Plot variable not specified, automatically selected `.vars = Passengers`

lambda_4 <- ansett %>%
  filter(Class == "Economy", Airports == "MEL-SYD") %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_4

## [1] 1.999927

ansett %>%
  filter(Class == "Economy", Airports == "MEL-SYD") %>%
  autoplot(box_cox(Passengers,lambda_4))+
  labs(title = "Economy Class Passengers between Melborne and Sydney with lambda")

Based on the function output, a lambda value of apporiximatelt 2 was selected

c. Pedestrian counts at Southern Cross Station from pedestrian.

pedestrian %>%
  filter (Sensor == "Southern Cross Station") %>%
  autoplot(Count) +
  labs (title="Pedestrian counts at Southern Cross Station")

lambda_5 <- pedestrian %>%
  filter (Sensor == "Southern Cross Station") %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_5

## [1] -0.2501616

pedestrian %>%
  filter (Sensor == "Southern Cross Station") %>%
  autoplot(box_cox(Count,lambda_5))+
  labs(title = "Economy Class Passengers between Melborne and Sydney with lambda")

Based on the function output, a lambda value of -0.2 was selected

Exercise 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Gas`

Yes and there is a significant seasonal pattern with a slight upward trend.

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_classical <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")
        ) %>%
  components()

gas_classical %>%
  autoplot()+
  labs (title = "Classical Multiplicative Decomposition of Gas Production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

#### c. Do the results support the graphical interpretation from part a? Indeed, the trend is slightly upward, and the seasonal patterns show a clear up-and-down movement.

d. Compute and plot the seasonally adjusted data.

gas_classical %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs( title = "Gas Production") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

#### e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_300 <- gas

gas_300$Gas[1] <-  gas_300$Gas[1] + 300

autoplot(gas_300,Gas)

gas_300_classical <- gas_300 %>%
  model(classical_decomposition(Gas, type = "multiplicative")
        ) %>%
  components()

gas_300_classical %>%
  autoplot()+
  labs (title = "Classical Multiplicative Decomposition of Gas Production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

gas_300_classical %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  labs( title = "Gas Production - Seasonally Adjusted") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Adding 300 to the first data point caused a large initial drop, after which the series stabilized and maintained the same pattern over time.

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas_300_2 <- gas

gas_300_2$Gas[10] <-  gas_300$Gas[10] + 300

autoplot(gas_300_2,Gas)

gas_300_2_classical <- gas_300_2 %>%
  model(classical_decomposition(Gas, type = "multiplicative")
        ) %>%
  components()

gas_300_2_classical %>%
  autoplot()+
  labs (title = "Classical Multiplicative Decomposition of Gas Production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

gas_300_2_classical %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  labs( title = "Gas Production - Seasonally Adjusted") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

gas_300_3 <- gas

gas_300_3$Gas[20] <-  gas_300$Gas[20] + 300

autoplot(gas_300_3,Gas)

gas_300_3_classical <- gas_300_3 %>%
  model(classical_decomposition(Gas, type = "multiplicative")
        ) %>%
  components()

gas_300_3_classical %>%
  autoplot()+
  labs (title = "Classical Multiplicative Decomposition of Gas Production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

gas_300_3_classical %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  labs( title = "Gas Production - Seasonally Adjusted") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

Overall, outliers in the middle of the series tend to create wider local distortions in the trend and seasonal components, while outliers at the end often produce large changes at the final points, making end-of-sample trend estimates unstable.

Exercise 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Based on the Exercise 3.4

##install.packages("seasonal")
library(seasonal)

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()

autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Retail data using X-11.")

There are several spikes observed in the graph, with some values slightly higher or lower than the regular pattern. Despite these fluctuations, the overall trend continues upward over time.

Exercise 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a.Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The trend shows steady growth in the Australian civilian labour force between 1978 and 1995 based on the trend component. Additionally, there is a clear seasonal pattern repeating each year. Lastly, a sharp decline appears around 1991, reflecting the impact of the recession.

b. Is the recession of 1991/1992 visible in the estimated components?

Yes. The 1991/1992 recession is visible as a dip in the trend component and irregular movements in the remainder.