lab_02

Author

Henock Montcho

Published

March 16, 2025

library(fpp3)
library(tsibble)
library(dplyr)
#library(ggplot)

1-)

-Plot the GDP per capita for each country over time:

global_economy  |>
  autoplot(GDP / Population, show.legend = FALSE) +
  labs(title= "GDP per capita", y = "$US")

-Which country has the highest GDP per capita?

global_economy  |>
  mutate(GDP_per_capita = GDP / Population)  |>
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE))  |>
  select(Country, GDP_per_capita)

# A tsibble: 1 x 3 [1Y]
# Key:       Country [1]
  Country GDP_per_capita  Year
  <fct>            <dbl> <dbl>
1 Monaco         185153.  2014

Comment: Monaco has the highest GDP per capita.

-How has Monaco GDP per capita changed over time?

global_economy  |>
  filter(Country == "Monaco")  |>
  autoplot(GDP/Population) +
  labs(title= "GDP per capita for Monaco over time", y = "$US")

Comment: Monaco’s GDP per capita shows an overall trend of increase over the years. It also shows a cycle that lasts approximately 15 years with a decline in 2009 (probably due to the 2009 great recession) that shortens the 15-years cycle.

2-) For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

-United States GDP from “global_economy”.

global_economy  |>
  filter (Country == "United States")  |>
  autoplot(GDP / 10^13) +
  labs(title = "United States GDP over the years" , 
       y = "GDP ($Trillions)")

Comment: No huge transformation was needed beside the converting the y-axis into “$Trillions”

-Slaughter of Victorian “Bulls, bullocks and steers” in “aus_livestock”.

aus_livestock  |>
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria")  |>
  autoplot(Count) +
  labs(title= "Slaughter of Victoria Bulls, Bullocks, and Steers")

Comment: There was no transformation done.

-Victorian Electricity Demand from “vic_elec”

victorian_demand <- vic_elec  |>
  group_by(Date)  |>
  mutate(Demand = sum(Demand))  |>
  distinct(Date, Demand)
victorian_demand

# A tibble: 1,096 × 2
# Groups:   Date [1,096]
   Date        Demand
   <date>       <dbl>
 1 2012-01-01 222438.
 2 2012-01-02 257965.
 3 2012-01-03 267099.
 4 2012-01-04 222742.
 5 2012-01-05 210585.
 6 2012-01-06 210247.
 7 2012-01-07 202526.
 8 2012-01-08 193413.
 9 2012-01-09 213804.
10 2012-01-10 215020.
# ℹ 1,086 more rows

victorian_demand  |> 
  as_tsibble(index = Date)  |>
  autoplot(Demand) +
  labs(title= "Daily Victorian Electricity Demand", y = "$US (in trillions)")

victorian_demand  |>
  mutate(Date = yearmonth(Date))  |>
  group_by(Date)  |>
  summarise(Demand = sum(Demand)) |>
  as_tsibble(index = Date)  |>
  autoplot(Demand) +
  labs(title= "Monthly Victorian Electricity Demand", y = "$US (in trillions)")

victorian_demand  |>
  mutate(Date = year(Date))  |>
  group_by(Date)  |>
  summarise(Demand = sum(Demand)) |>
  as_tsibble(index = Date)  |>
  autoplot(Demand) +
  labs(title= "Yearly Victorian Electricity Demand", y = "$US (in trillions)")

Comment: There were some transformations made to visualize the daily, monthly and Yearly Demand of the Victorian Electricity. The Victorian Electricity Demand shows a decreasing trend throughout 2012 and 2014 considering the Yearly data. Considering the Daily Victorian Electricity Demand, it shows a cycle with its peak in the mid of the year.

-Gas production from “aus_production”.

aus_production  |>
  select("Gas")  |>
  autoplot()+
  labs(y = "Gas Production in petajoules",
       title = "Australian Quarterly Gas Production")

lambda <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)
lambda

[1] 0.1095171

aus_production |>
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = "Transformed Australian Gas Production (adjusted with lambda)",
         round(lambda,2))

Comment: The Gas production from ‘aus_production’ shows variation that increases with the level of the series, then a transformation can be useful. A Box-Cox transformation has been performed with lambda = 0.11. With that transformation, the variation has been maintained constant with the level of the series.

3-) Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas  |>
  autoplot(Volume) +
  labs(title = "Original Canadian Gas Production")

lambda <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)
lambda

[1] 0.5767648

canadian_gas |>
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "",
       title = "Transformed Canadian Gas Production (adjusted with lambda)")

Comment: The Box-Cox transformation did not make any significant change to the Canadian gas Production: the variation has not been maintained constant with the level of the series.

4-) What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(001)
myseries <- aus_retail  |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries, Turnover) +
  labs(title = "Retail Turnover", y = "$millions AUD")

lambda <- myseries  |>
  features(Turnover, features = guerrero)  |>
  pull(lambda_guerrero)
lambda

[1] -0.009186879

myseries  |>
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "", title = "Transformed Retail Turnover with lambda")

Comment: a Box-Cox transformation where lambda = ‘-0.01’ would be selected.

5-) For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

-Tobacco from aus_production

tobacco <- aus_production  |>
  select("Tobacco")

tobacco  |>
  autoplot() +
  labs(title = "Quarterly Production of Tobacco in Australia")

lambda <- tobacco  |>
  features(Tobacco, features = guerrero)  |>
  pull(lambda_guerrero)
lambda

[1] 0.9264636

tobacco  |>
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "", title = "Transformed Quarterly Production of Tobacco in Australia with lambda")

Comment: Lambda is approximately equal to ‘1’ (0.9). There will be no transformation.

-Economy class passengers between Melbourne and Sydney from ansett

economy_passengers <- ansett  |>
  filter(Class == "Economy", Airports == "MEL-SYD")  |>
  select("Passengers")

economy_passengers  |>
  autoplot(Passengers) +
  labs(title = "Economy class passengers between Melbourne and Sydney")

lambda <- economy_passengers  |>
  features(Passengers, features = guerrero)  |>
  pull(lambda_guerrero)
lambda

[1] 1.999927

economy_passengers  |>
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "", title = "Transformed Passengers numbers on Ansett airline flights Economy Class")

Comment: Lambda is approximately equal to 2. This means that each data value is squarred, and then 1 is substracted from the result, followed by dividing by 2. The purpose of this transformation is to stabilize variance and make the data more normally distributed.

-Pedestrian counts at Southern Cross Station from pedestrian

pedestrian_southern_cross <- pedestrian  |>
  filter(Sensor == "Southern Cross Station")

pedestrian_southern_cross  |>
  autoplot(Count) +
  labs(title = "Hourly Pedestrian counts in the city of Melbourne")

lambda <- pedestrian_southern_cross  |>
  features(Count, features = guerrero)  |>
  pull(lambda_guerrero)
lambda

[1] -0.2501616

pedestrian_southern_cross  |>
  autoplot(box_cox(Count, lambda)) +
  labs(y = "", title = "Transformed Hourly Pedestrian counts in the city of Melbourne with lambda")

#Transform Data
#Daily
pedestrian_southern_cross_daily <- pedestrian_southern_cross  |>
  index_by(Date)  |>
  summarise(Count = sum(Count))

pedestrian_southern_cross_daily  |>
  autoplot(Count) +
  labs(title = "Daily Pedestrian counts in the city of Melbourne")

lambda <- pedestrian_southern_cross_daily  |>
  features(Count, features = guerrero)  |>
  pull(lambda_guerrero)
lambda

[1] 0.2726316

pedestrian_southern_cross_daily  |>
  autoplot(box_cox(Count, lambda)) +
  labs(y = "", title = "Transformed Daily Pedestrian counts in the city of Melbourne with lambda")

#Weekly
pedestrian_southern_cross_weekly <- pedestrian_southern_cross_daily  |>
  mutate(Week = yearweek(Date))  |>
  index_by(Week)  |>
  summarise(Count = sum(Count))

pedestrian_southern_cross_weekly  |>
  autoplot(Count) +
  labs(title = "Weekly Pedestrian counts in the city of Melbourne")

lambda <- pedestrian_southern_cross_weekly  |>
  features(Count, features = guerrero)  |>
  pull(lambda_guerrero)
lambda

[1] -0.1108714

pedestrian_southern_cross_weekly  |>
  autoplot(box_cox(Count, lambda)) +
  labs(y = "", title = "Transformed Weekly Pedestrian counts in the city of Melbourne with lambda")

Comment: A better interpretation cannot be made from the hourly and the daily visualizations even when the data have been transformed with lambda. The weekly visualization (not transformed) on its own gives better insights. With lambda, there is not much change.

7-)

a- Plot the time series

gas <- tail(aus_production, 5*4) |> select(Gas)

autoplot(gas, Gas)

Comment: An overall increasing trend is observed with seasonal patterns where Q1 in any given year, has the lowest production of gas and Q3 has the highest production of gas and a decrease in Q4. An observed yearly cycle is repeated throughout the Quarters.

b- Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_dcmp_multi <- gas  |>
  model(classical_decomposition(Gas, type = "multiplicative"))
  components (gas_dcmp_multi)  |>
  autoplot() +
  labs(title = "Multiplicative Decomposition of Quarterly Gas Production")

c- Do the results support the graphical interpretation from part a?

Interpretation:

-The ‘trend’ graph shows an increasing trend (same conclusion from part a).

-The seasonality conclusion is the same with part a. Q1 still being the lowest and Q2 the highest throughout the years considering the ‘seasonal’ graph. Also, a yearly cycle is observed as in part a.

Conclusion:

The results support the graphical interpretation from part a.

d- Compute and plot the seasonally adjusted data.

components(gas_dcmp_multi) |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "gas")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  labs(title = "Quartely Gas Production vs. Adjusted Quartely Gas Production")

e- Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas  |>
  mutate(Gas = ifelse(Gas == 192, Gas + 300, Gas))  |>
  model(classical_decomposition(Gas, type = "multiplicative"))  |>
  components()  |>
  as_tsibble()  |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "blue") +
  labs(title = "Seasonally Adjusted Gas Production with an Outlier")

Comment: 2006 Q4 is the outlier. When added, the outlier drastically changed the graph compared to the graph from part d. However, the adjusted data seems “pretty close” to the non-adjusted data and the trend and seasonality are pretty close (both from part e).

f- Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas  |>
  mutate(Gas = ifelse(Gas == 236, Gas + 300, Gas))  |>
  model(classical_decomposition(Gas, type = "multiplicative"))  |>
  components()  |>
  as_tsibble()  |>
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "blue") +
  labs(title = "Seasonally Adjusted Gas Production with an Outlier")

Comment: The seasonality remains almost the same for the non adjusted data except for the outlier from parts e and f. The adjusted season has changed considerably from part e to part f.

8-Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(seasonal)

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

x11_dcmp <- myseries  |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()

Comment: No outliers are revealed. No unusual features not previously noticed before are revealed.

9-Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a- Write about 3-5 sentences describing the results of the decomposition:

From figure 3.19:

The trend graph shows an increasing trend in the labour force throughout the years. The scale of the trend graph is the same as the value graph which depicts the non decomposed time series. The trend from the value graph is approximately similar to the trend graph. The season_year graph shows a steady seasonality. However, the scale is much lower which could indicate the negligible impact of the season of the year on the labour force. There is a trough between 1991 and 1992 shown in the remainder graph with an impact noticed on the value graph, but not noticed on the trend graph.

b- Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession of 1991/1992 is visible in the remainder graph with the noticeable troughs.