#Import needed libraries

library(tsibbledata) #to use the time series data in it for the exercises.
library(tsibble) # to use function as_tsibble
library(tibble) # to use view function
library(ggplot2)
library(feasts) # to use the functions for graphics like autoplot()

library(readr) # to uses read_csv function
library(dplyr) # to use Filter, mutate function etc
library(tidyr) # to use pivot_longer function

library(USgas) # to use us_total data

library(fpp3)  # to use us_gasoline dataset 
library(seasonal) # X-13ARIMA-SEATS decomposition

Exercises:

3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Answers:

GDP per capita for each country over time is plotted through autoplot.

In 2017, Luxemborg has the highest GDP per capita.

However there is changes over time.

From 1960-1970 , United States was leading with few exception years indicating Kuwait.

From 1970-2016 , Monaco was leading with increasing GDP per capita with every passing year.

1976,1977, United Arab Emirates had the highest GDP per capita.

2013 and 2015, Liechtenstein had the highest GDP per capita.

global_economy |>
  autoplot(GDP/Population,show.legend = FALSE) +
  labs(title= "GDP per capita", y = "$USD")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

global_economy |>
  mutate(GDPperCapita = GDP/Population) |>
  index_by(Year) |>
  filter(GDPperCapita == max(GDPperCapita, na.rm = TRUE)) |>
  select(Country, GDPperCapita) |>
  arrange(Year) |>
  ggplot(aes(x = Year, y = GDPperCapita, fill = Country)) +
  geom_bar(stat = "identity") +
  labs(title= "GDP per capita", y = "$USD") +
  scale_x_continuous(breaks = seq(min(global_economy$Year), max(global_economy$Year), by = 5))

3.2 For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

The plot before transformation shows a quadratic upward trend.the variance increases with the level of the series.

The plot after population adjustment shows a better trend with a drop in 2008 due to financila crisis.

The lambda value of 0.39 indicates that the data needs a moderate transformation to make it more suitable for analysis and forecasting.Transformation needed is slightly stronger than a square root transformation (λ = 0.5) but not as strong as a log transformation (λ = 0)

# before transformation
global_economy |>
  filter(Country == "United States") |>
  autoplot(GDP) +
  labs(title= "United States GDP", y = "$USD")+
  scale_x_continuous(breaks = seq(min(global_economy$Year), max(global_economy$Year), by = 5))+
  scale_y_continuous(labels = scales::comma)

# performing population adjustments by dividing GDP by population

global_economy |>
  filter(Country == "United States") |>
  autoplot(GDP/Population) +
  labs(title= "United States GDPperCapita", y = "$USD")+
  scale_x_continuous(breaks = seq(min(global_economy$Year), max(global_economy$Year), by = 5))+
  scale_y_continuous(labels = scales::comma)

# applying transformation
us_economy <- global_economy |>
  filter(Country == "United States") |>
  mutate(GDPperCapita = GDP/Population) 

lambda <- us_economy |>
  features(GDPperCapita, features = guerrero) |>
  pull(lambda_guerrero)

us_economy |>
  autoplot(box_cox(GDPperCapita, lambda)) +
  labs(y = "",title = paste("United States Transformed GDPperCapita with λ = ",round(lambda,2)))+
  scale_x_continuous(breaks = seq(min(global_economy$Year), max(global_economy$Year), by = 5))

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

The plot before any transformation shows a changing trend direction.There is seasonality.

Check for distribution reveals the data is right skewed indicating the seasonal fluctuations that grow with the level of the series.

Check for heteroscedasticity shows it is not the case.

The lambda value of -0.04 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.

#original plot
aus_livestock |>
  filter( Animal == 'Bulls, bullocks and steers' & State == 'Victoria') |>
autoplot(Count) +
  labs(title= "Meat production in Australia for human consumption", y = "Number of animals slaughtered")

# create a refined set
aus_vic_livestock <- aus_livestock |>
  filter( Animal == 'Bulls, bullocks and steers' & State == 'Victoria')

# Check distribution
aus_vic_livestock |>
  ggplot(aes(x = Count)) +
  geom_histogram(bins = 30) +
  labs(title = "Distribution of Count")

# Check for heteroscedasticity
aus_vic_livestock |>
  ggplot(aes(x = Month, y = Count)) +
  geom_boxplot() +
  labs(title = "Monthly Distribution of Counts")
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

# box con transformation
lambda <- aus_vic_livestock |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)

aus_vic_livestock |>
  autoplot(box_cox(Count, lambda)) +
  labs(y = "",title = paste("Meat production in Australia for human consumption with  λ = ",round(lambda,2)))

#log transformation
aus_livestock |>
  filter( Animal == 'Bulls, bullocks and steers' & State == 'Victoria') |>
autoplot(log(Count)) +
  labs(y = "Log(Count)", title= "Meat production in Australia for human consumption (log transformed)")

Victorian Electricity Demand from vic_elec.

The plot before any transformation does not show any trend and its diffcult to observe any seasonality or cyclic pattern.

Check for distribution reveals the data is right skewed indicating the seasonal fluctuations that grow with the level of the series.

Check for heteroscedasticity shows it is not the case.

I used yearmonth class function to transform to a monthly time series

The lambda value of 1.999 indicates that the data needs no transformation.

In the month time series , we observe a seasonality across the three years 2013, 2014,2015.

The Demand for electricity peaks in July.

#original time series
autoplot(vic_elec, Demand)+
  ggtitle("Half-hourly electricity demand for Victoria, Australia") +
  xlab("Date ") +
  ylab("Demand") 

# Check distribution
vic_elec |>
  ggplot(aes(x = Demand)) +
  geom_histogram(bins = 30) +
  labs(title = "Distribution of Count")

# Check for heteroscedasticity
vic_elec |>
  ggplot(aes(x = Time, y = Demand)) +
  geom_boxplot() +
  labs(title = "Monthly Distribution of Counts")
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

# using yearmonth class function to transform to a monthly time series
vic_elec_month <- vic_elec  |>
  mutate(YearMonth = yearmonth(Date))  |>
  index_by(YearMonth)  |>
  summarise(Demand = sum(Demand)) 

# box con transformation
lambda <- vic_elec_month |>
  features(Demand, features = guerrero) |>
  pull(lambda_guerrero)

print(lambda)
## [1] -0.8999268
  autoplot(vic_elec_month,Demand)+
  ggtitle("Half-hourly electricity demand for Victoria, Australia") +
  xlab("Month ") +
  ylab("Demand") 

# Method 2: Simple reciprocal transformation (1/x)
vic_elec_month |>
  mutate(Demand_reciprocal = 1/Demand) |>
  autoplot(Demand_reciprocal) +
  ggtitle("Reciprocal transformed electricity demand") +
  xlab("Month") +
  ylab("1/Demand")

Gas production from aus_production.

The time series shows a upward trend with seasonality behavior. Also the magnitude of change increases over the level of the series.

Check for distribution reveals the data is highly right skewed indicating the seasonal fluctuations that grow with the level of the series.

Check for heteroscedasticity shows it is not the case.

The lambda value of 0.11 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.

#original time series
aus_production |>
  autoplot(Gas) +
  labs ( title = "Quarterly production of Gas in Australia", y = "Gas production in petajoules")

# Check distribution
aus_production |>
  ggplot(aes(x = Gas)) +
  geom_histogram(bins = 30) +
  labs(title = "Distribution of Count")

# Check for heteroscedasticity
aus_production |>
  ggplot(aes(x = Quarter, y = Gas)) +
  geom_boxplot() +
  labs(title = "Monthly Distribution of Counts")
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

# Estimate the lambda
lambda <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)

aus_production |>
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = paste("Transformed gas production with = ",round(lambda,2)))

# perform log transformation
aus_production |>
  autoplot(log(Gas)) +
  labs ( title = "Quarterly production of Gas in Australia (log transformed)", y = "Log(Gas production in petajoules)")

3.3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?

The time series shows a upward trend which is not linear. there is seasonality which is changing over time.

The distribution shows that it is multi modal as data has multiple peaks or “natural clusters.”

Box-Cox transformations are designed to address skewness and variance stabilization, but they cannot handle multimodality. A Box-Cox transformation with λ = 0.58 (close to 0.5, suggests a square root transformation) would:Still maintain the multiple modes,Not make the data more “normal” in distribution Not address the underlying pattern in your data.

Hence it is not useful

autoplot(canadian_gas, Volume)+
  labs( y = "gas production in billions of cubic metres", title ="Monthly Canadian gas production")

# Check distribution
canadian_gas |>
  ggplot(aes(x = Volume)) +
  geom_histogram(bins = 30) +
  labs(title = "Distribution of Count")

# Check for heteroscedasticity
canadian_gas |>
  ggplot(aes(x = Month, y = Volume)) +
  geom_boxplot() +
  labs(title = "Monthly Distribution of Counts")
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

# Estimate the lambda
lambda <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)

canadian_gas |>
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "",
       title = paste("Transformed gas production with λ = ",round(lambda,2)))

3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

the time series is indicating a changing direction trend as its on a upward trend from 1982 to 2002. Then it is decreasing till 2006 and so on.

the seasonality show the retail turnover is highest in December over the years and relatively lowest in February.

The lambda value of 0.47 indicates the needs for a transformation to make it more suitable for analysis and forecasting.Transformation needed is a square root transformation (λ = 0.5).

set.seed(12345698)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries,Turnover)+
  labs( y = "Retail turnover in $Million AUD", title = "Australian retail trade turnover")

gg_season(myseries,Turnover)+
  labs( y = "Retail turnover in $Million AUD", title = "Australian retail trade turnover")

gg_subseries(myseries,Turnover)+
  labs( y = "Retail turnover in $Million AUD", title = "Australian retail trade turnover")

# Estimate the lambda
lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

myseries |>
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "",
       title = paste("Transformed Australian retail trade turnover = ",round(lambda,2)))

autoplot(myseries,sqrt(Turnover))+
  labs( y = "SQRT(Retail turnover in $Million AUD)", title = "Australian retail trade turnover( SQRT transformed)")

3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production

Box con transformation generated a lambda of 0.93, which indicates no transformation is needed.

aus_production |>
  autoplot(Tobacco) + 
  labs(title= "Quarterly Production of Tobbaco" , y = "Tobacco and cigarette production in tonnes")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Generating the  lambda 
lambda <- aus_production %>%
  features(Tobacco,features = guerrero) %>%
  pull(lambda_guerrero)

aus_production |>
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y =  "Tobacco and cigarette production in tonnes",
       title = paste("Transformed Tobacco production with λ = ",round(lambda,2)))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Economy class passengers between Melbourne and Sydney from ansett

Lambda value of 2 indicates the need for a square Transformation to stabilize the variance.

mel_syd_ansett <- ansett |>
  filter(Airports == "MEL-SYD" & Class == "Economy") 

autoplot(mel_syd_ansett,Passengers) + labs(title = "Passenger numbers on Ansett airline flights" , y = "Total air passengers travelling with Ansett")+
  scale_y_continuous(labels = scales::comma)

#Generating the  lambda 
lambda <- mel_syd_ansett %>%
  features(Passengers,features = guerrero) %>%
  pull(lambda_guerrero)

mel_syd_ansett |>
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "(Total air passengers travelling with Ansett)^2",
       title = paste("Square Transformed Passenger numbers on Ansett airline flights with λ = ",round(lambda,2))) +
  scale_y_continuous(labels = scales::comma)

Pedestrian counts at Southern Cross Station from pedestrian.

The original time series is not inferable at all.

After adjusting to grouping by the Weekly time series, i observe a constant trend in pedestrian count of 3000 from 2015 to 2017, except for a dip around Jan 2016.There is seasonality fluctations with no cyclic behavior.

With lambda equal to -0.11 (very close to 0), this suggests that a log transformation would be most appropriate to stabilise the variance.

#original time series
pedestrian |>
   filter(Sensor == "Southern Cross Station") |>
    autoplot(Count) + labs(title = "Pedestrian counts in the city of Melbourne", y = "Hourly pedestrian counts")

# Convert to weekly time series
weekly_pedestrian <- pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  mutate(YearWeek = yearweek(Date))  |>
  index_by(YearWeek)  |>
  summarise(Count = sum(Count)) 


lambda <- weekly_pedestrian |>
  features(Count,features = guerrero) %>%
  pull(lambda_guerrero)

print(lambda)
## [1] -0.1108714
weekly_pedestrian |>
    autoplot(box_cox(Count,lambda)) + 
  labs(title = paste("Log Transformed Pedestrian counts in the city of Melbourne with λ = ",round(lambda,2)), 
       y = "log(Weekly pedestrian counts)")

3.7 Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

Do the results support the graphical interpretation from part a?

Compute and plot the seasonally adjusted data.

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Answer: a) Gas production in Australia for the years 2005 to 2010, based on the time series is indicating a upward trend with a seasonality fluctuation. It increases from Q1 to Q3 and decreases in Q4 for these years.

The classical multiplicative decomposition shows the trend cycle is general increasing without the other components. The seasonal index shows the repeating pattern each year. The remainder index shows values around 1.0 indicate normal variation.

The results support the graphical interpretation from part a.

After computing and plotting the seasonal adjusted data, i can see the true trend cycle without the seasonal patterns. Thus better comparison across the year like a dip in 2008 Q2.

After changing one observation 2010 q4 to be an outlier at the middle of the time series and recomputing the seasonally adjusted data,i see that it significantly affected the trend cycle. the trend is not showing that it is upward trend and highlighting a spike.

If i change the outlier to be added to the near end of the time series, there is a difference in where the outlier is placed showing the spike and accordingly a difference in the season adjusted representation.

# gas production last 5 years
gas_aus_production <- tail(aus_production, 5*4) |> 
      select(Gas)

# plot the time series
autoplot(gas_aus_production, Gas)+labs ( title = "Quarterly production of Gas in Australia", y = "Gas production in petajoules")

#Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas_aus_production  |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() |>
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Quarterly production of Gas in Australia")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Compute and plot the seasonally adjusted data.
gas_aus_production |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() |>

  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, color = "Original")) +
  geom_line(aes(y = season_adjust, color = "Seasonally Adjusted")) +
  labs(title = "Original vs Seasonally Adjusted Gas Production",
       y = "Gas Production",
       color = "Series") +
  scale_color_manual(values = c("Original" = "blue", "Seasonally Adjusted" = "red"))

## Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data.
gas_aus_production_test <- gas_aus_production
gas_aus_production_test$Gas[10] = gas_aus_production_test$Gas[10] +300


## Re Compute and plot the seasonally adjusted data showing the effect of the outlier
gas_aus_production_test |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() |>

  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, color = "Original")) +
  geom_line(aes(y = season_adjust, color = "Seasonally Adjusted")) +
  labs(title = "Original vs Seasonally Adjusted Gas Production( outlier in middle of series)",
       y = "Gas Production",
       color = "Series") +
  scale_color_manual(values = c("Original" = "blue", "Seasonally Adjusted" = "red"))

#change the outlier to be added to the near end of the time series
gas_aus_production_new <- gas_aus_production
gas_aus_production_new$Gas[19] = gas_aus_production_new$Gas[19] +300


## Re Compute and plot the seasonally adjusted data showing the effect of the outlier
gas_aus_production_new |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() |>

  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, color = "Original")) +
  geom_line(aes(y = season_adjust, color = "Seasonally Adjusted")) +
  labs(title = "Original vs Seasonally Adjusted Gas Production(outlier in the near end of the series)",
       y = "Gas Production",
       color = "Series") +
  scale_color_manual(values = c("Original" = "blue", "Seasonally Adjusted" = "red"))

3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Answer: The retail time series data is indicating a changing direction trend as its on a upward trend from 1982 to 2002. Then it is decreasing till 2006 and so on.The seasonality showed the retail turnover is highest in December over the years and relatively lowest in February. Decomposing the series using X-11 revealed the key outliers that was not clearly visible previously. This is evident in the irregular component which shows spikes around 1990 to 2000.

myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries,Turnover)+
  labs( y = "Retail turnover in $Million AUD", title = "Australian retail trade turnover")

x11_dcmp <- myseries |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components() 

autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Australian retail trade turnover using X-11.")

3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

b. Is the recession of 1991/1992 visible in the estimated components?

Answer:

  1. The decomposition of the number of persons in the civilian labour force is using a additive STL decomposition indicating positive trend which is increasing over time.STL is better at handling outliers than other traditional techniques.As observed in the season_year component, there is seasonality as there is variations within the year, however they have a wide range from -100 to 100. It is highest in the month of December followed by March.It is least in Jan and August.There is no cyclic behavior.
  2. Using the 3.19 graph, it is evident that the remainder component highlights the strong outliers in 1991 and 1992 which was probably due to recession.Other than that , it shows a normal variation.