Homework 2 Data 624

library(lubridate)
library(tsibble)

## Warning: package 'tsibble' was built under R version 4.3.3

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.3

library(tidyverse)
library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

library(forecast)

## Warning: package 'forecast' was built under R version 4.3.3

library(seasonal)

## Warning: package 'seasonal' was built under R version 4.3.3

1) Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy <- global_economy %>%
  mutate(GDP_per_capita = GDP/Population) 

global_economy %>%
  autoplot(GDP_per_capita, show.legend = FALSE) +
  labs(title = "GDP per Capita", y = "$US")

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

max_gdp_country <- global_economy %>%
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
  select(Country, Year, GDP_per_capita)

max_gdp_country

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country  Year GDP_per_capita
##   <fct>   <dbl>          <dbl>
## 1 Monaco   2014        185153.

global_economy %>%
  filter(Country == "Monaco") %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  autoplot(GDP_per_capita) +
  labs(title = "GDP per Capita for Monaco")

## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).

Monaco is the country with the highest GDP per capita over time.Overtime, it has increased and seems to be greater than the other countries mostly. The GDP per capita has an increasing trend for majority of the countries.

………………………………………………………………………………………………………….

2 For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy %>%
  filter(Country == "United States") %>%
  ggplot(aes(x = Year, y = GDP)) +
  geom_line() +
  labs(title = "United States GDP", y = "$US") +
  theme_minimal()

It seems like it doesn’t need a transformation.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>% filter(Animal == 'Bulls, bullocks and steers') %>%
  filter(State == 'Victoria') %>%
  autoplot(Count) +
  labs(title = "Slaughter of Victorian Bulls, bullocks and steers", y = 'Count')

aus_livestock %>% 
  filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
  mutate(Quarter = yearquarter(Month)) %>%
  index_by(Quarter) %>%
  summarise(Count = sum(Count)) %>%
  autoplot(Count)

I transformed the data to quaterly so it can provide us with a more readble result.

Victorian Electricity Demand from vic_elec.

vic_elec %>% autoplot(Demand) + 
  labs(title = 'Victory Electricity Demands', y = 'MWh')

This one clearly needs a transformation since there is too much noise.

vic_elec %>%
  index_by(Year = year(Date)) %>%   # Convert the Date to yearly format
  summarise(Demand = sum(Demand)) %>%  # Sum the demand for each year
  autoplot(Demand) +  # Plot using autoplot
  labs(title = "Victorian Electricity Demand (Yearly)", 
       x = "Year", 
       y = "Demand (MW)") +
  theme_minimal()

With this transformation seasonality became less apparent and long-term trends are emphasized.

Gas production from aus_production.

aus_production %>% autoplot(Gas) +
  labs(title = 'Gas Production')

For this one I’ll use the Box-Cox transformation and use the guerrero feature (as seen in the book) with a lamda of 0.11

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = paste("Transformed gas production with lambda = ", round(lambda,2)))

This transformation made the seasonality more apparent.

………………………………………………………………………………………………………….

3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume) +
  labs(title= "Monthly Gas Production in Canada")

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "",
       title =(paste0(
         "Transformed gas production with lambda = ",
         round(lambda,2))))

For the canadian_gas data, the seasonal patterns remain fairly consistent throughout the series, except between 1978 and 1990 when these patterns become more intense. When the transformation is applied, it reduces the impact of these larger seasonal shifts, making them less visible.However, since the transformation isn’t designed to highlight or detect seasonality, it can hide these important patterns in the data.

……………………………………………………………………………………………………………

4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(2001)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 
myseries %>% autoplot(Turnover) +
  labs(title = "Retail Data ",
       y = "$AUD (Millions)")

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "",
       title =(paste0(
         "Transformed food services turnover with lambda = ",
         round(lambda,2))))

I’ll use a Box-Cox transformation with lambda 0.19 as it smooths the variation in the data. The Box-Cox transformation was used because it handles exponential growth with natural logarithms. Guerrero’s method found a good value for lamda, making forecasting easier.

…………………………………………………………………………………………………………

5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production

aus_production %>% autoplot(Tobacco) +
  labs(title = "*Original* Tobacco Production")

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambda <- aus_production %>%
  features(Tobacco, features = guerrero)%>%
  pull(lambda_guerrero)

aus_production %>% autoplot(box_cox(Tobacco,lambda)) +
  labs(title = paste("*Transformed* Tobacco Production with lamda =", round(lambda, 2)))

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

For the Tobacco data we see that Box-Cox transformation we have a lamda value of 0.93 which signifies that there was barely a transformation in data.

Economy class passengers between Melbourne and Sydney from ansett

economy_pass <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")

autoplot(economy_pass, Passengers)+
  labs(title = "*Original* Economy class Passengers Between Melbourne and Sydney")

lambda <- economy_pass %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

economy_pass %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(title = paste("**Transformed** Economy Class Passengers Count with lamda =", round(lambda, 2)))

The lamda here is equal to 2 which squares the data and can emphasize larger values and smooth out smaller differences.

Pedestrian counts at Southern Cross Station from pedestrian

pedestrian %>% filter(Sensor =='Southern Cross Station') %>% autoplot(Count)+
  labs(title = "*Original* Pedestrian Count Hourly")

The data has way too much noise and DEFINITELY needs some transformation. I’ll change it from hourly to weekly and move forward from there.

pedestrian_week <- pedestrian %>%
  mutate(Week = yearweek(Date)) %>%
  index_by(Week) %>%
  summarise(Count = sum(Count)) 
 pedestrian_week %>% autoplot(Count)+
  labs(title = "*Transformed 1* Pedestrian Count Weekly")

lambda <- pedestrian_week %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

pedestrian_week %>% autoplot(box_cox(Count,lambda)) +
  labs(title = paste("*Transformed 2* Weekly Pedestrian Count with lamda =", round(lambda, 2)))

Once again here lamda=2.

………………………………………………………………………………………………………….

7 Consider the last five years of the Gas data from aus_production. gas <- tail(aus_production, 5*4) |> select(Gas)

a Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

b Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

c Do the results support the graphical interpretation from part a?

d Compute and plot the seasonally adjusted data.

e Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data.What is the effect of the outlier?

f Does it make any difference if the outlier is near the end rather than in the middle of the time series?

a

gas <- tail(aus_production, 5*4) |> select(Gas)
gas %>% autoplot(Gas) +
  labs(title = "Australia Gas Production")

There is a clear increasing trend in the data, along with seasonality. The data typically peaks around mid-year, near the end of Q2 and reaches its lowest points in the first quarter (Q1).

b

gas %>% model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  autoplot()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

c

The data clearly shows the upward trend previously described and the seasonality is even more clear now, with peaks in Q2 and noticable lows in Q1.

d

gas %>% model(classical_decomposition(Gas, type = "multiplicative"))  %>%    
  components(gas_season) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "darkgray") +
  geom_line(aes(y=season_adjust), colour = "darkred") +
  labs(title = "Seasonally Adjusted Gas Production")

e

gas_withoutlier <- gas
gas_withoutlier$Gas[10] <- gas_withoutlier$Gas[10] + 300  # Add 300 to the 10th observation

gas_withoutlier %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "darkgray") +
  geom_line(aes(y=season_adjust), colour = "darkblue") +
  labs(title = "Seasonally Adjusted Data with an Outlier")

The outlier in the data caused the decomposition to be distorted and the seasonally adjusted data to show an unnatural spike.

f

I’ll now put the outlier in the end to see if that will make any difference.

gas_withoutlier_end <- gas
gas_withoutlier_end$Gas[nrow(gas)] <- gas_withoutlier_end$Gas[nrow(gas)] + 300

gas_withoutlier_end %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "darkgray") +
  geom_line(aes(y=season_adjust), colour = "darkgreen") +
  labs(title = "Seasonally Adjusted Data with an Outlier")

It didn’t make a significant difference by putting it in the end. The only noticeable effect (when comparing the two plots) is that the outlier near the end has less impact on the earlier data points.

……………………………………………………………………………………………………………

8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

myseries %>% #### myseries was created for excercise 4
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components() %>% 
  autoplot()+
   labs(title = "X-11 Decomposition of Retail Time Series")

#### The trend is generally upward, with some flattening periods in the middle (around 1990 to 2005) and a notable rise in recent years. There are some irregularities in the early part of the series (1980s) and around 2020, where larger spikes are observed. The unusual features in 2020 could be attributed to the COVID-19 pandemic, which impacted retail activity in many regions.

…………………………………………………………………………………………………………….

9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The provided chart displays an STL decomposition of the number of persons in the civilian labor force in Australia from February 1978 to August 1995. The top panel shows the overall trend component, which exhibits a steady increase over the period, suggesting a growth in the labor force. The second panel details the seasonality, indicating regular fluctuations that repeat annually. The third panel, the remainder component, shows the irregularities not accounted for by the trend or seasonality, showing some variability but not a distinct pattern.

b Is the recession of 1991/1992 visible in the estimated components?

Regarding the recession of 1991/1992, it appears in the remainder component with some negative spikes, but is not clearly visible in the trend component, which continues to increase through those years. The seasonal component also doesn’t show a clear change, suggesting that seasonal labor patterns stayed consistent despite the economic downturn.

Homework 2 Data 624

Nikoleta Emanouilidi

2024-09-10

1) Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Monaco is the country with the highest GDP per capita over time.Overtime, it has increased and seems to be greater than the other countries mostly. The GDP per capita has an increasing trend for majority of the countries.

………………………………………………………………………………………………………….

2 For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

It seems like it doesn’t need a transformation.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

I transformed the data to quaterly so it can provide us with a more readble result.

Victorian Electricity Demand from vic_elec.

This one clearly needs a transformation since there is too much noise.

With this transformation seasonality became less apparent and long-term trends are emphasized.

Gas production from aus_production.

For this one I’ll use the Box-Cox transformation and use the guerrero feature (as seen in the book) with a lamda of 0.11

This transformation made the seasonality more apparent.

………………………………………………………………………………………………………….

3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?

……………………………………………………………………………………………………………

4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

I’ll use a Box-Cox transformation with lambda 0.19 as it smooths the variation in the data. The Box-Cox transformation was used because it handles exponential growth with natural logarithms. Guerrero’s method found a good value for lamda, making forecasting easier.

…………………………………………………………………………………………………………

5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production

For the Tobacco data we see that Box-Cox transformation we have a lamda value of 0.93 which signifies that there was barely a transformation in data.

Economy class passengers between Melbourne and Sydney from ansett

The lamda here is equal to 2 which squares the data and can emphasize larger values and smooth out smaller differences.

Pedestrian counts at Southern Cross Station from pedestrian

The data has way too much noise and DEFINITELY needs some transformation. I’ll change it from hourly to weekly and move forward from there.

Once again here lamda=2.

………………………………………………………………………………………………………….

7 Consider the last five years of the Gas data from aus_production. gas <- tail(aus_production, 5*4) |> select(Gas)

a Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

b Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

c Do the results support the graphical interpretation from part a?

d Compute and plot the seasonally adjusted data.

e Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data.What is the effect of the outlier?

f Does it make any difference if the outlier is near the end rather than in the middle of the time series?

a

There is a clear increasing trend in the data, along with seasonality. The data typically peaks around mid-year, near the end of Q2 and reaches its lowest points in the first quarter (Q1).

b

c

The data clearly shows the upward trend previously described and the seasonality is even more clear now, with peaks in Q2 and noticable lows in Q1.

d

e

The outlier in the data caused the decomposition to be distorted and the seasonally adjusted data to show an unnatural spike.

f

I’ll now put the outlier in the end to see if that will make any difference.

It didn’t make a significant difference by putting it in the end. The only noticeable effect (when comparing the two plots) is that the outlier near the end has less impact on the earlier data points.

……………………………………………………………………………………………………………

8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

…………………………………………………………………………………………………………….

9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

b Is the recession of 1991/1992 visible in the estimated components?