624_HW2

Exercises FPP3, section 3.7

Q_3.1:

global_economy|>
  as_tsibble(index=Year)

## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # ℹ 15,140 more rows

gdp_percapita <- global_economy |>
  mutate(GDP_per_capita = GDP / Population) |>
  as_tsibble(key = Country, index = Year)

gdp_percapita

## # A tsibble: 15,150 x 10 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # ℹ 15,140 more rows
## # ℹ 1 more variable: GDP_per_capita <dbl>

avg_gdp <- gdp_percapita |>
  index_by(Year) |>
  summarise(avg_gdp = mean(GDP_per_capita, na.rm = TRUE))


avg_gdp|>
  autoplot(avg_gdp) +
  geom_smooth(method = "loess", span = 0.4) +
  labs(
    title = ("Avg GDP per Capita Over Time"),
    y = ("GDP per capita"),
    x = ("Time")
  )

## `geom_smooth()` using formula = 'y ~ x'

#In this graph we can see that GDP per capita took a global change in course around the 2008 financial crisis and has since flattened out and even shown decline over covid. I chose a fairly balanced span for this graph that still showed this trend. 


gdp_percapita |>
  filter(Country %in% c("United States", "China", "Japan", "Germany", "United Kingdom", "Canada","India","France","Russia","Italy")) |>
  autoplot(GDP_per_capita) +
  labs(
    title = "GDP per Capita Over Time",
    y = "GDP per capita",
    x = "Time"
  )

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

#In this graph we can see the US has the highest GDP per capita by a decent amount and is one of the few countries that have shown growth even in the last decade. For such a high GDP country, it is interesting to see that China is so low in this graph however their growth has been shown very consistent even with an exponentially larger population over the last 15 years.

Q_3.2:

#global_economy
#Here we again look at the GDP trend
Top_gdp <- global_economy |>
  filter(Country %in% c("United States", "China", "Japan", "Germany", "United Kingdom", "Canada"))
  
autoplot(Top_gdp,GDP) +
  labs(title = "GDP of Selected Countries")

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

#In this plot we can see with a log transformation how the data is smoothened and the trend becomes clearer for all examples.
autoplot(Top_gdp, log(GDP)) +
  labs(title = "Log US GDP")

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

#aus_livestock
#In this plot we can see the count of animals slaughtered and with a very noisy graph I did a log() transformation as well. Here we can see that the variance has decreased which gives a bit of a better picture but still a very noisy graph in general.
bulls_etc <- aus_livestock |>
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria")

autoplot(bulls_etc, Count) +
  labs(title = "Victoria Slaughter")

autoplot(bulls_etc, log(Count)) +
  labs(title = "Log Victorian Slaughter")

#vic_electric
#Here with the vic_electric demand data we can see there is also a lot of noise but it seems the log transformation has once again helped to cut the noise down. The graph has a lot of seasonal variation as expected even after this is applied which can be betetr seen with the log transformation

autoplot(vic_elec, Demand) +
  labs(title = "Electricity Demand")

autoplot(vic_elec, log(Demand)) +
  labs(title = "Log Electricity Demand")

#aus_production
#Here we can see a graph that doesn't appear to have as much variation but the way this original production gas graph looks like very linear growth, the log transform here can help us to see the flattening pattern of growth as production has grown variation has decreased. This trend now becomes more linear in view
aus_gas <- aus_production |>
  select(Quarter, Gas)

autoplot(aus_gas, Gas) +
  labs(title = "Gas Production")

autoplot(aus_gas, log(Gas)) +
  labs(title = "Log Gas Production")

Q_3.3:

lambda <- canadian_gas |>
  features(Volume, features = guerrero)

lambda

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.577

canadian_gas_bc <- canadian_gas |>
  mutate(Gas_bc = box_cox(Volume, lambda))

autoplot(canadian_gas_bc, Gas_bc) +
  labs(title = "Box-Cox Transformed Gas Production")

can_gas <- canadian_gas |>
  select(Month, Volume)

autoplot(can_gas,Volume) +
  labs(title = "Canadian Gas Non Box")

#In this case we used the geurrero feature to choose a value for the box-cox transformation but from the two graphs we can see there wasn't a very substantial change in the variance we observed. If variation increased with the level of gas volume here, the transformation would be more useful but since that is not the case the transformation fails to help us stabilize variability

Q_3.4:

set.seed(111)

my_retail <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

lambda_retail <- my_retail |>
  features(Turnover, features = guerrero)

lambda_retail

## # A tibble: 1 × 3
##   State    Industry                                      lambda_guerrero
##   <chr>    <chr>                                                   <dbl>
## 1 Victoria Cafes, restaurants and takeaway food services           0.176

lambda_retail <- lambda_retail$lambda_guerrero

my_retail |>
  autoplot(Turnover) +
  labs(title = "Retail turnover (original)")

my_retail |>
  mutate(Turnover_bc = box_cox(Turnover, lambda_retail)) |>
  autoplot(Turnover_bc) +
  labs(title = paste0("Retail turnover (Box-Cox, lambda=", round(lambda_retail, 3), ")"))

#Based on the lambda value here of .176 we would reccomend a log transformation since lambda is close to 0. This suggests that as retail turnover grows over time so does the seasonal variation and this low lambda value tells us a log transformation will make the seasonal pattern we observe more consistent.

Q_3.5:

#The high value here very close to 1 suggests very constant variation and no transformation is necessary to show the seasonal trend as there is minimal noise and variation seemingly consistent around 400.
tobacco <- aus_production |>
  select(Quarter, Tobacco)

lambda_tobacco<-tobacco |>
  features(Tobacco, guerrero)

lambda_tobacco

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.926

autoplot(tobacco, Tobacco)

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

#Here we encountered some problems due to a massive change in variance of the data, lambda was returning a value of 2. This is likely due to the heteroscedasticity of the data but when we set lambda to 0 and use thus a log transformation we see that it does effectively decrease variance as hoped. This is a great example of where the guerrero function may fail but with some intuition can help us understand the effects of heteroscedacity in time series.

economy_melsyd <- ansett |>
  filter(Airports == "MEL-SYD", Class == "Economy") |>
  as_tibble() |>
  as_tsibble(index = Week)

economy_melsyd |>
  features(Passengers, guerrero)

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1            2.00

lambda <- economy_melsyd |>
  features(Passengers, guerrero) |>
  pull(lambda_guerrero)

economy_melsyd |>
  mutate(
    bc = box_cox(Passengers, lambda),
    logp = log(Passengers)
  ) |>
  autoplot(bc) +
  labs(title = paste0("Box-Cox transformed passengers, lambda=", round(lambda, 3)))

autoplot(economy_melsyd, Passengers)

lambda_2 = 0
economy_melsyd |>
  mutate(
    bc = box_cox(Passengers, lambda_2),
    logp = log(Passengers)
  ) |>
  autoplot(bc) +
  labs(title = paste0("Box-Cox transformed passengers, lambda=", lambda_2))

#In this example I chose to aggregate the pedestrian data by date and found that there was very apparent seasonlity. At first I attempted to model this using the Date_time but there was clearly huge heterscedascity within the daily data. After this change the seasonality for each day becomes a lot clearer as emphasized with the gg_season() plot at the end. After this we were able to stabilize variance and get a lambda of 0.27 that transformed the data to show the expected weekly seasonal pattern of pedestrian counts.
sc_station <- pedestrian |>
  filter(Sensor == "Southern Cross Station") |>
  index_by(Date) |>
  summarise(Count = sum(Count))

sc_station |>
  features(Count, guerrero)

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.273

lambda <- sc_station |> features(Count, guerrero) |> pull(lambda_guerrero)

sc_station |>
  mutate(Count_bc = box_cox(Count, lambda)) |>
  autoplot(Count_bc)

sc_station |>
  gg_season(Count)

## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Q_3.7:

#a.
#The seasonal fluctuations here are extremely strong repeating quarterly pattern along an increasing trend.
gas <- tail(aus_production, 5*4) |> 
  select(Gas)

gas |> 
  autoplot(Gas) +
  labs(title = "Gas Production (Last 5 Years)")

#b.
#Here from our classical decomposition we can see that there is a very smooth trend line as well as a small remainder from .98 to 1.02 which tells us that this data exhibits a multiplicative structure. Our seasonal pattern is so consistent in the seasonal graph as well, we can interpret this as being incredibly strong seasonality with positve multiplicative structure.
gas_decomp <- gas |>
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_decomp) |> 
  autoplot()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

#c.For the Reasons above I would say the decomposition strongly supports the graphical interpretation made from the graphs in part a to an even further extent than I previously emphasized.

#d.
#Our seasonally adjusted plot shows the growth going into 2008 with a noticeable slowdown and recovery going into 2010 that actually has a sharp decline again right after. This becomes clearly visible when seasonally adjusted
gas_sa <- components(gas_decomp)

gas_sa |> 
  autoplot(season_adjust) +
  labs(title="Seasonally Adjusted Gas Production")

#e.
#An outlier this strong has a big effect since this decomposition is using moving averages. This causes a substantial effect on the seasonal estimates, remainder, and trend line before and after the outlier. This severely changes how we would now interpret our seasonally adjusted data which is a strong argument for the removal of outliers when trying to draw trend data, especially when looking at an oulier that holds in the middle of your data.
gas_outlier <- gas
gas_outlier$Gas[10] <- gas_outlier$Gas[10] + 300

gas_outlier_decomp <- gas_outlier |>
  model(classical_decomposition(Gas, type="multiplicative"))

components(gas_outlier_decomp) |> autoplot()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

components(gas_outlier_decomp) |>
  autoplot(season_adjust) +
  labs(title="Seasonally Adjusted With Outlier")

#f.
#Here we move the outlier to the end and we can observe that this has a strong affect on all the data that comes before.
gas_end <- gas
gas_end$Gas[20] <- gas_end$Gas[20] + 300

gas_end_decomp <- 
  gas_end |>
  model(classical_decomposition(Gas, type="multiplicative"))

components(gas_end_decomp) |> 
  autoplot(season_adjust) +
  labs(title = "Gas End Outlier DATA")

Q_3.8:

retail_x11 <- my_retail |>
  model(
    x11 = X_13ARIMA_SEATS(box_cox(Turnover, lambda_retail) ~ x11())
  )

components(retail_x11) |>
  autoplot()

components(retail_x11) |>
  autoplot(irregular)

#Here we see that X-11 decomposition separates the retail turnover series into trend, seasonal and irregular components. There are some noticeable large spikes in the irregular graph around 2001-2002. This is an outlier that may not have been previously emphasized so easily with just the trend and seasonal component.

Q_3.9:

#a. The SQL decomposition shows very clear seasonality in the data with a consistent upward trend. We do see a bit of a substantial deviation around 1992 implying some sort of negative economic event but the seasonal component is still consistent. This economic event is most clearly show in the remainder graph around those early 1990s approximately 1991-1992. With the exception of that period, the seasonality is very well shown with a spike right before january of every year and even a small remainder throughout.
#b. I am interpretting the estimated component as the remainder which is exactly where the recession of 1991/1992 is clearly visible. Here we get an obvious picture that shows up hardly in our trend and value graphs with almost no visiblity in the seasonal portion of the decomposition.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

624_HW2

2026-02-15