HW 4 – Applied Predictive Modeling

library(tidyverse)
library(fpp3)

XProblem 3.1 – Transformations and Adjustments

Transformations and adjustments simplify time series data by removing known sources of variation. This allows patterns to be easier to interpret and forecast.

Calendar Adjustments

Calendar adjustments remove variation caused by differences in the number of days in each period. For example, retail sales may vary by month simply because some months have more trading days. Adjusting the data to represent average sales per day removes this calendar effect and produces a more consistent series.

Population Adjustments

Population adjustments convert totals into per-capita values. This removes the effect of population growth and allows for more meaningful comparisons across time. Example: GDP per capita.

global_economy |>
  filter(Country == "Australia") |>
  autoplot(GDP/Population) +
  labs(title="GDP per capita (Australia)", y="$US")

Using per-capita data allows us to determine whether growth is due to population changes or real economic improvement.

Inflation Adjustments

Inflation adjustments convert monetary values into constant dollars so values from different years can be compared fairly.

Example adjusting retail turnover using CPI:

print_retail <- aus_retail |>
  filter(Industry == "Newspaper and book retailing") |>
  group_by(Industry) |>
  index_by(Year = year(Month)) |>
  summarise(Turnover = sum(Turnover))

aus_economy <- global_economy |>
  filter(Code == "AUS")

print_retail |>
  left_join(aus_economy, by="Year") |>
  mutate(Adjusted_turnover = Turnover/CPI*100) |>
  pivot_longer(c(Turnover,Adjusted_turnover),
               values_to="Turnover") |>
  ggplot(aes(x=Year,y=Turnover)) +
  geom_line() +
  facet_grid(name~., scales="free_y") +
  labs(title="Turnover: Australian Print Media Industry",
       y="$AU")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

Adjusting for inflation reveals the true long-term trend rather than increases caused only by rising prices.

Mathematical Transformations

Mathematical transformations stabilize variation in time series data.

Common transformations include:

• logarithm • square root • power transformations

A widely used transformation is the Box-Cox transformation.

The value of λ (lambda) determines the transformation. λ = 0 corresponds to a log transformation.

Example using Australian gas production:

lambda <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)

lambda

## [1] 0.1095171

aus_production |>
  autoplot(box_cox(Gas, lambda)) +
  labs(title="Box-Cox Transformation of Gas Production")

lambda <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)

lambda

## [1] 0.1095171

aus_production |>
  autoplot(box_cox(Gas, lambda)) +
  labs(title="Box-Cox Transformation of Gas Production")

The Guerrero method selects a lambda value that stabilizes seasonal variation.

Problem 3.2 – Time Series Components

Time series data can be broken down into several components that describe the structure of the series. These components include the trend-cycle, seasonal, and remainder components. An additive decomposition assumes the data can be expressed as: Y(t)=S(t)+T(t)+R(t_ where: Y(t)= observed data S(t)= seasonal compnent T(t)= trend cycle compnent R(t)= remainder (random raise)

Additive decomposition is appropriate when the seasonal fluctuations remain relatively constant over time.

A multiplicative decomposition is written as: Yt=S(t)xT(t)xR(t) This approach is appropriate when seasonal variation increases as the level of the series increases, which is common in economic and financial data.

Instead of using multiplicative decomposition directly, a log transformation can be applied to stabilize the variance. Taking the logarithm converts the multiplicative relationship into an additive one: log(Yt)=log(St)+log(Tt)+log(Rt) This transformation simplifies modeling and analysis.

Example: US Retail Employment

We can illustrate time series decomposition using employment data in the US retail sector.

library(fpp3)

us_retail_employment <- us_employment |>
  filter(year(Month) >= 1990, Title == "Retail Trade") |>
  select(-Series_ID)

autoplot(us_retail_employment, Employed) +
  labs(
    y = "Persons (thousands)",
    title = "Total employment in US retail"
  )

Next, we apply an STL decomposition to separate the series into its components.

dcmp <- us_retail_employment |>
  model(STL(Employed))

components(dcmp) |> autoplot()

The decomposition separates the time series into:

Trend-cycle component – long-term movement in employment

Seasonal component – repeating yearly patterns

Remainder component – random fluctuations not explained by trend or seasonality

Seasonally Adjusted Data

Seasonally adjusted data remove the seasonal component so that the underlying trend can be examined more clearly.

For an additive model: seasonally adjusted+ Yt-St We can visualize this using the STL output.

components(dcmp) |>
  as_tsibble() |>
  autoplot(Employed, colour = "gray") +
  geom_line(aes(y = season_adjust), colour = "blue") +
  labs(
    y = "Persons (thousands)",
    title = "Seasonally Adjusted Retail Employment"
  )

Seasonally adjusted data help analysts focus on economic trends rather than seasonal effects, though the series may still contain random variation.

output: html_document

HW 4 – Applied Predictive Modeling

Izza Khan