Data 624 –Predictive Analytics-Homework -2

3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

 knitr::opts_chunk$set(warning = FALSE, message = FALSE)   
  # Import fpp3 libraries
  library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

  global_economy %>%
  autoplot(GDP / Population, show.legend =  FALSE) +
  labs(title= "GDP per capita for Each Country", y = "GDP Per Capita")

The plot above shows all the GDP per capita data from global_economy. In general the trend is increasing.

  # Find the country with the highest GDP per capita
  global_economy %>%
    mutate(GDP_per_capita = GDP / Population) %>%  # Calculate GDP per capita
    filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%  # Find the highest GDP per capita
    select(Country, Year, GDP_per_capita)  # Select relevant columns (Country, Year, GDP per capita)

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country  Year GDP_per_capita
##   <fct>   <dbl>          <dbl>
## 1 Monaco   2014        185153.

Monaco has the highest GDP per capita with 185153 in 2014.

  # Filter the data for Monaco
  global_economy %>%
    filter(Country == "Monaco") %>%
    autoplot(GDP/Population)  +
    geom_line(color = "green") +
    labs(title= "GDP per capita for Monaco", y = "GDP_per_capita")

Above graph shows visual representation of Monaco’s economic performance relative to its population.Every year Monaco increase in GDP per capita, the line is trending upward direction.There is a small fluctuation in the graph.

3.2. For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy %>%
  filter(Country == "United States") %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  ggplot(aes(x = Year, y = GDP_per_capita)) +
  geom_line(color = "blue") +
  labs(
    title = "GDP Per Capita for United States",
    x = "Year",
    y = "GDP Per Capita"
  )

This graph shows a representation of United States relative to its population over time. Its and increasing in GDP per capita, the line is trending upward. There is a small fluctuation, slight rise and fall in 2007 - 2009.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

  aus_livestock %>%
    filter(Animal == "Bulls, bullocks and steers",
           State == "Victoria") %>%
    autoplot(Count) +
    labs(title= "Slaughter of Victoria Bulls, Bullocks, and Steers")

The livestock data represents excessive fluctuations and overall decreasing trend.

Victorian Electricity Demand from vic_elec.

  # Convert to tibble from tsibble 
  vc_elec_tib <- as_tibble(vic_elec) %>% 
    select(-c(Time))
  
  # Convert back to tsibble grouping and summing Total Daily Demand 
  vc_elec_tsib <- vc_elec_tib %>%
    group_by(Date) %>%
    summarise(DailyTotal = sum(Demand)) %>%
    mutate(Date = as_date(Date)) %>%
    as_tsibble(index = Date, key = DailyTotal)
  
  # Plot new tsibble, Daily Total Demand by Date
  vc_elec_tsib %>%
    ggplot(aes(x = Date, y = DailyTotal)) +
    geom_line() +
    labs(title = "Electricity Demand in Victoria",
         subtitle = "Victoria, Australia",
         y = "Daily Total (in MW / Megawatts)")

This plot gives a better view of the seasonal changes in electricity demand. This method aggregates the half-hourly data into daily averages, providing a clearer view of the overall trend and reducing short-term changes.

Gas production from aus_production.

aus_production %>%
    autoplot(Gas) +
    labs(title = "Quarterly production of Gas Production in Australia")

  lambda <- aus_production |>
    features(Gas, features = guerrero) |>
    pull(lambda_guerrero)
  aus_production |>
    autoplot(box_cox(Gas, lambda)) +
    labs(y = "",
         title = paste("Transformed Gas Production with Lambda = ", round(lambda,2)))

Transformed Australian quarterly gas production with the lambda parameter chosen using the Guerrero method. (This code was provided in the Forcasting: Principles and Practice text (3rd ed), Chapter 3)

3.3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?

  head(canadian_gas)

## # A tsibble: 6 x 2 [1M]
##      Month Volume
##      <mth>  <dbl>
## 1 1960 Jan   1.43
## 2 1960 Feb   1.31
## 3 1960 Mar   1.40
## 4 1960 Apr   1.17
## 5 1960 May   1.12
## 6 1960 Jun   1.01

head(canadian_gas)

## # A tsibble: 6 x 2 [1M]
##      Month Volume
##      <mth>  <dbl>
## 1 1960 Jan   1.43
## 2 1960 Feb   1.31
## 3 1960 Mar   1.40
## 4 1960 Apr   1.17
## 5 1960 May   1.12
## 6 1960 Jun   1.01

  canadian_gas %>%
    autoplot(Volume) +
    labs(title = "Monthly Canadian Gas Production")

  lambda <- canadian_gas %>%
    features(Volume, features = guerrero) %>%
    pull(lambda_guerrero)
  
  canadian_gas %>%
    autoplot(box_cox(Volume, lambda)) +
    labs(y = "",
         title = paste("Transformed Canadian Gas volume with lambda = ", round(lambda,2)))

Box-Cox transformation is unhelpful because it does not make the seasonal variation uniform. it suggests a non-monotonic relationship between the variance and the mean, which is not the Box-Cox transformation can handle effectively.

3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

  # View the first 6 rows 
  head(aus_production)

## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7

  set.seed(12345)
  myseries <- aus_retail |>
    filter(`Series ID` == sample(aus_retail$`Series ID`,1))
  autoplot(myseries, Turnover) + 
    labs(title = "Retail Data Turnover")

  lambda <- myseries %>%
    features(Turnover, features = guerrero) %>%
    pull(lambda_guerrero)
  
  myseries %>% autoplot(box_cox(Turnover, lambda))+
    labs(title = paste("Transformed Retail Turnover with lambda round(lambda, 2)"))

I can see in a plot more uniform seasonal variation. Box-Cox transformation used since it uses a natural logarithm for exponential growth found a good value of lambda to help making the forecasting simpler.

3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

  # Tobacco
  # First time I have installed latex2exp package 
 # install.packages("latex2exp")
  library(latex2exp)
  
  lambda <- aus_production %>%
    features(Tobacco, features = guerrero) %>%
    pull(lambda_guerrero)
  
  aus_production %>%
    autoplot(box_cox(Tobacco, lambda)) +
    labs(y = "",
         title = latex2exp::TeX(paste0(
           "Transformed tobacco production with $\\lambda$ = ",
           round(lambda,2))))

From the above plot, we see that Box-Cox transformation have a lambda value of 0.93 which signifies that there was barely a transformation in data set.

ansett

melb_syd <- ansett %>%
    filter(Class == "Economy",
           Airports == "MEL-SYD")
  
  autoplot(melb_syd, Passengers)+
    labs(title = "Economy class Between Melbourne and Sydney")

  lambda <- melb_syd %>%
    features(Passengers, features = guerrero) %>%
    pull(lambda_guerrero)
  
  melb_syd %>%
    autoplot(box_cox(Passengers, lambda)) +
    labs(y = "", title = TeX(paste0("Transformed Number of Passengers with $\\lambda$ = ",
    round(lambda,2))))

For Economy class passengers between Melbourne and Sydney from ansett dataset, we see a Box-Cox transformation with a lambda value of 2 indicating that the data is Squared in order to show variations better.

pedestrian

   # Pedestrian counts 
  lambda <- pedestrian %>%
    filter(Sensor == 'Southern Cross Station') %>%
    features(Count, features = guerrero) %>%
    pull(lambda_guerrero)
  
  pedestrian %>%
    filter(Sensor == 'Southern Cross Station') %>%
    autoplot(box_cox(Count, lambda)) +
    labs(y = "",
         title = latex2exp::TeX(paste0(
           "Transformed Number of Passengers with $\\lambda$ = ",
           round(lambda,2))))

  weekly_ped <- pedestrian %>%
    mutate(Week = yearweek(Date)) %>%
    index_by(Week) %>%
    summarise(Count = sum(Count))
  
  weekly_ped %>% autoplot(Count)+
    labs(title = "Weekly Pedestrian Count")

  lambda <- weekly_ped %>%
    features(Count, features = guerrero) %>%
    pull(lambda_guerrero)
  
  weekly_ped %>% autoplot(box_cox(Count,lambda)) +
    labs(title = paste("Transformed Weekly Pedestrian Count with lambda =", round(lambda, 2)))

The Box-Cox Transformation appears to apply a transformation with a lambda value of 2. The significant adjustment was aggregating the data from an hourly pedestrian count to a weekly count, as the overall time frame spans from Jan 2015 to Dec 2016.

3.7 Consider the last five years of the Gas data from aus_production.

  gas <- tail(aus_production, 5*4) %>% select(Gas)
  
  head(gas)

## # A tsibble: 6 x 2 [1Q]
##     Gas Quarter
##   <dbl>   <qtr>
## 1   221 2005 Q3
## 2   180 2005 Q4
## 3   171 2006 Q1
## 4   224 2006 Q2
## 5   233 2006 Q3
## 6   192 2006 Q4

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>%
  autoplot(Gas)

Trend-cycle shows an increase over the past five years. There is an increase after the first quarter, that peaks in the third quarter and then decreases again.

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

  decomp <- gas %>%
    model(classical_decomposition(Gas, type = "multiplicative")) %>%
    components() 
  
  decomp %>%
    autoplot() +
    labs(title = "Classical multiplicative decomposition of gas production")

c. Do the results support the graphical interpretation from part a?

Yes, the trend plot shows an increase from left to right with a almost constant in the middle. The seasonal indices shows an almost perfect seasonal variance over the years (2006-2010) window.

d. Compute and plot the seasonally adjusted data.

  decomp %>%
    ggplot(aes(x = Quarter)) +
    geom_line(aes(y = Gas, colour = "Data")) +
    geom_line(aes(y = season_adjust,
                  colour = "Seasonally Adjusted")) +
    geom_line(aes(y = trend, colour = "Trend")) +
    labs(y = "Gas",
         title = "Seasonally Adjusted Gas Production") +
    scale_colour_manual(
      values = c("gray", "green", "black"),
      breaks = c("Data", "Seasonally Adjusted", "Trend")
    )

The seasonally adjusted data shows hat there is an increasing trend in gas production.

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas %>%
  mutate(Gas = ifelse(Gas == 249, Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "blue") +
  labs(title = "Seasonally Adjusted Gas Production with an Outlier")

In 2008 Q3 became an outlier when 400 was added to it. There is a significant increase there in both the data and the seasonally adjusted data. It should be noted that the increase is smaller in the seasonally adjusted data. The trend also seems to be disrupted.

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas %>%
  mutate(Gas = ifelse(Gas == 236, Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "blue") +
  labs(title = "Seasonally Adjusted Gas Production with an Outlier at the End")

It does not seem to make a different a difference if the outlier is near the end or in the middle as there is still a spike where the outlier is and the trend is not noticeable.

3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

#  install.packages("seasonal")
  
 library(seasonal) 
  myseries <- aus_retail %>%
    filter(`Series ID` == sample(aus_retail$`Series ID`,1))
  
  x11_dcomp <- myseries %>%
    model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
    components()
  
  autoplot(x11_dcomp) +
    labs(title = "Decomposition of Retail Turnover using X-11.")

Yes, the seasonal variance flips over time. The seasonal plot shows spikes of increased turnover whereas later in the plot, the spikes are for the lower turnover values. The plot indicates a few outliers as identified from the “irregular” chart. Overall, the trend plot does not uncover any unusual aspect of the data.

3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Decomposition of the number of persons in the civilian labor force in Australia each month from Feb,1978 to Aug,1995. Seasonal component from the decomposition shown in the previous figure.

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Increasing trend in the number of persons in the civilian labor force in Australia. There is also a seasonality, whose scale is much smaller than the remainder. This may signify that the seasonality is not as important in the labor force data. There is also a decrease in the early 1990s which was due to recession.

b. Is the recession of 1991/1992 visible in the estimated components?

Yes, the remainder component shows very clear data in 1991 and 1992, which actually give the remainder decomposition more impact than the season year decomposition based on the scales. The overall data and the trend, show a slight decrease in these years.

Data 624 –Predictive Analytics-Homework -2

Bikash Bhowmik

16-Feb-2025