Assignment 2 Data 624

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.1

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'tsibble' was built under R version 4.4.1

## Warning: package 'feasts' was built under R version 4.4.1

## Warning: package 'fabletools' was built under R version 4.4.1

## Warning: package 'fable' was built under R version 4.4.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(seasonal)

## Warning: package 'seasonal' was built under R version 4.4.1

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

library(x13binary)

R Markdown

3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

head(global_economy)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

global_economy|> #Captia=GDP/population
  autoplot(GDP/Population, show.legend=FALSE)+ #removed legends as it was a large list
  labs(y="US$", title = "GDP per Capita Overtime")

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

#get country with max captia
global_economy |>
  mutate(GDPCaptia= GDP/Population)|>
  filter(GDPCaptia==max(GDPCaptia, na.rm=TRUE))|>
  select(Country,GDPCaptia)

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country GDPCaptia  Year
##   <fct>       <dbl> <dbl>
## 1 Monaco    185153.  2014

#plot country with max GPD per captia
global_economy |>
  filter(Country == "Monaco") |>
  autoplot(GDP/Population) +
  labs(title= "GDP per capita", y = "$US")

## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).

Monaco was the country with the highest GDP per capita, their max GDP per captia was in 2014. Overtime the trend for Monaco was upwards mostly above the other countries’ GPD captia

Including Plots

3.2 For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.

“If the data shows variation that increases or decreases with the level of the series, then a transformation can be useful”

global_economy|>
  filter(Country=="United States")|>
  autoplot(GDP)+
  labs(y="US$", title = "GDP per Capita Overtime for US")

No transformation needed since there isn’t much variance, for the times series on United States GDP from global_economy, over all the trend is upward.

head(aus_livestock)

## # A tsibble: 6 x 4 [1M]
## # Key:       Animal, State [1]
##      Month Animal                     State                        Count
##      <mth> <fct>                      <fct>                        <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory  2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory  2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory  2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory  1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory  2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory  1800

aus_livestock|> #Captia=GDP/population
  filter(State=="Victoria", Animal=="Bulls, bullocks and steers")|>
  autoplot(Count)+
  labs(y="Count of animals slaughtered", title = "Count of slaughtered Bulls, bullocks and steers in Victoria")

No transformation was needed, overall the trend is downward.

head(vic_elec)

## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
##   Time                Demand Temperature Date       Holiday
##   <dttm>               <dbl>       <dbl> <date>     <lgl>  
## 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
## 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
## 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
## 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
## 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
## 6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE

vic_elec|>
  autoplot(Demand)+
  labs(y= "MWh", title ="Victory Electricity Demands" )

#transformation
vic_elec|>
  index_by(Date) |>
  summarise(Demand = sum(Demand))|>
  autoplot(Demand) + 
  labs(title = 'Daily Victory Electricity Demands', y = 'MWh')

A lot variance and noise in the time series plot for Victorian Electricity Demand from vic_elec, I transformed the time series to daily and it seemed to have les noise than the original time series. There seems to be no trend but there is a seasonality pattern.

head(aus_production)

## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7

aus_production %>% autoplot(Gas) +
  labs(title = 'Gas Production', y = 'petajoules')

lambda <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero) #lambda =.1095171

aus_production |>
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = (paste0(
         "Transformed gas production with lambda$ = ",
         round(lambda,2))))

The time series of Gas production from aus_production was transformed using Box-cox using a lambda of 0.11, the shape of the trend was different more exponential for the the tranformed data while for the regular time series the trend was more of liner upwards pattern.

3.3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?

head(canadian_gas)

## # A tsibble: 6 x 2 [1M]
##      Month Volume
##      <mth>  <dbl>
## 1 1960 Jan   1.43
## 2 1960 Feb   1.31
## 3 1960 Mar   1.40
## 4 1960 Apr   1.17
## 5 1960 May   1.12
## 6 1960 Jun   1.01

canadian_gas|>
  autoplot(Volume) +
  labs(title = "Canadian Gas Data", y ="Volumne in Billions of Cubic Metres")

lambda2 <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero) #lambda =.

canadian_gas |>
  autoplot(box_cox(Volume, lambda2)) +
  labs(y = "",
       title = (paste0(
         "Transformed gas production with lambda$ = ",
         round(lambda2,2))))

The Box-Cox transformation was unhelpful because the transformation didn’t make the seasonal pattern is the same for both plot.

3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries|>
  autoplot() +
  labs(title = "Australian Retail Trade Turnover")

## Plot variable not specified, automatically selected `.vars = Turnover`

lambda3 <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero) #lambda =.

myseries |>
  autoplot(box_cox(Turnover, lambda3)) +
  labs(y = "",
       title = (paste0(
         "AUS Retail Trade Turnover with lambda$ = ",
         round(lambda3,2))))

The lambda for for the Australian retail turnover was 0.08 which is near to 0, therefore the natural logarithms should be used.

3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

lambda4<-aus_production|>
  features(Tobacco, features = guerrero)|>
  pull(lambda_guerrero)

aus_production|>
  autoplot(box_cox(Tobacco, lambda4)) +
  labs(y = "",
       title = (paste0(
         "AUS Retail Trade Turnover with lambda$ = ",
         round(lambda4,2))))

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

The variance is somewhat stable with a lambda of 0.93.

head(ansett)

## # A tsibble: 6 x 4 [1W]
## # Key:       Airports, Class [1]
##       Week Airports Class    Passengers
##     <week> <chr>    <chr>         <dbl>
## 1 1989 W28 ADL-PER  Business        193
## 2 1989 W29 ADL-PER  Business        254
## 3 1989 W30 ADL-PER  Business        185
## 4 1989 W31 ADL-PER  Business        254
## 5 1989 W32 ADL-PER  Business        191
## 6 1989 W33 ADL-PER  Business        136

Economy<-ansett |>
    filter(Class == 'Economy') |>
    filter(Airports == 'MEL-SYD')

lambda5<-Economy|>
  features(Passengers, features = guerrero)|>
  pull(lambda_guerrero)

Economy|>
  autoplot(box_cox(Passengers, lambda5)) +
  labs(y = "",
       title = paste0(
         "Economy passengers MEL-SYD with lambda$ = ",
         round(lambda5,2)))

The variance for the economy passengers was staized with a lambda of 2.

PedesCount<-pedestrian|>
    filter(Sensor == 'Southern Cross Station')

lambda6 <- PedesCount |>
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)

PedesCount|>
  autoplot(box_cox(Count, lambda6)) +
  labs(y = "",
       title = paste0(
         "Pedestrian counts at Southern Cross with $\\lambda$ = ",
         round(lambda6,2)))

The the plot seem hard to readbut the variance is stable at -0.25 for pedestrian counts at Southern cross.

3.7 Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas) Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle? Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas <- tail(aus_production, 5*4) |> select(Gas)
#plot time series
gas|>
  autoplot(Gas)

Based on the time series plot the trend is upwards, with a strong seasonality pattern with an alternating up or down peak at every 3rd quarter.

gas|>
  model(classical_decomposition(Gas, type = "multiplicative"))|>
  components()|>
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Australian gas production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

The decomposition plot proves that there was an upward trend and seasonality.

#decomposition using STL
dcmp_gas <- gas|>
  model(stl = STL(Gas))

#seasonally adjusted data
components(dcmp_gas)|>
  as_tsibble() |>
  autoplot(Gas, color="blue") +
  geom_line(aes(y=season_adjust)) +
  labs(y = "Gas production",
       title = "Australian Gas Production")

In this plot the black line would be the season adjusted plot and the blue line would be the original plot.

Gas2 <- gas
Gas2$Gas[8] <- Gas2$Gas[8] + 300

dcmp_gas<-Gas2|>
  model(stl=STL(Gas))

components(dcmp_gas)|>
  as_tsibble() |>
  autoplot(Gas, color="blue") +
  geom_line(aes(y=season_adjust)) +
  labs(y = "Gas production",
       title = "Australian Gas Production")

Gas3 <- gas
Gas3$Gas[1] <- Gas3$Gas[1] + 300

dcmp_gas<-Gas3|>
  model(stl=STL(Gas))

components(dcmp_gas)|>
  as_tsibble() |>
  autoplot(Gas, color="blue") +
  geom_line(aes(y=season_adjust)) +
  labs(y = "Gas production",
       title = "Australian Gas Production")

Gas4 <- gas
Gas4$Gas[20] <- Gas4$Gas[20] + 300

dcmp_gas<-Gas4|>
  model(stl=STL(Gas))

components(dcmp_gas)|>
  as_tsibble() |>
  autoplot(Gas, color="blue") +
  geom_line(aes(y=season_adjust)) +
  labs(y = "Gas production",
       title = "Australian Gas Production")

Adding 300 as an outlier effected the plot, causing a high peak and changed the pattern of the adjusted season line.

If the outlier is applied in the ends or in the middle, it will effect the plots the same way by changing the adjusted seasonal line.

3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

checkX13binary(fail.unsupported = FALSE, verbose = TRUE)

## x13binary is working properly

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries|>
  autoplot() +
  labs(title = "Australian Retail Trade Turnover")

## Plot variable not specified, automatically selected `.vars = Turnover`

head(myseries)

## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State              Industry                      `Series ID`    Month Turnover
##   <chr>              <chr>                         <chr>          <mth>    <dbl>
## 1 Northern Territory Clothing, footwear and perso… A3349767W   1988 Apr      2.3
## 2 Northern Territory Clothing, footwear and perso… A3349767W   1988 May      2.9
## 3 Northern Territory Clothing, footwear and perso… A3349767W   1988 Jun      2.6
## 4 Northern Territory Clothing, footwear and perso… A3349767W   1988 Jul      2.8
## 5 Northern Territory Clothing, footwear and perso… A3349767W   1988 Aug      2.9
## 6 Northern Territory Clothing, footwear and perso… A3349767W   1988 Sep      3

myseries_dcmp <- myseries|>
  model(X11 = X_13ARIMA_SEATS(Turnover ~ x11()))|>
  components() 

autoplot(myseries_dcmp) + 
labs(title = "Decomposition of Australian clothing and accessory retailing trade turnover using X-11")

3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. Is the recession of 1991/1992 visible in the estimated components?

The decomposition shows a trend that is upward. The season-year window shows a seasonal repetitive pattern, where there seems to be a upward and downward peak taht seems to increase in size overtime. The remainder shows doesn’t show much noise up until after 1987 and there is deep around 1991. The recession of 1991/1992 was visible in the estimated components in figure 3.19 since we can see a slight decrease in the trend line and the decrease in the remainder window.

Assignment 2 Data 624

Andreina A

2025-02-16

R Markdown

Including Plots