Homework 2

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code

{r}# install.packages("seasonal")

library("fpp3")
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library("ggplot2")
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.2     ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("seasonal") 
## 
## Attaching package: 'seasonal'
## 
## The following object is masked from 'package:tibble':
## 
##     view

Question 3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

data("global_economy")
global_economy
## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # ℹ 15,140 more rows

add hastags to lookback thru notes for better understanding

time series graph for the countries gdp per capita

global_economy |>
  autoplot(GDP/Population, show.legend =  FALSE)  +
  labs(title= "GDP per capita for each country overtime", y = "GDP per captia in $US")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

Overtime most countries GDP per captia is increasing Now to find the country with the highest GDP

finding country with highest gpd from the data

global_economy |>
  mutate(GDP_per_captia = GDP/Population)  %>%  #create colum for gpd per capita
  filter(GDP_per_captia == max(GDP_per_captia, na.rm = TRUE)) %>%  # look for highest gpd in the country set
  select(Country,GDP_per_captia)  # look for contry with max gdp
## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country GDP_per_captia  Year
##   <fct>            <dbl> <dbl>
## 1 Monaco         185153.  2014

Monaco is contry with the highest GDP per captia, and it highest was in the year 2014

Monaco plot overtime

global_economy |>
  filter(Country == "Monaco") |> # filter to get country Monaco
  autoplot(GDP/Population) +  #time series function for finding trend, cycle, and seasonalty in data 
  labs(title= "GDP per capita", y = "GDP Per Capita in $US") 
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).

Overall the graph is increasing overtime and does have some fluctuation over time, which we can derive form economic factors such as tourism, population growth, currency risk, industrial development and etc.

Question 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy |>
  filter(Country == "United States") |> # filter to get country US
  autoplot(GDP/Population) +  #time series function for finding trend, cycle, and seasonality in data 
  labs(title= "United State GDP per capita Over Time", y = "GDP Per Capita in $US") 

This graph visualizes the GDP per capita for the United States from 1960 to 2017. The growth of the GDP per capita can be contributed to the constant expansion of technological innovation, government investment, foriegn investment, and population growth. Advances in software developments and AI has helped drive companies to be efficient and productive. Investment from government increase business expansion and global trade help export technology and energy across the world. However, there is dip in constant increase due to the 2008 recession, which had impact of employment and production throughout the economy.

This graph does not need a transformation, as there is not enough variation within the data and the constant trend of increase overtime lacks any variation through the years.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

data("aus_livestock")
aus_livestock
## # A tsibble: 29,364 x 4 [1M]
## # Key:       Animal, State [54]
##       Month Animal                     State                        Count
##       <mth> <fct>                      <fct>                        <dbl>
##  1 1976 Jul Bulls, bullocks and steers Australian Capital Territory  2300
##  2 1976 Aug Bulls, bullocks and steers Australian Capital Territory  2100
##  3 1976 Sep Bulls, bullocks and steers Australian Capital Territory  2100
##  4 1976 Oct Bulls, bullocks and steers Australian Capital Territory  1900
##  5 1976 Nov Bulls, bullocks and steers Australian Capital Territory  2100
##  6 1976 Dec Bulls, bullocks and steers Australian Capital Territory  1800
##  7 1977 Jan Bulls, bullocks and steers Australian Capital Territory  1800
##  8 1977 Feb Bulls, bullocks and steers Australian Capital Territory  1900
##  9 1977 Mar Bulls, bullocks and steers Australian Capital Territory  2700
## 10 1977 Apr Bulls, bullocks and steers Australian Capital Territory  2300
## # ℹ 29,354 more rows
help("aus_livestock")
aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>% # look for bulls and bullocks in animal colum
  autoplot(Count) + #time series function for number of animals killed
  labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers", 
       y = "Animal Count")

This data shows the monthly meat production of Bulls, bullocks and steers in Victoria, Australia from July 1972 to December 2018. The data overall has been decreasing overtime, which fluctuations prevalent throughout. These variations make it difficult to observe the data. Using a transformation will make the data more even throughout each fluctuations become more orderly. We can simplify the variations by using a box cox transformation.

need new dataset

aus_livestock  = aus_livestock %>% 
  filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>% # look for bulls and bullocks in animal colum 
mutate(Count = as.numeric(Count))    # colum for  meat proudction as numeric value 
lambda = aus_livestock %>% 
  features(Count, features = guerrero)  %>% 
  pull(lambda_guerrero)
aus_livestock  %>% 
  autoplot(box_cox(Count,lambda)) + labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers After Box Cox Transformation", 
       y = "Animal Count")

This lambda has negative value, meaning the there are extreme values whitin in the data points which have extreme volatility

we can use the moving avergae to transform and make it easier to understand

aus_livestock<- aus_livestock |>
  mutate(
    `5-MA` = slider::slide_dbl(Count, mean,
                .before = 2, .after = 2, .complete = TRUE)
  )
aus_livestock |>
  autoplot(Count) +
  geom_line(aes(y = `5-MA`), colour = "red") +
 labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers After Box Cox Transformation", 
       y = "Animal Count")
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

Victorian Electricity Demand from vic_elec.

data("vic_elec")
vic_elec
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
##    Time                Demand Temperature Date       Holiday
##    <dttm>               <dbl>       <dbl> <date>     <lgl>  
##  1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
##  2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
##  3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
##  4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
##  5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
##  6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE   
##  7 2012-01-01 03:00:00  3694.        20.1 2012-01-01 TRUE   
##  8 2012-01-01 03:30:00  3562.        19.6 2012-01-01 TRUE   
##  9 2012-01-01 04:00:00  3433.        19.1 2012-01-01 TRUE   
## 10 2012-01-01 04:30:00  3359.        19.0 2012-01-01 TRUE   
## # ℹ 52,598 more rows
help("vic_elec")
vic_elec %>% 
  autoplot(Demand) + labs(title =  "Half-hourly electricity demand for Victoria, Australia" , y = "Demand in MWh")

This data visualizes the hourly demand for electricity in Victoria Australia in Mwh. This data is difficult to observe due to many fluctuations through the data which is influenced from the seasonal component and white noise throughout the data. Because the demand is measured hourly, we can focus our observation on monthly or weekly scale to interpret the data. We can transform this data using adjustment, we will adjust the hourly demand into a weekly demand to observe the data in clear visual

vic_elec = vic_elec %>% 
  index_by(Date = yearweek(Time)) %>%  # turns time into week
  summarise(Avg_Elec_Demand = mean(Demand)) # finds the avergae of the demand
vic_elec %>% 
  autoplot(Avg_Elec_Demand) + labs(title =  "Weekly electricity demand for Victoria, Australia" , y = "Demand in MWh", x = "Weeks")

Gas production from aus_production.

data("aus_production")
aus_production
## # A tsibble: 218 x 7 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
##  1 1956 Q1   284    5225    189    465        3923     5
##  2 1956 Q2   213    5178    204    532        4436     6
##  3 1956 Q3   227    5297    208    561        4806     7
##  4 1956 Q4   308    5681    197    570        4418     6
##  5 1957 Q1   262    5577    187    529        4339     5
##  6 1957 Q2   228    5651    214    604        4811     7
##  7 1957 Q3   236    5317    227    603        5259     7
##  8 1957 Q4   320    6152    222    582        4735     6
##  9 1958 Q1   272    5758    199    554        4608     5
## 10 1958 Q2   233    5641    229    620        5196     7
## # ℹ 208 more rows
help("aus_production")
aus_production %>% 
  autoplot(Gas)  %>% 
  labs(title =  "Quarterly production of selected commodities in Australia", y = "Gas in Petajoules")
## [[1]]

## 
## $y
## [1] "Gas in Petajoules"
## 
## $title
## [1] "Quarterly production of selected commodities in Australia"
## 
## attr(,"class")
## [1] "labels"

This data visualizes the quarterly gas production in petajoules in Australia. Overall, there is an increase in data throughout the graph, with fluctuation throughout that can be transformed. This transformation will allows to change the variations throughout the data, and make it even throughout. We can use the Guerrero feature to choose the lambda that will fit our box cox function.

lambda1 <- aus_production |>
  features(Gas, features = guerrero) |> # fuction to find lambda
  pull(lambda_guerrero)
aus_production |>
  autoplot(box_cox(Gas, lambda1)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda1,2))))

Question 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

data("canadian_gas")
canadian_gas
## # A tsibble: 542 x 2 [1M]
##       Month Volume
##       <mth>  <dbl>
##  1 1960 Jan  1.43 
##  2 1960 Feb  1.31 
##  3 1960 Mar  1.40 
##  4 1960 Apr  1.17 
##  5 1960 May  1.12 
##  6 1960 Jun  1.01 
##  7 1960 Jul  0.966
##  8 1960 Aug  0.977
##  9 1960 Sep  1.03 
## 10 1960 Oct  1.25 
## # ℹ 532 more rows
help("canadian_gas")
canadian_gas%>% 
  autoplot(Volume)  %>% 
  labs(title =  "Quarterly production of selected commodities in Australia")
## [[1]]

## 
## $title
## [1] "Quarterly production of selected commodities in Australia"
## 
## attr(,"class")
## [1] "labels"
lambda2 <-  canadian_gas |>
  features(Volume, features = guerrero) |> # fuction to find lambda
  pull(lambda_guerrero)
canadian_gas |>
  autoplot(box_cox(Volume, lambda2)) +  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda2,2))))

The Box cox transformation is not helpful in this data, because the variance is stable enough to be observed and trend looks the same after the transformation. Because of the strong seasonality and trend component. The main issue of variance is not address in this data set using the box cox transformation. In order to address this issue a seasonality decomposition would be allows us to observe the seasonality characteristics throughout the gas production.

Question 3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries  %>% 
  autoplot(Turnover) %>% 
  labs(title = "Retail Data Turnover")
## [[1]]

## 
## $title
## [1] "Retail Data Turnover"
## 
## attr(,"class")
## [1] "labels"
lambda3 <- myseries |>
  features(Turnover, features = guerrero) |> # fuction to find lambda
  pull(lambda_guerrero)
myseries |>
  autoplot(box_cox(Turnover, lambda3)) +  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Retail Data Transformation with $\\lambda$ = ",
         round(lambda3,2))))

After using the Guerrero feature, we can create a box transformation transformation that will calculate a lambda of 0.8 for our graph.

Question 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production

help("aus_production")
aus_production %>% 
  autoplot(Tobacco)  %>% 
  labs(title =  "Quarterly production of Tobacco in Australia") 
## [[1]]
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

## 
## $title
## [1] "Quarterly production of Tobacco in Australia"
## 
## attr(,"class")
## [1] "labels"
lambda4 <- aus_production |>
  features(Tobacco, features = guerrero) |> # fuction to find lambda
  pull(lambda_guerrero)
aus_production %>% 
  autoplot(box_cox(Tobacco, lambda4)) +  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Tabocco production with $\\lambda$ = ",
         round(lambda4,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Melbourne and Sydney from ansett

data("ansett")
ansett
## # A tsibble: 7,407 x 4 [1W]
## # Key:       Airports, Class [30]
##        Week Airports Class    Passengers
##      <week> <chr>    <chr>         <dbl>
##  1 1989 W28 ADL-PER  Business        193
##  2 1989 W29 ADL-PER  Business        254
##  3 1989 W30 ADL-PER  Business        185
##  4 1989 W31 ADL-PER  Business        254
##  5 1989 W32 ADL-PER  Business        191
##  6 1989 W33 ADL-PER  Business        136
##  7 1989 W34 ADL-PER  Business          0
##  8 1989 W35 ADL-PER  Business          0
##  9 1989 W36 ADL-PER  Business          0
## 10 1989 W37 ADL-PER  Business          0
## # ℹ 7,397 more rows
help("ansett")
ansett= ansett %>% 
  filter ( Class == "Economy" & Airports == "MEL-SYD")
ansett %>% 
  autoplot(Passengers) +
  labs(title = "Economy class passengers between Melbourne and Sydney")

lambda5 = ansett %>% 
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)
ansett %>% 
    autoplot(box_cox(Passengers, lambda5)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Tabocco production with $\\lambda$ = ",
         round(lambda5,2)))) +ylab("Passengers")

Pedestrian counts at Southern Cross Station from pedestrian

data("pedestrian")
pedestrian
## # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne>
## # Key:       Sensor [4]
##    Sensor         Date_Time           Date        Time Count
##    <chr>          <dttm>              <date>     <int> <int>
##  1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
##  2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
##  3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
##  4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
##  5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
##  6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01     5    77
##  7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01     6    44
##  8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01     7    56
##  9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01     8   113
## 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01     9   166
## # ℹ 66,027 more rows
help("pedestrian")
pedestrian %>% 
  filter( Sensor == "Southern Cross Station") %>% 
  autoplot(Count) +
  labs(title = "Pedestrian counts in the city of Melbourne
")

need to index into weekly data

ped_week = pedestrian %>% 
  mutate(Weekly = yearweek(Date)) %>% 
  index_by(Weekly) %>% 
  summarise(Count = sum(Count, na.rm = TRUE))
ped_week %>% 
  autoplot(Count) + labs(title = "Pedestrian counts in the city of Melbourne")

lambda6 <- ped_week %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
ped_week %>% 
  autoplot(box_cox(Count,lambda6)) + labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Pedestrian counts with $\\lambda$ = ",
         round(lambda6,2))))

Question 3.7

Consider the last five years of the Gas data from aus_production.

help("aus_production")
gas <- tail(aus_production, 5*4) |> select(Gas)
  1. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas %>% 
  autoplot(Gas) + labs(title = "Gas Production in Australia" , y = "Gas in Petajoules")

There are some clear seasonal fluctuations repeating every year in a pattern. Gas production will begun at its lowest production in quarter 1 of each year and increasing gradualy throughout quarter 3 and hit its peak in quarter 3 and the decreasing of gas production occurs in quarter 4. This cycle repeats every year throughout the data.

  1. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas_decomp = gas %>% 
  model(
    classical_decomposition(Gas, type = "multiplicative")
  )  %>%  
  components()
gas_decomp %>%  
  autoplot() +xlab("Quarter") + ggtitle("Classical Multiplicative. Decompostion of Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

  1. Do the results support the graphical interpretation from part a?

Comparing a our graph to the classical decomposition graph, shows similarities. The decomposition will expose the seasonality, trend, and remainder variables, which we can explore separately. All three variables show an increasing trend, just the original time series trend in A.

  1. Compute and plot the seasonally adjusted data.
 gas_tsibble =  as_tsibble(gas_decomp)
gas_tsibble %>% 
  autoplot(season_adjust) + labs(title = "Seasonal Adjusted Data for Gas Production", y = "Gas Production in Petajoules") 

  1. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas2 = gas %>% 
  mutate( Gas = if_else(Quarter ==yearquarter("2009Q3"), Gas + 300, Gas)) %>% 
    model(
    classical_decomposition(Gas, type = "multiplicative")
  )  %>%  
  components()
gas2tssible = as_tsibble(gas2)
gas2tssible%>% 
  autoplot(season_adjust) + labs(title = "Seasonal Adjusted Data for Gas Production in Quarter 3 of 2009 When 300 + ", y = "Gas Production in Petajoules") 

  1. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

It does make a difference when an outlier is added near the end rather than the middle of the time series, because it sticks out and it develops peak towards the end of the data. it make the data visually skewed, when 300 more petajoules of gas is added to quarter 3 in 2009. The outlier at the end make it difficult to understand any trend or pattern that will occur in the data, because the huge peak created has impact on how the data is interpreted.

Question 3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of total US retail employment using X-11.")

X-11 model can allows us to observe the robust outlier drawn thorughout the graph. Some unusual patterns are shown to occur in in the begining and throughout of the 1990s and this pattern is observed again in the end of 2000s. These fluctuations could be result of economic downsides like recession from dot.com bubble in the 90s and housing crash in 2008. These outlier highlight irregular trends in the employment data.

Question 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

  1. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

In figure 3.19, we can observe a steady increase in the number of person in the civilian labor force from 1978 to 1995. There is noticeable in the beginning of the 1990s, due to some economic recessions. When we observe figure 3.20, there some strong seasonal fluctuation that occur in the Australian civilian labor force. The labor force hits its peak in two months March and December, due to hiring programs for holidays. In the remainder component the sharp decline in 1991 visualizes the impact of the 1991/1992 recession.

Is the recession of 1991/1992 visible in the estimated components?

The recession is clearly visible in the remainder component. This recession caused a decline the civilian labour force, and is detailed to have strong impacted the upwards trends of labour force growth.