Week 3 Decomposition |14-Feb 20-Feb

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with your .rmd file.

QUESTIONS

3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

EXPLORATION

head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
global_economy %>%
  tsibble(key = Code, index = Year)%>%
  autoplot(GDP/Population, show.legend = FALSE) +
  labs(title= "GDP per capita",
       y = "$United States")

my_global_economy <- global_economy %>% 
  mutate(GDP_per_capita = GDP/Population) 

my_global_economy %>%
  filter(GDP_per_capita > 100000) %>%
  autoplot(GDP_per_capita) +
  labs(title= "GDP per capita",
       subtitle = "Years",
       y = "USD")

z <- global_economy %>%
  group_by(Country, GDP, Population) %>%
  summarise(GD = GDP/Population) %>% 
  arrange(desc(GD))
head(z)
## # A tsibble: 6 x 5 [1Y]
## # Key:       Country, GDP, Population [6]
## # Groups:    Country, GDP [6]
##   Country               GDP Population  Year      GD
##   <fct>               <dbl>      <dbl> <dbl>   <dbl>
## 1 Monaco        7060236168.      38132  2014 185153.
## 2 Monaco        6476490406.      35853  2008 180640.
## 3 Liechtenstein 6657170923.      37127  2014 179308.
## 4 Liechtenstein 6391735894.      36834  2013 173528.
## 5 Monaco        6553372278.      37971  2013 172589.
## 6 Monaco        6468252212.      38499  2016 168011.
global_economy %>%
  tsibble(key = Code, index = Year)%>%
  filter(Country=="Monaco") %>%
  autoplot(GDP/Population)

Monaco has highest GDP per capita and we see an overall increase for subsequent countries as it pertains to gdp growth

3.2 For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP/Population) +
  labs(title= "The GDP per capita", y = "$USD")

#I believe no necessary change is needed on this one due to pop growth in relation to gdp

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot(Count) +
  labs(title= "Slaughter of Victorian “Bulls, bulls and steers", y = "The Total Count")

# when we analyze the below we see clear downward trend in slaughter of Victorian bulls, bulls, and steers. Interesting to also see possible cycle pattern with peaks happening every 5 to 10 years or so.

Victorian Electricity Demand from vic_elec.

autoplot(vic_elec, Demand)

#when viewing the vic_elec demand plot we witness seasonality. The spikes seen are likely summer or winter [extreme months] We may say that no trend for demand is seen possibly no cycle. 

Gas production from aus_production.

autoplot(aus_production, Gas)

lamb_cg <- canadian_gas %>%
                  features(Volume, features = guerrero) %>%
                  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda = lamb_cg))+
  labs(title = latex2exp::TeX(paste0(
         "Box Cox Transformation of Canadian Gas Production with $\\lambda$ = ",
         round(lamb_cg,2))))+
  theme_replace()+
  geom_line(col = "#69b3a2")

#when viewing the plot of monthly Canadian gas production we see seasonality of one year and a low seasonal variance. 
#Guerrero's method for Box Cox lambda selection
#Applies Guerrero's (1993) method to select the lambda which minimises the coefficient of variation for subseries of x.

3.3 Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume) +
  labs(title = "Canadian Gas Production")

lambda_x <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

canadian_gas %>%
  autoplot(box_cox(Volume, lambda_x)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda_x,2))))

From our plot of monthly Canadian gas production we witness seasonality of 1 year and the seasonal variance that is low from 1960 through 1978. Later, we see larger from 1978 through 1988. Since the seasonal variation increases and then decreases, we can say that the Box Cox transformation cannot be used to make the seasonal variation uniform.

3.4 What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(646)
p <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
p %>% autoplot(Turnover)

lambda_retail <- p %>%
                  features(Turnover, features = guerrero) %>%
                  pull(lambda_guerrero)

p %>%
  autoplot(box_cox(Turnover, lambda_retail))+
  labs(title = latex2exp::TeX(paste0(
         "Box Cox Transformation of Australian Retail Trade Turnover with $\\lambda$ = ",
         round(lambda_retail,2))))

#We view that in the data it contains little max(y) and little min(y) value also variation of values appears to be more random than following the levels, this is why Box Cox transformation wouldn't work with this data.
# just to try, I explore further by using the guerrero feat. to extract the optimal lambda and plot the resulting box cox transformation of Turnover provided

3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

head(aus_production)
## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7
aus_production %>%
  autoplot(Tobacco)

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "",
       title = paste("Box-Cox Transformation with lambda = ", round(lambda,2)))

The variance appears to be some what stabile with a lambda of 0.93.

#between economy class of Melbourne and Sydney
ansett %>%
    filter(Class == 'Economy') %>%
    filter(Airports == 'MEL-SYD') -> economy_x

lambda_y <- economy_x %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

economy_x %>%
  autoplot(box_cox(Passengers, lambda_y)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Economy passengers MEL-SYD with $\\lambda$ = ",
         round(lambda_y,2))))

We see that the variance appears to be somewhat stable with a lambda of 2, with occurence of some drops

#Pedestrian counts at Southern Cross Station from pedestrian
pedestrian %>%
    filter(Sensor == 'Southern Cross Station') -> sc_sss

lambda_c <- sc_sss %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

sc_sss %>%
  autoplot(box_cox(Count, lambda_c)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "The Pedestrian counts at Southern Cross Station with $\\lambda$ = ",
         round(lambda_c,2))))

I would say variance looks to be stable, however this data freq. is visible difficulty.

3.7 Consider the last five years of the Gas data from aus_production.

head(aus_production)
## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas
## # A tsibble: 20 x 2 [1Q]
##      Gas Quarter
##    <dbl>   <qtr>
##  1   221 2005 Q3
##  2   180 2005 Q4
##  3   171 2006 Q1
##  4   224 2006 Q2
##  5   233 2006 Q3
##  6   192 2006 Q4
##  7   187 2007 Q1
##  8   234 2007 Q2
##  9   245 2007 Q3
## 10   205 2007 Q4
## 11   194 2008 Q1
## 12   229 2008 Q2
## 13   249 2008 Q3
## 14   203 2008 Q4
## 15   196 2009 Q1
## 16   238 2009 Q2
## 17   252 2009 Q3
## 18   210 2009 Q4
## 19   205 2010 Q1
## 20   236 2010 Q2

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>%
  autoplot(Gas) +
  labs(title = "The Quarterly Australian Gas Production")

Wee see that there is an upward trend with a quarterly seasonality.

B. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a?

#gas_x <- gas %>% 
#  model(
#    classical_decomposition(Gas,type = "multiplicative")
#    ) %>% 
#  components() 
#gas_x

Compute and plot the seasonally adjusted data.

#gas_seasonally_adjusted <- gas %>% model(classical_decomposition(Gas, type = 'multiplicative')) %>% components()
#
#gas_seasonally_adjusted %>%
#  ggplot(aes(x = Quarter)) +
#  geom_line(aes(y = Gas, color = 'Data')) +
#  geom_line(aes(y = season_adjust, color = 'Seasonally Adjusted')) +
#  geom_line(aes(y = trend, color = 'Trend')) +
#  labs(title = 'Last Five Years of Gas Data Seasonally Adjusted')

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?

#gas %>%
#  mutate(Gas = if_else(Quarter==yearquarter("2007Q2"), Gas + 400, Gas)) %>%
#  model(classical_decomposition(Gas, type = "multiplicative")) %>%
#  components() %>%
#  as_tsibble() %>%
#  autoplot(season_adjust) +
#  labs(title = 'Seasonally Adjusted Data with 400')

3.8 Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

set.seed(222)

#myseries <- aus_retail %>%
#  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

#x11_dcmp <- myseries %>%
#  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
#  components()
#autoplot(x11_dcmp) +
#  labs(title =
#    "Decomposition of Australian Retail Turnover using X-11.")

3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

  1. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. We witness that the number of persons in the civilian labour force in Australia has grown at steady pace over time. Focusing on the trend data in figure 3.19. When analyzing the month to month of the seasonal data in figure 3.20 this tells us that some months display greater variation freqs than other months. We can also identify a seasonal pattern in the “season_year” graph.

b.Is the recession of 1991/1992 visible in the estimated components?

I believe so, a drop in employment during the years 1991 until 1992 can be seen in the remainder graphic. This decline is not explained by the seasonality or trend data.