HW2 Chapter 3 624

library(fpp3)

## -- Attaching packages -------------------------------------------- fpp3 0.4.0 --

## v tibble      3.1.4      v tsibble     1.0.1 
## v dplyr       1.0.7      v tsibbledata 0.3.0 
## v tidyr       1.1.3      v feasts      0.2.2 
## v lubridate   1.7.10     v fable       0.3.1 
## v ggplot2     3.3.5

## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()    masks base::date()
## x dplyr::filter()      masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval()  masks lubridate::interval()
## x dplyr::lag()         masks stats::lag()
## x tsibble::setdiff()   masks base::setdiff()
## x tsibble::union()     masks base::union()

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v readr   2.0.1     v stringr 1.4.0
## v purrr   0.3.4     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x tsibble::intersect()     masks lubridate::intersect(), base::intersect()
## x tsibble::interval()      masks lubridate::interval()
## x dplyr::lag()             masks stats::lag()
## x tsibble::setdiff()       masks lubridate::setdiff(), base::setdiff()
## x tsibble::union()         masks lubridate::union(), base::union()

library(dplyr)
library(latex2exp)

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

dim(global_economy)

## [1] 15150     9

str(global_economy)

## tbl_ts [15,150 x 9] (S3: tbl_ts/tbl_df/tbl/data.frame)
##  $ Country   : Factor w/ 263 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Code      : Factor w/ 263 levels "ABW","AFG","AGO",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Year      : num [1:15150] 1960 1961 1962 1963 1964 ...
##  $ GDP       : num [1:15150] 5.38e+08 5.49e+08 5.47e+08 7.51e+08 8.00e+08 ...
##  $ Growth    : num [1:15150] NA NA NA NA NA NA NA NA NA NA ...
##  $ CPI       : num [1:15150] NA NA NA NA NA NA NA NA NA NA ...
##  $ Imports   : num [1:15150] 7.02 8.1 9.35 16.86 18.06 ...
##  $ Exports   : num [1:15150] 4.13 4.45 4.88 9.17 8.89 ...
##  $ Population: num [1:15150] 8996351 9166764 9345868 9533954 9731361 ...
##  - attr(*, "key")= tibble [263 x 2] (S3: tbl_df/tbl/data.frame)
##   ..$ Country: Factor w/ 263 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ .rows  : list<int> [1:263] 
##   .. ..$ : int [1:58] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..$ : int [1:58] 59 60 61 62 63 64 65 66 67 68 ...
##   .. ..$ : int [1:58] 117 118 119 120 121 122 123 124 125 126 ...
##   .. ..$ : int [1:58] 175 176 177 178 179 180 181 182 183 184 ...
##   .. ..$ : int [1:58] 233 234 235 236 237 238 239 240 241 242 ...
##   .. ..$ : int [1:58] 291 292 293 294 295 296 297 298 299 300 ...
##   .. ..$ : int [1:58] 349 350 351 352 353 354 355 356 357 358 ...
##   .. ..$ : int [1:58] 407 408 409 410 411 412 413 414 415 416 ...
##   .. ..$ : int [1:58] 465 466 467 468 469 470 471 472 473 474 ...
##   .. ..$ : int [1:58] 523 524 525 526 527 528 529 530 531 532 ...
##   .. ..$ : int [1:58] 581 582 583 584 585 586 587 588 589 590 ...
##   .. ..$ : int [1:58] 639 640 641 642 643 644 645 646 647 648 ...
##   .. ..$ : int [1:58] 697 698 699 700 701 702 703 704 705 706 ...
##   .. ..$ : int [1:58] 755 756 757 758 759 760 761 762 763 764 ...
##   .. ..$ : int [1:58] 813 814 815 816 817 818 819 820 821 822 ...
##   .. ..$ : int [1:58] 871 872 873 874 875 876 877 878 879 880 ...
##   .. ..$ : int [1:58] 929 930 931 932 933 934 935 936 937 938 ...
##   .. ..$ : int [1:58] 987 988 989 990 991 992 993 994 995 996 ...
##   .. ..$ : int [1:58] 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 ...
##   .. ..$ : int [1:58] 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 ...
##   .. ..$ : int [1:58] 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 ...
##   .. ..$ : int [1:58] 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 ...
##   .. ..$ : int [1:58] 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 ...
##   .. ..$ : int [1:58] 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 ...
##   .. ..$ : int [1:58] 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 ...
##   .. ..$ : int [1:58] 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 ...
##   .. ..$ : int [1:58] 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 ...
##   .. ..$ : int [1:58] 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 ...
##   .. ..$ : int [1:58] 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 ...
##   .. ..$ : int [1:58] 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 ...
##   .. ..$ : int [1:58] 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 ...
##   .. ..$ : int [1:58] 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 ...
##   .. ..$ : int [1:58] 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 ...
##   .. ..$ : int [1:58] 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 ...
##   .. ..$ : int [1:58] 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 ...
##   .. ..$ : int [1:58] 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 ...
##   .. ..$ : int [1:58] 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 ...
##   .. ..$ : int [1:58] 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 ...
##   .. ..$ : int [1:58] 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 ...
##   .. ..$ : int [1:58] 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 ...
##   .. ..$ : int [1:58] 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 ...
##   .. ..$ : int [1:58] 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 ...
##   .. ..$ : int [1:58] 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 ...
##   .. ..$ : int [1:58] 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 ...
##   .. ..$ : int [1:58] 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 ...
##   .. ..$ : int [1:58] 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 ...
##   .. ..$ : int [1:58] 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 ...
##   .. ..$ : int [1:58] 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 ...
##   .. ..$ : int [1:58] 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 ...
##   .. ..$ : int [1:58] 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 ...
##   .. ..$ : int [1:58] 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 ...
##   .. ..$ : int [1:58] 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 ...
##   .. ..$ : int [1:58] 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 ...
##   .. ..$ : int [1:58] 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 ...
##   .. ..$ : int [1:58] 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 ...
##   .. ..$ : int [1:58] 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 ...
##   .. ..$ : int [1:58] 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 ...
##   .. ..$ : int [1:58] 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 ...
##   .. ..$ : int [1:58] 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 ...
##   .. ..$ : int [1:58] 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 ...
##   .. ..$ : int [1:58] 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 ...
##   .. ..$ : int [1:58] 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 ...
##   .. ..$ : int [1:58] 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 ...
##   .. ..$ : int [1:58] 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 ...
##   .. ..$ : int [1:58] 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 ...
##   .. ..$ : int [1:58] 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 ...
##   .. ..$ : int [1:58] 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 ...
##   .. ..$ : int [1:58] 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 ...
##   .. ..$ : int [1:52] 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 ...
##   .. ..$ : int [1:58] 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 ...
##   .. ..$ : int [1:58] 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 ...
##   .. ..$ : int [1:58] 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 ...
##   .. ..$ : int [1:58] 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 ...
##   .. ..$ : int [1:58] 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 ...
##   .. ..$ : int [1:58] 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 ...
##   .. ..$ : int [1:58] 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 ...
##   .. ..$ : int [1:58] 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 ...
##   .. ..$ : int [1:58] 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 ...
##   .. ..$ : int [1:58] 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 ...
##   .. ..$ : int [1:58] 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 ...
##   .. ..$ : int [1:58] 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 ...
##   .. ..$ : int [1:58] 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 ...
##   .. ..$ : int [1:58] 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 ...
##   .. ..$ : int [1:58] 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 ...
##   .. ..$ : int [1:58] 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 ...
##   .. ..$ : int [1:58] 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 ...
##   .. ..$ : int [1:58] 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 ...
##   .. ..$ : int [1:58] 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 ...
##   .. ..$ : int [1:58] 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 ...
##   .. ..$ : int [1:58] 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 ...
##   .. ..$ : int [1:58] 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 ...
##   .. ..$ : int [1:58] 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 ...
##   .. ..$ : int [1:58] 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 ...
##   .. ..$ : int [1:58] 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 ...
##   .. ..$ : int [1:58] 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 ...
##   .. ..$ : int [1:58] 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 ...
##   .. ..$ : int [1:58] 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 ...
##   .. ..$ : int [1:58] 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 ...
##   .. ..$ : int [1:58] 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 ...
##   .. .. [list output truncated]
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE
##  - attr(*, "index")= chr "Year"
##   ..- attr(*, "ordered")= logi TRUE
##  - attr(*, "index2")= chr "Year"
##  - attr(*, "interval")= interval [1:1] 1Y
##   ..@ .regular: logi TRUE

head(global_economy)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

global_economy2 <-global_economy %>%
  mutate (GDP_perC=GDP/Population)

global_economy2%>% distinct(Country)

## # A tibble: 263 x 1
##    Country            
##    <fct>              
##  1 Afghanistan        
##  2 Albania            
##  3 Algeria            
##  4 American Samoa     
##  5 Andorra            
##  6 Angola             
##  7 Antigua and Barbuda
##  8 Arab World         
##  9 Argentina          
## 10 Armenia            
## # ... with 253 more rows

#There are 263 countries in the dataset.  Let's look at highest GDP/capita.
 
test<-global_economy2 %>% as_tibble() %>% group_by(Country) %>% dplyr::summarise(maxGDP = max(GDP_perC))

str(test)

## tibble [263 x 2] (S3: tbl_df/tbl/data.frame)
##  $ Country: Factor w/ 263 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ maxGDP : num [1:263] NA NA 5565 NA NA ...

class(test)

## [1] "tbl_df"     "tbl"        "data.frame"

test%>%arrange(desc(maxGDP))

## # A tibble: 263 x 2
##    Country        maxGDP
##    <fct>           <dbl>
##  1 Luxembourg    119225.
##  2 Norway        103059.
##  3 Iceland        70057.
##  4 Ireland        69331.
##  5 Australia      67990.
##  6 Denmark        64322.
##  7 Sweden         60283.
##  8 United States  59532.
##  9 North America  58070.
## 10 Singapore      57714.
## # ... with 253 more rows

#Feasible to look at top 10:  Luxembourg, Norway, Iceland, Ireland, Australia, Denmark, Sweden, United States, North America, Singapore.

global_economy10<-global_economy2 %>% filter(Country=='Luxembourg'|Country=='Norway'|Country=='Iceland'|Country=='Ireland'|Country=='Australia'|Country=='Denmark'|Country=='Sweden'|Country=='United States'|Country=='North America'|Country=='Singapore') 

global_economy10%>%
  autoplot(GDP_perC)+labs(title="GDP per capita", y="$US")

#A bit difficult to view, let's isolate on top 5
global_economy5<-global_economy2 %>% filter(Country=='Luxembourg'|Country=='Norway'|Country=='Iceland'|Country=='Ireland'|Country=='Australia') 

global_economy5%>%
  autoplot(GDP_perC)+labs(title="GDP per capita", y="$US")

ANSWER: Luxembuorg currently has the highest GDP per capita. Since 1990, Luxembourg has been highest, followed by Norway. ____________________

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_economy2%>%
  filter(Country=='United States') %>%
  autoplot(GDP_perC)+ 
  labs(title="United States GDP per capita", y="$US")

ANSWER:

This data is appropriately population adjusted.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

head(aus_livestock)

## # A tsibble: 6 x 4 [1M]
## # Key:       Animal, State [1]
##      Month Animal                     State                        Count
##      <mth> <fct>                      <fct>                        <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory  2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory  2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory  2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory  1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory  2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory  1800

aus_livestock2<-aus_livestock%>%
  filter(Animal=='Bulls, bullocks and steers') 

aus_livestock2%>%
  autoplot(Count)+ 
  labs(title="Bulls, bullocks and steers", y="Count")

aus_livestock2 %>%
                  ggplot(aes(x=Month, y=Count,colour=State))+geom_line()+
  facet_grid(State ~ ., scales="free_y")

#Northern Territory and Australian Capital Territory appear to have no data after a certain date.  

# Lets focus on Tasmania
aus_livestock3<-aus_livestock2 %>%filter(State=="Tasmania")

aus_livestock3%>%
                  autoplot(Count)

#Lets try a box cox transformation using the guerrero feature.

lambda<-aus_livestock3%>%
  features(Count, features=guerrero)%>%
  pull(lambda_guerrero)

aus_livestock3%>%
  autoplot(box_cox(Count,lambda))+
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed livestock slaughter with $\\lambda$ = ",
         round(lambda,2))))

# the box_cox does temper the variation a bit.

Victorian Electricity Demand from vic_elec.

head(vic_elec)

## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
##   Time                Demand Temperature Date       Holiday
##   <dttm>               <dbl>       <dbl> <date>     <lgl>  
## 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
## 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
## 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
## 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
## 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
## 6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE

vic_elec%>%
  autoplot(Demand)+ 
  labs(title="Australia electricity demand", y="Demand")

DISCUSSION: This plot is overwritten. It is difficult to tell what is going on. It may be beneficial to isolate on 1 year.

vic_elec2 <-vic_elec%>% filter(grepl('2014', Time))

autoplot(vic_elec2)

## Plot variable not specified, automatically selected `.vars = Demand`

Possibly try a calendar adjustment, where we compute average demand per day.

Perhaps 30 minute interval is meaningful, try a log transformation.

#lambda2<-vic_elec2%>%
#  features(Demand, features=guerrero)%>%
#  pull(lambda_guerrero)

#vic_elec2%>%
#  autoplot(box_cox(Demand,lambda2))+
#  labs(y = "",
#       title = latex2exp::TeX(paste0(
#         "Transformed Demand with $\\lambda$ = ",
#         round(lambda2,2))))
vic_elec3<-vic_elec2%>%
  mutate(ln_demand=log(Demand))

autoplot(vic_elec3,ln_demand)

DISCUSSION: There is a slight improvement in the variability

Gas production from aus_production.

head(aus_production)

## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7

autoplot(aus_production,Gas)

Per the book suggestion, box-cox transformation shown below.

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

DISCUSSION:

The box-cox tranformation lessened the variability.

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

head(canadian_gas)

## # A tsibble: 6 x 2 [1M]
##      Month Volume
##      <mth>  <dbl>
## 1 1960 Jan   1.43
## 2 1960 Feb   1.31
## 3 1960 Mar   1.40
## 4 1960 Apr   1.17
## 5 1960 May   1.12
## 6 1960 Jun   1.01

autoplot(canadian_gas)

## Plot variable not specified, automatically selected `.vars = Volume`

DISCUSSION: Try a box-cox with guerrero…

lambda_c <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda_c)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Volume with $\\lambda$ = ",
         round(lambda_c,2))))

DISCUSSION:

The box-cox transformation was not impactful on the variability.

“there will be no need of transformation if max/min is small. Mostly, with max/min small all the observations are away from zero (relatively), so the power transform will be well approximated linearly over a short interval” cited from Stack Exchange on data transformation.

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(1111111)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))

head(myseries)

## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State             Industry                     `Series ID`    Month Turnover
##   <chr>             <chr>                        <chr>          <mth>    <dbl>
## 1 Western Australia Newspaper and book retailing A3349822A   1982 Apr      9.7
## 2 Western Australia Newspaper and book retailing A3349822A   1982 May     11  
## 3 Western Australia Newspaper and book retailing A3349822A   1982 Jun     10.7
## 4 Western Australia Newspaper and book retailing A3349822A   1982 Jul      9  
## 5 Western Australia Newspaper and book retailing A3349822A   1982 Aug      9.1
## 6 Western Australia Newspaper and book retailing A3349822A   1982 Sep     10

autoplot(myseries,.vars=Turnover)

DISCUSSION: Now run the guerrero method to select the appropriate box-cox transformation.

lambda_m <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
myseries %>%
  autoplot(box_cox(Turnover, lambda_m)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Turnover with $\\lambda$ = ",
         round(lambda_m,2))))

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production

head(aus_production)

## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7

autoplot(aus_production,Tobacco)

## Warning: Removed 24 row(s) containing missing values (geom_path).

lambda_a <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Tobacco, lambda_a)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Tobacco with $\\lambda$ = ",
         round(lambda_a,2))))

## Warning: Removed 24 row(s) containing missing values (geom_path).

Economy class passengers between Melbourne and Sydney from ansett

head(ansett)

## # A tsibble: 6 x 4 [1W]
## # Key:       Airports, Class [1]
##       Week Airports Class    Passengers
##     <week> <chr>    <chr>         <dbl>
## 1 1989 W28 ADL-PER  Business        193
## 2 1989 W29 ADL-PER  Business        254
## 3 1989 W30 ADL-PER  Business        185
## 4 1989 W31 ADL-PER  Business        254
## 5 1989 W32 ADL-PER  Business        191
## 6 1989 W33 ADL-PER  Business        136

ansett2<-ansett%>%
    filter(Airports=='MEL-SYD' & Class=='Economy')

autoplot(ansett2,Passengers)

lambda_t <- ansett2 %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)
ansett2 %>%
  autoplot(box_cox(Passengers, lambda_t)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Passengers with $\\lambda$ = ",
         round(lambda_t,2))))

Pedestrian counts at Southern Cross Station from pedestrian

head(pedestrian)

## # A tsibble: 6 x 5 [1h] <Australia/Melbourne>
## # Key:       Sensor [1]
##   Sensor         Date_Time           Date        Time Count
##   <chr>          <dttm>              <date>     <int> <int>
## 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
## 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
## 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
## 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
## 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
## 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01     5    77

pedestrian2<-pedestrian%>%
    filter(Sensor=='Southern Cross Station')

autoplot(pedestrian2,Count)

lambda_p <- pedestrian2 %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)
pedestrian2 %>%
  autoplot(box_cox(Count, lambda_p)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Pedestrians with $\\lambda$ = ",
         round(lambda_p,2))))

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas <- tail(aus_production, 5*4) %>% select(Gas)
head(gas)

## # A tsibble: 6 x 2 [1Q]
##     Gas Quarter
##   <dbl>   <qtr>
## 1   221 2005 Q3
## 2   180 2005 Q4
## 3   171 2006 Q1
## 4   224 2006 Q2
## 5   233 2006 Q3
## 6   192 2006 Q4

autoplot(gas)

## Plot variable not specified, automatically selected `.vars = Gas`

DISCUSSION:

There is an upward trend with a quarterly seasonality.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas2<-gas %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Gas")

gas2

## Warning: Removed 2 row(s) containing missing values (geom_path).

dec<-as_tsibble(gas) %>%
  model(classical_decomposition(Gas ~ season(4), type = "mult")) %>%
  components()

dec

## # A dable: 20 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend * seasonal * random
##    .model                      Quarter   Gas trend seasonal random season_adjust
##    <chr>                         <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposition(G~ 2005 Q3   221   NA     1.13  NA              196.
##  2 "classical_decomposition(G~ 2005 Q4   180   NA     0.925 NA              195.
##  3 "classical_decomposition(G~ 2006 Q1   171  200.    0.875  0.974          195.
##  4 "classical_decomposition(G~ 2006 Q2   224  204.    1.07   1.02           209.
##  5 "classical_decomposition(G~ 2006 Q3   233  207     1.13   1.00           207.
##  6 "classical_decomposition(G~ 2006 Q4   192  210.    0.925  0.987          208.
##  7 "classical_decomposition(G~ 2007 Q1   187  213     0.875  1.00           214.
##  8 "classical_decomposition(G~ 2007 Q2   234  216.    1.07   1.01           218.
##  9 "classical_decomposition(G~ 2007 Q3   245  219.    1.13   0.996          218.
## 10 "classical_decomposition(G~ 2007 Q4   205  219.    0.925  1.01           222.
## 11 "classical_decomposition(G~ 2008 Q1   194  219.    0.875  1.01           222.
## 12 "classical_decomposition(G~ 2008 Q2   229  219     1.07   0.974          213.
## 13 "classical_decomposition(G~ 2008 Q3   249  219     1.13   1.01           221.
## 14 "classical_decomposition(G~ 2008 Q4   203  220.    0.925  0.996          219.
## 15 "classical_decomposition(G~ 2009 Q1   196  222.    0.875  1.01           224.
## 16 "classical_decomposition(G~ 2009 Q2   238  223.    1.07   0.993          222.
## 17 "classical_decomposition(G~ 2009 Q3   252  225.    1.13   0.994          224.
## 18 "classical_decomposition(G~ 2009 Q4   210  226     0.925  1.00           227.
## 19 "classical_decomposition(G~ 2010 Q1   205   NA     0.875 NA              234.
## 20 "classical_decomposition(G~ 2010 Q2   236   NA     1.07  NA              220.

DISCUSSION:
The classical decomposition does depict the series components quite eloquently.

The trend is clearly upward positive and the seasonality is quarterly. The remainder is around 1.

Do the results support the graphical interpretation from part a?

DISCUSSION:

The classical decomposition does support the graphical interpretation from parta, upward trend and quarterly seasonality.

Compute and plot the seasonally adjusted data.

autoplot(gas)

## Plot variable not specified, automatically selected `.vars = Gas`

dec %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gass",
       title = "Gas Production") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 row(s) containing missing values (geom_path).

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_o<-gas
   gas_o[1,1]<-gas[1,1]  +300
   head(gas_o)

## # A tsibble: 6 x 2 [1Q]
##     Gas Quarter
##   <dbl>   <qtr>
## 1   521 2005 Q3
## 2   180 2005 Q4
## 3   171 2006 Q1
## 4   224 2006 Q2
## 5   233 2006 Q3
## 6   192 2006 Q4

   gas_o2<-gas_o %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Gas with outlier")

gas_o2

## Warning: Removed 2 row(s) containing missing values (geom_path).

dec_o<-as_tsibble(gas_o) %>%
  model(classical_decomposition(Gas ~ season(4), type = "mult")) %>%
  components()

dec_o

## # A dable: 20 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend * seasonal * random
##    .model                      Quarter   Gas trend seasonal random season_adjust
##    <chr>                         <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposition(G~ 2005 Q3   521   NA     1.14  NA              459.
##  2 "classical_decomposition(G~ 2005 Q4   180   NA     0.933 NA              193.
##  3 "classical_decomposition(G~ 2006 Q1   171  238     0.849  0.846          201.
##  4 "classical_decomposition(G~ 2006 Q2   224  204.    1.08   1.02           207.
##  5 "classical_decomposition(G~ 2006 Q3   233  207     1.14   0.992          205.
##  6 "classical_decomposition(G~ 2006 Q4   192  210.    0.933  0.979          206.
##  7 "classical_decomposition(G~ 2007 Q1   187  213     0.849  1.03           220.
##  8 "classical_decomposition(G~ 2007 Q2   234  216.    1.08   1.00           216.
##  9 "classical_decomposition(G~ 2007 Q3   245  219.    1.14   0.987          216.
## 10 "classical_decomposition(G~ 2007 Q4   205  219.    0.933  1.00           220.
## 11 "classical_decomposition(G~ 2008 Q1   194  219.    0.849  1.04           229.
## 12 "classical_decomposition(G~ 2008 Q2   229  219     1.08   0.965          211.
## 13 "classical_decomposition(G~ 2008 Q3   249  219     1.14   1.00           219.
## 14 "classical_decomposition(G~ 2008 Q4   203  220.    0.933  0.987          218.
## 15 "classical_decomposition(G~ 2009 Q1   196  222.    0.849  1.04           231.
## 16 "classical_decomposition(G~ 2009 Q2   238  223.    1.08   0.985          220.
## 17 "classical_decomposition(G~ 2009 Q3   252  225.    1.14   0.986          222.
## 18 "classical_decomposition(G~ 2009 Q4   210  226     0.933  0.996          225.
## 19 "classical_decomposition(G~ 2010 Q1   205   NA     0.849 NA              242.
## 20 "classical_decomposition(G~ 2010 Q2   236   NA     1.08  NA              218.

autoplot(gas_o)

## Plot variable not specified, automatically selected `.vars = Gas`

dec_o %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gass",
       title = "Gas Production end outlier") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 row(s) containing missing values (geom_path).

The outlier was the first datapoint in the time series. It has a profound effect on the seaonally adjusted data.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

DISCUSSION:

Let’s add 300 to the 11th data point, 2008Q1

gas_m<-gas
   gas_m[11,1]<-gas[11,1]  +300
   head(gas_m)

## # A tsibble: 6 x 2 [1Q]
##     Gas Quarter
##   <dbl>   <qtr>
## 1   221 2005 Q3
## 2   180 2005 Q4
## 3   171 2006 Q1
## 4   224 2006 Q2
## 5   233 2006 Q3
## 6   192 2006 Q4

   gas_m2<-gas_m %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Gas with middle outlier")

gas_m2

## Warning: Removed 2 row(s) containing missing values (geom_path).

dec_m<-as_tsibble(gas_m) %>%
  model(classical_decomposition(Gas ~ season(4), type = "mult")) %>%
  components()

dec_m

## # A dable: 20 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend * seasonal * random
##    .model                      Quarter   Gas trend seasonal random season_adjust
##    <chr>                         <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposition(G~ 2005 Q3   221   NA     1.05  NA              211.
##  2 "classical_decomposition(G~ 2005 Q4   180   NA     0.868 NA              207.
##  3 "classical_decomposition(G~ 2006 Q1   171  200.    1.08   0.792          159.
##  4 "classical_decomposition(G~ 2006 Q2   224  204.    1.01   1.09           222.
##  5 "classical_decomposition(G~ 2006 Q3   233  207     1.05   1.08           223.
##  6 "classical_decomposition(G~ 2006 Q4   192  210.    0.868  1.05           221.
##  7 "classical_decomposition(G~ 2007 Q1   187  213     1.08   0.815          174.
##  8 "classical_decomposition(G~ 2007 Q2   234  216.    1.01   1.07           232.
##  9 "classical_decomposition(G~ 2007 Q3   245  256.    1.05   0.915          234.
## 10 "classical_decomposition(G~ 2007 Q4   205  294.    0.868  0.804          236.
## 11 "classical_decomposition(G~ 2008 Q1   494  294.    1.08   1.56           459.
## 12 "classical_decomposition(G~ 2008 Q2   229  294     1.01   0.771          227.
## 13 "classical_decomposition(G~ 2008 Q3   249  256.    1.05   0.928          238.
## 14 "classical_decomposition(G~ 2008 Q4   203  220.    0.868  1.06           234.
## 15 "classical_decomposition(G~ 2009 Q1   196  222.    1.08   0.820          182.
## 16 "classical_decomposition(G~ 2009 Q2   238  223.    1.01   1.06           236.
## 17 "classical_decomposition(G~ 2009 Q3   252  225.    1.05   1.07           241.
## 18 "classical_decomposition(G~ 2009 Q4   210  226     0.868  1.07           242.
## 19 "classical_decomposition(G~ 2010 Q1   205   NA     1.08  NA              190.
## 20 "classical_decomposition(G~ 2010 Q2   236   NA     1.01  NA              234.

autoplot(gas_m)

## Plot variable not specified, automatically selected `.vars = Gas`

dec_m %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gass",
       title = "Gas Production end middle outlier") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 row(s) containing missing values (geom_path).

DISCUSSION:

The middle outlier does have a profound effect on the seasonal adjusted data as well as the trend.

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(seasonal)

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

autoplot(myseries)

## Plot variable not specified, automatically selected `.vars = Turnover`

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Newspaper books retail Turnover using X-11.")

x11_dcmp %>%
  ggplot(aes(x = Month)) +
  geom_line(aes(y = Turnover, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Turnover)",
       title = "Turnover in Newspaper, books retail") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

The decomposition addresses the wider variability as time progresses in this time series.

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995. Figure 3.19: Decomposition of the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Seasonal component from the decomposition shown in the previous figure. Figure 3.20: Seasonal component from the decomposition shown in the previous figure.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

DISCUSSION:

The seasonal component changes over time, more variation in later time periods. However, consecutive years are similar.

The trend is upward as time goes on. The recession of 1991/1992 is not captured in the trend component and shows in the remainder. This a deficiency ot the decomposition.

With respect to the scales, the variation in the is smallest when compared to the variation in the data.

Is the recession of 1991/1992 visible in the estimated components?

DISCUSSION:

As stated above, the recession of 1991/1992 is not captured in the trend component and shows in the remainder.

The is a deficiency of the decomposition.

HW2 Chapter 3 624

Lisa Szydziak

9/14/2021