624_Homework

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

## ✔ tibble      3.1.8     ✔ tsibble     1.1.3
## ✔ dplyr       1.1.0     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.0
## ✔ lubridate   1.9.1     ✔ fable       0.3.2
## ✔ ggplot2     3.4.0     ✔ fabletools  0.3.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

Exercise 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_gdp <- global_economy %>%
  mutate(gdp_per_capita=GDP/Population)

global_gdp %>% 
  autoplot(gdp_per_capita,show.legend=FALSE)+
  labs('GDP Per Capita',y='US Dollars')

global_gdp %>%
  filter(gdp_per_capita == max(gdp_per_capita, na.rm = TRUE)) %>%
  select(Country, gdp_per_capita)

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country gdp_per_capita  Year
##   <fct>            <dbl> <dbl>
## 1 Monaco         185153.  2014

global_gdp %>%
  filter(Country == 'Monaco')%>%
  autoplot(gdp_per_capita)

The country with the highest gdp per capita is Monaco, which peaked at $185,152, in 2014. Long term the Monaco’s gdp per capita trend is positive. As long as Monaco continues to be a tax haven, this upward trend should continue.

Exercise 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

global_gdp %>%
  filter(Country == 'United States')%>%
  autoplot(GDP)

For the most part, the US GDP chart is pretty smooth and it already looks like an exponential graph of x^3. Since a function to map this graph could pretty easily be created for this dataset, I believe doing nothing to it is best.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

The Slaughter of Victoria data has 40 years worth of monthly data. It’s not possible to see any seasonal trends as far as Janaury over January because the graph has 10 year intervals which makes it impossible to judge where it goes from one month to the other. However, judging on how the graph ebbs and flows I would bet that there is seasonality in the data set. There is clearly a downward trend in the overall data. Looking at the minima’s and maxima’s of the graph there appears to be cycles. Using a 24 month moving average that is centered on the 6 month, smooths out the data which makes the cycles more apparent. There seems to be cycles of several years up to several years down. Although, from 1997 to 2009, that cycle is broken and is just an extended downward trend.

aus_livestock %>%
  filter(Animal == 'Bulls, bullocks and steers',State=='Victoria') %>%
  autoplot(Count) +
  labs(title= "Slaughter of Victoria, Australia Bulls, Bullocks, and Steers")

aus_livestock %>%
  filter(Animal == 'Bulls, bullocks and steers',State=='Victoria') %>%
  mutate(ma12 = slider::slide_dbl(Count, mean,
                .before = 11, .after = 12, .complete = TRUE))%>%
  autoplot(ma12,color='#363738') +
  theme(panel.background = element_rect(fill = 'white'),
        panel.grid.major = element_line(color = 'black', linetype = 'dotted')) +
  labs(title= "Slaughter of Victoria, Australia Bulls, Bullocks, and Steers")

Victorian Electricity Demand from vic_elec.

As is, there are 52,608 data points. Which makes the plot look a little noisy. Regardless with a plain plot of those points, clear seasonal trends can be seen. Where demand spikes during the winter and summer months then decreases during the spring and fall months.

By making calendar adjustments of average monthly demand we can see that a cycle appears, where every two years the summer demand spikes.

ve <- vic_elec %>% as_tsibble()
head(ve,5)

## # A tsibble: 5 x 5 [30m] <Australia/Melbourne>
##   Time                Demand Temperature Date       Holiday
##   <dttm>               <dbl>       <dbl> <date>     <lgl>  
## 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
## 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
## 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
## 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
## 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE

autoplot(ve,Demand)

ve_month <- ve %>%
  mutate(Date = yearmonth(Time)) %>%
  group_by_key() %>%
  index_by(Date) %>%
  summarise(Avg_Demand = mean(Demand)) 
autoplot(ve_month,Avg_Demand)+
  labs(title= "Average Monthly Victorian Electricity Demand")

Gas production from aus_production.

In the plot of the original data it’s easy to see how everything grows from year to year. As time goes on the variability grows. The best way to fix this inconsistent variability is by doing a Box-Cox transformation on it, which will make the variability more uniform through the span of time.

The graph of the data with the box-cox transformation shows how the variability is now pretty much uniform throughout the time series.

aus <- aus_production
autoplot(aus,Gas)

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "", title = paste0("Transformed Gas Production with lambda = ",
         round(lambda,2)))

Exercise 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Since lambda(.58) is greater than zero, y to the power of lambda, minus 1, all divided by lambda will be the formula.

\[ \frac{y^\lambda-1}{\lambda} \]

The effect that box-cox has on higher values is much smaller than the effect that box-cox has on smaller values.

maximum <-max(canadian_gas$Volume)
minimum <- min(canadian_gas$Volume)
bc_effect_on_max <- (maximum**.58-1)/.58
bc_effect_on_min <- (minimum**.58-1)/.58

## [1] "The maximum value is 19.5284"

## [1] "Box Cox transforms the maximum value to 7.94"

## [1] "Which reduces the maximum value by 59.342%"

## [1] "The minimum value is 0.966"

## [1] "Box Cox transforms the minimum value to -0.034"

## [1] "Which reduces the minimum value by 103.545%"

autoplot(canadian_gas,Volume)

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "", title = paste0("Transformed Gas Production with lambda = ",
         round(lambda,2)))

In the original data plot we can see that the variance at either ends of the graph is relatively small compared to the variance in the middle section of the graph. Because the box-cox had a smaller effect on the larger values(1990+), the variability between each point didn’t change as much, it actually looks smaller. Whereas, the variability on the smaller points(<1970) looks to have grown bigger, because the change in each data point was bigger. While the variability in the middle looks the same.

Exercise 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

I would use a power transformation, as it would have the biggest effect on the smaller values, with the least variability. Which would increase variability in the time period before 2000, while decreasing the amount of variability from 2000 on.

set.seed(2151)
aus_retail <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(aus_retail,Turnover)

lambda <- aus_retail %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
aus_retail %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "", title = paste0("Australian Retail with lambda = ",
         round(lambda,2)))

Exercise 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

The Guerrero feature makes it quite easy to pick the ’right’alpha for the box-cox transformation. None of these data sets contain negative values so we do not have to use a modified box-cox or a yeo-johnson transformation.

Tobacco from aus_production

autoplot(aus_production,Tobacco)

aus_tob_lambda <- aus_production %>% 
  features(Tobacco,features=guerrero) %>% 
  pull(lambda_guerrero)

aus_production %>% autoplot(box_cox(Tobacco, aus_tob_lambda))+
  labs(y = "", title = paste0("Transformed Tobacco Production with lambda ",
         round(aus_tob_lambda,3)))

Economy class passengers between Melbourne and Sydney from ansett

mel_to_syd <- ansett %>% 
  filter(Airports=='MEL-SYD',Class == 'Economy')
autoplot(mel_to_syd,Passengers)

mel_to_syd_lambda <- mel_to_syd %>% 
  features(Passengers,features=guerrero) %>% 
  pull(lambda_guerrero)

mel_to_syd %>% autoplot(box_cox(Passengers, mel_to_syd_lambda))+
  labs(y = "", title = paste0("Transformed Passengers with lambda ",
         round(mel_to_syd_lambda,3)))

Pedestrian counts at Southern Cross Station from pedestrian

ped <- pedestrian %>% filter(Sensor=='Southern Cross Station') 
ped %>% autoplot(Count)

ped_lambda <- ped %>% 
  features(Count,features=guerrero) %>% 
  pull(lambda_guerrero)

ped %>% autoplot(box_cox(Count, ped_lambda))+
  labs(y = "", title = paste0("Transformed Count with lambda ",
         round(ped_lambda,3)))

Exercise 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

a) Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

There is an upward trend, with a cycle of two up(Q2,Q3) to two down(Q4,Q1)

autoplot(gas,Gas)

b) Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_decomp<- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) 

components(gas_decomp) %>%
  autoplot() +
  labs(title = "Classical Multiplicative Decomposition of Gas")

c) Do the results support the graphical interpretation from part a?

Yes, the decomposition supports the original interpretation.

d) Compute and plot the seasonally adjusted data.

components(gas_decomp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "black") +
  geom_line(aes(y=season_adjust), colour = "red") +
  labs(title = "Seasonally Adjusted Gas")

e) Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

The outlier doesn’t seem to have any effect on the seasonally adjusted data.

components(gas_decomp) %>%
  mutate(Gas = ifelse(Gas ==171,Gas+300,Gas))%>%
  as_tsibble() %>%
  autoplot(Gas, colour = "black") +
  geom_line(aes(y=season_adjust), colour = "red") +
  labs(title = "Seasonally Adjusted Gas Production")

components(gas_decomp) %>%
  mutate(Gas = ifelse(Gas ==196,Gas+300,Gas))%>%
  as_tsibble() %>%
  autoplot(Gas, colour = "black") +
  geom_line(aes(y=season_adjust), colour = "red") +
  labs(title = "Seasonally Adjusted Gas Production")

f) Does it make any difference if the outlier is near the end rather than in the middle of the time series?

No, it does not make a difference whether or not the outlier is near the beginning or near the end of the time series.

Exercise 3.8

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

The irregular chart revealed a lot more possible outliers than I thought there was, in places that I found surprising. The main outlier having happened around the end of 1997 or beginning of 1998 was one that I wouldn’t have thought was an outlier just by looking at the regular chart. Even though that was the biggest jump of any point up until then, it looks like a relatively normal increase compared to all the other increases in the entire data set. Also, some of the decreases from 2010 onward, is surprising to see as possible outliers.

x11_dcmp <- aus_retail |>
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of total AUS retail using X-11.")

Exercise 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a) Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The civilian labour force in Australia has a positive trend. There is a seasonal trend also, of which the increases and decreases of the seasonal trend is growing each year. The remainder for the most part stays in a range of -50 to 50. There are several points where the remainder goes outside of that range which indicates outliers.

b) Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession is visible in the components, specifically the remainder component. In the remainder component there is a drop to roughy -400 during the time period of 1991 to 1992. This drop is greater than 8 times what would normally be expected. The drop is followed by a drastic increase up to roughly 50.

624_Homework_Two

Raymond Fries

2023-02-10