Question 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Question 3.1 Answer

The legend had to be hidden in order to see the actual plot since there are 263 countries within the data set.

global_economy %>%
  autoplot(GDP/Population, show.legend = FALSE)

## Warning: Removed 3242 rows containing missing values (`geom_line()`).

global_economy %>%
  mutate(GDP_per_capita = GDP/Population) %>%
  top_n(n = 1)

## Selecting by GDP_per_capita

## # A tsibble: 1 x 10 [1Y]
## # Key:       Country [1]
##   Country Code   Year         GDP Growth   CPI Imports Exports Populat…¹ GDP_p…²
##   <fct>   <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>     <dbl>   <dbl>
## 1 Monaco  MCO    2014 7060236168.   7.18    NA      NA      NA     38132 185153.
## # … with abbreviated variable names ¹Population, ²GDP_per_capita

global_economy %>%
  filter(Country == "Monaco") %>%
  mutate(GDP_per_capita = GDP/Population,
         `5-MA` = slider::slide_dbl(GDP_per_capita, mean, .before = 2, .after = 2, .complete = TRUE)) %>%
  autoplot(GDP_per_capita) +
  geom_line(aes(y = `5-MA`), colour = "#D55E00") +
  labs(y = "GDP Per Capita",
       title = "Monaco GDP Per Capita Time Series") +
  guides(colour = guide_legend(title = "series"))

## Warning: Removed 11 rows containing missing values (`geom_line()`).

## Warning: Removed 15 rows containing missing values (`geom_line()`).

The output above shows us that Monaco has the highest GDP per capita. The plot that was generated above also shows us that the GDP per capita (black line) of Monaco has steadily risen over the years, with some significant dips in the mid-1980s and early-2000s, but rising back up after several years. The general trend and these dips are also reflected in the 5-MA (red line).

Question 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

Question 3.2 Answer

United States GDP from `global_economy`.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP) +
  labs(title = "GDP of the United States")

Since GDP changes with population, we have to apply a population adjustment, where we divide the GDP by the population.

global_economy %>%
  filter(Country == "United States") %>%
  mutate(GDP_per_capita = GDP/Population) %>%
  autoplot(GDP_per_capita) +
  labs(title = "GDP Per Capita of the United States")

Slaughter of Victorian “Bulls, bullocks and steers” in `aus_livestock`

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
  autoplot(Count) +
  labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers")

It does not look like that the seasonality varies with respect to the level of the series. We see a giant drop that occurred in 1979, but we have too little data before this drop to determine if the seasonality of the data was any different after this drop. Let’s apply a Box-Cox anyways just to see what the graph looks like.

lambda <- aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Count with $\\lambda$ = ",
         round(lambda,2))))

The guerrero transformation gives us a lambda value of -0.04, which is pretty close to zero. Also, as we can see, the seasonal variation doesn’t look all that different when comparing the two graphs.

Victorian Electricity Demand from `vic_elec`

vic_elec %>%
  autoplot(Demand) %>%
  labs(title = "Victorian Electricity Demand")

## [[1]]

## 
## $title
## [1] "Victorian Electricity Demand"
## 
## attr(,"class")
## [1] "labels"

I don’t think that we have to apply any transformations here.

Gas production from `aus_production`

aus_production %>%
  autoplot(Gas) %>%
  labs(title = "Gas Production")

## [[1]]

## 
## $title
## [1] "Gas Production"
## 
## attr(,"class")
## [1] "labels"

Right off the bat, we should apply a Box-Cox transformation, to account for the increase in variation in the series that we see in the plot above.

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

The output above shows us that a \(\lambda\) value of 0.11 has transformed the data to a point where the size of the seasonal variation is about the same across the entire series.

Question 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Question 3.3 Answer

canadian_gas %>%
  autoplot(Volume)

A Box-Cox transformation would be unhelpful for the canadian_gas data because the size of the seasonal variation is not significant enough to warrant the use of a Box-Cox transformation across the entire data set as shown in the plot above. We see an increase in the seasonal variation between 1970 to 1990, but this variation is not as significant as the variation shown in the gas production data in aus_production.

Question 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

Question 3.4 Answer

set.seed(23987)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

The output above shows us that some sort of transformation must take place in order to reduce the seasonal variability that is present in the plot above. The variability increases as the level of the series increases.

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed turnover with $\\lambda$ = ",
         round(lambda,2))))

By using the guerrero feature, the graph above shows us that a \(\lambda\) of 0.14 gives us a graph where the seasonal variation is consistent across the entire series. Therefore, I would go with a Box-Cox transformation with a \(\lambda\) value of 0.14 for my dataset.

Question 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Question 3.5 Answer

Tobacco from `aus_production`.

aus_production %>%
  autoplot(Tobacco)

## Warning: Removed 24 rows containing missing values (`geom_line()`).

It does not look like a Box-Cox transformation would help this data out that much. There’s a bit of seasonal variation in the beginning, but it almost looks indistinguishable from the rest of the time series.

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Tobacco with $\\lambda$ = ",
         round(lambda,2))))

## Warning: Removed 24 rows containing missing values (`geom_line()`).

The lambda as determined by the guerrero feature is almost 1, and this is reflected in the plot too. The values may have changed when we compare the y-axis between the two plots, but the shape and variation looks more or less the same.

Economy Class Passengers between Melbourne and Sydney from `ansett`

ansett %>%
  filter(Class == "Economy" & Airports == "MEL-SYD") %>%
  autoplot(Passengers)

lambda <- ansett %>%
  filter(Class == "Economy" & Airports == "MEL-SYD") %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

ansett %>%
  filter(Class == "Economy" & Airports == "MEL-SYD") %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Passengers with $\\lambda$ = ",
         round(lambda,2))))

If we compare the two graphs, we can see that a Box-Cox transformation using a \(lambda\) value of 2 that was determined by the guerrero feature has a resulted in a reduction in the y-axis range/a more stable variance.

Pedestrian Counts at Southern Cross Station from `pedestrian`

pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  select(Date_Time, Count) %>%
  autoplot(Count)

The graph above shows us a huge drop in pedestrian traffic in 2016. So it’s possible that a Box-Cox transformation could stabilize the variance shown. The plot above also shows us that a significant amount of the counts never exceed 500.

lambda <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  select(Date_Time, Count)%>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  select(Date_Time, Count) %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Count with $\\lambda$ = ",
         round(lambda,2))))

A Box-Cox transformation using a \(\lambda\) value of -0.25 as determined by the guerrero featured produced the plot above. The huge drop is less noticeable in the transformed Box-Cox plot shown above. Notice that the middle of the graph above is darker than the rest while with the original data, the darker part was on the bottom of the graph, so the Box-Cox transformation has shifted those counts upwards.

Question 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition with type = multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Question 3.7a Answer

gas %>%
  autoplot(Gas)

The plot above exhibits consistent seasonality that does not vary and an upward moving trend-cycle.

Question 3.7b Answer

gas |>
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) |>
  components() |>
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Gas")

## Warning: Removed 2 rows containing missing values (`geom_line()`).

Question 3.7c Answer

The results in part b support the graphical interpretation from part a. As we can see from the part b answer, there is an upward trend, and consistent seasonality that does not vary.

Question 3.7d Answer

gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  ggplot(aes(x = Quarter)) + 
  geom_line(aes(y = Gas, colour = "Original Data")) +
  geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted Data")) +
  scale_colour_manual(
    values = c("grey", "red"),
    breaks = c("Original Data", "Seasonally Adjusted Data")
  )

Question 3.7e Answer

gas_copy <- gas
gas_copy$Gas[10] <- gas_copy$Gas[10] + 300

gas_copy %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  ggplot(aes(x = Quarter)) + 
  geom_line(aes(y = Gas, colour = "Original Data")) +
  geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted Data")) +
  scale_colour_manual(
    values = c("grey", "red"),
    breaks = c("Original Data", "Seasonally Adjusted Data")
  )

The outlier has drastically altered the shape of the “Seasonally Adjusted Data” plot. While the upward trend in the “Seasonally Adjusted Data” plot using the original data is easy to see, the upward trend is much more subdued in the data with the outlier. Also, the “Seasonally Adjusted Data” plot using the data with the outlier has this giant spike where the outlier is in the “Original Data”. We can also see that there is this saw-tooth pattern in the “Seasonally Adjusted Data” plot that repeats each year. It looks like the seasonality has leaked over to the “Seasonally Adjusted Data” plot as a result of the giant outlier.

Question 3.7f Answer

gas_copy <- gas
gas_copy$Gas[length(gas_copy$Gas) - 1] <- gas_copy$Gas[length(gas_copy$Gas) - 1] + 300

gas_copy %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  ggplot(aes(x = Quarter)) + 
  geom_line(aes(y = Gas, colour = "Original Data")) +
  geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted Data")) +
  scale_colour_manual(
    values = c("grey", "red"),
    breaks = c("Original Data", "Seasonally Adjusted Data")
  )

The plot above shows us that it actually does make a difference if the outlier is placed near the end rather than the middle. As we can see, we do not have as pronounced of a sawtooth pattern as we did when we had the outlier in the middle of the data in Question 7e. The “Seasonally Adjusted Data” plot for the first 18 values looks like it closely matches the “Seasonally Adjusted Data” plot when the original data was used in Question 7d.

Question 3.8

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Question 3.8 Answer

set.seed(23987)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

x11_dcmp_myseries <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
  components()

autoplot(x11_dcmp_myseries) +
  labs(title =
    "Decomposition of Turnover Using X-11.")

The trend is moving upwards but we expect to see that if we just look at a plot of the original data. The seasonality is more or less the same across the entire time series. The irregular plot shows us that there may have been an increase in Turnover in 1989, but other than that there is not really anything unusual I can see. I also do not see any outliers across all 4 plots.

Question 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
Is the recession of 1991/1992 visible in the estimated components?

Question 3.9a Answer

Notice how in Figure 3.19, the remainder plot shows us a giant variation that begins sometime in the year 1991. Notice how the grey bar to the left of the remainder part is larger than the grey bar for the value and trend plots and smaller than the season_year plot. The larger the bar, the smaller the variation, which indicates that the remainder has a larger variation than the seasonality. The y-axis range is only 200 for the season_year plot while it is 400 for the remainder plot. There must have been a sharp decrease in the civilian labor force that occurred sometime in 1991. You can see this in the value plot too; there is a small decrease in 1991.

Question 3.9b Answer

The recession of 1991/1992 is definitely visible in the estimated components, especially in the remainder plot. We can definitely see in the remainder plot, a significant period of variation that starts in 1991 and ends in the following year.

DATA 624 - Homework 2

Peter Phung

2023-02-11

Question 3.1

Question 3.1 Answer

Question 3.2

Question 3.2 Answer

United States GDP from `global_economy`.

Slaughter of Victorian “Bulls, bullocks and steers” in `aus_livestock`

Victorian Electricity Demand from `vic_elec`

Gas production from `aus_production`

Question 3.3

Question 3.3 Answer

Question 3.4

Question 3.4 Answer

Question 3.5

Question 3.5 Answer

Tobacco from `aus_production`.

Economy Class Passengers between Melbourne and Sydney from `ansett`

Pedestrian Counts at Southern Cross Station from `pedestrian`

Question 3.7

Question 3.7a Answer

Question 3.7b Answer

Question 3.7c Answer

Question 3.7d Answer

Question 3.7e Answer

Question 3.7f Answer

Question 3.8

Question 3.8 Answer

Question 3.9

Question 3.9a Answer

Question 3.9b Answer

DATA 624 - Homework 2

Peter Phung

2023-02-11

Question 3.1

Question 3.1 Answer

Question 3.2

Question 3.2 Answer

United States GDP from global_economy.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock

Victorian Electricity Demand from vic_elec

Gas production from aus_production

Question 3.3

Question 3.3 Answer

Question 3.4

Question 3.4 Answer

Question 3.5

Question 3.5 Answer

Tobacco from aus_production.

Economy Class Passengers between Melbourne and Sydney from ansett

Pedestrian Counts at Southern Cross Station from pedestrian

Question 3.7

Question 3.7a Answer

Question 3.7b Answer

Question 3.7c Answer

Question 3.7d Answer

Question 3.7e Answer

Question 3.7f Answer

Question 3.8

Question 3.8 Answer

Question 3.9

Question 3.9a Answer

Question 3.9b Answer

United States GDP from `global_economy`.

Slaughter of Victorian “Bulls, bullocks and steers” in `aus_livestock`

Victorian Electricity Demand from `vic_elec`

Gas production from `aus_production`

Tobacco from `aus_production`.

Economy Class Passengers between Melbourne and Sydney from `ansett`

Pedestrian Counts at Southern Cross Station from `pedestrian`