Questions

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

head(global_economy)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

per_cap_gdp_tbl <- global_economy %>%
    as_tsibble(index = Year, key = Country) %>%
    mutate(per_cap_gdp = GDP / Population) %>%
    ggplot(aes(x = Year, y = per_cap_gdp, colour = Country)) +
    geom_line(stat = "identity", show.legend = F)
print(per_cap_gdp_tbl)

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

per_cap_gdp_tbl <- global_economy %>%
    as_tsibble(index = Year, key = Country) %>%
    mutate(per_cap_gdp = GDP / Population) %>% 
    arrange(desc(per_cap_gdp))

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

head(per_cap_gdp_tbl)

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

## # A tsibble: 6 x 10 [1Y]
## # Key:       Country [2]
##   Country Code   Year    GDP Growth   CPI Imports Exports Population per_cap_gdp
##   <fct>   <fct> <dbl>  <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>       <dbl>
## 1 Monaco  MCO    2014 7.06e9  7.18     NA      NA      NA      38132     185153.
## 2 Monaco  MCO    2008 6.48e9  0.732    NA      NA      NA      35853     180640.
## 3 Liecht… LIE    2014 6.66e9 NA        NA      NA      NA      37127     179308.
## 4 Liecht… LIE    2013 6.39e9 NA        NA      NA      NA      36834     173528.
## 5 Monaco  MCO    2013 6.55e9  9.57     NA      NA      NA      37971     172589.
## 6 Monaco  MCO    2016 6.47e9  3.21     NA      NA      NA      38499     168011.

The Country with the highest per capita GDP is Monaco for the top two spots, followed by Liechtenstein in the next two and then Monaco again.There is a general upward trend from Monaco and has remained above almost every other country since 1960.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

1. United States GDP from global_economy.

global_economy %>% filter(Country == "United States") %>% 
    autoplot(GDP)

global_economy %>% filter(Country == "United States") %>% 
    autoplot(Exports - Imports) +
    labs(title= "US Trade Deficit")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

I wanted to look at the yearly change in the US trade deficit against the change in overall GDP. There seems to be an overarching negative relationship between increasing US GDP and a falling trade deficit (more imports than exports). Which kind of falls in line with the shift in the US economy, from a good based ecomony to a services based economy.

2. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>% 
    filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>% 
    autoplot(Count)

yearly_data <- aus_livestock %>%
    filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
    index_by(Year = ~ year(.)) %>%
    summarise(Yearly_Count = sum(Count, na.rm = TRUE))

autoplot(yearly_data, Yearly_Count) +
    ggtitle("Yearly Count of Bulls, Bullocks, and Steers in Victoria") +
    xlab("Year") +
    ylab("Count")

To smooth over the monthly data, I changed the index to a yearly basis.

3. Victorian Electricity Demand from vic_elec.

vic_elec %>% 
    autoplot(Demand)

daily_data <- vic_elec %>%
    mutate(Day = as.Date(Time)) %>%
    index_by(Day) %>%
    summarise(Daily_Demand = sum(Demand, na.rm = TRUE))

autoplot(daily_data, Daily_Demand) +
    ggtitle("Daily Demand in Victoria") +
    xlab("Day") +
    ylab("Demand")

As another attempt to smooth the data, I decided to go with a weekly index over every 30 minutes. Highlighting that demand tends to peak around mid year, with a few outliers in 2014 adn 2013.

4. Gas production from aus_production.

aus_production %>% 
    autoplot(Gas)

growth_rate_gas <- aus_production %>%
    select(Quarter, Gas) %>%
    mutate(growth_rate = (Gas - lag(Gas)) / lag(Gas)*100)


autoplot(growth_rate_gas, growth_rate) +
    ggtitle("Gas Production Growth Rate") +
    xlab("Quarter") +
    ylab("Growth Rate")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

The growth rate calc shows that the historic rates vary between 40% and around -20% likely.

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>% autoplot(Volume)

bc_transformation <- canadian_gas %>% 
    features(Volume, features = guerrero) %>% 
    pull(lambda_guerrero)
canadian_gas %>% 
    autoplot(box_cox(Volume, bc_transformation))

The idea behind box cox transformations is to normalize non-normal data, so when we use it on normal data the pattern does not change, and is thus a useless excersise in this example.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

set.seed(11)

retail_series <- aus_retail %>% 
    filter(`Series ID` == sample(aus_retail$`Series ID`,1))

retail_series %>% 
    autoplot(Turnover)

#### Check Lambda

lambda_val <- retail_series %>% 
    features(Turnover, features = guerrero) %>% 
    pull(lambda_guerrero)
print(lambda_val)

## [1] 0.5054374

retail_series %>% 
    autoplot(box_cox(Turnover, lambda_val))

This non-normal data was transformed, starting around like 2008ish and continues on through the remainder of the dataset.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

aus_production %>% 
    autoplot(Tobacco)

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_lambda_val <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)
print(aus_lambda_val)

## [1] 0.9264636

aus_production %>%
  autoplot(box_cox(Tobacco, aus_lambda_val))

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

Despite having an optimized lambda value, this data did not change at all. Is there possibly a better way to normalize this dataset?

ansett

ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(Passengers)

ans_lambda_val <- ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>% 
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)
print(ans_lambda_val)

## [1] 1.999927

ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(box_cox(Passengers,ans_lambda_val))

This dataset has the same problem as the aus_production box cox plot. The data was not normalized and could use another method to do so.

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>%  dplyr::select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>% 
    autoplot()

## Plot variable not specified, automatically selected `.vars = Gas`

Gas production spikes consistently during the midyear and falls just as fast right after.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

classic_decomp <- gas %>% 
    model(classical_decomposition( Gas, type = "multiplicative")) %>% 
    components()

classic_decomp %>% autoplot() + 
    ggtitle("Classical Decomposition Gas Production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

Do the results support the graphical interpretation from part a?

Yes, there is a underlying upward trend in production from 06 to 10, and the seasonal trends are reflected properly as well.

Compute and plot the seasonally adjusted data.

classic_decomp %>% 
    ggplot(aes(Quarter))+
    geom_line(aes(y = season_adjust))

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

classic_decomp_2 <- gas %>% 
    mutate(Gas = if_else(Quarter ==yearquarter("2010Q1"), Gas + 150, Gas)) %>% 
    model(classical_decomposition(Gas, type = "multiplicative")) %>% 
    components() %>%
    as_tsibble()

classic_decomp_2 %>% 
    ggplot(aes(Quarter))+
    geom_line(aes(y = season_adjust))

The outlier stretches the y axis, making the quarterly seasonality beforehand much more subdued (at least it looks like that).

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

classic_decomp_3 <- gas %>% 
    mutate(Gas = if_else(Quarter ==yearquarter("2008Q1"), Gas + 150, Gas)) %>% 
    model(classical_decomposition(Gas, type = "multiplicative")) %>% 
    components() %>%
    as_tsibble()

classic_decomp_3 %>% 
    ggplot(aes(Quarter))+
    geom_line(aes(y = season_adjust))

An addition to the middle of the graph seems to keep seasonality truer to how it was before the addition.

3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

set.seed(11)
aus_retail_p2 <- aus_retail %>% 
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(aus_retail_p2)

## Plot variable not specified, automatically selected `.vars = Turnover`

decomp_retail_x11 <- aus_retail_p2 %>% 
    model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>% 
    components()
decomp_retail_x11 %>% 
    autoplot()+
    ggtitle("Retail Turnover Decomposition using the X11 Model")

Decomposing this retail dataset using X11 brought out much more data to analyze when compared with the simple autoplot used to get an overall look at the data. For one it shows how turnover is very seasonal for the retail sector but has seemingly become much less volatile as the years have progressed. Could online shopping be at play?

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The STL decomposition graph shows that there is a stable upward trend in the civilian labour force in Australia between the years of 1980 and 1995. It also shows significant seasonality year in and year out. The large decline can be seen in 1992 (aprox.) with employment dropping nearly 400%.

Is the recession of 1991/1992 visible in the estimated components?

As mentioned above the 91/92 recession is very prevalent in the remainder piece of the STL decomp graph and only somewhat present in the overall graph at the top. it can also be seen in the season_year graph (fig 3.19).

Week 3 Homework 2

Jonathan Burns

2024-09-15