DATA 624 Homework 2

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy %>%
  autoplot(GDP / Population, show.legend =  FALSE) +
  labs(title= "GDP per capita", y = "$US")

global_economy %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
  select(Country, GDP_per_capita)

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country GDP_per_capita  Year
##   <fct>            <dbl> <dbl>
## 1 Monaco         185153.  2014

global_economy %>%
  filter(Country == "Monaco") %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per capita for Monaco", y = "$US")

Monaco has the highest GDP per capita documented. Overtime, it has increased and seems to be greater than the other countries mostly. The GDP per capita has an increasing trend for majority of the countries.

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

There was no transformation done here, as the increasing population did not seem to have an effect on the GDP. The only transformation that was done here was for readability and converting the GDP in terms of trillions.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP / 10 ^ 12) +
  labs(title= "GDP, United States", y = "$US (in trillions)")

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

There was no transformation done here but an overall decreasing trend can be observed.

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") %>%
  autoplot(Count) +
  labs(title= "Slaughter of Victoria Bulls, Bullocks, and Steers")

Victorian Electricity Demand from vic_elec.

The data was transformed such to reflect daily electricity demand instead of half-hourly demand. This allows us to see the seasonality better and the variations in days. It can be said that there is a significant increase in electricity in the summers and another increase mid-year around the winter time, when people are home more. It is also interesting to see monthly electricity demands which shows the seasonal change more.

v <- vic_elec %>%
  group_by(Date) %>%
  mutate(Demand = sum(Demand)) %>%
  distinct(Date, Demand)

v %>% 
  as_tsibble(index = Date) %>%
  autoplot(Demand) +
  labs(title= "Daily Victorian Electricity Demand", y = "$US (in trillions)") 

v %>%
  mutate(Date = yearmonth(Date)) %>%
  group_by(Date) %>%
  summarise(Demand = sum(Demand)) %>%
  as_tsibble(index = Date) %>%
  autoplot(Demand) +
  labs(title= "Monthly Victorian Electricity Demand", y = "$US (in trillions)")

Gas production from aus_production.

Since there is a variation with the level of the series, a box-cox transformation may be useful. This makes the seasonal variation around the same across the whole series.

aus_production %>%
  autoplot(Gas) +
  labs(title = "Non-Transformed Gas Production")

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Gas Production with $\\lambda$ = ",
         round(lambda,2))))

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume) +
  labs(title = "Non-Transformed Gas Production")

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Gas Production with $\\lambda$ = ",
         round(lambda,2))))

The Box-Cox transformation is unhelpful because it does not make the seasonal variation uniform. This can be due to the increase in variation around 1978 which is followed by a decrease around 1989, whereas the Australian gas production only had an increase in variation.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(1234)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

autoplot(myseries, Turnover)+
  labs(title = "Retail Turnover", y = "$AUD (in millions)")

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Retail Turnover with $\\lambda$ = ",
         round(lambda,2))))

A Box-Cox transformation with \(\lambda\) = 0.27 would be selected as it helped to make the seasonal variation more uniform.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

autoplot(aus_production, Tobacco)+
  labs(title = "Tobacco and Cigarette Production in Tonnes")

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Tobacco Production with $\\lambda$ = ",
         round(lambda,2))))

Since \(\lambda\) is close to 1 here, the transformed data is mostly just shifted downwards with little change in the shape of the time series. The Box-Cox transformation is not effective on the tobacco production data.

mel_syd <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")

autoplot(mel_syd, Passengers)+
  labs(title = "Economy class Passengers Between Melbourne and Sydney")

lambda <- mel_syd %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

mel_syd %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Number of Passengers with $\\lambda$ = ",
         round(lambda,2))))

With a \(\lambda\) of 2, it is essentially a transformation of \(Y^2\) or \(Passengers^2\). It shows the variation a little more clear.

southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") 

autoplot(southern_cross, Count)+
  labs(title = "Hourly Pedestrian Counts at Southern Cross Station")

lambda <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Hourly Pedestrian Counts with $\\lambda$ = ",
         round(lambda,2))))
#--------------------------------------------
southern_cross <- southern_cross %>%
  index_by(Date) %>%
  summarise(Count = sum(Count))

autoplot(southern_cross, Count)+
  labs(title = "Daily Pedestrian Counts at Southern Cross Station")

lambda <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Daily Pedestrian Counts with $\\lambda$ = ",
         round(lambda,2))))
#-------------------------------------------------
southern_cross <- southern_cross %>%
  mutate(Week = yearweek(Date)) %>%
  index_by(Week) %>%
  summarise(Count = sum(Count))

autoplot(southern_cross, Count)+
  labs(title = "Weekly Pedestrian Counts at Southern Cross Station")

lambda <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "", title = TeX(paste0("Transformed Weekly Pedestrian Counts with $\\lambda$ = ",
         round(lambda,2))))

The hourly and daily data for pedestrian counts at the Southern Cross Station were not that readable, but the weekly pedestrian counts show the variations.Since lambda was closer to 1 for the weekly data, the transformed data is mostly just shifted downwards with little change in the shape of the time series.

3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(gas, Gas)

There is an increasing trend, that has a cycle of 1 year. There is an increase after the first quarter, that peaks in the third quarter and then decreases again.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_dcmp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) 

components(gas_dcmp) %>%
  autoplot() +
  labs(title = "Classical Multiplicative Decomposition of Gas Production")

## Warning: Removed 2 row(s) containing missing values (geom_path).

Do the results support the graphical interpretation from part a?

Yes, the results support the graphical interpretation from part a, as there is an increasing trend and seasonality that increases every first quarter and decreases after the third quarter.

Compute and plot the seasonally adjusted data.

components(gas_dcmp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Gas Production")

The seasonally adjusted data shows that there is an increasing trend in gas production.

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas %>%
  mutate(Gas = ifelse(Gas == 249, Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Gas Production with an Outlier")

Quarter 3 of 2008 became an outlier when 400 was added to it. There is a significant increase there in both the data and the seasonally adjusted data. It should be noted that the increase is smaller in the seasonally adjusted data. The trend also seems to be disrupted.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas %>%
  mutate(Gas = ifelse(Gas == 236, Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Gas Production with an Outlier at the End")

It does not seem to make a different a difference if the outlier is near the end or in the middle as there is still a spike where the outlier is and the trend is not noticeable.

3.8

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

# myseries was created in 3.4

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()

autoplot(x11_dcmp) +
  labs(title = "Decomposition of Retail Turnover using X-11.")

Unlike the other models, the X-11 decomposition has less curved lines as they became more jagged. This method is able to capture more noise in the early 1990s which was a recession and it is able to capture the seasonality better. There are a few data points that look to be more irregular that were not seen before.

3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

There is an increasing trend in the number of persons in the civilian labor force in Australia. There is also a seasonality, whose scale is much smaller than the remainder. This may signify that the seasonality is not as important in the labor force data. There is also a decrease in the early 1990s which was due to a recession which can be seen in the remainder.

Is the recession of 1991/1992 visible in the estimated components?

The recession of 1991/1992 is very visible in the estimated components as there is a sharp decrease in the remainder component.

DATA 624 Homework 2

Orli Khaimova

2/20/2022

3.1

3.2

3.3

3.4

3.5

3.7

3.8

3.9