DATA624 Homework 2

3.1) Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

## Warning: Removed 3242 rows containing missing values (`geom_line()`).

## # A tsibble: 262 x 10 [1Y]
## # Key:       Country [262]
##    Country    Code   Year     GDP Growth   CPI Imports Exports Population  gperc
##    <fct>      <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>  <dbl>
##  1 Luxembourg LUX    2017 6.24e10   2.30 111.    194.    230.      599449 1.04e5
##  2 Macao SAR… MAC    2017 5.04e10   9.10 136.     32.0    79.4     622567 8.09e4
##  3 Switzerla… CHE    2017 6.79e11   1.09  98.3    53.9    65.0    8466017 8.02e4
##  4 Norway     NOR    2017 3.99e11   1.92 115.     33.1    35.5    5282223 7.55e4
##  5 Iceland    ISL    2017 2.39e10   3.64 122.     42.8    47.0     341284 7.01e4
##  6 Ireland    IRL    2017 3.34e11   7.80 105.     87.9   120.     4813608 6.93e4
##  7 Qatar      QAT    2017 1.67e11   1.58 116.     37.3    51.0    2639211 6.32e4
##  8 United St… USA    2017 1.94e13   2.27 112.     NA      NA    325719178 5.95e4
##  9 North Ame… NAC    2017 2.10e13   2.35  NA      NA      NA    362492702 5.81e4
## 10 Singapore  SGP    2017 3.24e11   3.62 113.    149.    173.     5612253 5.77e4
## # ℹ 252 more rows

Luxembourg has the highest GDP per capita in 2017. As we can see in the plot above, Luxembourg began increasing exponentially over time at a fairly consistent rate. From 2008 to 2017, the GDP per capita seemed to fluctuate throughout the remaining years.

3.2) For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

Seeing as this is difficult to interpret, it would be best to change the time interval from 30 minutes to each day over the course of 3 years. Additionally, the average of each demand within each day must be calculated to receive the average demand. The plot below is easier to interpret and consists of a similar scale the initial demand contained.

It is difficult to interpret the trend that occurs between 1956 and 1970. To enhance this, Box-Cox transformation should be applied. With guerrero, λ was selected at 0.11, which allows the plot to be more interpretable and consistent.

3.3) Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Box-Cox transformation will not be helpful due to the seasonal variation being fairly consistent across the series.

3.4) What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

For this specific seed, the guerrero feature selected λ = 0.08 to make the variance more stable. As we can see, there is less of a spread in the data in the later seasons.

3.5) For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

## Warning: Removed 24 rows containing missing values (`geom_line()`).

## Warning: Removed 24 rows containing missing values (`geom_line()`).

3.7) Consider the last five years of the Gas data from aus_production.

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

The trend-cycle increases over the 5 years. The seasonal fluctuations show highs in quarter 3 and lows in quarter 1 consistently each year.

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

## Warning: Removed 2 rows containing missing values (`geom_line()`).

Do the results support the graphical interpretation from part a?

Yes, as the trend is increasing over each quarter and the seasonal lows are Q1 and highs are Q3.

Compute and plot the seasonally adjusted data.

## Warning: Removed 4 rows containing missing values (`geom_line()`).

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?

## Warning: Removed 2 rows containing missing values (`geom_line()`).

## Warning: Removed 2 rows containing missing values (`geom_line()`).

## Warning: Removed 2 rows containing missing values (`geom_line()`).

Wherever an outlier is placed, you notice the trend peaks and return to its gradual increase. The seasonal lows are in Q1 no matter where the outlier is placed. However, the seasonal highs are in Q3 for outliers in the front and end, but Q4 when the outlier is in the middle.

3.8) Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

The seasonal plot shows a mirror in the results, where the downward spike becomes the peak spike as the months increase. The trend remains to increase over time with slight “hiccups” throughout the months.

3.9) Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

There is a seasonality that consists of similar values to the remainder. There is, however, a slight dip around 1991 due to the recession at the time. The trend increases at a similar slope of the value plot.

Is the recession of 1991/1992 visible in the estimated components?

In the remainder component plot, you can see the recession of 1991/1992 due to the dip close to -400.