Time Series Decomposition Homework

Jack Wright

3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9

3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

countries with highest mean gdp per capita across all years

Country mean_gdp_per
Qatar 62650.62
Switzerland 48994.33
Luxembourg 42046.82
United Arab Emirates 40469.09
Iceland 34919.83
Macao SAR, China 34561.99

Note that several countries explode in GDP per capita in the mid 80s, while a few more have drastic increases due to being newly tracked in the 2000s

3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

I chose to scale the GDP to trillions of dollars instead of dollars for graph legibility. I see no other need to transform the data.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

I have performed a daily average transformation using the monthday() function from the forecast package

Victorian Electricity Demand from vic_elec

performing a calendar transformation allows me to sum demand by month so we can see the seasonality more clearly

Gas production from aus_production

Performing a box-cox transformation allows us to compress the variance, allowing us to see the trend more clearly

3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

The Box-Cox transformation isn’t all that useful in this case because the enveloping nature of the difference in variance, concentrated around the middle of the data. A power transformation will scale the variance as the data increases or decreases, but isn’t able to capture this type of complexity.

3.4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)? when running a box cox on a sample of the aus_retail data, The results look fairly conclusive

After investigating the aus_retail dataset. I noticed that depending on the sample, the transformation suggested can vary wildly

## mean: 0.04945226
## sd: 0.2337697

The range in suggested lambdas based upon what random sample of aus_retail is selected is very concerning. I would not perform any analysis on a model where a null transformation is within the confidence interval for lambda.

3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian

Suggested values of lambda for the selected data

Data Lambdas
aus_production 0.9289402
ansett 1.9999268
pedestrian -0.2255423

3.7

Consider the last five years of the Gas data from aus_production.

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

An x-11 decomposition shows that there is a clear trend and seasonality to the total production of Australian gas in the last five years.

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

The classical multiplicative captures the same trend and seasonality, although the error looks far less random.

c. Do the results support the graphical interpretation from part a?

Yes they do.

d. Compute and plot the seasonally adjusted data.

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the series

In order to measure the effect of outliers on the seasonally adjusted data, I want to be able to measure the shape of the trendline.

I will use a linear regression model, to measure my trend data, because I am interested in how the general slope of the trendline changes due to the introduction of outliers.

The largest impact on the slope of the trendline is when the outlier is about a quarter of the way from the end of the trendline. This could be due to the down weighting from the Moving Average.

## [1] 0.007258312

3.8

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Boxplots of Turnover show the large number of outliers in the dataset. Building a model on this data (especially with such a small sample) will lead to poor predictions

I don’t see the outliers in the trend data, yet they are clear in the boxplots. Perhaps this highlights the importance of EDA before building models.

cat('mean:',mean(output),'sd:',sd(output))
## mean: 0.04945226 sd: 0.2337697

3.9

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

There is a linear trend in the growth of the civilian labor force in Australia from February 1978 to August 1995. The carrier signal however has a seasonal cycle as well . There seems to be two oscillations per season, with peaks in March and December, and troughs in January and August. The former might be due to summer vacations and the latter due to the firing of seasonal workers after Christmas.

b. Is the recession of 1991/1992 visible in the estimated components?

The recession is not in the estimated components, however it can be seen in the remainder, which captures all the variation not described by the macro or the seasonal trends.