3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
countries with highest mean gdp per capita across all years
| Country | mean_gdp_per |
|---|---|
| Qatar | 62650.62 |
| Switzerland | 48994.33 |
| Luxembourg | 42046.82 |
| United Arab Emirates | 40469.09 |
| Iceland | 34919.83 |
| Macao SAR, China | 34561.99 |
Note that several countries explode in GDP per capita in the mid 80s, while a few more have drastic increases due to being newly tracked in the 2000s
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
I chose to scale the GDP to trillions of dollars instead of dollars for graph legibility. I see no other need to transform the data.
aus_livestock.I have performed a daily average transformation using the monthday() function from the forecast package
performing a calendar transformation allows me to sum demand by month so we can see the seasonality more clearly
aus_productionPerforming a box-cox transformation allows us to compress the variance, allowing us to see the trend more clearly
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
The Box-Cox transformation isn’t all that useful in this case because the enveloping nature of the difference in variance, concentrated around the middle of the data. A power transformation will scale the variance as the data increases or decreases, but isn’t able to capture this type of complexity.
What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)? when running a box cox on a sample of the aus_retail data, The results look fairly conclusive
After investigating the aus_retail dataset. I noticed that depending on the sample, the transformation suggested can vary wildly
## mean: 0.04945226
## sd: 0.2337697
The range in suggested lambdas based upon what random sample of aus_retail is selected is very concerning. I would not perform any analysis on a model where a null transformation is within the confidence interval for lambda.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian
Suggested values of lambda for the selected data
| Data | Lambdas |
|---|---|
| aus_production | 0.9289402 |
| ansett | 1.9999268 |
| pedestrian | -0.2255423 |
Consider the last five years of the Gas data from aus_production.
a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
An x-11 decomposition shows that there is a clear trend and seasonality to the total production of Australian gas in the last five years.
b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
The classical multiplicative captures the same trend and seasonality, although the error looks far less random.
c. Do the results support the graphical interpretation from part a?
Yes they do.
d. Compute and plot the seasonally adjusted data.
e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the series
In order to measure the effect of outliers on the seasonally adjusted data, I want to be able to measure the shape of the trendline.
I will use a linear regression model, to measure my trend data, because I am interested in how the general slope of the trendline changes due to the introduction of outliers.
The largest impact on the slope of the trendline is when the outlier is about a quarter of the way from the end of the trendline. This could be due to the down weighting from the Moving Average.
## [1] 0.007258312
Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
Boxplots of Turnover show the large number of outliers in the dataset. Building a model on this data (especially with such a small sample) will lead to poor predictions
I don’t see the outliers in the trend data, yet they are clear in the boxplots. Perhaps this highlights the importance of EDA before building models.
cat('mean:',mean(output),'sd:',sd(output))
## mean: 0.04945226 sd: 0.2337697
a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
There is a linear trend in the growth of the civilian labor force in Australia from February 1978 to August 1995. The carrier signal however has a seasonal cycle as well . There seems to be two oscillations per season, with peaks in March and December, and troughs in January and August. The former might be due to summer vacations and the latter due to the firing of seasonal workers after Christmas.
b. Is the recession of 1991/1992 visible in the estimated components?
The recession is not in the estimated components, however it can be seen in the remainder, which captures all the variation not described by the macro or the seasonal trends.