Libraries
library(fpp3)
library(tsibble)Assignment Instructions
Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with your .rmd file.
Exercises
3.1
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
global_economyglobal_economy %>%
autoplot(GDP / Population, show.legend = FALSE) +
labs(title= "GDP per capita", y = "$US")## Warning: Removed 3242 row(s) containing missing values (geom_path).
gdp_per_capita = global_economy %>%
mutate(per_capita = GDP / Population)
gdp_per_capita <- gdp_per_capita[order(-gdp_per_capita$per_capita),]
head(gdp_per_capita, 5)global_economy %>%
filter(Country == "Monaco") %>%
autoplot(GDP/Population) +
labs(title= "Monaco GDP per Capita", y = "$US")## Warning: Removed 11 row(s) containing missing values (geom_path).
3.2
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy.
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP / Population) +
labs(title= "GDP, United States", y = "$US") Because of the changing population over time, it is appropriate to scale the data to a per capita basis to show actual growth over time.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Because this is showing monthly slaughter of livestock, there may be need for a calendar adjustment. We could achieve this be computing the average sales per year.
head(aus_livestock)aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers",
State == "Victoria") %>%
autoplot(Count) +
labs(title= "Slaughters in Australia")Victorian Electricity Demand from vic_elec.
Similar to the above, we can adjust the Victoria Electricity Demand by aggregating values up to the daily level, as opposed to hourly. This is helpful because there is a lot of variation between hours that makes the graph difficult to read.
head(vic_elec)v <- vic_elec %>%
group_by(Date) %>%
mutate(Demand = sum(Demand)) %>%
distinct(Date, Demand) %>%
as_tsibble(index=Date)
vautoplot(v, Demand)Gas production from aus_production
This data shows variation that increases or decreases with the level of the series, so a mathematical transformation would be appropriate.
aus_production %>%
autoplot(Gas) +
labs(title = "Non-Transformed Gas Production")lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) +
labs(y = "", title = "Transformed Gas Production")3.3
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
canadian_gas %>%
autoplot(Volume) +
labs(title = "Non-Transformed Gas Production")lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume, lambda)) +
labs(y = "", title = "Transformed Gas Production")The Box-Cox transformation is unhelpful because it does not standardize the cyclical variation in the series. Looking between roughly 1975 and 1990 we can see a larger than normal variation.
3.4
What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
aus_retail %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)## [1] 0.505437358 0.455334875 -0.001548504 -0.142211494 0.239724139
## [6] -0.026472613 0.302765950 -0.347859628 -0.079800145 0.488891084
## [11] 0.170177523 0.008546483 0.258373687 -0.278913616 0.085422753
## [16] 0.216735747 -0.186680060 0.121172855 0.349057788 0.358373811
## [21] 0.194644303 0.097852391 0.014794694 -0.029183970 0.219320743
## [26] 0.015030705 0.087866199 -0.088668882 0.233295271 0.794736203
## [31] 0.409200236 -0.034414504 -0.233638227 -0.402606263 0.048725122
## [36] 0.170462269 -0.271677387 0.088105269 0.157152361 0.002144737
## [41] 0.265927881 0.224713740 0.284977808 0.083036312 0.127694817
## [46] -0.062802562 -0.162564723 0.346569384 0.434893851 0.338398250
## [51] 0.260980929 -0.026468501 0.066365207 -0.071665181 -0.022160487
## [56] 0.137501308 0.142266795 -0.009186879 -0.078532522 0.121434869
## [61] 0.022846394 0.076369281 -0.216330466 0.221481127 0.480753465
## [66] 0.231063552 0.132569588 0.043833072 -0.202826332 0.111909872
## [71] 0.205315650 0.285948322 0.113040418 0.209030359 0.197041795
## [76] 0.441087014 0.371315460 -0.102361315 -0.154344946 0.035435040
## [81] -0.022985108 0.084733215 -0.187787216 0.142978026 0.466262913
## [86] 0.166374070 0.016104851 -0.360022768 -0.639897411 0.083802504
## [91] 0.097715830 -0.342658577 0.149896211 0.155555534 0.189590792
## [96] 0.325699726 0.319622078 0.001893373 -0.093951032 0.155626888
## [101] 0.090231492 -0.376863610 0.159984903 0.444917891 0.287535970
## [106] 0.704385146 0.128944957 -0.532982237 -0.899926773 -0.037340959
## [111] 0.222986858 0.330815717 0.173409612 0.176110328 0.030210246
## [116] 0.004465470 0.172748557 -0.033916722 0.050680165 -0.078768077
## [121] 0.083552351 0.554245382 0.215164121 -0.014119612 -0.353784988
## [126] -0.265806646 0.034427160 0.116131283 -0.354627576 0.134290108
## [131] 0.153561415 0.088721960 0.312135155 0.271087477 0.078757767
## [136] 0.002190358 0.081555041 -0.033484940 0.117911536 -0.069445135
## [141] 0.031003706 0.464098968 0.203250692 0.037661525 -0.053178552
## [146] -0.210144461 0.143890246 0.223362268 -0.371484018 0.140920582
## [151] 0.205193769 0.129945930
A good selection would be .5
3.5
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
autoplot(aus_production, Tobacco)+
labs(title = "Tobacco Production")## Warning: Removed 24 row(s) containing missing values (geom_path).
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, lambda)) +
labs(y = "", title = "Transformed Tobacco Production")## Warning: Removed 24 row(s) containing missing values (geom_path).
lambda## [1] 0.9264636
filtered <- ansett %>%
filter(Class == "Economy",
Airports == "MEL-SYD")
autoplot(filtered, Passengers)+
labs(title = "Economy Passengers")lambda <- filtered %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
filtered %>%
autoplot(box_cox(Passengers, lambda)) +
labs(y = "", title = "Transformed Economy Passengers")lambda## [1] 1.999927
# filter for southern cross
southern_cross <- pedestrian %>%
filter(Sensor == "Southern Cross Station")
# aggregate on weekly basis
southern_cross <- southern_cross %>%
mutate(Week = yearweek(Date)) %>%
index_by(Week) %>%
summarise(Count = sum(Count))
autoplot(southern_cross, Count)+
labs(title = "Weekly Ped Counts")lambda <- southern_cross %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
southern_cross %>%
autoplot(box_cox(Count, lambda)) +
labs(y = "", title = "Transformed Weekly Ped Counts")lambda## [1] -0.1108714
3.7
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 20) %>% select(Gas)autoplot(gas, Gas)From the plot above, we see an positive trend with 1-Year seasonality.
gas_classical_decomp <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative"))
components(gas_classical_decomp) %>%
autoplot()## Warning: Removed 2 row(s) containing missing values (geom_path).
The above decomposition supports the hypothesis above that the gas series has positive trend and annual seasonality.
components(gas_classical_decomp) %>%
as_tsibble() %>%
autoplot(Gas, colour = "purple") +
geom_line(aes(y=season_adjust), colour = "blue") +
labs(title = "Seasonally Adjusted Production")gas$Gas[1] = gas$Gas[1] + 300gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(Gas, colour = "purple") +
geom_line(aes(y=season_adjust), colour = "blue") +
labs(title = "Seasonally Adjusted Gas Production with an Outlier")When adding 300 to the first entry, we see that data and seasonally adjusted data rise significantly.
gas <- tail(aus_production, 20) %>% select(Gas)gas$Gas[20] = gas$Gas[20] + 300gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(Gas, colour = "purple") +
geom_line(aes(y=season_adjust), colour = "blue") +
labs(title = "Seasonally Adjusted Gas Production with an Outlier")
When adding the outlier to the end of the series, there is not much
difference between adding to the first.
3.8
Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) x11_decomposition <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_decomposition) +
labs(title = "X11 Retail Decomposition.")X-11 decomposition seems to be able to capture more noise in the early 1990s. This is interesting because of the external factors affecting growth in that time. There was a recession. Trend is still positive and seasonality is much more defined using x11.
3.9
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Q: Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
A: There is an increasing trend, and we also see constant seasonality between 100 and -100. Conversely, we can see that the scale of the remainder is larger than seasonality which suggests that seasonality is not a significant factor in the data.
Q: Is the recession of 1991/1992 visible in the estimated components?
A: Yes, you can notice this recession by looking at the remainder in these years. There is a lot of negative growth not accounted by the model decompostion.