This document contains the homework problems for the Data 624 course. Link: https://otexts.com/fpp3/graphics-exercises.html
Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from Forecasting: Principles and Practice book. Please submit both your Rpubs link as well as attach the .rmd file with your code.
if (!require('fpp3')) (install.packages('fpp3'))
if (!require('latex2exp')) (install.packages('latex2exp'))
if (!require('dplyr')) (install.packages('dplyr'))
if (!require('forecast')) (install.packages('forecast'))
if (!require('seasonal')) (install.packages('seasonal'))Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
head(global_economy)## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
global_economy %>%
autoplot(GDP/Population, show.legend = FALSE) +
labs(title = "GDP per Capita",
subtitle = "1960-2017",
x = "Year",
y = "GDP per capita")global_economy <- global_economy %>%
mutate(GDP_per_capita = GDP/Population)
global_economy %>%
filter(GDP_per_capita > 100000) %>%
autoplot(GDP_per_capita) +
labs(title= "GDP per capita",
subtitle = "1960-2017",
y = "USD")z <- global_economy %>%
group_by(Country, GDP, Population) %>%
summarise(GD_PoP = GDP/Population) %>%
arrange(desc(GD_PoP))
head(z)## # A tsibble: 6 x 5 [1Y]
## # Key: Country, GDP, Population [6]
## # Groups: Country, GDP [6]
## Country GDP Population Year GD_PoP
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco 7060236168. 38132 2014 185153.
## 2 Monaco 6476490406. 35853 2008 180640.
## 3 Liechtenstein 6657170923. 37127 2014 179308.
## 4 Liechtenstein 6391735894. 36834 2013 173528.
## 5 Monaco 6553372278. 37971 2013 172589.
## 6 Monaco 6468252212. 38499 2016 168011.
global_economy %>%
tsibble(key = Code, index = Year)%>%
filter(Country=="Monaco") %>%
autoplot(GDP/Population)Monaco has the highest GDP per capita.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
global_economy.aus_livestock.vic_elec.aus_production.United States GDP from global_economy
global_economy %>%
filter(Country == "United States") %>%
autoplot() +
labs(title= "United States GDP",
subtitle = "1960-2017",
y = "USD")Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
autoplot(Count) +
labs(title= "Slaughter of Victorian “Bulls, bulls and steers [Monthly]", y = "The Total Count")The trend has come down over time in the last 40 years where there is downward trend in slaughter of Victorian bulls, bulls, and steers. Interesting to see peculiar trend during 2015 where there was reversal of trend again.
Victorian Electricity Demand from vic_elec.
head(vic_elec)## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
autoplot(vic_elec)+
labs(title= "Electricity Demand [30 Minute]",
subtitle= "Victoria, Australia",
y = "MW") when viewing the vic_elec demand plot we witness seasonality. The spikes seen are likely summer or winter [extreme months].
vic_elec %>%
index_by(Date) %>%
summarise(Demand = sum(Demand)) -> daily_demand
daily_demand %>% autoplot(Demand) +
labs(title= "Daily Electricity Demand",
subtitle= "Victoria, Australia",
y = "MW")There is spike in the first few months of each year and another smaller peak towards the middle of each year.
Gas production from aus_production
head(aus_production)## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
autoplot(aus_production, Gas) +
labs(title= "Australian Gas Production")lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title= "Australian Gas Production",
subtitle = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))+
theme_replace()+
geom_line(col = "#69b3a2") Using the Guerrero feature, an optimal value of lambda = 0.12 was obtained. As seen above, the transformation has minimized the seasonal variation across the whole series and has provided an almost consistent throughout. Guerrero’s (1993) method to select the lambda which minimises the coefficient of variation for subseries of x.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
head(canadian_gas)## # A tsibble: 6 x 2 [1M]
## Month Volume
## <mth> <dbl>
## 1 1960 Jan 1.43
## 2 1960 Feb 1.31
## 3 1960 Mar 1.40
## 4 1960 Apr 1.17
## 5 1960 May 1.12
## 6 1960 Jun 1.01
canadian_gas %>%
autoplot(Volume) +
labs(title = "Canadian Gas Production")canadian_gas%>%
summarise(Volume = sum(Volume)) %>%
gg_subseries(Volume)It doesn’t look like the variation increases/decreases with the level of the series.
What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
set.seed(123)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
head(myseries)## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Household goods retailing A3349643V 1982 Apr 173.
## 2 Victoria Household goods retailing A3349643V 1982 May 180.
## 3 Victoria Household goods retailing A3349643V 1982 Jun 167.
## 4 Victoria Household goods retailing A3349643V 1982 Jul 174.
## 5 Victoria Household goods retailing A3349643V 1982 Aug 178.
## 6 Victoria Household goods retailing A3349643V 1982 Sep 180.
autoplot(myseries, Turnover)lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries %>%
autoplot(box_cox(Turnover, lambda)) +
labs(y = "",
title = paste("Transformation with lambda = ", round(lambda,2)))The above shows the transformed retail turn over with the \(\lambda\) parameter of 0.22 chosen using the Guerrero method. The transformation shows a more tamed and consistent amplitude in Turnover throughout in comparison to the original plot of the Turnover.
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.
aus_productionansettpedestrianTobacco from aus_production
head(aus_production)## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
autoplot(aus_production, (Tobacco)) +
labs(title = "Tobacco Production")aus_tobacco <- aus_production %>%
select(Quarter, Tobacco)
lambda <- aus_tobacco %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_tobacco %>%
autoplot(box_cox(Tobacco, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco Production with $\\lambda$ = ",
round(lambda,2))))Given our \(\lambda\) is near 1 with value of 0.93, there is no substantive transformation resulting from the Box-Cox transformation.
Economy class passengers between Melbourne and Sydney from ansett
eco_mel_syd <- ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD")
eco_mel_syd %>% autoplot(Passengers)lambda <- eco_mel_syd %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
eco_mel_syd %>%
autoplot(box_cox(Passengers, lambda)) +
labs(y = "",
title = paste("Box-Cox Transformation with lambda = ", round(lambda,2)))Pedestrian counts at Southern Cross Station from pedestrian
head(pedestrian)## # A tsibble: 6 x 5 [1h] <Australia/Melbourne>
## # Key: Sensor [1]
## Sensor Date_Time Date Time Count
## <chr> <dttm> <date> <int> <int>
## 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630
## 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826
## 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567
## 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264
## 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139
## 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77
View unique Sensor categories:
unique(pedestrian$Sensor)## [1] "Birrarung Marr" "Bourke Street Mall (North)"
## [3] "QV Market-Elizabeth St (West)" "Southern Cross Station"
scs_pedestrian <- pedestrian %>%
filter(Sensor == "Southern Cross Station")sct_count <- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
group_by(Sensor) %>%
index_by(Week = yearweek(Date_Time)) %>%
summarise(Count = sum(Count))
sct_count %>% autoplot(Count)lambda <- sct_count %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
sct_count %>%
autoplot(box_cox(Count, lambda)) +
labs(y = "",
title = paste("Box-Cox Transformation with lambda = ", round(lambda,2))) ### 3.7 Consider the last five years of the Gas data from
aus_production.
gas <- tail(aus_production, 5*4) %>% select(Gas)head(gas)## # A tsibble: 6 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 221 2005 Q3
## 2 180 2005 Q4
## 3 171 2006 Q1
## 4 224 2006 Q2
## 5 233 2006 Q3
## 6 192 2006 Q4
gas <- tail(aus_production, 5*4) %>% select(Gas)
autoplot(gas, Gas)+
labs(title = "Quarterly Australian Gas Production",
subtitle = "Q3 2005 - Q2 2010")We can observe there is seasonality as evident by a decrease around Q1 and increase around Q3. There is also an upward trend-cycle in gas production.
classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of total AU Gas Production")Do the results support the graphical interpretation from part a?
Yes, one can clearly see the upward trend and the seasonality in the decomposition components.
Compute and plot the seasonally adjusted data.
class_decomp <- gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
class_decomp %>% autoplot() +
labs(title = "Classical multiplicative decomposition of Australia
Gas Production")as_tsibble(class_decomp) %>%
autoplot(season_adjust) +
labs(title = "Seasonally Adjusted Data")gas %>%
mutate(Gas = if_else(Quarter==yearquarter("2007Q2"), Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust) +
labs(title = 'Seasonally Adjusted Data with 300 added to "2007 Q2"')Interestingly a new seasonal pattern emerged post adding the outlier. It did have a great impact on the overal trend though.
#change one observation to be an outlier
gas_outlier_2 <- gas
gas_outlier_2$Gas[10] <- gas_outlier_2$Gas[10] + 1000
#recompute the seasonally adjusted data
# STL decomposition
dcmp_2 <- gas_outlier_2 %>%
model(stl = STL(Gas))
#Compute and plot the seasonally adjusted data
components(dcmp_2) %>%
as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas Production",
title = "Seasonally Adjusted Australian Gas Production",
subtitle = "Purposely skewed with one outlier")Adding the outlier to the end vs. adding it to the middle of the series have small difference from one another. The outlier in both cases eliminates the original increasing trend shown in 3.7.d.
Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp)+
labs(title =
"Decomposition of Australian Department Stores Turnover using X-11.")Seasonality was much stronger or more variable during the years of 1982 to around 1990, which is quite the opposite of what was previously observed. There appears to be an outlier in terms of retail data when it is usually towards the end of the year that you see bigger turnover.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Based on the plots for the number of persons in the civilian labor force in Australia, it presents an overall increasing trend over time, and the seasonal plot says it contains certain amount of seasonal fluctuations.However, the growth does not appear to be a straight line, as an obvious drop shows at around 1992 ~ 1993; the remainder plot also reflected this drop as an outlier pops downward. We could acknowledge from such a situation that there was a large decrease in labor force during that time, and the causation might be national recession or the revolution of gender distribution needed by the labor market.