library(tsibble)
library(fpp3)
library(readxl)
library(seasonal)
Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
per_cap_gdp_tbl <- global_economy %>%
as_tsibble(index = Year, key = Country) %>%
mutate(per_cap_gdp = GDP / Population) %>%
ggplot(aes(x = Year, y = per_cap_gdp, colour = Country)) +
geom_line(stat = "identity", show.legend = F)
print(per_cap_gdp_tbl)
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
per_cap_gdp_tbl <- global_economy %>%
as_tsibble(index = Year, key = Country) %>%
mutate(per_cap_gdp = GDP / Population) %>%
arrange(desc(per_cap_gdp))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
head(per_cap_gdp_tbl)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## # A tsibble: 6 x 10 [1Y]
## # Key: Country [2]
## Country Code Year GDP Growth CPI Imports Exports Population per_cap_gdp
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7.06e9 7.18 NA NA NA 38132 185153.
## 2 Monaco MCO 2008 6.48e9 0.732 NA NA NA 35853 180640.
## 3 Liecht… LIE 2014 6.66e9 NA NA NA NA 37127 179308.
## 4 Liecht… LIE 2013 6.39e9 NA NA NA NA 36834 173528.
## 5 Monaco MCO 2013 6.55e9 9.57 NA NA NA 37971 172589.
## 6 Monaco MCO 2016 6.47e9 3.21 NA NA NA 38499 168011.
The Country with the highest per capita GDP is Monaco for the top two spots, followed by Liechtenstein in the next two and then Monaco again.There is a general upward trend from Monaco and has remained above almost every other country since 1960.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
global_economy %>% filter(Country == "United States") %>%
autoplot(GDP)
global_economy %>% filter(Country == "United States") %>%
autoplot(Exports - Imports) +
labs(title= "US Trade Deficit")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
I wanted to look at the yearly change in the US trade deficit against the change in overall GDP. There seems to be an overarching negative relationship between increasing US GDP and a falling trade deficit (more imports than exports). Which kind of falls in line with the shift in the US economy, from a good based ecomony to a services based economy.
aus_livestock %>%
filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
autoplot(Count)
yearly_data <- aus_livestock %>%
filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
index_by(Year = ~ year(.)) %>%
summarise(Yearly_Count = sum(Count, na.rm = TRUE))
autoplot(yearly_data, Yearly_Count) +
ggtitle("Yearly Count of Bulls, Bullocks, and Steers in Victoria") +
xlab("Year") +
ylab("Count")
To smooth over the monthly data, I changed the index to a yearly
basis.
vic_elec %>%
autoplot(Demand)
daily_data <- vic_elec %>%
mutate(Day = as.Date(Time)) %>%
index_by(Day) %>%
summarise(Daily_Demand = sum(Demand, na.rm = TRUE))
autoplot(daily_data, Daily_Demand) +
ggtitle("Daily Demand in Victoria") +
xlab("Day") +
ylab("Demand")
As another attempt to smooth the data, I decided to go with a weekly
index over every 30 minutes. Highlighting that demand tends to peak
around mid year, with a few outliers in 2014 adn 2013.
aus_production %>%
autoplot(Gas)
growth_rate_gas <- aus_production %>%
select(Quarter, Gas) %>%
mutate(growth_rate = (Gas - lag(Gas)) / lag(Gas)*100)
autoplot(growth_rate_gas, growth_rate) +
ggtitle("Gas Production Growth Rate") +
xlab("Quarter") +
ylab("Growth Rate")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
The growth rate calc shows that the historic rates vary between 40% and around -20% likely.
canadian_gas %>% autoplot(Volume)
bc_transformation <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume, bc_transformation))
The idea behind box cox transformations is to normalize non-normal data,
so when we use it on normal data the pattern does not change, and is
thus a useless excersise in this example.
set.seed(11)
retail_series <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
retail_series %>%
autoplot(Turnover)
#### Check Lambda
lambda_val <- retail_series %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
print(lambda_val)
## [1] 0.5054374
retail_series %>%
autoplot(box_cox(Turnover, lambda_val))
This non-normal data was transformed, starting around like 2008ish and
continues on through the remainder of the dataset.
aus_production %>%
autoplot(Tobacco)
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_lambda_val <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
print(aus_lambda_val)
## [1] 0.9264636
aus_production %>%
autoplot(box_cox(Tobacco, aus_lambda_val))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
Despite having an optimized lambda value, this data did not change at all. Is there possibly a better way to normalize this dataset?
ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
autoplot(Passengers)
ans_lambda_val <- ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
print(ans_lambda_val)
## [1] 1.999927
ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
autoplot(box_cox(Passengers,ans_lambda_val))
This dataset has the same problem as the aus_production box cox plot.
The data was not normalized and could use another method to do so.
gas <- tail(aus_production, 5*4) %>% dplyr::select(Gas)
gas %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Gas`
Gas production spikes consistently during the midyear and falls just as
fast right after.
classic_decomp <- gas %>%
model(classical_decomposition( Gas, type = "multiplicative")) %>%
components()
classic_decomp %>% autoplot() +
ggtitle("Classical Decomposition Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Yes, there is a underlying upward trend in production from 06 to 10, and the seasonal trends are reflected properly as well.
classic_decomp %>%
ggplot(aes(Quarter))+
geom_line(aes(y = season_adjust))
classic_decomp_2 <- gas %>%
mutate(Gas = if_else(Quarter ==yearquarter("2010Q1"), Gas + 150, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble()
classic_decomp_2 %>%
ggplot(aes(Quarter))+
geom_line(aes(y = season_adjust))
The outlier stretches the y axis, making the quarterly seasonality
beforehand much more subdued (at least it looks like that).
classic_decomp_3 <- gas %>%
mutate(Gas = if_else(Quarter ==yearquarter("2008Q1"), Gas + 150, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble()
classic_decomp_3 %>%
ggplot(aes(Quarter))+
geom_line(aes(y = season_adjust))
An addition to the middle of the graph seems to keep seasonality truer
to how it was before the addition.
set.seed(11)
aus_retail_p2 <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(aus_retail_p2)
## Plot variable not specified, automatically selected `.vars = Turnover`
decomp_retail_x11 <- aus_retail_p2 %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
decomp_retail_x11 %>%
autoplot()+
ggtitle("Retail Turnover Decomposition using the X11 Model")
Decomposing this retail dataset using X11 brought out much more data to
analyze when compared with the simple autoplot used to get an overall
look at the data. For one it shows how turnover is very seasonal for the
retail sector but has seemingly become much less volatile as the years
have progressed. Could online shopping be at play?
The STL decomposition graph shows that there is a stable upward trend in the civilian labour force in Australia between the years of 1980 and 1995. It also shows significant seasonality year in and year out. The large decline can be seen in 1992 (aprox.) with employment dropping nearly 400%.
As mentioned above the 91/92 recession is very prevalent in the remainder piece of the STL decomp graph and only somewhat present in the overall graph at the top. it can also be seen in the season_year graph (fig 3.19).