library(fpp3)
library(latex2exp)
Consider the GDP information in global_economy
. Plot the
GDP per capita for each country over time. Which country has the highest
GDP per capita? How has this changed over time?
Monaco has the highest GDP per capita followed by Liechtenstein. Over the years, Monaco and Liechtenstein has maintained their position in have the highest GDP per capita. All countries’ GDP per capita following the overall same trend (upwards) and the similar dips corresponding to economic crisis (like the financial crisis in 2008)
global_economy %>%
autoplot(GDP/Population, show.legend = FALSE) +
labs(title = "GDP per capita", x = "Year", y = "$US")
# Must include `show.legend = FALSE`
gdp_per_capita <- global_economy %>%
select(Country, Year, GDP, Population) %>%
mutate(gdp_per_capita = GDP/Population) %>%
slice_max(order_by = gdp_per_capita, n = 20)
gdp_per_capita
## # A tsibble: 20 x 5 [1Y]
## # Key: Country [2]
## Country Year GDP Population gdp_per_capita
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco 2014 7060236168. 38132 185153.
## 2 Monaco 2008 6476490406. 35853 180640.
## 3 Liechtenstein 2014 6657170923. 37127 179308.
## 4 Liechtenstein 2013 6391735894. 36834 173528.
## 5 Monaco 2013 6553372278. 37971 172589.
## 6 Monaco 2016 6468252212. 38499 168011.
## 7 Liechtenstein 2015 6268391521. 37403 167591.
## 8 Monaco 2007 5867916781. 35111 167125.
## 9 Liechtenstein 2016 6214633651. 37666 164993.
## 10 Monaco 2015 6258178995. 38307 163369.
## 11 Monaco 2011 6080344732. 37497 162155.
## 12 Liechtenstein 2011 5739977477. 36264 158283.
## 13 Monaco 2012 5743029680. 37783 152000.
## 14 Liechtenstein 2012 5456009385. 36545 149296.
## 15 Monaco 2009 5451653237. 36534 149221.
## 16 Monaco 2010 5362649007. 37094 144569.
## 17 Liechtenstein 2008 5081432924. 35541 142974.
## 18 Liechtenstein 2010 5082366478. 36003 141165.
## 19 Monaco 2006 4582988333. 34408 133195.
## 20 Liechtenstein 2007 4601299567. 35322 130267.
global_economy %>%
filter(Country %in% c("Monaco", "Liechtenstein"))%>%
autoplot(GDP/Population) +
labs(title = "GDP per capita", x = "Year", y = "$US")
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy
. Slaughter of
Victorian “Bulls, bullocks and steers” in aus_livestock
.
Victorian Electricity Demand from vic_elec
. Gas production
from aus_production
.
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP/Population, show.legend = FALSE) +
labs(title = "GDP per capita", x = "Year", y = "$US")
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria")%>%
autoplot(Count) +
labs(title = "Number of Victorian Bulls, Bullocks and Steers Slaughtered ")
The original data shows Half-hourly electricity demand for Victoria, Australia. The variation seems even for the most part with an exception to some unusual peaks. Transformation may not be necessary. We can explore trends of electricity demand by yearly, monthly, and daily.
vic_elec %>%
autoplot(Demand) +
labs(title = "Victorian Electricity Demand", y = "MWh")
Plotting the electricity demand yearly, we see the demand has decrease over time.
demand <- vic_elec%>%
mutate(Month_Year = yearmonth(Time),
Year = year(Time))
annual_demand <- demand %>%
group_by(Year) %>%
index_by(Year) %>%
summarise(total = sum(Demand), .groups = "drop")
annual_demand %>%
autoplot(total) +
labs(title = "Yearly Victorian Electricity Demand", y = "MWh")
Plotting the electricity demand monthly, we see the seasonal patterns. Peaks in demands in the summer and winter. Dips in demand when temperatures are comfortable.
monthly_demand <- demand %>%
group_by(Month_Year) %>%
index_by(Month_Year) %>%
summarise(total = sum(Demand), .groups = "drop")
monthly_demand %>%
autoplot(total) +
labs(title = "Monthly Victorian Electricity Demand", y = "MWh")
The variation seems even for the most part with an exception to some unusual peaks like early 2014.
daily_demand <- demand %>%
group_by(Date) %>%
index_by(Date) %>%
summarise(total = sum(Demand), .groups = "drop")
daily_demand %>%
autoplot(total) +
labs(title = "Daily Victorian Electricity Demand", y = "MWh")
In the plot of the original data, the variations is not homogeneous. There are more variation in the demand in 2000s than in 1960s. We should use transformation to make the variation more homogeneous.
aus_production %>%
autoplot(Gas) +
labs(title = "Orginal Victorian Electricity Demand Data", y = "MWh")
After a square root transformation, the variations look more even but there are still more variations in the 2000s data.
aus_production %>%
autoplot(sqrt(Gas)) +
labs(title = "Square Root Victorian Electricity Demand Data", y = "MWh")
After a cube root transformation, the variations look more even than the last transformation but there are still more variations in the 2000s data.
aus_production %>%
autoplot((Gas)^(1/3)) +
labs(title = "Cube Root Victorian Electricity Demand Data", y = "MWh")
After a cube root transformation, the variations look more even but there are still more variations in the late 1900s data instead.
aus_production %>%
autoplot(log(Gas)) +
labs(title = "Log Victorian Electricity Demand Data", y = "MWh")
The box cox transformation is the best by far in evening out the seasonal variation.
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.1095171
aus_production %>%
autoplot(box_cox(Gas, lambda))+
labs(y= "",
title = latex2exp::TeX(paste0(
"Transformed Gas with $\\lambda = ",
round(lambda,2)
)))
Why is a Box-Cox transformation unhelpful for the
canadian_gas
data?
Box Cox transformation is not helpful for data that has different
periods of changing variance. The transformation will overcorrect or
undercorrect the periods. In the canadian_gas
data, it
shows an increasing trend from 1960 to 1970 with season patterns, a
constant trend with seasonal patterns from 1970 to 1990, a slight
increasing trend from 1990 to 2000, and finally constant.
canadian_gas %>%
autoplot() +
labs(title = "Orginal Canadian Gas Data " )
## Plot variable not specified, automatically selected `.vars = Volume`
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.5767648
canadian_gas %>%
autoplot(box_cox(Volume, lambda))+
labs(y= "",
title = latex2exp::TeX(paste0(
"Transformed Gas with $\\lambda = ",
round(lambda,2)
)))
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)
The optimal lambda value is 0.08303631, which is close to 0. This indicates that the appropriate transformation for this data is natural logarithm.
\(\lambda\) | Transformation |
---|---|
2 | Square |
1 | None |
0.5 | square root |
-0.5 | inverse square root |
-1 | reciprocal |
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
autoplot()+
labs(title = "Monthly Australian Retail Trade Turnover",
x = "Monthly",
y = "Turnover ($Million AUD)")
## Plot variable not specified, automatically selected `.vars = Turnover`
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.08303631
myseries %>%
autoplot(box_cox(Turnover, lambda))+
labs(y= "",
title = latex2exp::TeX(paste0(
"Transformed Australian Retail Trade Turnover with $\\lambda = ",
round(lambda,2)
)))
For the following series, find an appropriate Box-Cox transformation
in order to stabilise the variance. Tobacco from
aus_production
, Economy class passengers between Melbourne
and Sydney from ansett
, and Pedestrian counts at Southern
Cross Station from pedestrian
.
\(\lambda\) is 0.9264636, which is close to 1. This indicates that there is almost none transformation is needed for this dataset.
aus_production %>%
autoplot(Tobacco) +
labs(title = "Original Tobacco data ")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.9264636
aus_production %>%
autoplot(box_cox(Tobacco, lambda))+
labs(y= "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco with $\\lambda = ",
round(lambda,2)
)))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
The data shows the total number of air passengers traveling with Ansett weekly. The optimal lambda value is 1.999927, which is close to 2. This indicates that the appropriate transformation for this data is square transformation.
economy_class <- ansett %>%
filter(Class == "Economy",
Airports == "MEL-SYD")
economy_class %>%
autoplot(Passengers) +
labs(title = "Weekly Economy class passengers between Melbourne and Sydney")
lambda <- economy_class %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 1.999927
economy_class %>%
autoplot(box_cox(Passengers, lambda))+
labs(y= "",
title = latex2exp::TeX(paste0(
"Transformed Economy class passengers Numbers with $\\lambda = ",
round(lambda,2)
)))
This dataset contains the hourly pedestrian counts from 2015 to 2016 at 4 sensors (“Birrarung Marr”, “Bourke Street Mall (North)”, “QV Market-Elizabeth St (West)” “Southern Cross Station”) in the city of Melbourne.
The optimal lambda value is -0.2501616, which is close to -0.25. This indicates that the appropriate transformation for this data is inverse square root transformation.
southern_cross <- pedestrian %>%
filter(Sensor == "Southern Cross Station")
southern_cross %>%
autoplot(Count) +
labs(title = "Hourly Pedestrian counts in the city of Melbourne")
pedestrian
## # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne>
## # Key: Sensor [4]
## Sensor Date_Time Date Time Count
## <chr> <dttm> <date> <int> <int>
## 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630
## 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826
## 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567
## 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264
## 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139
## 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77
## 7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44
## 8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56
## 9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113
## 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166
## # ℹ 66,027 more rows
lambda <- southern_cross %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] -0.2501616
southern_cross %>%
autoplot(box_cox(Count, lambda))+
labs(y= "",
title = latex2exp::TeX(paste0(
"Transformed Pedestrian Ccounts in Melbourne with $\\lambda = ",
round(lambda,2)
)))
Consider the last five years of the Gas data from
aus_production
.
gas <- tail(aus_production, 5*4) |> select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Seasonal Fluctuations: There is a repeating pattern each year. Each year gas production rises to a peak in Q3 and then declines to a trough in Q1. This make senses as the winter months in Austrailia is June to August.
Trend Cycle: Overall, it is a upward trend with each year’s peak higher than the previous year’s.
gas %>%
autoplot() +
labs(title = "Last Five Years of the Gas Production in Australia" )
## Plot variable not specified, automatically selected `.vars = Gas`
Use classical_decomposition
with
type=multiplicative
to calculate the trend-cycle and
seasonal indices.
The first panel shows the plot of the original data.
The second panel shows the trend-cyle of the data to be overally increasing. Gas production increases from 2006 to mid 2007, constant (no trend) from mid 2007 to mid 2008, and increases after.
The third panel shows the seasonal trend. There is a annual pattern of rise and fall in gas production. Gas production peaks in Q3 and troughs in Q1.
gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot() +
labs(title = "Classical Multiplicative decomposition of Last Five Years of the Gas Production in Australia ")
Do the results support the graphical interpretation from part a?
Yes, the results support the graphical interpretation from part a. There was a yearly seasonal trend with a overall upward trend in gas production.
Compute and plot the seasonally adjusted data.
gas_mult_decomp <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
gas_mult_decomp %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Gas",
title = "Last Five Years of the Gas Production in Australia ") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
# There are only 20 observations.
outlier <- gas
outlier$Gas[5] <- outlier$Gas[5] + 300
outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot() +
labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
subtitle = "(Introduced Outlier)")
outlier_mult_decomp <- outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
outlier_mult_decomp %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Gas",
title = "Last Five Years of the Gas Production in Australia ") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Introducing the outlier in the beginning, middle, or end has little affect on the seasonal trend.
outlier <- gas
outlier$Gas[1] <- outlier$Gas[1] + 300
outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot() +
labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
subtitle = "(Outlier in the Beginning)")
outlier_mult_decomp <- outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
outlier_mult_decomp %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Gas",
title = "Last Five Years of the Gas Production in Australia ",
subtitle = "(Outlier in the Beginning)") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
outlier <- gas
outlier$Gas[10] <- outlier$Gas[10] + 300
outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot() +
labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
subtitle = "(Outlier in the Middle)")
outlier_mult_decomp <- outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
outlier_mult_decomp %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Gas",
title = "Last Five Years of the Gas Production in Australia ",
subtitle = "(Outlier in the Middle)") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
outlier <- gas
outlier$Gas[20] <- outlier$Gas[20] + 300
outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
autoplot() +
labs(title = "Classical Multiplicative decomposition of Gas Production in Australia",
subtitle = "Outlier in the End")
outlier_mult_decomp <- outlier %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
outlier_mult_decomp %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Gas",
title = "Last Five Years of the Gas Production in Australia ",
subtitle = "Outlier in the End") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?