Consider the GDP information in global_economy. Plot the
GDP per capita for each country over time. Which country has the highest
GDP per capita? How has this changed over time?
gdp <- global_economy |>
mutate(GDP_pc = GDP / Population) |>
select(Country, Year, GDP_pc)
gdp |>
autoplot(GDP_pc, show.legend = FALSE) +
labs(title = 'GDP per Capita',
y = 'GDP ($US)')top_countries <- gdp |>
group_by(Country) |>
mutate(maxGDP = max(GDP_pc, na.rm = TRUE)) |>
distinct(Country, maxGDP) |>
arrange(desc(maxGDP)) |>
head(5) |>
select(Country)
gdp |>
inner_join(top_countries, by = join_by(Country)) |>
autoplot(GDP_pc) +
labs(title = 'GDP per Capita',
subtitle = 'Top 5 Countries',
y = 'GDP ($US)')Monaco has the highest GDP per Capita in the dataset. In the last several years Lichenstein has been very close to Monoco, and was higher in some years.
global_economy.aus_livestock.vic_elec.aus_production.global_economy |>
filter(Country == 'United States') |>
autoplot(GDP / Population) +
labs(title = 'US GDP per capita',
y = '$US')Transformed GDP to a per capita value.
aus_livestock |>
filter(Animal == 'Bulls, bullocks and steers',
State == 'Victoria') |>
autoplot(Count)The half-hour time scale makes the graph difficult to read. We can transform the data into a daily demand chart.
vic_elec |>
group_by(Date) |>
mutate(Demand = sum(Demand)) |>
distinct(Date, Demand) |>
as_tsibble(index = Date) |>
autoplot(Demand) +
labs(title = 'Daily Victorian Electricity Demand',
y = 'Demand (MWh)')There are still considerable fluctuations in the daily data and may be better to view as a monthly summary since we expect most electricity demand to come from cooling.
vic_elec |>
mutate(Date = yearmonth(Date)) |>
group_by(Date) |>
mutate(Demand = sum(Demand)) |>
distinct(Date, Demand) |>
as_tsibble(index = Date) |>
autoplot(Demand) +
labs(title = 'Monthly Victorian Electricity Demand',
y = 'Demand (MWh)')It is easier to see here that peak demand is occuring in the summer months and periods of low demand occur during the winter months.
aus_production |>
autoplot(Gas) +
labs(title = 'Autstrailian Gas Production',
y = 'Production (petajoules)')
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, lambda)) +
labs(title = 'Austrailian Gas Production',
subtitle = TeX(paste0('Transformed with $\\lambda$ = ', round(lambda, 2))))Transformed the Gas production data using \(\lambda = 0.11\) to standardize the seasonal variation.
Why is a Box-Cox transformation unhelpful for the
canadian_gas data?
A Box-Cox transformation is unhelpful with the Canadian gas data because the seasonal variability between 1975 and 1990 is significantly larger than earlier and later periods. The Box-Cox transformation is good when the seasonal variation increases or decreases over time.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(31415)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
myseries |>
autoplot(Turnover)myseries_lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries_lambda## [1] -0.0264685
With a small negative lambda value calculated using the guerrero feature, I’d select \(\lambda = 0\), and apply a log transformation.
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.
- Tobacco from
aus_production- Economy class passangers between Melbourne and Sydney from
ansett- Pedestrian counts at Southern Cross Station from
pedestrian
## Warning: Removed 24 rows containing missing values (`geom_line()`).
## [1] 0.9264636
With a \(\lambda = 0.93\), the seasonality of the Tobacco data is relatively stable and probably makes the most sense to leave without a transformation.
lambda <- ansett |>
filter(Airports == 'MEL-SYD',
Class == 'Economy') |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
lambda## [1] 1.999927
Applying a box-cox transformation with \(\lambda = 2\).
The hourly pedestrian counts is too busy to determine the pattern and also contains a lot of near zero intervals.
weekly_ped <- pedestrian |>
mutate(Week = yearweek(Date)) |>
group_by(Week) |>
mutate(Count = sum(Count)) |>
distinct(Week, Count) |>
as_tsibble(index = Week)
weekly_ped |>
autoplot(Count)## [1] 1.999927
Applying Box-Cox transformation with \(\lambda = 2\).
There appears to be significantly less pedestrian traffic on the new year’s day holidays shown within the dataset.
Consider the last five years of the Gas data from
aus_production.
- Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Natural gas is primarily used for heating. As expected, we see seasonal peaks during the winter and lows during the summer. The overall trend is increasing.
- Use
classical_decompositionwithtype = multiplicativeto calculate the trend-cycly and seasonal indices.
gas_dcmp <- gas |>
model(classical_decomposition(Gas, type = 'multiplicative'))
gas_dcmp |>
components() |>
autoplot()
- Do the results support the graphical interpretation from part a?
Yes, the trend is increasing and the seasonal components follow the pattern as previously described.
- Compute and plot the seasonally adjusted data.
gas |>
autoplot(Gas, colour = 'gray') +
autolayer(components(gas_dcmp), season_adjust, colour = '#0072B2') +
labs(title = 'Seasonally Adjusted Gas Production')
- Change one observation to be an outlier (e.g., add 300 to one ovservation), and recompute the seaonally adjusted data. What is the effect of the outlier?
gas_outlier <- gas |>
mutate(Gas = ifelse(Quarter == yearquarter('2007-Q2'),Gas + 300, Gas))
dcmp_outlier <- gas_outlier |>
model(classical_decomposition(Gas, type = 'multiplicative'))
gas_outlier |>
autoplot(Gas, colour = 'gray') +
autolayer(components(dcmp_outlier), season_adjust, colour = '#0072B2') +
labs(title = 'Seasonally Adjusted Gas Production with Outlier')
- Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas_outlier2 <- gas |>
mutate(Gas = ifelse(Quarter == yearquarter('2010-Q2'),Gas + 300, Gas))
dcmp_outlier2 <- gas_outlier2 |>
model(classical_decomposition(Gas, type = 'multiplicative'))
gas_outlier2 |>
autoplot(Gas, colour = 'gray') +
autolayer(components(dcmp_outlier2), season_adjust, colour = '#0072B2') +
labs(title = 'Seasonally Adjusted Gas Production with Outlier')An outlier in the middle of the time series has more of an impact on the periods on either side of the outlier.
x11_dcmp <- myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11()))
x11_dcmp |>
components() |>
autoplot()The seasonal component of the series is on the same scale as the irregular component.
Figures 3.19 and 3.20 show the result of decompsing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
- Write about 3-5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
There is an overall increasing trend in the civilian labor force in Australia. There is a seasonality to the data, but it is on a very small scale relative to the trend and remainder, which indicates that it’s not a strong influence.
- Is the recession of 1991/1992 visible in the estimated components?
The recession of 1991/1992 is can be seen with a flattening of the trend compoenent and significant varaitions in the remainder component.