global_economy %>%
group_by(Country) %>%
autoplot(GDP/Population, show.legend = FALSE) +
labs(title = 'Global GDP Per Capita',
y = '$',
x = 'Year')
## `mutate_if()` ignored the following grouping variables:
## • Column `Country`
## Warning: Removed 3242 row(s) containing missing values (geom_path).
global_economy %>%
mutate(gpd_per_cap = GDP/Population) %>%
arrange(desc(gpd_per_cap)) %>%
slice(1:100) %>%
autoplot(GDP/Population)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
Monaco holds the top spot for GDP per capita (though Liechtenstein is close). There are fluctuations but this value has generally increased.
global_economy %>%
group_by(Country) %>%
filter(Country == 'United States') %>%
autoplot(GDP/Population, show.legend = FALSE) +
labs(title = 'US GDP Per Capita',
y = '$',
x = 'Year')
This plot is already adjusted by population but it might be interesting to also adjust for “real GDP.”
global_economy %>%
group_by(Country) %>%
filter(Country == 'United States') %>%
mutate(RealGDP = GDP/CPI*100) %>%
autoplot(RealGDP/Population, show.legend = FALSE) +
labs(title = 'US Real GDP Per Capita',
y = "$",
x = 'Year')
Using the Real GDP formula of GDP / CPI * 100 we find a plot with more variation that may show when periods of high inflation effectively lowered the GDP even though the nominal value increased.
aus_livestock %>%
filter(Animal == 'Bulls, bullocks and steers') %>%
group_by(Animal) %>%
summarize(Count = sum(Count)/1000) %>%
autoplot() +
labs(title = 'Count of bulls, bullocks and steers slaughtered',
y = "('000)",
x = 'Year Month')
## Plot variable not specified, automatically selected `.vars = Count`
These data appear to have consistent variation so a mathematical transformation seems unnecessary. But seasonality might be affecting the data if slaughters tend to happen at a certain time every year. We can try a calendar adjustment and sum by year.
aus_livestock %>%
filter(Animal == 'Bulls, bullocks and steers') %>%
index_by(Year = ~ year(.)) %>%
group_by(Animal) %>%
summarize(Count = sum(Count)/1000) %>%
autoplot() +
labs(title = 'Count of bulls, bullocks and steers slaughtered',
y = "('000)",
x = 'Year')
## Plot variable not specified, automatically selected `.vars = Count`
This plot does a better job of showing the yearly fluctuations in slaughter counts.
vic_elec %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Demand`
Demand has uneven variation but it doesn’t increase or decrease consistently based on the level of the series – which rules out a mathematical transformation. Lets try a calendar adjustment that looks at average demand per day.
vic_elec %>%
index_by(Date) %>%
summarize(Daily_Avg_Demand = mean(Demand)) %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Daily_Avg_Demand`
It’s a bit easier to see how average demand changes because every half hour datapoints have been aggregated to a daily average. Aggregating by weekly or monthly averages would yield a cleaner line.
aus_production %>%
autoplot(Gas)
Gas production has clear variation that increases as the series level increases. This makes a mathematical transformation appropriate.
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda))
By using the Guerrero package feature function to automatically select a lambda value we can perform a box-cox transformation to ensure a consistent variation in the data.
canadian_gas %>%
autoplot(Volume)
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume,lambda))
It appears that even with the optimal lambda the Box Cox transformation cannot produce a desirable result. This is because variation does no increase or decrease consistently as the series level increases – it increases in the middle of the series and then decreases again.
# seed from HW1 to get same series from retail data
set.seed(81023948)
aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) %>%
autoplot(Turnover)
The variation is increasing over the course of the series, so we’ll try the Box Cox transformation again.
lambda <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) %>%
autoplot(box_cox(Turnover,lambda))
I would use a lambda function from a library (like the features() function above) to get the optimal lambda and then I would use either a power transformation or a logarithmic transformation depending on the value of lambda.
aus_production %>%
autoplot(Tobacco)
## Warning: Removed 24 row(s) containing missing values (geom_path).
Increasing variation over time.
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco,lambda))
## Warning: Removed 24 row(s) containing missing values (geom_path).
More even variation.
ansett %>%
filter(Class == 'Economy') %>%
summarize(Passengers = sum(Passengers)) %>%
autoplot(Passengers)
Some increasing variation over time but nothing major.
lambda <- ansett %>%
filter(Class == 'Economy') %>%
summarize(Passengers = sum(Passengers)) %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
ansett %>%
filter(Class == 'Economy') %>%
summarize(Passengers = sum(Passengers)) %>%
autoplot(box_cox(Passengers,lambda))
More even variation.
pedestrian %>%
filter(Sensor == 'Southern Cross Station') %>%
autoplot(Count)
Not sure if this data requires a Box-Cox transformation as much as maybe a different adjustment to make the visual more readable. In any case—
lambda <- pedestrian %>%
filter(Sensor == 'Southern Cross Station') %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedestrian %>%
filter(Sensor == 'Southern Cross Station') %>%
autoplot(box_cox(Count,lambda))
We can see a more dense band of counts around the 2 value now.
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Gas`
# b.
gas %>%
model(
classical_decomposition(Gas, type='multiplicative')
) %>%
components() %>%
autoplot()
## Warning: Removed 2 row(s) containing missing values (geom_path).
# d.
classic_decomp_gas <- gas %>%
model(classical_decomposition(Gas, type='multiplicative')) %>%
components()
classic_decomp_gas %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted"))
# e.
classic_decomp_gas <- gas %>%
mutate(Gas = if_else(Quarter == yearquarter("2007Q4"), Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type='multiplicative')) %>%
components()
classic_decomp_gas %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted"))
With the addition of the outlier the seasonally adjusted data is thrown out of whack. The adjusted data looks like it has some seasonality with a large spike for the outlier.
# f.
classic_decomp_gas <- gas %>%
mutate(Gas = if_else(Quarter == yearquarter("2010Q2"), Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type='multiplicative')) %>%
components()
classic_decomp_gas %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = season_adjust, colour = "Seasonally Adjusted"))
If the outlier is at the end then more of a trend is visible rather than seasonality in the seasonally adjusted data.
# seed from HW1 to get same series from retail data
set.seed(81023948)
x11_dcmp <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp)
There is an increasing trend which is a bit more clear since in the last homework I said there appeared to be one but it was murky. It also looks like the seasonal variation is decreasing with time.
There is a clear seasonal pattern to the number of people in the civilian labour force, which looks like three distinct spikes and subsequent drops every year in the season_year component. The total number of people in the civilian labour force is trending upward over time as evidenced by the trend and value components. The trend component also flattened out after 1991, while the remainder component shows that little else is impacting this data, save for the obvious exception in 1991 which will be discussed in part b.
The recession of 1991/1992 is visible in the remainder component as a sharp drop in value, recovery to approximately the baseline, and then another sharp drop and recovery. It is clear that more significant factors other than trend and seasonality were affecting the data here, and so their impact was relegated to the remainder component.