Question 2.8:
1) Total Private Employes from us_employement
There is some seasonality in the data, whereby the number employed is at its lowest in January, increases to peak in June - August, then decreases again till the end of the year. Additionally, The number of people employed was on an increasing trend, despite decreasing in some years, which were during the big recessions. The recession years also went against the seasonality rules, for example, in 2009 the number of people employed in January was the highest during the year.
The autocorrelation graph shows very high correlation across all of the years. It starts almost equal to one at a lag of 1 month, and continues to decrease as the lags increase.
library('fpp3')
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(ggplot2)
us_emp <- us_employment %>% filter(Title == "Total Private")
us_emp |> autoplot(Employed)

us_emp |> gg_season(Employed, labels="both")

us_emp |> gg_subseries(Employed)

us_emp |> gg_lag(Employed)

us_emp |> ACF(Employed) |> autoplot()
2) Bricks from aus_production
The general trend is one where the production increased from 1960 till 1980, but then started to decrease. There is a seasonal trend whereby the first quarter has slightly lower values, and then the 2nd - 4th quarter of the year are higher. However, the seasonality is not consistent across all of the years, and is quite variable year-on-year. As a result of the seasonality, the autocorrelation graph has a general decreasing trend but peaks at the multiples of 4.
aus_production |> autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |> gg_season(Bricks, labels="both")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text()`).

aus_production |> gg_subseries(Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |> gg_lag(Bricks, geom="point")
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production |> ACF(Bricks) |> autoplot()

3) Hare from pelt
There is seasonality in the data, whereby the value of Hare is increasing dramatically in certain years, but then dropping in others. However, there is no general trend in the data, it’s fluctuating around a similar mean across the years, except for the years around 1865 and 1885 where the increase was much more dramatic.
The autocorrelation graph shows the seasonality of the data, where the correlation is highly positive, then becomes highly negative, and then repeats.
pelt |> autoplot(Hare)

#pelt |> gg_season(Hare, labels="both")
pelt |> gg_subseries(Hare)

pelt |> gg_lag(Hare, geom="point")

pelt |> ACF(Hare) |> autoplot()
4) “H02” Cost from PBS
There is a general increasing trend in the data as well as clear seasonality, whereby the cost in January is high then decreases significantly in February and starts to gradually increase till December. All of the years follow the general trend and seasonality with some fluctuation in the year 2008.
The seasonality is also clear in the autocorrelation plot, which remains positive across the years, but continues to increase and decrease.
PBS_filter <- PBS %>% filter(ATC2 == "H02") %>% summarise(Cost=sum(Cost))
PBS_filter |> autoplot(Cost)

PBS_filter |> gg_season(Cost, labels="both")

PBS_filter |> gg_subseries(Cost)

PBS_filter |> gg_lag(Cost, geom="point")

PBS_filter |> ACF(Cost) |> autoplot()
5) Barrels from us_gasline
The general trend of the data was that it was increasing steadily over the years, but the trend shifted around 2008/2009 and decreased, but it looks to be in a general increasing trend again. There is seasonality in the data, where the summer months are slightly higher than the winter months, but the values fluctuate heavily week on week. The autocorrelation plot is positive across the years, but has a decreasing trend, with slight peaks at some lags due to seasonality.
us_gasoline |> autoplot(Barrels)
us_gasoline |> gg_season(Barrels, labels="both")
us_gasoline |> gg_subseries(Barrels)
us_gasoline |> gg_lag(Barrels, geom="point")
us_gasoline |> ACF(Barrels) |> autoplot()
Question 7.4:
souvenirs |> autoplot()
## Plot variable not specified, automatically selected `.vars = Sales`
souvenirs_log <- souvenirs %>%
mutate(log_sales = log(Sales))
souvenirs_log |> autoplot(log_sales)
#Mutating the dataset: took the log of sales and created a new dummy variable called festival that assigned 1 if the month is Jan, Feb, or Mar
souvenirs_question <- souvenirs %>%
mutate(
log_sales = log(Sales),
month = as.character(month(Month, label=TRUE)),
festival = if_else(month %in% c("Jan", "Feb", "Mar", "Apr"), 1, 0),
winter = if_else(month %in% c("Dec", "Jan", "Feb"), 1, 0),
spring = if_else(month %in% c("Mar", "Apr", "May"), 1, 0),
summer = if_else(month %in% c("Jun", "Jul", "Aug"), 1, 0),
fall = if_else(month %in% c("Sep", "Oct", "Nov"), 1, 0)
)
souvenirs_question
#fitted the model to the data
model <- souvenirs_question %>% model(lm = TSLM(log_sales ~ Month + winter + spring + summer + fall + festival))
#plotting the fitted line
augment(model) |>
ggplot(aes(x = Month)) +
geom_line(aes(y = log_sales)) +
geom_line(aes(y = .fitted, colour = "Fitted"))
#Modeling the residulas against the fitted valeus
augment(model) |>
ggplot(aes(x = .fitted, y = .resid)) +
geom_point() + labs(x = "Fitted", y = "Residuals")
#assessment of the residuals
model |> gg_tsresiduals()
augmented_data <- augment(model)
augmented_data$residuals <- augmented_data$.resid
augmented_data$Month <- as.factor(augmented_data$Month)
ggplot(augmented_data, aes(x = Month, y = residuals)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Boxplot of Residuals by Month", x = "Month", y = "Residuals")
report(model)
## Series: log_sales
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.00284 -0.26948 0.01035 0.22375 0.83185
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.698e+00 4.741e-01 7.800 2.29e-11 ***
## Month 7.665e-04 6.142e-05 12.480 < 2e-16 ***
## winter 4.334e-01 1.558e-01 2.781 0.00678 **
## spring 1.943e-01 1.559e-01 1.247 0.21612
## summer -3.209e-01 1.273e-01 -2.520 0.01376 *
## fall NA NA NA NA
## festival -8.455e-01 1.354e-01 -6.245 2.07e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4122 on 78 degrees of freedom
## Multiple R-squared: 0.7442, Adjusted R-squared: 0.7278
## F-statistic: 45.39 on 5 and 78 DF, p-value: < 2.22e-16
resids <- residuals(model)$.resid
Box.test(resids, lag = 10, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: resids
## X-squared = 107.61, df = 10, p-value < 2.2e-16
#forecast(model, h = 1)
** Question 8.1:**
aus_pigs <- aus_livestock |> filter(Animal=="Pigs") |> summarise(Count = sum(Count))
aus_pigs |> autoplot(aus_pigs$Count)
## Warning: Use of `aus_pigs$Count` is discouraged.
## ℹ Use `Count` instead.
model_ets <- aus_pigs |> model(ETS(Count))
model_ets |> report()
## Series: Count
## Model: ETS(M,Ad,M)
## Smoothing parameters:
## alpha = 0.09821581
## beta = 0.08632586
## gamma = 0.144023
## phi = 0.8000286
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5]
## 422028 -4053.052 1.043194 1.068446 0.9533266 0.9723318 0.8927822 0.8661272
## s[-6] s[-7] s[-8] s[-9] s[-10] s[-11]
## 1.038676 1.0529 1.040113 0.9965679 1.019755 1.05578
##
## sigma^2: 0.0027
##
## AIC AICc BIC
## 14615.56 14616.83 14693.40
model_ets |> forecast(h=4, level=.95) |> autoplot(aus_pigs)
forecast <- forecast(model_ets, h=4, level=.95)
forecast
forecast_1 <- forecast$.mean[1]
residuals <- residuals(model_ets)
residual_sd <- sd(residuals$.resid)
residual_sd
## [1] 0.05103259
lower_bound <- forecast_1 - (1.96 * residual_sd)
upper_bound <- forecast_1 + (1.96 * residual_sd)
cat(lower_bound, forecast_1, upper_bound)
## 419019.7 419019.8 419019.9