Homework 3

5.1, 5.2, 5.3, 5.4 and 5.7

1.

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Naïve Bayes Classification is a type of probability classifier that applies Bayes’ theorem, which assumes independence between predictors. Naive Bayes classification remains a popular method for the problem of determining a document into one of several categories (e.g. spam, sports, politics) by being used in text classification. In Naise Bayes classification, Bayes’ theorem is used and each data is assumed to be an independent event.
The snaive() function assumes the last value of the same season in the previous year for future forecast values.
Random walk forecast may be appropriate when the data is in a random walk pattern, is relatively stable, and long-term trends are observed.

a.

Australian Population (global_economy)

Utilize Random Walk forecasting due to the necessity of observing long-term trends.

australia_data <- global_economy %>%
  filter(Country == "Australia")

missing <- any(is.na(australia_data))
missing

## [1] TRUE

australia_data %>%
  model(RW(Population ~ drift())) %>%
  forecast(h = 10) %>%
  autoplot(australia_data) +
  labs(title = "Australia Population Growth using Random Walk Forcast")

b.

Bricks (aus_production)

The snaive() function is used because the data exhibits seasonality.

#aus_production
bricks_data <- aus_production %>%
  select(Bricks)

missing <- any(is.na(bricks_data))
missing

## [1] TRUE

bricks_data %>%
  filter(!is.na(Bricks)) %>%
  model(SNAIVE(Bricks)) %>%
  forecast(h = 10) %>%
  autoplot(bricks_data) +
  labs(title = "Bricks Production Forecast using SNAIVE Model")

c.

NSW Lambs (aus_livestock)

Using the NAIVE method seems appropriate since each data point is independent.

aus_livestock

## # A tsibble: 29,364 x 4 [1M]
## # Key:       Animal, State [54]
##       Month Animal                     State                        Count
##       <mth> <fct>                      <fct>                        <dbl>
##  1 1976 Jul Bulls, bullocks and steers Australian Capital Territory  2300
##  2 1976 Aug Bulls, bullocks and steers Australian Capital Territory  2100
##  3 1976 Sep Bulls, bullocks and steers Australian Capital Territory  2100
##  4 1976 Oct Bulls, bullocks and steers Australian Capital Territory  1900
##  5 1976 Nov Bulls, bullocks and steers Australian Capital Territory  2100
##  6 1976 Dec Bulls, bullocks and steers Australian Capital Territory  1800
##  7 1977 Jan Bulls, bullocks and steers Australian Capital Territory  1800
##  8 1977 Feb Bulls, bullocks and steers Australian Capital Territory  1900
##  9 1977 Mar Bulls, bullocks and steers Australian Capital Territory  2700
## 10 1977 Apr Bulls, bullocks and steers Australian Capital Territory  2300
## # ℹ 29,354 more rows

Lambs_data <- aus_livestock %>%
  filter(State == "New South Wales", Animal  == "Lambs")
Lambs_data

## # A tsibble: 558 x 4 [1M]
## # Key:       Animal, State [1]
##       Month Animal State            Count
##       <mth> <fct>  <fct>            <dbl>
##  1 1972 Jul Lambs  New South Wales 587600
##  2 1972 Aug Lambs  New South Wales 553700
##  3 1972 Sep Lambs  New South Wales 494900
##  4 1972 Oct Lambs  New South Wales 533500
##  5 1972 Nov Lambs  New South Wales 574300
##  6 1972 Dec Lambs  New South Wales 517500
##  7 1973 Jan Lambs  New South Wales 562600
##  8 1973 Feb Lambs  New South Wales 426900
##  9 1973 Mar Lambs  New South Wales 496300
## 10 1973 Apr Lambs  New South Wales 496000
## # ℹ 548 more rows

Lambs_data %>%
  model(NAIVE(Count)) %>%
  forecast(h = 10) %>%
  autoplot(Lambs_data) +
  labs(title = "Lambs Count inNew South Wales")

d.

Household wealth (hh_budget).

Random Walk forecasting is used to observe long-term trends.

hh_budget

## # A tsibble: 88 x 8 [1Y]
## # Key:       Country [4]
##    Country    Year  Debt     DI Expenditure Savings Wealth Unemployment
##    <chr>     <dbl> <dbl>  <dbl>       <dbl>   <dbl>  <dbl>        <dbl>
##  1 Australia  1995  95.7 3.72          3.40   5.24    315.         8.47
##  2 Australia  1996  99.5 3.98          2.97   6.47    315.         8.51
##  3 Australia  1997 108.  2.52          4.95   3.74    323.         8.36
##  4 Australia  1998 115.  4.02          5.73   1.29    339.         7.68
##  5 Australia  1999 121.  3.84          4.26   0.638   354.         6.87
##  6 Australia  2000 126.  3.77          3.18   1.99    350.         6.29
##  7 Australia  2001 132.  4.36          3.10   3.24    348.         6.74
##  8 Australia  2002 149.  0.0218        4.03  -1.15    349.         6.37
##  9 Australia  2003 159.  6.06          5.04  -0.413   360.         5.93
## 10 Australia  2004 170.  5.53          4.54   0.657   379.         5.40
## # ℹ 78 more rows

wealth_data <- hh_budget %>%
  select(Wealth)

missing <- any(is.na(wealth_data))
missing

## [1] FALSE

wealth_data %>%
  model(RW(Wealth ~ drift())) %>%
  forecast(h = 10) %>%
  autoplot(wealth_data) +
  labs(title = "Household Wealth using Random Walk Forcast")

e.

Australian takeaway food turnover (aus_retail).

Random Walk forecasting is used to observe long-term trends.

#aus_retail

turnover_data <- aus_retail %>%
  filter(Industry == "Cafes, restaurants and takeaway food services")
turnover_data

## # A tsibble: 3,456 x 5 [1M]
## # Key:       State, Industry [8]
##    State                        Industry           `Series ID`    Month Turnover
##    <chr>                        <chr>              <chr>          <mth>    <dbl>
##  1 Australian Capital Territory Cafes, restaurant… A3349606J   1982 Apr      7.6
##  2 Australian Capital Territory Cafes, restaurant… A3349606J   1982 May      6.7
##  3 Australian Capital Territory Cafes, restaurant… A3349606J   1982 Jun      7.1
##  4 Australian Capital Territory Cafes, restaurant… A3349606J   1982 Jul      7.5
##  5 Australian Capital Territory Cafes, restaurant… A3349606J   1982 Aug      7.3
##  6 Australian Capital Territory Cafes, restaurant… A3349606J   1982 Sep      8.1
##  7 Australian Capital Territory Cafes, restaurant… A3349606J   1982 Oct      8.9
##  8 Australian Capital Territory Cafes, restaurant… A3349606J   1982 Nov      9.6
##  9 Australian Capital Territory Cafes, restaurant… A3349606J   1982 Dec     11.2
## 10 Australian Capital Territory Cafes, restaurant… A3349606J   1983 Jan      7.7
## # ℹ 3,446 more rows

forecasted_data <- turnover_data %>%
  model(RW(Turnover ~ drift())) %>%
  forecast(h = 20)

forecasted_data %>%
  autoplot(turnover_data) +
  labs(title = "Australian Takeaway Food Turnover Forecast") +
  facet_wrap(~State, scales = "free")

2.

Use the Facebook stock price (data set gafa_stock) to do the following:

a.

Produce a time plot of the series.

gafa_stock

## # A tsibble: 5,032 x 8 [!]
## # Key:       Symbol [4]
##    Symbol Date        Open  High   Low Close Adj_Close    Volume
##    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
##  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
##  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
##  3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
##  4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
##  5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
##  6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
##  7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
##  8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200
##  9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400
## 10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700
## # ℹ 5,022 more rows

facebook_stock <- gafa_stock %>%
  filter(Symbol == "FB")

autoplot(facebook_stock, Close) +
    labs(title = "Facebook Close Price")

b.

Produce forecasts using the drift method and plot them.

tsibble is a useful package for handling time series data. The following code demonstrates the use of tsibble to manipulate Facebook stock price data, showcasing its ease of use in managing and analyzing time series data, particularly stock prices over time. tsibble provides a structured approach to working with time series data, enhancing data management and analysis capabilities.

The key point here is that when converting the facebook_stock dataframe to a tsibble using the as_tsibble() function, an error occurs due to missing dates in the original data. This is because tsibbles require regular intervals. Setting regular = TRUE will ensure that the tsibble has regular intervals and prevents errors. However, missing values are still not filled in. Therefore, using the fill_gaps() function fills in the missing values, resulting in a complete time series dataset without errors.

facebook_stock2 <- as_tsibble(facebook_stock, key = "Symbol", index = "Date", regular = TRUE) %>% fill_gaps()

drift_model <- facebook_stock2 %>%
  model(Drift = RW(Close ~ drift())) %>%
  forecast(h = 90)%>%
  autoplot(facebook_stock) +
    labs(title = "Facebook Close Price Forcast")


print(drift_model)

c.

Show that the forecasts are identical to extending the line drawn between the first and last observations.

first_close <- facebook_stock$Close[1]
last_close <- tail(facebook_stock$Close, 1)

first_close

## [1] 54.71

last_close

## [1] 131.09

line_df <- data.frame(x1 = as.Date('2014-01-02'), x2 = as.Date('2018-12-31'), y1 = 54.71, y2 = 131.09)

drift_model <- facebook_stock2 %>%
  model(Drift = RW(Close ~ drift())) %>%
  forecast(h = 90)%>%
  autoplot(facebook_stock) +
    labs(title = "Facebook Close Price Forcast") +
  geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, colour = "segment"), data = line_df)

print(drift_model)

d.

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

auto.arima function & forecast from forecast package.

When comparing drift methods and ARIMA models, ARIMA models are considered superior because they can capture more complex time series structures. The drift method, on the other hand, is used to model a trend that increases or decreases at a constant rate over time. While the drift method can be useful for time series data with relatively simple trends, it may not be suitable for data with more complex patterns that ARIMA models can handle.

auto.arima(facebook_stock2$Close)

## Series: facebook_stock2$Close 
## ARIMA(0,1,0) 
## 
## sigma^2 = 5.112:  log likelihood = -2961.13
## AIC=5924.26   AICc=5924.27   BIC=5929.77

facebook_stock2.arima <- arima(facebook_stock2$Close, order=c(1,1,1))

The selected model, ARIMA(1,1,1), will be used to forecast future numerical values.

facebook_stock2.forecast <- forecast(facebook_stock2.arima, h = 90)

Using the selected ARIMA model, future numerical values are predicted using the forecast function from the forecast package. The predictions are presented as a range of values. Refer to the graph for a visual representation.

plot(facebook_stock2.forecast)

The dark blue line represents the regression line, which is the point estimate. The widest light blue area (80%) and gray area (95%) correspond to the interval prediction values, indicating confidence intervals for the predictions.

3.

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

The seasonal naive method sets the future forecast value to the last observed value from the same season in the previous year, similar to the naive method.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))

# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

auto_arima_Beer <- auto.arima(recent_production$Beer, trace=TRUE, test="adf", ic="aicc")

## 
##  ARIMA(2,0,2) with non-zero mean : Inf
##  ARIMA(0,0,0) with non-zero mean : 770.5697
##  ARIMA(1,0,0) with non-zero mean : 771.9605
##  ARIMA(0,0,1) with non-zero mean : 760.2114
##  ARIMA(0,0,0) with zero mean     : 1111.425
##  ARIMA(1,0,1) with non-zero mean : 761.0401
##  ARIMA(0,0,2) with non-zero mean : 754.1458
##  ARIMA(1,0,2) with non-zero mean : 756.0566
##  ARIMA(0,0,3) with non-zero mean : Inf
##  ARIMA(1,0,3) with non-zero mean : 739.417
##  ARIMA(2,0,3) with non-zero mean : Inf
##  ARIMA(1,0,4) with non-zero mean : 720.5997
##  ARIMA(0,0,4) with non-zero mean : 718.1571
##  ARIMA(0,0,5) with non-zero mean : Inf
##  ARIMA(1,0,5) with non-zero mean : Inf
##  ARIMA(0,0,4) with zero mean     : Inf
## 
##  Best model: ARIMA(0,0,4) with non-zero mean

summary(auto_arima_Beer)

## Series: recent_production$Beer 
## ARIMA(0,0,4) with non-zero mean 
## 
## Coefficients:
##           ma1      ma2      ma3     ma4      mean
##       -0.2380  -0.3937  -0.2247  0.6434  434.3864
## s.e.   0.1064   0.1218   0.1599  0.1104    2.5155
## 
## sigma^2 = 833.3:  log likelihood = -352.45
## AIC=716.9   AICc=718.16   BIC=730.73
## 
## Training set error measures:
##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.8259476 27.87514 22.51016 -0.7114097 5.112844 0.4188737
##                     ACF1
## Training set -0.01507376

checkresiduals(auto_arima_Beer)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,4) with non-zero mean
## Q* = 70.178, df = 6, p-value = 3.758e-13
## 
## Model df: 4.   Total lags used: 10

What do you conclude?

Checking the ACF of the residuals reveals no autocorrelation among them.
The distribution of the residuals appears to follow a normal distribution.
Based on the residual diagnostics, the model appears to be very good, satisfying assumptions of normality and homoscedasticity.

4.

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

In the ACF plot, the blue dotted line indicates the significance level for the correlation coefficient, indicating whether it is significantly different from zero. Normality is considered to be achieved when the correlation coefficient crosses this line. Looking at the ACF plot of the Exports series in the global_economy_aus dataset, we can infer that the time series data is close to a steady state and is predictable.

global_economy

## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # ℹ 15,140 more rows

# Extract data of interest
global_economy_aus <- global_economy %>%
  filter(Country == "Australia")

# Define and estimate a model
fit <- global_economy_aus %>% model(NAIVE(Exports))

# Look at the residuals
fit %>% gg_tsresiduals() +
  ggtitle("Residual Plots for Australian Exports")

# Look at some forecasts
fit %>% 
  forecast() %>% 
  autoplot(global_economy_aus) +
  ggtitle("Australian Exports Annually")

AR 모형

Autocorrelation is a concept in time series analysis where a time series model predicts future values of a variable by linearly combining its past observed values.
Previous observations influence subsequent observations.

auto_arima_global_economy_aus <- auto.arima(global_economy_aus$Exports, trace=TRUE, test="adf", ic="aicc")

## 
##  ARIMA(2,1,2) with drift         : Inf
##  ARIMA(0,1,0) with drift         : 189.2663
##  ARIMA(1,1,0) with drift         : 185.7032
##  ARIMA(0,1,1) with drift         : 182.1347
##  ARIMA(0,1,0)                    : 187.9099
##  ARIMA(1,1,1) with drift         : 181.6751
##  ARIMA(2,1,1) with drift         : Inf
##  ARIMA(1,1,2) with drift         : Inf
##  ARIMA(0,1,2) with drift         : 182.2434
##  ARIMA(2,1,0) with drift         : 186.0136
##  ARIMA(1,1,1)                    : 184.32
## 
##  Best model: ARIMA(1,1,1) with drift

summary(auto_arima_global_economy_aus)

## Series: global_economy_aus$Exports 
## ARIMA(1,1,1) with drift 
## 
## Coefficients:
##          ar1      ma1   drift
##       0.3931  -0.8459  0.1523
## s.e.  0.2449   0.1790  0.0457
## 
## sigma^2 = 1.27:  log likelihood = -86.45
## AIC=180.91   AICc=181.68   BIC=189.08
## 
## Training set error measures:
##                       ME     RMSE       MAE        MPE     MAPE      MASE
## Training set -0.02521336 1.087279 0.8844992 -0.6367124 5.362766 0.8983888
##                     ACF1
## Training set -0.01851169

checkresiduals(auto_arima_global_economy_aus)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,1) with drift
## Q* = 6.1328, df = 8, p-value = 0.6324
## 
## Model df: 2.   Total lags used: 10

Null hypothesis: There is no autocorrelation among the residuals.
Since it is greater than the significance level of 0.05, it means that the residuals satisfy the assumption of a stationary time series.

aus_production

## # A tsibble: 218 x 7 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
##  1 1956 Q1   284    5225    189    465        3923     5
##  2 1956 Q2   213    5178    204    532        4436     6
##  3 1956 Q3   227    5297    208    561        4806     7
##  4 1956 Q4   308    5681    197    570        4418     6
##  5 1957 Q1   262    5577    187    529        4339     5
##  6 1957 Q2   228    5651    214    604        4811     7
##  7 1957 Q3   236    5317    227    603        5259     7
##  8 1957 Q4   320    6152    222    582        4735     6
##  9 1958 Q1   272    5758    199    554        4608     5
## 10 1958 Q2   233    5641    229    620        5196     7
## # ℹ 208 more rows

# Extract data of interest
bricks_data <- aus_production %>%
  filter(!is.na(Bricks)) %>%
  select(Bricks)

# Define and estimate a model
fit <- bricks_data %>% model(SNAIVE(Bricks))

# Look at the residuals
fit %>% gg_tsresiduals() +
  ggtitle("Residual Plots for Bricks Production")

# Look at some forecasts
fit %>% 
  forecast() %>% 
  autoplot(bricks_data) +
  ggtitle("Bricks Production")

auto_arima_bricks_data <- auto.arima(bricks_data$Bricks, trace=TRUE, test="adf", ic="aicc")

## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,0,2) with non-zero mean : Inf
##  ARIMA(0,0,0) with non-zero mean : 2373.113
##  ARIMA(1,0,0) with non-zero mean : 2018.901
##  ARIMA(0,0,1) with non-zero mean : 2216.674
##  ARIMA(0,0,0) with zero mean     : 2952.723
##  ARIMA(2,0,0) with non-zero mean : 2021.714
##  ARIMA(1,0,1) with non-zero mean : 2016.569
##  ARIMA(2,0,1) with non-zero mean : Inf
##  ARIMA(1,0,2) with non-zero mean : 2044.152
##  ARIMA(0,0,2) with non-zero mean : 2092.951
##  ARIMA(1,0,1) with zero mean     : Inf
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(1,0,1) with non-zero mean : 2023.215
## 
##  Best model: ARIMA(1,0,1) with non-zero mean

summary(auto_arima_global_economy_aus)

## Series: global_economy_aus$Exports 
## ARIMA(1,1,1) with drift 
## 
## Coefficients:
##          ar1      ma1   drift
##       0.3931  -0.8459  0.1523
## s.e.  0.2449   0.1790  0.0457
## 
## sigma^2 = 1.27:  log likelihood = -86.45
## AIC=180.91   AICc=181.68   BIC=189.08
## 
## Training set error measures:
##                       ME     RMSE       MAE        MPE     MAPE      MASE
## Training set -0.02521336 1.087279 0.8844992 -0.6367124 5.362766 0.8983888
##                     ACF1
## Training set -0.01851169

checkresiduals(auto_arima_bricks_data )

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,1) with non-zero mean
## Q* = 301.56, df = 8, p-value < 2.2e-16
## 
## Model df: 2.   Total lags used: 10

Null hypothesis: There is autocorrelation among the residuals.
The residual does not satisfy the assumption of a stationary time series, as the significance level is less than 0.05.

7.

For your retail time series (from Exercise 7 in Section 2.10):

set.seed(12345678)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

a.

Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

b,

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c.

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train %>%
  model(SNAIVE(Turnover))

d. Check the residuals.

fit %>% gg_tsresiduals()

Do the residuals appear to be uncorrelated and normally distributed?

This indicates that there is initially strong autocorrelation among the residuals, but over time, the autocorrelation weakens and eventually disappears. This suggests that the assumption of a stationary time series is not met. It could imply that the model initially did not explain the data well, but its predictions became more accurate over time. Therefore, in this scenario, the residuals can be considered uncorrelated and to follow a normal distribution..

e.

Produce forecasts for the test data

fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc %>% autoplot(myseries)

f.

Compare the accuracy of your forecasts against the actual values.

fit %>% accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… SNAIV… Trai… 0.439  1.21 0.915  5.23  12.4     1     1 0.768

fc %>% accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Clothin… Test  0.836  1.55  1.24  5.94  9.06  1.36  1.28 0.601

How sensitive are the accuracy measures to the amount of training data used?

Accuracy metrics can be sensitive to the amount of training data used. Generally, using more training data allows for more stable and reliable accuracy measures.

The predictive power of these models is assessed using metrics such as RMSE (smaller is better), MAE (smaller is better), MAPE (smaller is better), MASE (smaller is better), RMSSE (smaller than 1 is better), etc. These metrics are important indicators for evaluation.

By comparing the two tables, it is apparent that the model performs relatively better on the training data but has poorer prediction performance on the test data. This suggests that the model may be overfitting the training data, meaning it is overly tuned to the training data and performs poorly on new data.

Homework 3

Seung Min Song

2024-02-18

1.

a.

b.

c.

d.

e.

2.

a.

b.

c.

d.

3.

4.

7.

a.

b,

c.

d. Check the residuals.

e.

f.