Assignment: Homework Hyndman Chapter 3 & 7 Exercises

Exercises Chapter 3

Exercise 1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy %>%
  tsibble(key = Code, index = Year)%>%
  autoplot(GDP/Population, show.legend =  FALSE) +
  labs(title= "GDP per capita",
       y = "USD")

## Warning: Removed 3242 row(s) containing missing values (geom_path).

maximun_value  <- global_economy %>%
              mutate(sum = GDP/Population)
grouped <- setkey(setDT(maximun_value), Country)[,list(sum=sum(sum)),by=list(Country)]
maximun_value <- grouped[which.max(grouped$sum),]
maximun_value

##       Country     sum
## 1: Luxembourg 2398911

Luxembourgh has the highest GDP per capita considering the sum of whole historic data, now in the last year (2017).

a<-as.data.frame(global_economy)
a$percapita <- a$GDP/a$Population
just_2017<-a[a$Year=="2017",]
just_2017_ordered=just_2017[order(just_2017$percapita, decreasing = TRUE), ]
head(just_2017_ordered)

##                Country Code Year          GDP   Growth       CPI   Imports
## 8404        Luxembourg  LUX 2017  62404461275 2.297941 111.41323 193.96982
## 8462  Macao SAR, China  MAC 2017  50361201096 9.096104 136.09909  32.00998
## 13440      Switzerland  CHE 2017 678887336848 1.086792  98.26686  53.92747
## 10492           Norway  NOR 2017 398831956478 1.918746 114.55072  33.05191
## 6084           Iceland  ISL 2017  23909289979 3.642227 121.95696  42.84878
## 6606           Ireland  IRL 2017 333730764773 7.802382 105.07959  87.87832
##         Exports Population percapita
## 8404  230.01643     599449 104103.04
## 8462   79.39530     622567  80892.82
## 13440  64.97673    8466017  80189.70
## 10492  35.47242    5282223  75504.57
## 6084   46.96383     341284  70056.87
## 6606  120.01253    4813608  69330.69

Now in 2016.

a<-as.data.frame(global_economy)
a$percapita <- a$GDP/a$Population
just_2016<-a[a$Year=="2016",]
just_2016_ordered=just_2016[order(just_2016$percapita, decreasing = TRUE), ]
head(just_2016_ordered)

##                Country Code Year          GDP     Growth      CPI   Imports
## 9505            Monaco  MCO 2016   6468252212  3.2138488       NA        NA
## 8113     Liechtenstein  LIE 2016   6214633651         NA       NA        NA
## 8403        Luxembourg  LUX 2016  58631324559  3.0826433 109.5177 186.16333
## 13439      Switzerland  CHE 2016 668745279605  1.3758845  97.7451  54.58890
## 6663       Isle of Man  IMN 2016   6592627599  7.4000000       NA        NA
## 8461  Macao SAR, China  MAC 2016  45310877913 -0.8631188 134.4500  34.51534
##         Exports Population percapita
## 9505         NA      38499 168010.91
## 8113         NA      37666 164993.19
## 8403  221.26778     582014 100738.68
## 13439  65.81131    8373338  79866.03
## 6663         NA      83737  78730.16
## 8461   76.07221     612167  74017.18

The world leadership in terms of GDP per capita has been a competition between 4 countries,Monaco, Liechtenstein, Luxembourg and Switzerland.

Exercise 2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

-United States GDP from global_economy. -Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. -Victorian Electricity Demand from vic_elec. -Gas production from aus_production.

United States GDP from global_economy

First we plot the GDP without any adjustment, then we adjust by population

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP)+
  labs(title = "United States GDP", y = "USD")+
  theme_replace()+
  geom_line(col = "#8DC4E9")

global_economy %>%
  filter(Country == "United States") %>%
  tsibble(key = Code, index = Year)%>%
  autoplot(GDP/Population, show.legend =  FALSE)+
  labs(title = "United States GDP per capita", y = "USD")+
  theme_replace()+
  geom_line(col = "#8DC4E9")

As we are just graphing US is not important the population adjustment, but we applied it because it lets to compare the value directly with other counstries.

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock

aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria")%>%
  autoplot(Count)+
  labs(title = "Slaughter of Victorian “Bulls, bullocks and steers”")+
  theme_replace()+
  geom_line(col = "#1B89D3")

No adjustment was considered.

Victorian Electricity Demand from vic_elec.

vic_elec %>%
  autoplot(Demand)+
  labs(title = "Victorian Electricity Demand",
       y = "Demand")+
  theme_replace()+
  geom_line(col = "#D63B22")

No adjustment was considered.

Gas production from aus_production.

aus_production %>%
  autoplot(Gas)+
  labs(title = "Gas production")+
  theme_replace()+
  geom_line(col = "#30D622")

Here, as the book says, it will be useful if we apply a transformation because the variation increases with the level of the serie.

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

Here it is being applied the Guerrero Method to select the lambda, which minimises the coefficient of variation for subseries of x. We can see that the transformation made the variation steady along the serie.

Exercise 3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
  autoplot(Volume)+
  labs(title = "Canadian Gas Production",
       y = "Monthly Gas Production (billions of cubic meter)")+
  theme_replace()+
  geom_line(col = "#CE22D6")

The Gas production shows a seasonal variance that is low between 1960 and 1978, then is large between 1978 and 1988 and is low again between 1988 and 2005, so is expected that the Box Cox transformation will be unhelpful. Here the result of the transformation.

lambda_cangas <- canadian_gas %>%
                  features(Volume, features = guerrero) %>%
                  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda = lambda_cangas))+
  labs(title = latex2exp::TeX(paste0(
         "Box Cox Transformation of Gas Production with $\\lambda$ = ",
         round(lambda_cangas,2))))+
  theme_replace()+
  geom_line(col = "#1B89D3")

The graph shows that the transformation was unhelpful.

Exercise 4

What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(569973)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot(Turnover)+
  labs(title = "Australian Retail Trade Turnover")+
  theme_replace()+
  geom_line(col = "#C70039")

The graph shows that the seasonal variation increases with time, so it will be expected that a Box Cox transformation will be helpful in this case. Here the result.

lambda_retail <- myseries %>%
                  features(Turnover, features = guerrero) %>%
                  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda_retail))+
  labs(title = latex2exp::TeX(paste0(
         "Box Cox Transformation of Australian Retail Trade Turnover with $\\lambda$ = ",
         round(lambda_retail,2))))+
  theme_replace()+
  geom_line(col = "#C70039")

Exercise 5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco

aus_production %>%
  autoplot(Tobacco)+
  labs(title = "Tobacco")+
  theme_replace()+
  geom_line(col = "#FFC300")

## Warning: Removed 24 row(s) containing missing values (geom_path).
## Removed 24 row(s) containing missing values (geom_path).

Box Cox transformation

lambda_tobacco <- aus_production %>%
                   features(Tobacco, features = guerrero) %>%
                   pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Tobacco, lambda_tobacco)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Tobacco with $\\lambda$ = ",
         round(lambda_tobacco,2))))+
  theme_replace()+
  geom_line(col = "#FFC300")

## Warning: Removed 24 row(s) containing missing values (geom_path).
## Removed 24 row(s) containing missing values (geom_path).

Economy class passengers

ansett %>%filter(Class == "Economy",Airports == "MEL-SYD")%>%
  mutate(Passengers = Passengers/1000)%>%
  autoplot(Passengers)+
  labs(title = "Ansett Airlines Economy Class Passengers")+
  theme_replace()+
  geom_line(col = "#FFC300")

Box Cox transformation

lambda_class <- ansett %>%
                 filter(Class == "Economy",
                        Airports == "MEL-SYD")%>%
                 features(Passengers, features = guerrero) %>%
                 pull(lambda_guerrero)
ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")%>%
  mutate(Passengers = Passengers/1000) %>%
  autoplot(box_cox(Passengers, lambda = lambda_class)) +
  labs(y = "Passengers ('000)",
       title = latex2exp::TeX(paste0(
         "Transformed Ansett Airlines Economy Class with $\\lambda$ = ",
         round(lambda_class,2))),
       subtitle = "Melbourne-Sydney")+
  theme_replace()+
  geom_line(col = "#FFC300")

Pedestrian counts

pedestrian %>%filter(Sensor == "Southern Cross Station")%>%

  autoplot(Count)+
  labs(title = "Pedestrian Counts at Southern Cross Station")+
  theme_replace()+
  geom_line(col = "#FFC300")

Box Cox Transformation

lambda_count <- pedestrian %>%
                filter(Sensor == "Southern Cross Station") %>%
                 features(Count, features = guerrero) %>%
                 pull(lambda_guerrero)
pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  autoplot(box_cox(Count,lambda_count))+
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Pedestrian Counts at Southern Cross Station with $\\lambda$ = ",
         round(lambda_count,2))))+
  theme_replace()+
  geom_line(col = "#FFC300")

Exercise 6

Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

The demonstration is as follows.

Exercise 7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% dplyr::select(Gas)
head(gas)

## # A tsibble: 6 x 2 [1Q]
##     Gas Quarter
##   <dbl>   <qtr>
## 1   221 2005 Q3
## 2   180 2005 Q4
## 3   171 2006 Q1
## 4   224 2006 Q2
## 5   233 2006 Q3
## 6   192 2006 Q4

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>%
  autoplot()+
  labs(title = "Last Five Years of The Gas Data")+
  theme_replace()+
  geom_line(col = "#581845")

## Plot variable not specified, automatically selected `.vars = Gas`

There is a increasing trend and a seasonality of 1 year.

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas %>%
  model(classical_decomposition(Gas,type = "multiplicative")) %>%
  components() %>%
  autoplot() + 
  ggtitle("Last Five Years of The Gas Data")

## Warning: Removed 2 row(s) containing missing values (geom_path).

descom<-gas %>%
  model(classical_decomposition(Gas,type = "multiplicative")) %>%
  components()

a=mean(descom$seasonal)

b =mean(descom$trend,na.rm=TRUE)
  
cat(" The calculated seasonality is ", a, " and the trend is", b)

##  The calculated seasonality is  1  and the trend is 216.3203

c. Do the results support the graphical interpretation from part a?

Yes, with the graphical interpretation we saw a seasonality of 1 the same that we obtained in the numerical decomposition. Also we obtained a positive trend, the same that we saw in the graphical interpretation.

d. Compute and plot the seasonally adjusted data.

gas_decom <- gas %>%
             model(classical_decomposition(Gas,type = "multiplicative")) %>%
             components()
gas_decom %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "",
       title = "Last Five Years of The Gas Data") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 row(s) containing missing values (geom_path).

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas2 <- gas
gas2$Gas[11] <- gas2$Gas[11]+300

gas2 %>%
  model(classical_decomposition(Gas,type = "multiplicative")) %>%
  components() %>%
  autoplot() + 
  ggtitle("Last Five Years of The Gas Data with outlier")

## Warning: Removed 2 row(s) containing missing values (geom_path).

gas2 %>%
  model(classical_decomposition(Gas,type = "multiplicative")) %>%
  components() %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "",
       title = "Last Five Years of The Gas Data with 300 added ") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 row(s) containing missing values (geom_path).

It is observed that the outlier caused a spike in the seasonally adjusted data.The addition of 300 to the 11th observation has not effect in the seasonality of the sere, this is because the seasonal component is uniform for each year and only one data point has changed.

In terms of the trend, it caused a decreasing trend from early 2008 until middle 2008

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas3 <- gas
gas3$Gas[20] <- gas3$Gas[10]+300

gas3 %>%
  model(classical_decomposition(Gas,type = "multiplicative")) %>%
  components() %>%
  autoplot() + 
  ggtitle("Last Five Years of The Gas Data with outlier ")

## Warning: Removed 2 row(s) containing missing values (geom_path).

The seasonal data is not affected, also the trend is more like the graph without any outlier, but it shows an increasing trend at the end of the serie.

Exercise 8.

Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Multiplicative decomposition.

myseries %>%
  model(classical_decomposition(Turnover,type = "multiplicative")) %>%
  components() %>%
  autoplot() + 
  ggtitle("Multiplicative decomposition of my retail time series")

## Warning: Removed 6 row(s) containing missing values (geom_path).

#### X11 decomposition.

Here is the X11 decomposition, the ggfortify package is detached since it is over-writing the autoplot() function.

data<-ts(myseries$Turnover,start=c(1982,4), end=c(2000,12), frequency=12)

#detach(ggfortify)
library(quantmod)

## Warning: package 'quantmod' was built under R version 4.0.5

## Loading required package: xts

## Warning: package 'xts' was built under R version 4.0.5

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 4.0.5

## 
## Attaching package: 'zoo'

## The following object is masked from 'package:tsibble':
## 
##     index

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:data.table':
## 
##     first, last

## The following objects are masked from 'package:dplyr':
## 
##     first, last

## Loading required package: TTR

## Warning: package 'TTR' was built under R version 4.0.5

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(forecast)

## Warning: package 'forecast' was built under R version 4.0.5

## Registered S3 methods overwritten by 'forecast':
##   method                 from     
##   autoplot.Arima         ggfortify
##   autoplot.acf           ggfortify
##   autoplot.ar            ggfortify
##   autoplot.bats          ggfortify
##   autoplot.decomposed.ts ggfortify
##   autoplot.ets           ggfortify
##   autoplot.forecast      ggfortify
##   autoplot.stl           ggfortify
##   autoplot.ts            ggfortify
##   fitted.ar              ggfortify
##   fortify.ts             ggfortify
##   residuals.ar           ggfortify

## 
## Attaching package: 'forecast'

## The following object is masked _by_ '.GlobalEnv':
## 
##     gas

library(ggplot2)
library(seasonal)

## Warning: package 'seasonal' was built under R version 4.0.5

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

data %>% seas(x11="") -> fit

autoplot(fit) +
  ggtitle("X11 decomposition of my retail time series")

In terms of time, the multiplicative was less computational expense than the X11 decomposition. On the other hand, the X 11 decomposition trend has captured the sudden fall in the 2000-2010, and also a little change in the seasonality.

Exercise 9.

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Is the recession of 1991/1992 visible in the estimated components?

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

1.The trend component shows that the trend has increased thought the majority of the time. 2. The seasonal component shows that it is an increase employment at the end of the year and a decrease behavior at the beginning. 3. There is a random valley in 1991.

b. Is the recession of 1991/1992 visible in the estimated components?

Yes, we see a decrease in the serie of employment during 1991/1992 that is not explained by seasonality or the positive trend, because it is explained by the random component.

Exercise 10.

This exercise uses the canadian_gas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).

a. Plot the data using autoplot(), gg_subseries() and gg_season() to look at the effect of the changing seasonality over time.

autoplot()

canadian_gas %>%
  autoplot(Volume)+
  labs(title = "Monthly Canadian Gas Production",
       subtitle = "autoplot()",
       y = "billions of cubic meter")+
  theme_replace()+
  geom_line(col = "#C94FC7")

gg_subseries()

canadian_gas %>%
  gg_subseries(Volume)+
  labs(title = "Monthly Canadian Gas Production",
       subtitle = "gg_subseries()",
       y = "billions of cubic meter")

gg_season

canadian_gas %>%
  gg_season(Volume)+
  labs(title = "Monthly Canadian Gas Production",
       subtitle = "gg_season()",
       y = "billions of cubic meter")

The plots show a clear seasonality and a trend, Season plot shows this behavior decreasing between march and June and increasing between June and February.

b. Do an STL decomposition of the data. You will need to choose a seasonal window to allow for the changing shape of the seasonal component.

canadian_gas %>%
  model(
    STL(Volume ~ trend(window = 21) +
                   season(window = 13),
    robust = TRUE)) %>%
  components() %>%
  autoplot()+
  labs(title = "STL decomposition of Canadian Gas Production")

The trend represent the the trend of the original data. Between 1975 and 1985 the seasonal component increases and then decreases.

c. How does the seasonal shape change over time? [Hint: Try plotting the seasonal component using gg_season().]

As shown above, the seasonal shape is flat from beginning and then as the time goes by the seasonal shape increases. Between 1980 and 1990 it increases, and then decrease but changing the timing.

d.Can you produce a plausible seasonally adjusted series?

canadian_gas %>%
 model(
    STL(Volume ~ trend(window = 21) +
                   season(window = 13),
    robust = TRUE)) %>%
  components() %>%
  ggplot(aes(x = Month)) +
  geom_line(aes(y = Volume, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(title = "STL decomposition of Canadian Gas Production") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

e.Compare the results with those obtained using SEATS and X-11. How are they different?

canadian_gas %>%
  model(x11 = X_13ARIMA_SEATS(Volume ~ x11())) %>%
  components() %>%
  autoplot()+
  labs(title = "X-11 decomposition of Canadian Gas Production")

canadian_gas %>%
  model(seats = X_13ARIMA_SEATS(Volume ~ seats())) %>%
  components() %>%
autoplot() +
  labs(title ="SEATS Decomposition of Canadian Gas Production")

The decomposed trend and seasonal components are similar. The random component of SEAT is greater than of the X11.

Exercises Chapter 7

Exercise 1

Half-hourly electricity demand for Victoria, Australia is contained in vic_elec. Extract the January 2014 electricity demand, and aggregate this data to daily with daily total demands and maximum temperatures.

jan14_vic_elec <- vic_elec %>%
  filter(yearmonth(Time) == yearmonth("2014 Jan")) %>%
  index_by(Date = as_date(Time)) %>%
  summarise(
    Demand = sum(Demand),
    Temperature = max(Temperature)
  )

a.Plot the data and find the regression model for Demand with temperature as a predictor variable. Why is there a positive relationship

jan14_vic_elec %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) + 
  ggtitle("Electricity Demand for Victoria, Australia (2014)") +
  ylab("Demand") + 
  xlab("Temperature (Celsius)") + 
  geom_point() + 
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

jan14_vic_elec %>%
  model(TSLM(Demand~Temperature)) %>%
  report()

## Series: Demand 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -49978.2 -10218.9   -121.3  18533.2  35440.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  59083.9    17424.8   3.391  0.00203 ** 
## Temperature   6154.3      601.3  10.235 3.89e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24540 on 29 degrees of freedom
## Multiple R-squared: 0.7832,  Adjusted R-squared: 0.7757
## F-statistic: 104.7 on 1 and 29 DF, p-value: 3.8897e-11

The report displays a positive coefficient for temperature, this means that Demand demand grows if temperature grows, therefore there is a positive relationship. This happens because when you have high temperature people tend to use more air conditioners and are ofter more in home.

b.Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?.

jan14_vic_elec %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
    ylab("Electricity Demand") +
    xlab("Temperature") +
    geom_point() +
    geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

tslm_Dem_Temp<-jan14_vic_elec %>%
  model(TSLM(Demand~Temperature)) %>%
  report()

## Series: Demand 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -49978.2 -10218.9   -121.3  18533.2  35440.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  59083.9    17424.8   3.391  0.00203 ** 
## Temperature   6154.3      601.3  10.235 3.89e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24540 on 29 degrees of freedom
## Multiple R-squared: 0.7832,  Adjusted R-squared: 0.7757
## F-statistic: 104.7 on 1 and 29 DF, p-value: 3.8897e-11

tslm_Dem_Temp %>% gg_tsresiduals()

The variability of the residual increases as time goes by. Also, The majority of residual are around 0, but also is shown some outliers with values of -30000 and 30000.

c.Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was 15C and compare it with the forecast if the with maximum temperature was 35C. Do you believe these forecasts? The following R code will get you started:

#scenarios creation.
new_cons <- scenarios(
  "Temperature of 15C" = new_data(jan14_vic_elec, 1) %>%
    mutate(Temperature = 15),
  "Temperature of 35C" = new_data(jan14_vic_elec, 1) %>%
    mutate(Temperature = 35),
  names_to = "Scenario"
)
#forescast.
fcast <- forecast(tslm_Dem_Temp, new_cons)
fcast$.mean

## [1] 151398.4 274484.2

print("Demand")

## [1] "Demand"

print(mean(jan14_vic_elec$Demand))

## [1] 231622.6

print("Temperature")

## [1] "Temperature"

print(mean(jan14_vic_elec$Temperature))

## [1] 28.03548

Here we apply a graphic approach.

jan14_vic_elec %>%
  model(TSLM(Demand ~ Temperature)) %>%
  forecast(
    new_data(jan14_vic_elec, 1) %>%
      mutate(Temperature = 15)
  ) %>%
  autoplot(jan14_vic_elec)

jan14_vic_elec %>%
  model(TSLM(Demand ~ Temperature)) %>%
  forecast(
    new_data(jan14_vic_elec, 1) %>%
      mutate(Temperature = 35)
  ) %>%
  autoplot(jan14_vic_elec)

I trust the forecast of the demand with a temperature of 35, but I do not trust the forecast of the demand with a temperature of 15. In the image you can see that the demands less than 20000 are outliers.

plot(jan14_vic_elec$Temperature)

And plotting the temperature we can see that a temperature of 15C is an outlier, thus I partially believe in the model.

d. Give prediction intervals for your forecasts.

#scenarios creation.
fit <- lm(Demand ~ Temperature, data=jan14_vic_elec)

forecast(fit, newdata=data.frame(Temperature=c(15,35)))

##           Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 151398.35       151398.4 117127.2 185669.5  97951.22 204845.5
## 274484.25       274484.2 241333.0 307635.5 222783.69 326184.8

e. Plot Demand vs Temperature for all of the available data in vic_elec aggregated to daily total demand and maximum temperature. What does this say about your model?

#scenarios creation.
plot(Demand~Temperature, data=vic_elec, main= "Demand vs. Temperature for All Data in vic_elec")

There is a clear relationship between Temperature and Demand. More temperature more demand from 20C, before the relationship seems neutral or a little bit negative.

Exercise 2

Data set olympic_running contains the winning times (in seconds) in each Olympic Games sprint, middle-distance and long-distance track events from 1896 to 2016.

a. Plot the winning time against the year for each event. Describe the main features of the plot

autoplot(olympic_running)

## Plot variable not specified, automatically selected `.vars = Time`

How is expected the winning time is decreasing as the time goes. Also, between some year there is not data so we can assume that it was for world wars.

b. Fit a regression line to the data for each event. Obviously the winning times have been decreasing, but at what average rate per year?

Here we are going to plot 3 events. This because the events have the same behaviour.

mens=olympic_running[olympic_running$Sex=="men",]
women2=olympic_running[olympic_running$Sex=="women",]

mens100=mens[mens$Length=="100",]
women100=women2[women2$Length=="100",]
mens800=mens[mens$Length=="800",]

Mens100

mens100<-ts(mens100$Time,start=c(1896), end=c(2015), frequency=1)
#Interpolate missing values (No transformation is required)
df <- na.interp(mens100, lambda = NULL)
t <- time(df)

fit.mens100 <- tslm(df~t, data = mens100)

autoplot(mens100, series = "Data") + 
  autolayer(fitted(fit.mens100), series = "Fitted") + ggtitle({"Winning Times in Olympic Men's 100m Track Final (1896-2016)"}) + xlab("Time") + ylab("Seconds") + scale_color_manual(values = c("black", "red"))

Mens800

mens800<-ts(mens800$Time,start=c(1896), end=c(2015), frequency=1)
#Interpolate missing values (No transformation is required)
df <- na.interp(mens800, lambda = NULL)
t <- time(df)

fit.mens800 <- tslm(df~t, data = mens800)

autoplot(mens800, series = "Data") + 
  autolayer(fitted(fit.mens800), series = "Fitted") + ggtitle({"Winning Times in Olympic Men's 800m Track Final (1896-2016)"}) + xlab("Time") + ylab("Seconds") + scale_color_manual(values = c("black", "red"))

Women100

Women100<-ts(women100$Time,start=c(1896), end=c(2015), frequency=1)
#Interpolate missing values (No transformation is required)
df <- na.interp(Women100, lambda = NULL)
t <- time(df)

fit.women100 <- tslm(df~t, data = women100)

autoplot(women100, series = "Data") + 
  autolayer(fitted(fit.women100), series = "Fitted") + ggtitle({"Winning Times in Olympic Women 100m Track Final (1896-2016)"}) + xlab("Time") + ylab("Seconds") + scale_color_manual(values = c("black", "red"))

## Plot variable not specified, automatically selected `.vars = Time`

## Warning: Ignoring unknown parameters: series

This three events show a negative slope of the linear model, so this demonstrate that in each year the winning time is decreasing.

c.Plot the residuals against the year. What does this indicate about the suitability of the fitted lines?

Here we are going to see the residual plots of the three events that were used previously.

Mens100

fit <- olympic_running %>%
  filter(Sex=="men") %>%
  filter(Length=="100") %>%
  model(TSLM(Time ~ Year))
fit %>% gg_tsresiduals()

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing non-finite values (stat_bin).

Mens800

fit <- olympic_running %>%
  filter(Sex=="men") %>%
  filter(Length=="800") %>%
  model(TSLM(Time ~ Year))
fit %>% gg_tsresiduals()

## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing non-finite values (stat_bin).

Women100

fit <- olympic_running %>%
  filter(Sex=="women") %>%
  filter(Length=="100") %>%
  model(TSLM(Time ~ Year))
fit %>% gg_tsresiduals()

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing non-finite values (stat_bin).

d.Predict the winning time for each race in the 2020 Olympics. Give a prediction interval for your forecasts. What assumptions have you made in these calculations?

Mens100

olympic_running %>%
  filter(Sex=="men") %>%
  filter(Length=="100") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(
    new_data(olympic_running, 1) %>%
      mutate(Year = 2020)
  ) %>%
  autoplot(olympic_running)

women100

olympic_running %>%
  filter(Sex=="women") %>%
  filter(Length=="100") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(
    new_data(olympic_running, 1) %>%
      mutate(Year = 2020)
  ) %>%
  autoplot(olympic_running)

Men800

olympic_running %>%
  filter(Sex=="men") %>%
  filter(Length=="800") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(
    new_data(olympic_running, 1) %>%
      mutate(Year = 2020)
  ) %>%
  autoplot(olympic_running)

The result shows that the forecasted winning time will less for for each event in 2020.the assumption is that whole events will have the same beahivour orrelatively similar.

Exercise 3.

An elasticity coefficient is the ratio of the percentage change in the forecast variable (y) to the percentage change in the predictor variable (x).Mathematically, the elasticity is defined as (dy/dx) X (x/y). Consider the log-log model.

Express y as a function of x and show that coefficient B1 is the elasticity coefficient.

### Exercise 4.

The data set souvenirs concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

####a.Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.

autoplot(souvenirs)

## Plot variable not specified, automatically selected `.vars = Sales`

There is a clear trend and seasonality, in terms of unexpected fluctuations, there is a fluctuation between 1993 and 1994 that seems to be unusual.

b.Explain why it is necessary to take logarithms of these data before fitting a model.

Variability increases with time, so applying a logarithmic transformation can give good results trying to control this variation and normalizing it

c.Fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.

souvenirs2<- ts(souvenirs$Sales,start=c(1987,1), end=c(1993,6), frequency=12)
log.souvenirs = log(souvenirs2)
dummy.fest = rep(0, length(souvenirs2))
dummy.fest[seq_along(dummy.fest)%%12 == 3] = 1
dummy.fest[3] = 0
dummy.fest = ts(dummy.fest, freq = 12, start=c(1987,1))
new.data = data.frame(log.souvenirs, dummy.fest)

fit = tslm(log.souvenirs ~ trend + season + dummy.fest, data=new.data)

fore.data = data.frame(dummy.fest = rep(0, 12))
fore.data[3,] = 1
forecast(fit, newdata=fore.data)

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jul 1993       9.883136  9.637329 10.128943  9.503919 10.262353
## Aug 1993       9.867772  9.621965 10.113579  9.488555 10.246989
## Sep 1993      10.531942 10.191069 10.872815 10.006062 11.057822
## Oct 1993      10.092605  9.846798 10.338412  9.713388 10.471822
## Nov 1993      10.585189 10.339382 10.830995 10.205971 10.964406
## Dec 1993      11.357556 11.111749 11.603363 10.978339 11.736773
## Jan 1994       9.430933  9.186093  9.675774  9.053207  9.808659
## Feb 1994       9.704370  9.459530  9.949210  9.326644 10.082096
## Mar 1994       9.695742  9.365756 10.025727  9.186658 10.204825
## Apr 1994       9.881046  9.636206 10.125887  9.503320 10.258772
## May 1994       9.928500  9.683659 10.173340  9.550774 10.306226
## Jun 1994       9.989861  9.745020 10.234701  9.612135 10.367587

d.Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

autoplot(fit$residuals)

The plot shows that the residuals are around 0, therefore the logarithmic transformation was a useful transformation.

e.Do boxplots of the residuals for each month. Does this reveal any problems with the model?

boxplot(residuals(fit)~cycle(residuals(fit)))

Yes, some months have more residual variance so the model is not fit to certain months, for example month 8 and 10.

f.What do the values of the coefficients tell you about each variable?

fit$coefficients

## (Intercept)       trend     season2     season3     season4     season5 
##   7.6662400   0.0207611   0.2526756   0.2232860   0.3878297   0.4145219 
##     season6     season7     season8     season9    season10    season11 
##   0.4551219   0.5767690   0.5406444   0.6296713   0.7239552   1.1957774 
##    season12  dummy.fest 
##   1.9473841   0.5543818

There is a positive relationship of all season and there are significant jumps in season 11 and 12.

g.What does the Ljung-Box test tell you about your model?

checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 31.535, df = 17, p-value = 0.01718

There is a p-value lower than 0.05 , which implies that the data can be modeled with a logarithm.

h.Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.

fore.data = data.frame(dummy.fest = rep(0, 36))
preds = forecast(fit, newdata=fore.data)
preds

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jul 1993       9.883136  9.637329 10.128943  9.503919 10.262353
## Aug 1993       9.867772  9.621965 10.113579  9.488555 10.246989
## Sep 1993       9.977560  9.731753 10.223367  9.598343 10.356777
## Oct 1993      10.092605  9.846798 10.338412  9.713388 10.471822
## Nov 1993      10.585189 10.339382 10.830995 10.205971 10.964406
## Dec 1993      11.357556 11.111749 11.603363 10.978339 11.736773
## Jan 1994       9.430933  9.186093  9.675774  9.053207  9.808659
## Feb 1994       9.704370  9.459530  9.949210  9.326644 10.082096
## Mar 1994       9.695742  9.365756 10.025727  9.186658 10.204825
## Apr 1994       9.881046  9.636206 10.125887  9.503320 10.258772
## May 1994       9.928500  9.683659 10.173340  9.550774 10.306226
## Jun 1994       9.989861  9.745020 10.234701  9.612135 10.367587
## Jul 1994      10.132269  9.883394 10.381144  9.748319 10.516219
## Aug 1994      10.116905  9.868031 10.365780  9.732955 10.500856
## Sep 1994      10.226693  9.977819 10.475568  9.842743 10.610644
## Oct 1994      10.341738 10.092864 10.590613  9.957788 10.725689
## Nov 1994      10.834322 10.585447 11.083197 10.450372 11.218272
## Dec 1994      11.606690 11.357815 11.855564 11.222739 11.990640
## Jan 1995       9.680067  9.431764  9.928369  9.296999 10.063134
## Feb 1995       9.953503  9.705201 10.201806  9.570436 10.336570
## Mar 1995       9.944875  9.610605 10.279144  9.429182 10.460567
## Apr 1995      10.130180  9.881877 10.378482  9.747112 10.513247
## May 1995      10.177633  9.929330 10.425935  9.794566 10.560700
## Jun 1995      10.238994  9.990691 10.487296  9.855927 10.622061
## Jul 1995      10.381402 10.128745 10.634059  9.991617 10.771188
## Aug 1995      10.366039 10.113381 10.618696  9.976253 10.755824
## Sep 1995      10.475827 10.223169 10.728484 10.086041 10.865612
## Oct 1995      10.590872 10.338214 10.843529 10.201086 10.980657
## Nov 1995      11.083455 10.830798 11.336112 10.693669 11.473240
## Dec 1995      11.855823 11.603165 12.108480 11.466037 12.245608
## Jan 1996       9.929200  9.676730 10.181669  9.539704 10.318696
## Feb 1996      10.202636  9.950167 10.455106  9.813141 10.592132
## Mar 1996      10.194008  9.854949 10.533067  9.670926 10.717090
## Apr 1996      10.379313 10.126843 10.631782  9.989817 10.768809
## May 1996      10.426766 10.174296 10.679236 10.037270 10.816262
## Jun 1996      10.488127 10.235658 10.740597 10.098631 10.877623

i.How could you improve these predictions by modifying the model?

As was presented, doing a logarithmic transformation. Because twe have to create a model that has the capacity to increase the variability over the time.

Exercise 5

The us_gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day”. Consider only the data to the end of 2004.

autoplot(us_gasoline)

## Plot variable not specified, automatically selected `.vars = Barrels`

head(us_gasoline)

## # A tsibble: 6 x 2 [1W]
##       Week Barrels
##     <week>   <dbl>
## 1 1991 W06    6.62
## 2 1991 W07    6.43
## 3 1991 W08    6.58
## 4 1991 W09    7.22
## 5 1991 W10    6.88
## 6 1991 W11    6.95

gas <- ts(us_gasoline$Barrels, start=(1991-1), end=(2017-1),frequency = 52)

a. Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.

#y <- ts(z, start=2003, frequency=12)
gas <- window(gas, end = 2005)
for(num in c(1, 2, 3, 5, 10)){
  var_name <- paste("fit",
                    as.character(num),
                    "_gasoline_2004",
                    sep = "")
  
  assign(var_name,
         tslm(gas ~ trend + fourier(
           gas,
           K = num
         ))
  )
  
  print(
    autoplot(gas) +
      autolayer(get(var_name)$fitted.values,
                series = as.character(num)) +
      ggtitle(var_name) +
      ylab("gasoline") +
      guides(colour = guide_legend(title = "Number of Fourier Transform pairs"))
  )
}

Using higher K values in the fourier transformationthe result displays that the lines are more fitted.

b.Select the appropriate number of Fourier terms to include by minimising the AICc or CV value

As the book says we have to minimize AICc and CV so we will calculate it.

for(i in c(1, 2, 3, 5, 10)){
  fit_gasoline_2004.name <- paste(
    "fit", as.character(i), "_gasoline_2004",
    sep = ""
  )
  writeLines(
    paste(
      "\n", fit_gasoline_2004.name, "\n"
    )
  )
  print(CV(get(fit_gasoline_2004.name)))
}

## 
##  fit1_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  8.012058e-02 -1.969129e+03 -1.969052e+03 -1.945827e+03  8.427750e-01 
## 
##  fit2_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.919464e-02 -1.978217e+03 -1.978072e+03 -1.945593e+03  8.449887e-01 
## 
##  fit3_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.470266e-02 -2.023838e+03 -2.023604e+03 -1.981892e+03  8.541546e-01 
## 
##  fit5_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.365185e-02 -2.034993e+03 -2.034518e+03 -1.974405e+03  8.569479e-01 
## 
##  fit10_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.173572e-02 -2.056054e+03 -2.054595e+03 -1.948861e+03  8.624864e-01

Fourier K value of 10 minimize CV and AICc.

c. Plot the residuals of the final model using the gg_tsresiduals() function and comment on these. Use a Ljung-Box test to check for residual autocorrelation.

checkresiduals(fit10_gasoline_2004)

## 
##  Breusch-Godfrey test for serial correlation of order up to 104
## 
## data:  Residuals from Linear regression model
## LM test = 128.54, df = 104, p-value = 0.05168

d.Generate forecasts for the next year of data and plot these along with the actual data for 2005. Comment on the forecasts

Using the best model we forecast the next data

fc_gasoline_2005 <- forecast(
  fit10_gasoline_2004,
  newdata=data.frame(fourier(
    gas, K = 10, h = 52)
  )
)
fc_gasoline_2005

##          Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 2005.019       8.901276 8.557367  9.245186 8.374931  9.427622
## 2005.038       8.940132 8.596126  9.284138 8.413638  9.466626
## 2005.058       8.982049 8.638038  9.326060 8.455548  9.508551
## 2005.077       9.056949 8.712952  9.400946 8.530469  9.583429
## 2005.096       9.143925 8.799930  9.487920 8.617448  9.670401
## 2005.115       9.192581 8.848582  9.536579 8.666099  9.719062
## 2005.135       9.185356 8.841349  9.529363 8.658862  9.711851
## 2005.154       9.160913 8.816906  9.504919 8.634419  9.687406
## 2005.173       9.169559 8.825559  9.513560 8.643075  9.696044
## 2005.192       9.216866 8.872868  9.560865 8.690384  9.743348
## 2005.212       9.264222 8.920221  9.608223 8.737736  9.790708
## 2005.231       9.281951 8.937946  9.625957 8.755458  9.808444
## 2005.250       9.285826 8.941821  9.629831 8.759334  9.812318
## 2005.269       9.312227 8.968226  9.656228 8.785741  9.838713
## 2005.288       9.367159 9.023159  9.711159 8.840675  9.893643
## 2005.308       9.416957 9.072955  9.760958 8.890470  9.943444
## 2005.327       9.433266 9.089261  9.777272 8.906774  9.959758
## 2005.346       9.433725 9.089720  9.777729 8.907234  9.960216
## 2005.365       9.463951 9.119950  9.807952 8.937465  9.990437
## 2005.385       9.540677 9.196677  9.884678 9.014193 10.067162
## 2005.404       9.625983 9.281981  9.969985 9.099496 10.152471
## 2005.423       9.665636 9.321631 10.009640 9.139144 10.192127
## 2005.442       9.647059 9.303055  9.991063 9.120569 10.173550
## 2005.462       9.610131 9.266130  9.954132 9.083645 10.136617
## 2005.481       9.601794 9.257794  9.945795 9.075310 10.128279
## 2005.500       9.629756 9.285754  9.973759 9.103268 10.156244
## 2005.519       9.664433 9.320428 10.008438 9.137941 10.190924
## 2005.538       9.676302 9.332298 10.020306 9.149812 10.202792
## 2005.558       9.658713 9.314712 10.002714 9.132227 10.185199
## 2005.577       9.615773 9.271772  9.959773 9.089288 10.142258
## 2005.596       9.544821 9.200818  9.888823 9.018333 10.071309
## 2005.615       9.445303 9.101298  9.789309 8.918811  9.971795
## 2005.635       9.341433 8.997429  9.685437 8.814943  9.867923
## 2005.654       9.279412 8.935411  9.623412 8.752926  9.805897
## 2005.673       9.290336 8.946336  9.634336 8.763851  9.816820
## 2005.692       9.357646 9.013643  9.701648 8.831157  9.884134
## 2005.712       9.428370 9.084364  9.772376 8.901877  9.954863
## 2005.731       9.457460 9.113457  9.801464 8.930971  9.983950
## 2005.750       9.437719 9.093720  9.781719 8.911235  9.964203
## 2005.769       9.389996 9.045997  9.733996 8.863513  9.916480
## 2005.788       9.337513 8.993510  9.681516 8.811024  9.864001
## 2005.808       9.298876 8.954869  9.642883 8.772382  9.825371
## 2005.827       9.296233 8.952230  9.640237 8.769744  9.822723
## 2005.846       9.346559 9.002561  9.690557 8.820078  9.873040
## 2005.865       9.430551 9.086553  9.774549 8.904070  9.957032
## 2005.885       9.480414 9.136411  9.824418 8.953925 10.006904
## 2005.904       9.423219 9.079208  9.767229 8.896719  9.949719
## 2005.923       9.251762 8.907762  9.595762 8.725278  9.778246
## 2005.942       9.047458 8.703470  9.391446 8.520993  9.573924
## 2005.962       8.918927 8.574940  9.262915 8.392462  9.445392
## 2005.981       8.911344 8.567381  9.255307 8.384917  9.437771
## 2006.000       8.978870 8.634891  9.322849 8.452418  9.505323

We see that the gasolin is increasing but we are not considering that periods that gasoline is affected by geopolitical problems.

Exercise 6

The annual population of Afghanistan is available in the global_economy data set.

a. Plot the data and comment on its features. Can you observe the effect of the Soviet-Afghan war?

global_economy %>%
  filter(Country=="Afghanistan") %>%
  tsibble(key = Code, index = Year)%>%
  autoplot(Population, show.legend =  FALSE) +
  labs(title= "Afghanistan Population",
       y = "Population")

There is a valley produced by the Soviet-Afghan war in the total population in Afghanistan.

b. Fit a linear trend model and compare this to a piecewise linear trend model with knots at 1980 and 1989.

global_economy %>%
  filter(Country=="Afghanistan")%>%
  model(TSLM(Population ~ Year)) %>%
  report()

## Series: Population 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -5794518 -2582559   744761  2259222  6036280 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -829292529   49730866  -16.68   <2e-16 ***
## Year            425774      25008   17.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3188000 on 56 degrees of freedom
## Multiple R-squared: 0.8381,  Adjusted R-squared: 0.8352
## F-statistic: 289.9 on 1 and 56 DF, p-value: < 2.22e-16

global_economy %>%
  filter(Country=="Afghanistan")%>%
  filter(Year<1980)%>%
  model(TSLM(Population ~ Year)) %>%
  report()

## Series: Population 
## Model: TSLM 
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -146380.8 -110290.6    -451.2  105877.8  202881.3 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -470734527    8787062  -53.57   <2e-16 ***
## Year            244657       4462   54.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 115100 on 18 degrees of freedom
## Multiple R-squared: 0.994,   Adjusted R-squared: 0.9937
## F-statistic:  3007 on 1 and 18 DF, p-value: < 2.22e-16

global_economy %>%
  filter(Country=="Afghanistan")%>%
  filter(Year>1989)%>%
  model(TSLM(Population ~ Year)) %>%
  report()

## Series: Population 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -619234 -212927    6598  234280  612277 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.670e+09  1.640e+07  -101.8   <2e-16 ***
## Year         8.451e+05  8.184e+03   103.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 349800 on 26 degrees of freedom
## Multiple R-squared: 0.9976,  Adjusted R-squared: 0.9975
## F-statistic: 1.066e+04 on 1 and 26 DF, p-value: < 2.22e-16

The reports display that the coefficient of the second model is higher.

c. Generate forecasts from these two models for the five years after the end of the data, and comment on the results.

m1<-global_economy %>%
  filter(Country=="Afghanistan")%>%
  model(TSLM(Population ~ Year))

m2<-global_economy %>%
  filter(Country=="Afghanistan")%>%
  filter(Year>1989)%>%
  model(TSLM(Population ~ Year))

forecast(m1, h=5)

## # A fable: 5 x 5 [1Y]
## # Key:     Country, .model [1]
##   Country     .model                   Year          Population     .mean
##   <fct>       <chr>                   <dbl>              <dist>     <dbl>
## 1 Afghanistan TSLM(Population ~ Year)  2018   N(3e+07, 1.1e+13) 29919575.
## 2 Afghanistan TSLM(Population ~ Year)  2019   N(3e+07, 1.1e+13) 30345349.
## 3 Afghanistan TSLM(Population ~ Year)  2020 N(3.1e+07, 1.1e+13) 30771123.
## 4 Afghanistan TSLM(Population ~ Year)  2021 N(3.1e+07, 1.1e+13) 31196897.
## 5 Afghanistan TSLM(Population ~ Year)  2022 N(3.2e+07, 1.1e+13) 31622671.

forecast(m2, h=5)

## # A fable: 5 x 5 [1Y]
## # Key:     Country, .model [1]
##   Country     .model                   Year          Population     .mean
##   <fct>       <chr>                   <dbl>              <dist>     <dbl>
## 1 Afghanistan TSLM(Population ~ Year)  2018 N(3.6e+07, 1.4e+11) 35925602.
## 2 Afghanistan TSLM(Population ~ Year)  2019 N(3.7e+07, 1.4e+11) 36770747.
## 3 Afghanistan TSLM(Population ~ Year)  2020 N(3.8e+07, 1.4e+11) 37615892.
## 4 Afghanistan TSLM(Population ~ Year)  2021 N(3.8e+07, 1.5e+11) 38461037.
## 5 Afghanistan TSLM(Population ~ Year)  2022 N(3.9e+07, 1.5e+11) 39306182.

With the second model, where we did not consider this data that was part of the valley caused by the Soviet-Afghan war, we saw that the mean forecast population is higher. Therefore we can conclude that ignoring the data of certain events can improve our model.

Assignment: Homework Hyndman Chapter 3 & 7 Exercises

Miguel Angel Méndez Hernández

27/3/2022

Exercises Chapter 3

Exercise 1

Exercise 2

United States GDP from global_economy

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock

Victorian Electricity Demand from vic_elec.

Gas production from aus_production.

Exercise 3

Exercise 4

Exercise 5

Tobacco

Economy class passengers

Pedestrian counts

Exercise 6

Exercise 7

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

b. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

c. Do the results support the graphical interpretation from part a?

d. Compute and plot the seasonally adjusted data.

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Exercise 8.

Multiplicative decomposition.

Exercise 9.

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

b. Is the recession of 1991/1992 visible in the estimated components?

Exercise 10.

a. Plot the data using autoplot(), gg_subseries() and gg_season() to look at the effect of the changing seasonality over time.

b. Do an STL decomposition of the data. You will need to choose a seasonal window to allow for the changing shape of the seasonal component.

c. How does the seasonal shape change over time? [Hint: Try plotting the seasonal component using gg_season().]

d.Can you produce a plausible seasonally adjusted series?

e.Compare the results with those obtained using SEATS and X-11. How are they different?

Exercises Chapter 7

Exercise 1

a.Plot the data and find the regression model for Demand with temperature as a predictor variable. Why is there a positive relationship

b.Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?.

c.Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was 15C and compare it with the forecast if the with maximum temperature was 35C. Do you believe these forecasts? The following R code will get you started:

d. Give prediction intervals for your forecasts.

e. Plot Demand vs Temperature for all of the available data in vic_elec aggregated to daily total demand and maximum temperature. What does this say about your model?

Exercise 2

a. Plot the winning time against the year for each event. Describe the main features of the plot

b. Fit a regression line to the data for each event. Obviously the winning times have been decreasing, but at what average rate per year?

c.Plot the residuals against the year. What does this indicate about the suitability of the fitted lines?

d.Predict the winning time for each race in the 2020 Olympics. Give a prediction interval for your forecasts. What assumptions have you made in these calculations?

Exercise 3.

b.Explain why it is necessary to take logarithms of these data before fitting a model.

c.Fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.

d.Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

e.Do boxplots of the residuals for each month. Does this reveal any problems with the model?

f.What do the values of the coefficients tell you about each variable?

g.What does the Ljung-Box test tell you about your model?

h.Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.

i.How could you improve these predictions by modifying the model?

Exercise 5

a. Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.

b.Select the appropriate number of Fourier terms to include by minimising the AICc or CV value

c. Plot the residuals of the final model using the gg_tsresiduals() function and comment on these. Use a Ljung-Box test to check for residual autocorrelation.

d.Generate forecasts for the next year of data and plot these along with the actual data for 2005. Comment on the forecasts

Exercise 6

a. Plot the data and comment on its features. Can you observe the effect of the Soviet-Afghan war?

b. Fit a linear trend model and compare this to a piecewise linear trend model with knots at 1980 and 1989.

c. Generate forecasts from these two models for the five years after the end of the data, and comment on the results.