W2 HW1

When i try to knit it, it always says that can’t find the package, but i can operate it well in R. So i think it’s better for me to submit the rmd file. Sorry for the inconvenience. Although this week’s assignment really drives me crazy, I tried my best and never give up.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(fable)

## Loading required package: fabletools

library(tsibble)

## 
## Attaching package: 'tsibble'

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(feasts)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:tsibble':
## 
##     interval

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(tsibbledata)
library(fpp3)
library(ggplot2)
library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

##Chapter 3 Exercise

##Q1:

#Plot the GDP per capita for each country over time
global_economy %>%
  autoplot(GDP/Population, show.legend =  FALSE) +
  labs(title= "GDP per capita of each country", y = "$US")

## Warning: Removed 3242 row(s) containing missing values (geom_path).

#Which country has the highest GDP per capita?
global_economy %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
  select(Country, GDP_per_capita)

## # A tsibble: 1 x 3 [1Y]
## # Key:       Country [1]
##   Country GDP_per_capita  Year
##   <fct>            <dbl> <dbl>
## 1 Monaco         185153.  2014

#How Monaco's GDP per capita changed over time?
global_economy %>%
  filter(Country == "Monaco") %>%
  autoplot(GDP/Population) +
  labs(title= "Monaco's GDP per capita", y = "$US")

## Warning: Removed 11 row(s) containing missing values (geom_path).

The overall per capita GDP of Monaco has been on an oscillating upward trend, with the fastest growth rate in the decade between 2000 and 2010, and a slight decline around 2008 due to the financial crisis.

##Q2:

#United States GDP from global_economy.
global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP / 10 ^ 12) +
  labs(title= "GDP, United States", y = "$US (in trillions)")

I transformed the GDP in terms of trillions to make it more readable, but there is no logarithms or power transformations here. I don’t think we need to do population adjustment in this case， because I don’t think the U.S. population is exploding or decreasing, population has very little impact on per capita gdp.

#Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers",
         State == "Victoria") %>%
  autoplot(Count) +
  labs(title= "Slaughter of Victoria Bulls, Bullocks, and Steers")

There is no transformations here.The picture shows an overall oscillating downward trend，with the greatest decline starting in January 1980, probably due to the plague in animals,or a shift in people’s eating habits and religious beliefs.

#Victorian Electricity Demand from vic_elec.
Victorian <- vic_elec %>%
  group_by(Date) %>%
  mutate(Demand = sum(Demand)) %>%
  distinct(Date, Demand)

Victorian %>% 
  as_tsibble(index = Date) %>%
  autoplot(Demand) +
  labs(title= "Daily Victorian Electricity Demand", y = "$US (in trillions)")

Victorian %>%
  mutate(Date = yearmonth(Date)) %>%
  group_by(Date) %>%
  summarise(Demand = sum(Demand)) %>%
  as_tsibble(index = Date) %>%
  autoplot(Demand) +
  labs(title= "Monthly Victorian Electricity Demand", y = "$US (in trillions)")

I transformed the electric demand into daily and monthly to better observe the seasonal pattern.The picture shows a very clear seasonal demand, with a significant increase in the demand for electricity during the winter and summer months.

#Gas production from aus_production.
aus_production %>%
  autoplot(Gas) +
  labs(title = "Non-Transformed Gas Production")

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "", title = latex2exp::TeX(paste0("Transformed Gas Production with $\\lambda$ = ",
                                  round(lambda,2))))

I make graphs of both transformed and non-transformed here, I think the transformed diagram is easier to observe seasonality and more readable.

##Q3

canadian_gas %>%
  autoplot(Volume) +
  labs(title = "Non-Transformed Gas Production")

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "", title = latex2exp::TeX(paste0("Transformed Gas Production with $\\lambda$ = ",
                                  round(lambda,2))))

I think the reason why is a Box-Cox transformation unhelpful for the canadian_gas data is that the transformed didn’t make all variables included.The gas production only has incresing variable but lake of decreasing variable.

##Q4

set.seed(1234)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

autoplot(myseries, Turnover)+
  labs(title = "Retail Turnover", y = "$AUD (in millions)")

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "", title = latex2exp::TeX(paste0("Transformed Retail Turnover with $\\lambda$ = ",
                                  round(lambda,2))))

To sum up, λ = 0.27 would bea suitable Box-Cox transformations.

##Q5

autoplot(aus_production, Tobacco)+
  labs(title = "Tobacco Production in AUS")

## Warning: Removed 24 row(s) containing missing values (geom_path).

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "", title = latex2exp::TeX(paste0("Transformed Tobacco Production with $\\lambda$ = ",
                                  round(lambda,2))))

## Warning: Removed 24 row(s) containing missing values (geom_path).

The appropriate Box-Cox transformation about tobacco is λ = 0.93. The transformed data will shifted downwards with no little changes in the shape of the time series. The Box-Cox transformation is not effective on the tobacco production data.

mel_syd <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")

autoplot(mel_syd, Passengers)+
  labs(title = "Economy class Passengers Between Melbourne and Sydney")

lambda <- mel_syd %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

mel_syd %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "", title = latex2exp::TeX(paste0("Transformed Number of Passengers with $\\lambda$ = ",
                                  round(lambda,2))))

The appropriate Box-Cox transformation about Economy class passengers between Melbourne and Sydney is λ = 2. It shows a transformation of Passengers^2, this makes the plot easier to be observed.

southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") 

southern_cross <- southern_cross %>%
  mutate(Week = yearweek(Date)) %>%
  index_by(Week) %>%
  summarise(Count = sum(Count))

autoplot(southern_cross, Count)+
  labs(title = "Weekly Pedestrian Counts at Southern Cross Station")

lambda <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "", title = latex2exp::TeX(paste0("Transformed Weekly Pedestrian Counts with $\\lambda$ = ",
                                  round(lambda,2))))

The appropriate weekly Box-Cox transformation about Pedestrian is λ = 0.8.The transformed data will shifted downwards with no little changes in the shape of the time series.

##Q6

3*5MA= [(Y1+Y2+…+Y5)/5 + (Y2+…+Y6)/5 + (Y3+…+Y7)/5]/3 = (Y1+ 2Y2 + 3Y3 + 3Y4 +3Y5 +2Y6+Y7)/15

##Q7

#a)
gas <- tail(aus_production, 5*4) %>% select(Gas)
autoplot(gas, Gas)

I think it has an increasing trend.It rises in the first two quarters of the year and falls in the second two quarters peaking in the middle of the year.

#b)
gas_dcmp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) 

components(gas_dcmp) %>%
  autoplot() +
  labs(title = "Classical Multiplicative Decomposition of Gas Production")

## Warning: Removed 2 row(s) containing missing values (geom_path).

YES. The overall trend is increasing. In seasonal part, it increasing in the first two quarters and decreasing in the third quarter. The peak is achieved in the third quarter.

#d)
components(gas_dcmp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "pink") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Gas Production")

#e)
gas %>%
  mutate(Gas = ifelse(Gas == 249, Gas + 400, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "pink") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Gas Production with an Outlier")

As we can see from the plot, in the middle of 2018s, quarter 3 become an outlier.After adding 400, the original shape was changed and now the image obtained is no longer as regular as the original.

#f)
gas %>%
  mutate(Gas = ifelse(Gas == 236, Gas + 400, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "pink") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(title = "Seasonally Adjusted Gas Production with an Outlier at the End")

Whether in the middle or at the end, the picture obtained is not as obvious as the original pattern, and the outliers will still stand out and change the original form.

##Q8

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()

autoplot(x11_dcmp) +
  labs(title = "Decomposition of Retail Turnover using X-11.")

From the images obtained, the trend lines move more intensively, some small movements are less observable in the previous model, and it is easier to capture the seasonality in this model.

##Q9

There is a very clear upward trend in the Australian labor force as a whole in the graph, with a very regular pattern of seasonality. The Great Depression of 1933 led to very low values, which are also very clearly reflected in the second graph. Compared with the scale of remainder, the scale of seasonality is much more small in the graph.
YES. In the second graph, there is a sharp decrease around 1991/1992.

##Q10

#a)
autoplot(canadian_gas) +
  labs( x = "Month",
         y = "Gas Production (billions of cubic metres)",
         title = "Canadian Gas Production (Jan 1960 - Feb 2005)" )

## Plot variable not specified, automatically selected `.vars = Volume`

gg_subseries(canadian_gas) +
  labs( x = "Month",
        y = "Gas Production (billions of cubic metres)")

## Plot variable not specified, automatically selected `y = Volume`

gg_season(canadian_gas)+
  labs( x = "Month",
        y = "Gas Production (billions of cubic metres)")

## Plot variable not specified, automatically selected `y = Volume`

Based on the plotted graphs, it can be seen that there is a clear increasing trend and seasonality in the production of natural gas. Prior to 1989, natural gas production generally showed an oscillating downward trend in the first three quarters of the year and an upward trend in the fourth quarter. However, after 1999, the demand for natural gas increased and the difference in demand between seasons was less than the difference between seasons prior to 1989.

#b)
stl_dcmp <- stl(canadian_gas, s.window = 13, robust = TRUE)
autoplot(stl_dcmp)+labs( x = "Month",
                         y = "STL Decomposition of Canadian Gas Production")

I try to autoplot the stl_dcmp directly, but it says autoplot doesn’t support that.Then I tried to change the stl_dcmp into ts type, but it says doesn’t support too. I’m confused by these.

#c)
canadian_gas %>% 
  gg_season(Volume)

From the graph, we can see that as the times change, the gap between the peak and trough of demand in each quarter of the year is narrowing, but the basic characteristics of increasing demand in a particular season remain the same.

#d)
plauseasonally <- canadian_gas %>%
                 model (stl = STL(Volume)) %>%
                 components()

plauseasonally %>%
  as_tsibble() %>%
  autoplot (Volume, color = "pink")+
            geom_line (aes (y=season_adjust) ,color = "blue") +
            labs (title = " Total Volume of Canadian gas")

#e)
#x11
x11_dcmp <- canadian_gas %>%
  model(x11 = X_13ARIMA_SEATS(Volume~x11())) %>%
  components()
autoplot(x11_dcmp)+labs(title = "Decomposition of canadian gas using x11")

#seats
x11_dcmp <- canadian_gas %>%
  model(seats = X_13ARIMA_SEATS(Volume~seats())) %>%
  components()
autoplot(x11_dcmp)+labs(title = "Decomposition of canadian gas using SEATS")

Compared with the first graph, we can see that the second graph has more variables.From the graphs we know that the seasonal component and the residual component have a mean of 1 and the changes between the two correspond to each other.

##Chapter 7 Exercise

##Q1

jan14_vic_elec <- vic_elec %>%
  filter(yearmonth(Time) == yearmonth("2014 Jan")) %>%
  index_by(Date = as_date(Time)) %>%
  summarise(
    Demand = sum(Demand),
    Temperature = max(Temperature)
  )

#a)
jan14_vic_elec %>%
  gather("Variable", "Value", Demand, Temperature) %>%
  ggplot(aes(x = Date, y = Value)) +
  geom_line() +
  facet_grid(vars(Variable), scales = "free_y")

jan14_vic_elec %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

As can be seen from the picture, as the temperature increases, people’s demand for electricity also increases. This is because the increase in temperature leads to an increase in demand for high-powered appliances such as air conditioners and refrigerators.

#b)
fit <- jan14_vic_elec %>% model(TSLM(Demand ~ Temperature))
fit %>% gg_tsresiduals()

It is obvious that there is no coorelation in this model and I think it’s adequate. There are outliers, there are some points that are super high or super low.

#c)
jan14_vic_elec %>%
  model(TSLM(Demand ~ Temperature)) %>%
  forecast(
    new_data(jan14_vic_elec, 1) %>%
      mutate(Temperature = 15)
  ) %>%
  autoplot(jan14_vic_elec)

Although the resulting growth trend is not continuous, I believe the prediction considering the cold and hot temperature changes just after spring.

#d)
fit <- lm(Demand ~ Temperature, data=jan14_vic_elec)

forecast(fit, newdata=data.frame(Temperature=c(15,35)))

##           Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 151398.35       151398.4 117127.2 185669.5  97951.22 204845.5
## 274484.25       274484.2 241333.0 307635.5 222783.69 326184.8

#e)
vic_elec %>%
  as.data.frame() %>%
  ggplot(aes(x = Temperature, y = Demand)) +
  labs(x = "Temperature",
       y ="Electricity Demand" )+
  geom_point()

When the weather becomes colder or warmer, people’s demand for electricity will increase, with increased demand for electricity for heating equipment in winter and for cooling appliances in summer.

##Q2

#a)
autoplot(olympic_running)+
  labs(y = "Time")

## Plot variable not specified, automatically selected `.vars = Time`

From the graph we can see that there is a clear break in two time periods, which should be time two world wars canceled the competition.There has been a significant increase in the number of female athletes since 1950.The winning time has an obvious decreasing trend.

#b)
fit.olympic_running <-TSLM(olympic_running ~ trend)
fit.olympic_running

## <TSLM model definition>

autoplot(olympic_running)+
  labs(y = "Time")+
  geom_smooth(method="lm", se=FALSE)

## Plot variable not specified, automatically selected `.vars = Time`
## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 31 rows containing non-finite values (stat_smooth).

I choose to show part of women’s residuals data against year here. Below we can see that the fitted line has a high applicability and fits the actual data.

#c)
fit <- olympic_running %>%
  filter(Sex=="women") %>%
  filter(Length=="100") %>%
  model(TSLM(Time ~ Year))
fit %>% gg_tsresiduals()

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing non-finite values (stat_bin).

#c)
fit <- olympic_running %>%
  filter(Sex=="women") %>%
  filter(Length=="400") %>%
  model(TSLM(Time ~ Year))
fit %>% gg_tsresiduals()

#c)
fit <- olympic_running %>%
  filter(Sex=="women") %>%
  filter(Length=="800") %>%
  model(TSLM(Time ~ Year))
fit %>% gg_tsresiduals()

## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing non-finite values (stat_bin).

#c)
fit <- olympic_running %>%
  filter(Sex=="women") %>%
  filter(Length=="1500") %>%
  model(TSLM(Time ~ Year))
fit %>% gg_tsresiduals()

#d
olympic_running %>%
  filter(Sex=="men") %>%
  filter(Length=="100") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(
    new_data(olympic_running, 1) %>%
      mutate(Year = 2020)
  ) %>%
  autoplot(olympic_running)

#d
olympic_running %>%
  filter(Sex=="women") %>%
  filter(Length=="100") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(
    new_data(olympic_running, 1) %>%
      mutate(Year = 2020)
  ) %>%
  autoplot(olympic_running)

My assumption here is that all internal and external factors remain the same, including training patterns, athlete quality, etc.

##Q3

log(y) = β0 + β1log(x) + e β1log(x) = log(y) - β0 - e β1 = (log(y) - β0 - e) / log(x) The coefficient β1 is the elasticity coefficient because it shows the ratio of the percentage change in the predictor variable (y) to the change in the predictor variable (X).For each change in the predictor variable (X), there is a percentage change in the predictor variable (Y) in terms of β1.

##Q4

#a)
autoplot(souvenirs, main="Sales Volume over Time", ylab="Sales Volume")

## Plot variable not specified, automatically selected `.vars = Sales`

## Warning: Ignoring unknown parameters: main, ylab

gg_season(souvenirs, main="Seasonal Sales Volume over Time", ylab="Sales Volume")

## Plot variable not specified, automatically selected `y = Sales`

## Warning: Ignoring unknown parameters: main, ylab

From the two graphs we can see that the number of sales increases year by year and has a clear seasonal trend. There is a general decline in the second quarter of each year and a rapid rise in the fourth quarter of each year, with the fastest growth in sales volume throughout the year.

Because the variance of the residuals may not be constant and lead to issues in the regression.

#c)
souvenirsf<- ts(souvenirs$Sales,start=c(1987,1), end=c(1993,6), frequency=12)
log.souvenirs = log(souvenirsf)
dummy.fest = rep(0, length(souvenirsf))
dummy.fest[seq_along(dummy.fest)%%12 == 3] = 1
dummy.fest[3] = 0
dummy.fest = ts(dummy.fest, freq = 12, start=c(1987,1))
new.data = data.frame(log.souvenirs, dummy.fest)

fit = tslm(log.souvenirs ~ trend + season + dummy.fest, data=new.data)

fore.data = data.frame(dummy.fest = rep(0, 12))
fore.data[3,] = 1
forecast(fit, newdata=fore.data)

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jul 1993       9.883136  9.637329 10.128943  9.503919 10.262353
## Aug 1993       9.867772  9.621965 10.113579  9.488555 10.246989
## Sep 1993      10.531942 10.191069 10.872815 10.006062 11.057822
## Oct 1993      10.092605  9.846798 10.338412  9.713388 10.471822
## Nov 1993      10.585189 10.339382 10.830995 10.205971 10.964406
## Dec 1993      11.357556 11.111749 11.603363 10.978339 11.736773
## Jan 1994       9.430933  9.186093  9.675774  9.053207  9.808659
## Feb 1994       9.704370  9.459530  9.949210  9.326644 10.082096
## Mar 1994       9.695742  9.365756 10.025727  9.186658 10.204825
## Apr 1994       9.881046  9.636206 10.125887  9.503320 10.258772
## May 1994       9.928500  9.683659 10.173340  9.550774 10.306226
## Jun 1994       9.989861  9.745020 10.234701  9.612135 10.367587

#d)
autoplot(fit$residuals) + labs(x = "Year", y ="log(souvenirs)", title = "Residuals"  )

From the graph we can see that although the line fluctuates up and down, it always fluctuates below the level of 0, from which we can see that the log transformation is working.

#e)
boxplot(residuals(fit)~cycle(residuals(fit)))

It appears that this model does not work for every month, and there will be excess surplus in some months.

#f)
fit$coefficients

## (Intercept)       trend     season2     season3     season4     season5 
##   7.6662400   0.0207611   0.2526756   0.2232860   0.3878297   0.4145219 
##     season6     season7     season8     season9    season10    season11 
##   0.4551219   0.5767690   0.5406444   0.6296713   0.7239552   1.1957774 
##    season12  dummy.fest 
##   1.9473841   0.5543818

As can be seen from the table, the correlation data in the SEASON section shows an upward trend, indicating that the more recent months have a heavier proportion of data.

#g)
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 31.535, df = 17, p-value = 0.01718

From the figure we can see that the auto-correlation between the variables is not high and the distribution of the residuals mostly conforms to a normal distribution.

#h)
fore.data = data.frame(dummy.fest = rep(0, 36))
preds = forecast(fit, newdata=fore.data)
preds

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jul 1993       9.883136  9.637329 10.128943  9.503919 10.262353
## Aug 1993       9.867772  9.621965 10.113579  9.488555 10.246989
## Sep 1993       9.977560  9.731753 10.223367  9.598343 10.356777
## Oct 1993      10.092605  9.846798 10.338412  9.713388 10.471822
## Nov 1993      10.585189 10.339382 10.830995 10.205971 10.964406
## Dec 1993      11.357556 11.111749 11.603363 10.978339 11.736773
## Jan 1994       9.430933  9.186093  9.675774  9.053207  9.808659
## Feb 1994       9.704370  9.459530  9.949210  9.326644 10.082096
## Mar 1994       9.695742  9.365756 10.025727  9.186658 10.204825
## Apr 1994       9.881046  9.636206 10.125887  9.503320 10.258772
## May 1994       9.928500  9.683659 10.173340  9.550774 10.306226
## Jun 1994       9.989861  9.745020 10.234701  9.612135 10.367587
## Jul 1994      10.132269  9.883394 10.381144  9.748319 10.516219
## Aug 1994      10.116905  9.868031 10.365780  9.732955 10.500856
## Sep 1994      10.226693  9.977819 10.475568  9.842743 10.610644
## Oct 1994      10.341738 10.092864 10.590613  9.957788 10.725689
## Nov 1994      10.834322 10.585447 11.083197 10.450372 11.218272
## Dec 1994      11.606690 11.357815 11.855564 11.222739 11.990640
## Jan 1995       9.680067  9.431764  9.928369  9.296999 10.063134
## Feb 1995       9.953503  9.705201 10.201806  9.570436 10.336570
## Mar 1995       9.944875  9.610605 10.279144  9.429182 10.460567
## Apr 1995      10.130180  9.881877 10.378482  9.747112 10.513247
## May 1995      10.177633  9.929330 10.425935  9.794566 10.560700
## Jun 1995      10.238994  9.990691 10.487296  9.855927 10.622061
## Jul 1995      10.381402 10.128745 10.634059  9.991617 10.771188
## Aug 1995      10.366039 10.113381 10.618696  9.976253 10.755824
## Sep 1995      10.475827 10.223169 10.728484 10.086041 10.865612
## Oct 1995      10.590872 10.338214 10.843529 10.201086 10.980657
## Nov 1995      11.083455 10.830798 11.336112 10.693669 11.473240
## Dec 1995      11.855823 11.603165 12.108480 11.466037 12.245608
## Jan 1996       9.929200  9.676730 10.181669  9.539704 10.318696
## Feb 1996      10.202636  9.950167 10.455106  9.813141 10.592132
## Mar 1996      10.194008  9.854949 10.533067  9.670926 10.717090
## Apr 1996      10.379313 10.126843 10.631782  9.989817 10.768809
## May 1996      10.426766 10.174296 10.679236 10.037270 10.816262
## Jun 1996      10.488127 10.235658 10.740597 10.098631 10.877623

I think doing some proper transformations on the data might make the model improve.

##Q5

#a)
gas <- ts(us_gasoline$Barrels, start=(1991-1), end=(2017-1),frequency = 52)
gas <- window(gas, end = 2005)
for(num in c(1, 2, 3, 5, 10)){
  var_name <- paste("fit",
                    as.character(num),
                    "_gasoline_2004",
                    sep = "")
  
  assign(var_name,
         tslm(gas ~ trend + fourier(
           gas,
           K = num
         ))
  )
  
  print(
    autoplot(gas) +
      autolayer(get(var_name)$fitted.values,
                series = as.character(num)) +
      ggtitle(var_name) +
      ylab("gasoline") +
      guides(colour = guide_legend(title = "Number of Fourier Transform pairs"))
  )
}

From the figure, we can observe that the larger the value of k, the better the model is drawn to fit the actual data.

#b)
for(i in c(1, 2, 3, 5, 10)){
  fit_gasoline_2004.name <- paste(
    "fit", as.character(i), "_gasoline_2004",
    sep = ""
  )
  writeLines(
    paste(
      "\n", fit_gasoline_2004.name, "\n"
    )
  )
  print(CV(get(fit_gasoline_2004.name)))
}

## 
##  fit1_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  8.012058e-02 -1.969129e+03 -1.969052e+03 -1.945827e+03  8.427750e-01 
## 
##  fit2_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.919464e-02 -1.978217e+03 -1.978072e+03 -1.945593e+03  8.449887e-01 
## 
##  fit3_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.470266e-02 -2.023838e+03 -2.023604e+03 -1.981892e+03  8.541546e-01 
## 
##  fit5_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.365185e-02 -2.034993e+03 -2.034518e+03 -1.974405e+03  8.569479e-01 
## 
##  fit10_gasoline_2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.173572e-02 -2.056054e+03 -2.054595e+03 -1.948861e+03  8.624864e-01

When k is equal to 10, there is the minimum AICc or CV value

#c)
checkresiduals(fit10_gasoline_2004)

## 
##  Breusch-Godfrey test for serial correlation of order up to 104
## 
## data:  Residuals from Linear regression model
## LM test = 128.54, df = 104, p-value = 0.05168

#d)
fc_gasoline_2005 <- forecast(
  fit10_gasoline_2004,
  newdata=data.frame(fourier(
    gas, K = 10, h = 52)
  )
)
fc_gasoline_2005

##          Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 2005.019       8.901276 8.557367  9.245186 8.374931  9.427622
## 2005.038       8.940132 8.596126  9.284138 8.413638  9.466626
## 2005.058       8.982049 8.638038  9.326060 8.455548  9.508551
## 2005.077       9.056949 8.712952  9.400946 8.530469  9.583429
## 2005.096       9.143925 8.799930  9.487920 8.617448  9.670401
## 2005.115       9.192581 8.848582  9.536579 8.666099  9.719062
## 2005.135       9.185356 8.841349  9.529363 8.658862  9.711851
## 2005.154       9.160913 8.816906  9.504919 8.634419  9.687406
## 2005.173       9.169559 8.825559  9.513560 8.643075  9.696044
## 2005.192       9.216866 8.872868  9.560865 8.690384  9.743348
## 2005.212       9.264222 8.920221  9.608223 8.737736  9.790708
## 2005.231       9.281951 8.937946  9.625957 8.755458  9.808444
## 2005.250       9.285826 8.941821  9.629831 8.759334  9.812318
## 2005.269       9.312227 8.968226  9.656228 8.785741  9.838713
## 2005.288       9.367159 9.023159  9.711159 8.840675  9.893643
## 2005.308       9.416957 9.072955  9.760958 8.890470  9.943444
## 2005.327       9.433266 9.089261  9.777272 8.906774  9.959758
## 2005.346       9.433725 9.089720  9.777729 8.907234  9.960216
## 2005.365       9.463951 9.119950  9.807952 8.937465  9.990437
## 2005.385       9.540677 9.196677  9.884678 9.014193 10.067162
## 2005.404       9.625983 9.281981  9.969985 9.099496 10.152471
## 2005.423       9.665636 9.321631 10.009640 9.139144 10.192127
## 2005.442       9.647059 9.303055  9.991063 9.120569 10.173550
## 2005.462       9.610131 9.266130  9.954132 9.083645 10.136617
## 2005.481       9.601794 9.257794  9.945795 9.075310 10.128279
## 2005.500       9.629756 9.285754  9.973759 9.103268 10.156244
## 2005.519       9.664433 9.320428 10.008438 9.137941 10.190924
## 2005.538       9.676302 9.332298 10.020306 9.149812 10.202792
## 2005.558       9.658713 9.314712 10.002714 9.132227 10.185199
## 2005.577       9.615773 9.271772  9.959773 9.089288 10.142258
## 2005.596       9.544821 9.200818  9.888823 9.018333 10.071309
## 2005.615       9.445303 9.101298  9.789309 8.918811  9.971795
## 2005.635       9.341433 8.997429  9.685437 8.814943  9.867923
## 2005.654       9.279412 8.935411  9.623412 8.752926  9.805897
## 2005.673       9.290336 8.946336  9.634336 8.763851  9.816820
## 2005.692       9.357646 9.013643  9.701648 8.831157  9.884134
## 2005.712       9.428370 9.084364  9.772376 8.901877  9.954863
## 2005.731       9.457460 9.113457  9.801464 8.930971  9.983950
## 2005.750       9.437719 9.093720  9.781719 8.911235  9.964203
## 2005.769       9.389996 9.045997  9.733996 8.863513  9.916480
## 2005.788       9.337513 8.993510  9.681516 8.811024  9.864001
## 2005.808       9.298876 8.954869  9.642883 8.772382  9.825371
## 2005.827       9.296233 8.952230  9.640237 8.769744  9.822723
## 2005.846       9.346559 9.002561  9.690557 8.820078  9.873040
## 2005.865       9.430551 9.086553  9.774549 8.904070  9.957032
## 2005.885       9.480414 9.136411  9.824418 8.953925 10.006904
## 2005.904       9.423219 9.079208  9.767229 8.896719  9.949719
## 2005.923       9.251762 8.907762  9.595762 8.725278  9.778246
## 2005.942       9.047458 8.703470  9.391446 8.520993  9.573924
## 2005.962       8.918927 8.574940  9.262915 8.392462  9.445392
## 2005.981       8.911344 8.567381  9.255307 8.384917  9.437771
## 2006.000       8.978870 8.634891  9.322849 8.452418  9.505323

From the table we can see that the data in the Point Forecast column has a clear trend of growth over time.

##Q6

#a)
autoplot(global_economy, show.legend =  FALSE)

## Plot variable not specified, automatically selected `.vars = GDP`

## Warning: Removed 3242 row(s) containing missing values (geom_path).

global_economy %>%
  filter(Country=="Afghanistan") %>%
  tsibble(key = Code, index = Year)%>%
  autoplot(Population, show.legend =  FALSE) +
  labs(title= "Afghanistan Population",
       y = "Population")

From the picture, we can see the rapid increase in gdp growth rate from 1980 onwards The population grew dramatically around 1989.

#b)
global_economy %>%
  filter(Country=="Afghanistan")%>%
  model(TSLM(Population ~ Year)) %>%
  report()

## Series: Population 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -5794518 -2582559   744761  2259222  6036280 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -829292529   49730866  -16.68   <2e-16 ***
## Year            425774      25008   17.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3188000 on 56 degrees of freedom
## Multiple R-squared: 0.8381,  Adjusted R-squared: 0.8352
## F-statistic: 289.9 on 1 and 56 DF, p-value: < 2.22e-16

Model I, with the largest value of standard error.

global_economy %>%
  filter(Country=="Afghanistan")%>%
  filter(Year<1980)%>%
  model(TSLM(Population ~ Year)) %>%
  report()

## Series: Population 
## Model: TSLM 
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -146380.8 -110290.6    -451.2  105877.8  202881.3 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -470734527    8787062  -53.57   <2e-16 ***
## Year            244657       4462   54.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 115100 on 18 degrees of freedom
## Multiple R-squared: 0.994,   Adjusted R-squared: 0.9937
## F-statistic:  3007 on 1 and 18 DF, p-value: < 2.22e-16

Model II, with the largest value of the correlation coefficient.

global_economy %>%
  filter(Country=="Afghanistan")%>%
  filter(Year>1989)%>%
  model(TSLM(Population ~ Year)) %>%
  report()

## Series: Population 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -619234 -212927    6598  234280  612277 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.670e+09  1.640e+07  -101.8   <2e-16 ***
## Year         8.451e+05  8.184e+03   103.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 349800 on 26 degrees of freedom
## Multiple R-squared: 0.9976,  Adjusted R-squared: 0.9975
## F-statistic: 1.066e+04 on 1 and 26 DF, p-value: < 2.22e-16

Model III, with the largest value of r-squared.

#c)
mafg<-global_economy %>%
  filter(Country=="Afghanistan")%>%
  model(TSLM(Population ~ Year)) 

mafgy<-global_economy %>%
  filter(Country=="Afghanistan")%>%
  filter(Year>1989)%>%
  model(TSLM(Population ~ Year)) 

forecast(mafg, h=5)

## # A fable: 5 x 5 [1Y]
## # Key:     Country, .model [1]
##   Country     .model                   Year          Population     .mean
##   <fct>       <chr>                   <dbl>              <dist>     <dbl>
## 1 Afghanistan TSLM(Population ~ Year)  2018   N(3e+07, 1.1e+13) 29919575.
## 2 Afghanistan TSLM(Population ~ Year)  2019   N(3e+07, 1.1e+13) 30345349.
## 3 Afghanistan TSLM(Population ~ Year)  2020 N(3.1e+07, 1.1e+13) 30771123.
## 4 Afghanistan TSLM(Population ~ Year)  2021 N(3.1e+07, 1.1e+13) 31196897.
## 5 Afghanistan TSLM(Population ~ Year)  2022 N(3.2e+07, 1.1e+13) 31622671.

forecast(mafgy, h=5)

## # A fable: 5 x 5 [1Y]
## # Key:     Country, .model [1]
##   Country     .model                   Year          Population     .mean
##   <fct>       <chr>                   <dbl>              <dist>     <dbl>
## 1 Afghanistan TSLM(Population ~ Year)  2018 N(3.6e+07, 1.4e+11) 35925602.
## 2 Afghanistan TSLM(Population ~ Year)  2019 N(3.7e+07, 1.4e+11) 36770747.
## 3 Afghanistan TSLM(Population ~ Year)  2020 N(3.8e+07, 1.4e+11) 37615892.
## 4 Afghanistan TSLM(Population ~ Year)  2021 N(3.8e+07, 1.5e+11) 38461037.
## 5 Afghanistan TSLM(Population ~ Year)  2022 N(3.9e+07, 1.5e+11) 39306182.

From the two tables, we can see a clear and continuous increase in the mean after 1989 in the second table, and the magnitude is significant.

##Q7

This question is hard for me that I haven’t figure out how to do this.

W2 HW1

Mengjie Pan

2022-04-03