#1.Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
ge<-global_economy
ge$gdppc<-ge$GDP/ge$Population
ge %>%
tsibble(key = Code, index = Year)%>%
autoplot(gdppc, show.legend = FALSE) +
labs(y = "GDP Per Capita")
## Warning: Removed 3242 row(s) containing missing values (geom_path).
In the most recent year, Luxembourg had the highest GDP per capita. In year 2000, Monaco took the prize. Going back to 1965, the petro-state of Kuwait was the highest, with the United States in second. The United States is somewhat unique to appear at the top of these rankigns, with a significantly larger populaton than the Luxembourgh, Kuwait, Monaco, Lichtenstein, and UAE, other top-ranking countries.
ge_2017 <- ge[ge$Year == 2017,]
ge_2017[order(-ge_2017$gdppc),]
## # A tsibble: 262 x 10 [1Y]
## # Key: Country [262]
## Country Code Year GDP Growth CPI Imports Exports Population gdppc
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Luxembourg LUX 2017 6.24e10 2.30 111. 194. 230. 599449 1.04e5
## 2 Macao SAR… MAC 2017 5.04e10 9.10 136. 32.0 79.4 622567 8.09e4
## 3 Switzerla… CHE 2017 6.79e11 1.09 98.3 53.9 65.0 8466017 8.02e4
## 4 Norway NOR 2017 3.99e11 1.92 115. 33.1 35.5 5282223 7.55e4
## 5 Iceland ISL 2017 2.39e10 3.64 122. 42.8 47.0 341284 7.01e4
## 6 Ireland IRL 2017 3.34e11 7.80 105. 87.9 120. 4813608 6.93e4
## 7 Qatar QAT 2017 1.67e11 1.58 116. 37.3 51.0 2639211 6.32e4
## 8 United St… USA 2017 1.94e13 2.27 112. NA NA 325719178 5.95e4
## 9 North Ame… NAC 2017 2.10e13 2.35 NA NA NA 362492702 5.81e4
## 10 Singapore SGP 2017 3.24e11 3.62 113. 149. 173. 5612253 5.77e4
## # … with 252 more rows
ge_2000 <- ge[ge$Year == 2000,]
ge_2000[order(-ge_2000$gdppc),]
## # A tsibble: 263 x 10 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population gdppc
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2000 2.65e 9 3.91 NA NA NA 32082 82535.
## 2 Liechtens… LIE 2000 2.48e 9 3.22 NA NA NA 33286 74625.
## 3 Bermuda BMU 2000 3.48e 9 9.32 NA NA NA 61833 56284.
## 4 Luxembourg LUX 2000 2.13e10 8.24 80.1 123. 148. 436300 48736.
## 5 Channel I… CHI 2000 6.44e 9 5.83 NA NA NA 148725 43299.
## 6 San Marino SMR 2000 1.10e 9 2.21 NA NA NA 27418 40189.
## 7 Japan JPN 2000 4.89e12 2.78 103. 9.20 10.6 126843000 38532.
## 8 Norway NOR 2000 1.71e11 3.21 81.9 28.9 45.7 4490967 38147.
## 9 Switzerla… CHE 2000 2.72e11 3.94 91.7 46.0 52.1 7184250 37868.
## 10 United St… USA 2000 1.03e13 4.09 79.0 14.3 10.7 282162411 36450.
## # … with 253 more rows
ge_1965 <- ge[ge$Year == 1965,]
ge_1965[order(-ge_1965$gdppc),]
## # A tsibble: 260 x 10 [1Y]
## # Key: Country [260]
## Country Code Year GDP Growth CPI Imports Exports Population gdppc
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Kuwait KWT 1965 2.10e 9 NA NA 23.1 67.7 473554 4429.
## 2 United … USA 1965 7.44e11 6.40 14.4 4.24 4.99 194303000 3828.
## 3 North A… NAC 1965 7.98e11 6.42 NA 5.54 6.22 214031100 3727.
## 4 Sweden SWE 1965 2.33e10 3.82 11.0 22.5 21.9 7733853 3008.
## 5 Luxembo… LUX 1965 9.22e 8 -0.735 19.0 75.1 83.0 331500 2780.
## 6 Canada CAN 1965 5.39e10 6.64 14.4 18.7 18.5 19678000 2740.
## 7 Iceland ISL 1965 5.24e 8 7.37 0.0699 35.0 36.2 192286 2724.
## 8 Switzer… CHE 1965 1.53e10 NA 28.2 27.0 27.5 5856472 2620.
## 9 Bermuda BMU 1965 1.14e 8 4.76 NA NA NA 50100 2282.
## 10 Austral… AUS 1965 2.59e10 5.98 8.69 15.3 13.2 11388000 2277.
## # … with 250 more rows
In order to better portray the US GDP data, I opted to scale it simply into Billions, which keeps the same shape but better scales the data.
us<- ge[ge$Country == 'United States',]
us %>%
tsibble(index = Year)%>%
autoplot(GDP/1000000000, show.legend = FALSE) +
labs(y = "GDP in Billions($)")
I opted for no transformation on the Bulls, bullocks, and steers in Victoria, since the data clearly show variation across time and a seasonality effect.
aus<-aus_livestock
aus_bbc<- aus[aus$Animal == 'Bulls, bullocks and steers',]
aus_bbc_v<-aus_bbc[aus_bbc$State == 'Victoria',]
aus_bbc_v %>%
tsibble(index = Month)%>%
autoplot(Count, show.legend = FALSE) +
labs(title = "Slaughter of Bulls, Bollucks, and Steers in Victoria, Australia",
y = "Total Number",
x= "Month")
Due to the difficulty of viewing the plot,I considering a log transformation and a box-cox transformation, but neither made it easier to view.
elec<-vic_elec
elec
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
## 7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
## 8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
## 9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
## 10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
## # … with 52,598 more rows
elec$logd<-log(elec$Demand)
elec %>%
tsibble(index = Time) %>%
autoplot(Demand, show.legend = FALSE) +
labs(title = "Victorian Electricity Demand",
y = "KW Used",
x= "Time")
lambda <- elec %>%
features(Demand, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.09993089
elec %>%
autoplot(box_cox(Demand, lambda)) +
labs(title = "Victorian Electricity Demand",
y = "KW Used",
x= "Time")
No Change was considered because the data shows clear seasonality and
variation.
aus_p<-aus_production
aus_production %>% autoplot(Gas)
##3. Why is a Box-Cox transformation unhelpful for the canadian_gas data?
The box-cox transformation converts the plot to a near-normal distribution. In this case, since our variance is not continuous over the period, a boxcox will not be appropriate, and we can see in the two plots below, that the boxcox does not improve the graph.
cg<-canadian_gas
cg %>% autoplot(Volume)
lambda2 <- cg %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
cg %>% autoplot(box_cox(Volume, lambda2))
##4. What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
Below are the plots for turnover in Australian retail. I chose to use to guerrero feature to determine the lambda, which assigned a value of -.04. A good value of lambda “is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler”(Hyndman 3.1), which this achieves.
set.seed(45678)
austs <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
austs %>% autoplot(Turnover)
lambda3 <- austs %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda3
## [1] -0.04252935
austs %>% autoplot(box_cox(Turnover, lambda3))
##5. For the following series, find an appropriate Box-Cox
transformation in order to stabilise the variance. Tobacco from
aus_production, Economy class passengers between Melbourne and Sydney
from ansett, and Pedestrian counts at Southern Cross Station from
pedestrian.
ap<-aus_production
ap %>% autoplot(Tobacco)
## Warning: Removed 24 row(s) containing missing values (geom_path).
lambda4 <- ap %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
ap %>% autoplot(box_cox(Tobacco, lambda4))
## Warning: Removed 24 row(s) containing missing values (geom_path).
ans<-ansett
ansettms<- ans[ans$Airports == 'MEL-SYD',]
anseco<- ansettms[ansettms$Class == 'Economy',]
anseco %>% autoplot(Passengers)
lambda5 <- anseco%>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
anseco %>% autoplot(box_cox(Passengers, lambda5))
peds<-pedestrian
pedsx<- peds[peds$Sensor == 'Southern Cross Station',]
pedsx %>% autoplot(Count)
lambda6 <- pedsx%>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedsx %>% autoplot(box_cox(Count, lambda6))
##6.Show that a 3×5 MA is equivalent to a 7-term weighted moving average
with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.
A 3x5 MA averages the first five terms and then obtaining three observation set. Since a 3X5 MA will lead to a 1/15(y + 2y + 3y + 3y + 3y + 2y + y), it follows the weighted average of 1/15, 2/15, 3/15, 3/15 which are equivalent to the decimal values in the prompt.
##7. Consider the last five years of the Gas data from aus_production. The plot below shows the seasonal fluctuations and a general upwards trend.
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas %>% autoplot(Gas)
gts<- ts(gas$Gas, frequency = 4, start = c(2005))
plot(gts)
The decomposition below confirms the overall trend, with a clear upwards
trend and clear seasonal fluctuations, as one would expect with gas
usage.
dec1<-decompose(gts, type="multiplicative")
plot(dec1)
#borrowed this code from Juan Esteban Rincon Poveda
gas2<-gas%>% model(classical_decomposition(Gas, type="multiplicative")) %>% components()
gas2 %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Gas")) +
geom_line(aes(y = season_adjust)) +
geom_line(aes(y = trend, colour = "Trend"))
## Warning: Removed 4 row(s) containing missing values (geom_path).
e. Change one observation to be an outlier (e.g., add 300 to one
observation), and recompute the seasonally adjusted data. What is the
effect of the outlier?
The outlier has a notible disruptive impact on the trend, but does very little to the seasonal interpretation.
gas3<-gas
gas3$Gas[14]<-gas$Gas[14]+300
gas3 %>%
model(classical_decomposition(Gas,type = "multiplicative")) %>%
components() %>%
autoplot()
## Warning: Removed 2 row(s) containing missing values (geom_path).
Yes. If the added quantity is near the end of the time series, it will be in line with the overall trend and just make the effect more drastic. Alternatively, if it is at the beginning of the series, then it will disrupt the overall trend.
##Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
There are clearly some bumps from outliers that mess with the oeverall trend,. That said, there is clearly a seasonal trend here.
set.seed(3)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
model(classical_decomposition(Turnover ,type = "multiplicative")) %>%
components() %>%
autoplot()
## Warning: Removed 6 row(s) containing missing values (geom_path).
##9. a. Write about 3–5 sentences describing the results of the
decomposition. Pay particular attention to the scales of the graphs in
making your interpretation.
The decomposition in figures 3.19 and 3.20 show the decomposition of
the monthly civilian labor force from 1979 to 1995. The data show a
clear upward trend, with just a small change from near-perfect linearity
from 1990 to 1993ish, but then there is a clear bump up. There is a
clear season trend, as it humps up in the summer (our winter) and goes
down in the winter.
###b. Is the recession of 1991/1992 visible in the estimated components?
The recession of 1991-1992 is visible only in the ‘remainder,’ and
causes a slight but noticeable slowing of growth in the trend line.
THere is nothing discernible inhte seasonality about the recession,
which makes sense.
cg <- canadian_gas
cg %>%
autoplot(Volume)
cg %>%
gg_season(Volume)
cg %>%
gg_subseries(Volume)
#Chapter 7 Excercises
##1. Half-hourly electricity demand for Victoria, Australia is contained in vic_elec. Extract the January 2014 electricity demand, and aggregate this data to daily with daily total demands and maximum temperatures.
jan14_vic_elec <- vic_elec %>%
filter(yearmonth(Time) == yearmonth("2014 Jan")) %>%
index_by(Date = as_date(Time)) %>%
summarise(
Demand = sum(Demand),
Temperature = max(Temperature)
)
jan14_vic_elec
## # A tsibble: 31 x 3 [1D]
## Date Demand Temperature
## <date> <dbl> <dbl>
## 1 2014-01-01 175185. 26
## 2 2014-01-02 188351. 23
## 3 2014-01-03 189086. 22.2
## 4 2014-01-04 173798. 20.3
## 5 2014-01-05 169733. 26.1
## 6 2014-01-06 195241. 19.6
## 7 2014-01-07 199770. 20
## 8 2014-01-08 205339. 27.4
## 9 2014-01-09 227334. 32.4
## 10 2014-01-10 258111. 34
## # … with 21 more rows
There is a positive relationship because the average temperature in Victoria in January is about 80 degrees, so as the heat goes up, demand for AC goes up, hence an increase in demand for electricity.
mod1<- lm(Demand ~ Temperature, jan14_vic_elec)
summary(mod1)
##
## Call:
## lm(formula = Demand ~ Temperature, data = jan14_vic_elec)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49978 -10219 -121 18533 35441
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59083.9 17424.8 3.391 0.00203 **
## Temperature 6154.3 601.3 10.235 0.0000000000389 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24540 on 29 degrees of freedom
## Multiple R-squared: 0.7832, Adjusted R-squared: 0.7757
## F-statistic: 104.7 on 1 and 29 DF, p-value: 0.0000000000389
plot(jan14_vic_elec$Temperature,jan14_vic_elec$Demand)
b. Produce a residual plot. Is the model adequate? Are there any
outliers or influential observations?
As shown below,from inspection there is not a discernable trend between temperature and residuals. The residuals are skewed right, but positive and negative residuals exist at both ends up the spectrum.
res.mod1<-resid(mod1)
plot(jan14_vic_elec$Temperature, res.mod1,
ylab="Residuals", xlab="Temperature")
c. Use the model to forecast the electricity demand that you would
expect for the next day if the maximum temperature was 15 C and compare
it with the forecast if the with maximum temperature was 35. Do you
believe these forecasts?
I do not trust these forecasts. This forecast is not very reliable since the input data is only for Jan 15.
jan14_vic_elec %>%
model(TSLM(Demand ~ Temperature)) %>%
forecast(
new_data(jan14_vic_elec, 1) %>%
mutate(Temperature = 15)
) %>%
autoplot(jan14_vic_elec)
jan14_vic_elec %>%
model(TSLM(Demand ~ Temperature)) %>%
forecast(
new_data(jan14_vic_elec, 1) %>%
mutate(Temperature = 35)
) %>%
autoplot(jan14_vic_elec)
d. Give prediction intervals for your forecasts.
Upper and lower levels printed below
temp<- data.frame(Temperature=c(15,35))
mod2<- forecast(mod1, temp)
mod2
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 151398.35 151398.4 117127.2 185669.5 97951.22 204845.5
## 274484.25 274484.2 241333.0 307635.5 222783.69 326184.8
mod2$lower[,1]
## 1 2
## 117127.2 241333.0
mod2$upper[,1]
## 1 2
## 185669.5 307635.5
The model shows that the data does not follow a linear trend. Thus, our linear model may not be a good approximation of electricity demand. This likely stems from the interchanging relationship between cold and hot weather and the need for electricity. For example, higher temps does not mean more electricity unless it is the hotter months.
ve <- vic_elec
plot(ve$Temperature,ve$Demand,
xlab="Temp",ylab="Demand")
##2. Data set olympic_running contains the winning times (in seconds) in each Olympic Games sprint, middle-distance and long-distance track events from 1896 to 2016.
or<-olympic_running
or %>% autoplot(Time, show.legend = TRUE)
Fit a regression line to the data for each event. Obviously the winning
times have been decreasing, but at what average rate per year?
According to the model, the average times are decreasing by an average of 4.82 seconds per year on average. This does little to show us about the sprint models because the impact of the larger models is so much larger.
orts<-ts(or, frequency = 1)
ormod1 <- tslm(Time ~ trend + Year, data = orts)
summary(ormod1)
##
## Call:
## tslm(formula = Time ~ trend + Year, data = orts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -479.91 -291.19 -9.71 212.57 893.18
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2236.9636 1261.4729 1.773 0.0773 .
## trend 4.8262 0.2384 20.246 <0.0000000000000002 ***
## Year -1.3415 0.6440 -2.083 0.0382 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 354.9 on 278 degrees of freedom
## (31 observations deleted due to missingness)
## Multiple R-squared: 0.5986, Adjusted R-squared: 0.5957
## F-statistic: 207.3 on 2 and 278 DF, p-value: < 0.00000000000000022
Plot the residuals against the year. What does this indicate about the suitability of the fitted lines?
The relationship between residuals and year is a bit odd. Mostly, the relationship seems to be constant with some oddity at the higher residua;ls
resormod1<-resid(ormod1)
plot(or$Year, resormod1,
ylab="Residuals", xlab="Year")
#olympicforecast<- forecast(tslm, newdata=data.frame(year=2020))
An elasticity coefficient is the ratio of the percentage change in the forecast variable (y) as a percent change in the forecast variable.
##4. The data set souvenirs concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.
The plot below shows the time plot for souvenier sales. Two oddities exist: In general the data follow an upward but seasonal trend, with the exception of January 1991 where there is a lesser increase. Additionally, the growth in 1993 and 1994 exceeds that of prior years.
souv<-souvenirs
souv %>% autoplot(Sales)
b. Explain why it is necessary to take logarithms of these data before
fitting a model.
Logarithms accomplish useful transformation for these data. First, it allows us to scale the data to reduce the role of major fluctuations. Additionally, when we run a regression, we get a % on % interpretation which helps with general interpretability and statistic and practical significance interpretations.
c.Fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.
sts<- ts(souv$Sales,start=c(1987,1), end=c(1993,6), frequency=12)
#borrowed from Juan ERP
logsales = log(sts)
sts
## Jan Feb Mar Apr May Jun Jul Aug
## 1987 1664.81 2397.53 2840.71 3547.29 3752.96 3714.74 4349.61 3566.34
## 1988 2499.81 5198.24 7225.14 4806.03 5900.88 4951.34 6179.12 4752.15
## 1989 4717.02 5702.63 9957.58 5304.78 6492.43 6630.80 7349.62 8176.62
## 1990 5921.10 5814.58 12421.25 6369.77 7609.12 7224.75 8121.22 7979.25
## 1991 4826.64 6470.23 9638.77 8821.17 8722.37 10209.48 11276.55 12552.22
## 1992 7615.03 9849.69 14558.40 11587.33 9332.56 13082.09 16732.78 19888.61
## 1993 10243.24 11266.88 21826.84 17357.33 15997.79 18601.53
## Sep Oct Nov Dec
## 1987 5021.82 6423.48 7600.60 19756.21
## 1988 5496.43 5835.10 12600.08 28541.72
## 1989 8573.17 9690.50 15151.84 34061.01
## 1990 8093.06 8476.70 17914.66 30114.41
## 1991 11637.39 13606.89 21822.11 45060.69
## 1992 23933.38 25391.35 36024.80 80721.71
## 1993
surfing_festival <- c()
for(i in 1:length(sts)){
month <- round(12*(sts[i] - floor(sts[i]))) + 1
year <- floor(sts[i])
if(year >= 1988 & month == 3){
surfing_festival[i] <- 1
} else {
surfing_festival[i] <- 0
}
}
#borrowed code from Joseph Sansevero (previous class rpub posting)
surfmodel = tslm(logsales ~ trend + season + surfing_festival)
summary(surfmodel)
##
## Call:
## tslm(formula = logsales ~ trend + season + surfing_festival)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45174 -0.11922 -0.00074 0.10175 0.35933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6464607 0.0773528 98.852 < 0.0000000000000002 ***
## trend 0.0212957 0.0009292 22.919 < 0.0000000000000002 ***
## season2 0.2521410 0.0979991 2.573 0.012414 *
## season3 0.6932048 0.0990785 6.997 0.00000000189 ***
## season4 0.3820296 0.0991135 3.854 0.000271 ***
## season5 0.4123836 0.0980652 4.205 0.00008271735 ***
## season6 0.4524491 0.0981048 4.612 0.00001963521 ***
## season7 0.5767690 0.1019961 5.655 0.00000039077 ***
## season8 0.5352140 0.1033731 5.177 0.00000242988 ***
## season9 0.6237063 0.1034004 6.032 0.00000008934 ***
## season10 0.7223515 0.1020342 7.080 0.00000000135 ***
## season11 1.1936391 0.1020638 11.695 < 0.0000000000000002 ***
## season12 1.9447112 0.1021019 19.047 < 0.0000000000000002 ***
## surfing_festival 0.0293746 0.1001947 0.293 0.770337
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1833 on 64 degrees of freedom
## Multiple R-squared: 0.9444, Adjusted R-squared: 0.9331
## F-statistic: 83.67 on 13 and 64 DF, p-value: < 0.00000000000000022
The residuals show a clear decrease during the recession of 1991. Additionally, there is a clear pattern suggesting a relationship between residuals and time.
autoplot(surfmodel$residuals)
The larger residuals in March suggest that the significance around the surf model may be more variable than the model can predict.
boxplot(residuals(surfmodel)~cycle(residuals(surfmodel)))
f. What do the values of the coefficients tell you about each variable?
The model suggests that November and December cause the biggest jump in
sales – over 1% each. Additionally, even when accounting for the surfing
festival, March causes a .6% increase in sales, while, the surfing
festival accounts for another .3% increase.
checkresiduals(surfmodel)
##
## Breusch-Godfrey test for serial correlation of order up to 17
##
## data: Residuals from Linear regression model
## LM test = 32.739, df = 17, p-value = 0.01216
surffuture<- rep(0, 36)
for(i in 1:36){
if(i %% 12 == 3){
surffuture[i] <- 1
}
}
surffuture<-ts(data=surffuture,start=1994,
end=c(1996,12),
frequency=12)
surfforecast<-forecast(surfmodel, newdata=surffuture)
## Warning in forecast.lm(surfmodel, newdata = surffuture): newdata column names
## not specified, defaulting to first variable required.
autoplot(surfforecast)
Predicton intervalus below
surfforecast$upper
## [,1] [,2]
## Jul 1993 10.166941 10.308788
## Aug 1993 10.147163 10.289272
## Sep 1993 10.309227 10.463765
## Oct 1993 10.376410 10.518258
## Nov 1993 10.868994 11.010841
## Dec 1993 11.641361 11.783209
## Jan 1994 9.716872 9.858137
## Feb 1994 9.990309 10.131574
## Mar 1994 10.452922 10.594325
## Apr 1994 10.163042 10.304445
## May 1994 10.214438 10.355703
## Jun 1994 10.275800 10.417064
## Jul 1994 10.425660 10.569229
## Aug 1994 10.405760 10.549524
## Sep 1994 10.568224 10.724635
## Oct 1994 10.635130 10.778699
## Nov 1994 11.127713 11.271282
## Dec 1994 11.900081 12.043650
## Jan 1995 9.976000 10.119208
## Feb 1995 10.249437 10.392644
## Mar 1995 10.711946 10.855236
## Apr 1995 10.422067 10.565356
## May 1995 10.473566 10.616774
## Jun 1995 10.534927 10.678135
## Jul 1995 10.685120 10.830812
## Aug 1995 10.665100 10.810922
## Sep 1995 10.827895 10.986543
## Oct 1995 10.894590 11.040282
## Nov 1995 11.387173 11.532865
## Dec 1995 12.159541 12.305233
## Jan 1996 10.235858 10.381406
## Feb 1996 10.509295 10.654842
## Mar 1996 10.971704 11.117278
## Apr 1996 10.681824 10.827398
## May 1996 10.733425 10.878972
## Jun 1996 10.794786 10.940333
surfforecast$lower
## [,1] [,2]
## Jul 1993 9.644235 9.502387
## Aug 1993 9.623495 9.481386
## Sep 1993 9.739756 9.585217
## Oct 1993 9.853705 9.711857
## Nov 1993 10.346288 10.204440
## Dec 1993 11.118656 10.976808
## Jan 1994 9.196314 9.055049
## Feb 1994 9.469751 9.328486
## Mar 1994 9.931856 9.790454
## Apr 1994 9.641977 9.500574
## May 1994 9.693880 9.552615
## Jun 1994 9.755241 9.613976
## Jul 1994 9.896612 9.753043
## Aug 1994 9.875993 9.732229
## Sep 1994 9.991855 9.835444
## Oct 1994 10.106081 9.962512
## Nov 1994 10.598664 10.455096
## Dec 1994 11.371032 11.227463
## Jan 1995 9.448282 9.305074
## Feb 1995 9.721719 9.578511
## Mar 1995 10.183928 10.040639
## Apr 1995 9.894049 9.750759
## May 1995 9.945849 9.802641
## Jun 1995 10.007210 9.864002
## Jul 1995 10.148248 10.002556
## Aug 1995 10.127750 9.981928
## Sep 1995 10.243280 10.084632
## Oct 1995 10.357718 10.212026
## Nov 1995 10.850301 10.704609
## Dec 1995 11.622669 11.476977
## Jan 1996 9.699520 9.553973
## Feb 1996 9.972957 9.827409
## Mar 1996 10.435267 10.289693
## Apr 1996 10.145387 9.999813
## May 1996 10.197086 10.051539
## Jun 1996 10.258447 10.112900
The model could be improved by incuding other factors that account for sales. For example, average temperature in a given month, the overeall state of the economy, and other features.
The us_gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day”. Consider only the data to the end of 2004.
#Could not get fourier series to run properly
#usgas <- ts(us_gasoline$Barrels, start=(1991-1), end=(2017-1),frequency = 52)
#ugh<-us_gasoline
#fourier_gas<- recent_barrells %>%
#model(TSLM(Barrels~ trend() + fourier(K = 2)))
#report(fourier_beer)
##6. The annual population of Afghanistan is available in the global_economy data set.
The population of Afghanistan shrunk after 1980 (soviet-afghan war) but appears to have grown at a continuous rate since 1990.
ge_afg<-filter(ge, Code=='AFG')
ge_afg%>%
autoplot(Population)
bFit a linear trend model and compare this to a piecewise linear trend
model with knots at 1980 and 1989. c.Generate forecasts from these two
models for the five years after the end of the data, and comment on the
results.
The following graphs show the predicted values for both piecewise and linear functions. By applying a piecewise function, the predicted values are much closer to the recent trend, compared to the linear model which is too heavily influenced by the losses from an earlier war.
afg_trends <- ge_afg %>%
model(
linear = TSLM(Population ~ trend()),
piecewise = TSLM(Population ~ trend(knots = c(1980, 1989)))
)
afg_forecast <- afg_trends %>% forecast(h = 5)
ge_afg %>%
autoplot(Population) +
geom_line(data = fitted(afg_trends),
aes(y = .fitted, colour = .model)) +
autolayer(afg_forecast, alpha = 0.5, level = 95) +
labs(y = "Population",
title = "Afghan Population")