Author: Farhod Ibragimov
library(fpp3)
Explore the following four time series: Bricks from
aus_production, Lynx from pelt,
Close from gafa_stock, Demand
from vic_elec.
Use ? (or help()) to find out about the
data in each series.
What is the time interval of each series?
Use autoplot() to produce a time plot of each
series.
For the last plot, modify the axis labels and title.
#?aus_production
bricks_production <- aus_production |>
# filter(year(Quarter) >= 1980) |>
select(Bricks)
bricks_production
The bricks_productionseries consist of 218 observations
of Australian quaterly bricks productions . For this series time
interval is a quater of the year.
bricks_production |>
autoplot() +
labs(title = 'Bricks Production in Australia (millions)')
Bricks production plot reveals the following patterns and features.
Trend: There is strong increasing
production trend until 1975, and we can see that after that production
started to decline. Also, there are huge dips in 1975 and 1983 occurred
during recessions in Australian economy.
After 1985 there is a steady decrease of the trend in overall bricks
production.
Seasonality: Frequent spikes and drops suggest presence of seasonality in this data. Usually demand increases during construction works during summer and warm periods of the year, and demand decreases in winter causing decline of bricks production.
Cyclic: After 1983 the data exhibits cycles of rising and falling in production.
#?pelt
lynx <- pelt |>
select(Lynx)
lynx
The lynx contains yearly totals of Canadian Lynx trades.
For this series time interval is a year.
lynx |>
autoplot() +
labs(title = 'Canadian Lynx trades')
Lynx data plot reveals following patterns and features.
Trend: There is no visual long-term increase or decrease in the data. Trade peak points increasing till 1885-86, and decreasing after that.
Seasonality: There is no visual evidence of seasonality in the data. Seasonality can occur in time periods less than a year.
Cyclic: This time series appear to be cyclic with rise and falls frequencies every 5 - 7 years
stock_series <- gafa_stock |>
select(Close)
stock_series
We can see that this dataset has a Key variable, meaning there are 4 subseries.
stock_series |>
distinct(Symbol)
The stock_series data has 5032 observations of daily
closing stock prices for 4 companies (AAPL, AMZN, FB, GOOG) in the
period of 2014- 2019. These observations are made of closing prices on
the trading days, but these data does not have a fixed time interval.
The reason is that the stock markets are close on holidays and weekends.
In this case we can use the row numbers to use as indexes for
analysis.
stock_series |>
autoplot()+
labs(x = "Trading day",
y = 'Closing price',
title = 'Daily closing stock prices')
## Plot variable not specified, automatically selected `.vars = Close`
There are following patterns findings in this data.
Seasonality: Heavy fluctuations cannot suggest seasonal patterns in each of the time series. More analysis needed to see if there are seasonal patterns.
Trend: There is a strong trend of rising for GOOG and AMZN stock prices from 2014 till the 3rd quater of 2018, followed by the falling. Also, there is a slight stock prices rising trends for both FB and AAPL for the same period of time, followed by the slight falling.
Cyclic: I don’t see strong evidence of cycles in this data, even there are rises followed by declines of prices in the 3rd quater of 2018. Since there is no data beyond 2018 I cannot make a conclusion that these time series have cycle patterns.
elec_demand <- vic_elec |>
select(Demand)
elec_demand
This dataset has 52608 observations of demand for electricity. Observations are for each 30 minutes of the day. The time interval in this data is a period of 30 minutes.
elec_demand |>
autoplot() +
labs(title = "Electricity usage in every 30 minutes",
x = 'Time period (30 minutes)',
y = 'Electricity usage (megawatts)')
## Plot variable not specified, automatically selected `.vars = Demand`
Here are patterns and features of elictricity demand.
Trend: There are no visual trend of long-term increase or decrease of demand in this data.
Seasonal: There is clear seasonal pattern. We can see rise in demand at the start of the year followed by falling demand. Then demand starts rising again towards the middle of the year followed by fall of demand. Demand starts rising towards the end of the year. Therefore there a seasonal patern in this data.
Cyclic: I don’t see cyclic pattern in this data.
There is also noticeable increase of demand spike max values during
January-February in each year.
We can see that:
in January-February 2012 the max spike was just about 8000 MW
in the same periods in 2013 the max spike is almost 9000 MW, and in 2014 it is almost 9400 - 9500 MW
These increasing pattern could be affected by climate change, growth of population and infrastructures.
filter() to find what days corresponded to the peak
closing price for each of the four stocks in
gafa_stock.stock_series |>
group_by(Symbol) |>
filter(Close == max(Close))
There is an interesting facts about dates of the peaks.
Facebook (FB) and Google (GOOG) had peak closing prices in just one day
apart. Apple (AAPL) and Amazon (AMZN) peaks occured in one month apart.
All four companies stock prices peaks happened in span of 3 calendar
months (which is actually will be less than 3 months if we count only
trading days), followed by declines in stock prices. We need more data
beyond 2018 to see if there are other patterns for these stocks.
tute1.csv from the book website, open it in Excel
(or some other spreadsheet application), and review its contents. You
should find four columns of information. Columns B through D each
contain a quarterly series, labelled Sales, AdBudget and GDP. Sales
contains the quarterly sales for a small company over the period
1981-2005. AdBudget is the advertising budget and GDP is the gross
domestic product. All series have been adjusted for inflation.tute1 <- readr::read_csv('tute1.csv')
timeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
timeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(
x= Quarter,
y = value,
colour = name
))+
geom_line()+
labs(title = "'Spagetti plot' of time series")
timeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(
x= Quarter,
y = value,
colour = name
))+
geom_line() +
facet_grid(name ~ ., scales = 'free_y')+
labs(title = "Time series plots using facet_grid")
I can tell that it is much better plots using
facet_grid.
It gives a clean y-axis scaling for each times series compared with all of them on the same y-scale.
each time series scaled according it’s own variation
for example, we can see GDP fluctuations much better then it is plotted on ‘spagetti plot’
in 'facet_grid' there is clear pattern of positive
correlaration between Salesand
AdBudgetfeatures.
At the same time, there is negative correlation between
GDPand other two features. It’s hard to see this
correlation on ‘spagetti plot’
USgas package contains data on the demand for
natural gas in the US.library(USgas)
#?us_total
usgas_consumption <- us_total |>
as_tsibble(index = year, key = state)
usgas_consumption
ne_consumption <- usgas_consumption |>
filter(state %in% c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')) |>
mutate(total = y/1000)
ne_consumption |>
ggplot(aes(
x = year,
y = total,
colour = state
))+
geom_line() +
facet_grid(state ~., scales = 'free_y')
Findings from the data of natural gas consumption in New England region states.
Since the data has yearly observations of gas consumption, we cannot see seasonal patterns in each time series.
Connecticut: We can see rising trend in natural gas consumption. There is also evidence of cycles right after year of 2006.
Maine: There is a strong increase of consumption from 1997 to 2003. The shift in preferance of industrial energy sources was a cause of this huge rise. After 2003 we can declining trend with no cycles.
Massachusetts: The data show overall increasing trend with presence of cycles.
New Hampshire: There is a big increase from 2002 to 2005. After 2005 there is a slightly decreasing trend with cycles.
Rhode Island: We can see sharp decline until 2005, followed with slight increasing trend with presence of cycles.
Vermont: There is a spike towards the end of 2000, followed by decline and steady consumption trend. From 2002 there is a risind trend with presence of cycles.
2.5 Download tourism.xlsx from the book website and read it into R
using readxl::read_excel().
2.5.1 Create a tsibble which is identical to the tourism
tsibble from the tsibble package.
tourism1 <- readxl::read_excel('tourism.xlsx') |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter,
key = c('Region', 'State','Purpose'))
tourism1
The tourism1is time series dataset with time period of
quarter of the year.
2.5.2 Find what combination of Region and
Purpose had the maximum number of overnight trips on
average.
mean_data <- aggregate(Trips ~ Region + Purpose,
data = tourism1,
FUN = mean)
mean_data[which.max(mean_data$Trips), ]
The ‘Sydney - Visiting’ combination has the maximum number (747.27) of overnight trips on average
2.5.3 Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
total_trips <- tourism1 |>
group_by(State) |>
summarise(Total_trips = sum(Trips)) |>
ungroup()
total_trips
The new time series total_tripsaggregates the data by
summing Trips across all regions and purposes for each
state
2.8 Use the following graphics
functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and
explore features from the following time series: “Total
Private” Employed from us_employment, Bricks from aus_production, Hare from pelt,
“H02” Cost from PBS,
and Barrels from us_gasoline.
us_employment
private_employment <- us_employment |>
filter(Title == 'Total Private')
private_employment
These time series consists of 969 monthly observations of US total
private employment from the end of 1939 t9 January, 2020.
The time series period is monthly
private_employment |> autoplot(Employed)+
labs(title = "US Monthly Private Employment")
The trend is positive with srtong evidence of seasonality and appearance of longer-term fluctuations probably affected by economic cycles. There are noticeable drops in the trend. Especially we can see big drop in private employment during months of 2007 - 2010.
private_employment |>
gg_season(Employed,
alpha = 0.3)+
labs(title = '')
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
From these plot I can see there is a consistent slight overall
increase in a monthly levels of private employment within of most years.
There is evidence of seasonality pattern as we see increase of
employment during spring and summer seasons. Growth flattens towards the
end of the year.
private_employment |>
gg_subseries(Employed)
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
This plot shows clear differences between monthly means. There is a tendency of higher average employment levels from March reaching peaks in August, and stabilizing towards the end of the year
private_employment |> gg_lag(Employed,
geom = 'point',
lags = 1:24)+
labs(
title = 'Lag Plots for monthly US private employment'
)
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Overall, it is difficult to notice seasonality from this plot. Strong
trend in this time series affecting that each lag appers similar to each
other.
When i zoomed out this plot I’ve noticed stronger similarity between
lags 12 and 24. Also I can see that there is a variation in dispersion
of the points - slighly spreading up towards the middle of the year, and
tightening up towards the end of the year. These could indicate possible
seasonality pattern but do not confirm it.
private_employment |>
ACF(Employed, lag_max = 60) |>
autoplot()
ACF plot reveals dominating strong trend pattern. This reason makes
dificult to identify seasonality pattern from ACF Lag plots due to the
strong trend presence.
Even frequent fluctuations in overall time series plot suggest seasonality pattern, it is difficult to explain seasonality in Lags and ACF plots if there is strong trend pattern. For example, if we compare quarter 1 with quarter 2 separated by several years - values in these quarters may be similar due to the long-term growth, even if seasonality differences exist. So, strong trend can overlap and hide seasonality in ACF and lag plots.
My conclusion is ACF and lag functions may not always reveal seasonality
patterns when there is a stron presence of the trend.
?aus_production
## starting httpd help server ... done
bricks_production <- aus_production |>
select(Bricks) |>
filter(!is.na(Bricks))
bricks_production
This times series have 198 quarterly observations of Australian
bricks production during 1956 - 2005.
The time period is quarterly.
bricks_production |>
autoplot(Bricks)+
labs(title = 'Brick production in Australia 1956 - 2005 (millions)')
This plot reveals rising trend from begin of series until 1983,
followed by declining trend. There are also sharp drops of droduction in
1974 and 1983 due to economic recession in Australia.
There is a presence of cyclic pattern, especially in a period from 1985
and after.
Frequent fluctuations may suggest presence of seasonality pattern.
bricks_production |>
gg_season(Bricks, alpha = 0.6)+
labs(title = 'Seasonality plot')
Production levels generally low in Q1. We can see there is increases
in levels of production from Q1 through Q3, followed by stable or
slightly declining levels.
However, there are few years with declines starting Q2 towards Q4. Since
the plot is little bit noisy and cannot provide clarity, these declines
possibly due to economic recessions in 1970s and 1980s.
bricks_production |>
gg_subseries(Bricks)
As we see in this subseries plot average production level is low in Q1, and followed average levels are increasing reaching its highest level in Q3. Q4 shows declined average level from Q3. This suggest the presence of seasonality in bricks production.
bricks_production |>
# filter(year(Quarter) >= 1990) |>
gg_lag(geom = 'point', lags = 1:9)
## Plot variable not specified, automatically selected `y = Bricks`
If we look carefully at this lags plot, we can notice that spread of the points are increasing from lag1 and tightening up towards lag4. There is slightly weaker similar tendency from lag4 to lag8. This may suggest presence of seasonality pattern.
bricks_production |> ACF(Bricks, lag_max = 48) |>
autoplot()
Because this series have both rising and declining trends, there a
slow overall decrease in autocorrelations as lags increase.
Also strong presence of trends overlap and mask seasonality, but If we
look at lags that multiples of 4 (4, 8, 12, …. , 36), we see they are
larger than lags between them. Lags beyond lag 36 do not have
significant autocorrelation. This creates “scalloped shape” line
suggesting that series have quarterly seasonality pattern.
?pelt
hare <- pelt |>
select(Hare)
hare
This time series have 91 observations of annual numbers of Hare pelts
traded during 1845 - 1934 years.
The thime period is annual.
hare |>
autoplot(Hare)+
labs(title = 'Hale pelts trades annual numbers 1845 -1934')
I don’t see evidence of overall trend in this series. There are
significant rises in trades occurred approximately in 1863 and
1887.
There is an evidence of cyclic pattern.
Seasonal patterns cannot be observed, because seasonality occurs at
frequencies shorter than one year.
hare |>
gg_subseries()
## Plot variable not specified, automatically selected `y = Hare`
hare |>
gg_lag(geom = 'point', lags = 1:20)
## Plot variable not specified, automatically selected `y = Hare`
I found that it is not helpful to use gg_subseries,and
gg_seasonfunctions. They are helpful to detect seasonality
patterns of time periods less than a year.
hare |> ACF(Hare,lag_max = 30) |>
autoplot()
ACF plot reveals interesting insight, its shows appearance of cyclic
pattern occurring approximately every 10-11 years. This suggests cyclic
pattern in this time series.
Plot shows significant positive and negative autocorrelations,
suggesting dominance of cycles than trend.
#?PBS
HO2_cost <- PBS |>
filter(ATC2 == 'H02') |>
summarise(Cost = sum(Cost))
HO2_cost
The time series have 204 monthly observations of H02 scrips total
costs.
The time period is monthly.
HO2_cost |>
autoplot(Cost) +
labs(title = 'H02 monthly costs time series')
The trend rises steadily until 2005, and after 2005 the series levels
off and show slight decline toward the end.
Stable regular peaks and drops suggesting presence of seasonality.
There is no visible evidence of cyclic behavior.
HO2_cost |>
gg_season(Cost, labels = 'both') +
labs(title = 'Seasonal plot: H02 drug sales')
We can see large jumps in December and January over the years. Also there is small number of sales in December 2000 and March 2008.
HO2_cost |>
gg_subseries(Cost)+
labs(title = 'Subseries: H02 monthly cost averages')
This plot shows highest average H02 sales were in a month of
December. There is a huge drop of average sales in February, followed
followed by steady rise of average levels towards the end of the
year.
These findings are confirming the presence of seasonality pattern.
HO2_cost |>
gg_lag(Cost,
lags = 1:24,
geom = 'point')+
labs(title = 'Lags plot: H02 sales')
Lags 12 and 24 strong relationship have a strong positive
relationship, and this indicates the presence of strong seasonality
pattern in the series.
There are no visible strong negative relationships at lags, suggesting
the presence of the trend in the data.
HO2_cost |>
ACF(Cost, lag_max = 96) |>
autoplot()
This ACF correlogram plot confirms strong seasonality pattern. Lags with multiples of 12 (such as 12, 24, …, 72) have a positive autocorrelation coefficients above significance boundary. Also there is a presence of the trend because autocorrelations coefficients slowly decrease as the lags increase.
Starting from lag 78 we can see that coefficients are becoming negative below significance boundary. This may suggest that the series are not dominated by a stable trend.
#?us_gasoline
us_gasoline |> autoplot(Barrels)+
labs(title = "US gasoline supply plot",
y = 'Barrels (millions per day)')
This data is observations of weekly US average gasoline supply.
Time period is weekly.
There is a strong positive trend form 1991 until approximately 2007.
After that trend slightly declines, probably due to economic recession
in 2006 - 2010. Starting from 2011 trend appears to be positive towards
the end of the data.
us_gasoline |>
gg_season(Barrels, alpha = 0.6)+
labs(title = 'Seasonal plot: US weekly gasoline supply')
The gg_season plot is dense and noisy. However, I can
see that there is overall increase of supply levels from early April
towards the end of August followed by decline toward the end of the
year. This may suggest the presence of seasonality in the data.
There are five years that have 53rd week observations in the end of the
lines, and they reflect calendar effects rather than seasonality..
us_gasoline |>
gg_subseries(Barrels)
Looking carefully at the plot we can see that there are spikes of average supply levels in weeks 21, 26, 31, 33, 43 and 51. This weeks correspond to periods around Memorial Day, July 4th, Thanksgiving and Christmas holidays. Also travel is usualy higher during summer months, therefore demand for gasoline is usually is higher during those months.
us_gasoline |>
gg_lag(Barrels,
lags = 1:104)
The gg_lagplot is very dense and unreadable. Therefore
it is not helpful for this series.
us_gasoline |>
ACF(Barrels, lag_max = 208) |>
autoplot()
The ACF plot strongly suggest the presence of seasonality in data. As
we see lags with multiples of 52 (such as 52, 104 and 208) have a
positive autocorrelation coefficients above significance boundary. The
form of “scalloped shape” line confirms strong seasonality pattern in
series.
Also there is a presence of the trend because autocorrelation
coefficients slowly decrease as the lags increase.