DATA 624 Assignment 1

Author: Farhod Ibragimov

library(fpp3)
  1. Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

    • Use ? (or help()) to find out about the data in each series.

    • What is the time interval of each series?

    • Use autoplot() to produce a time plot of each series.

    • For the last plot, modify the axis labels and title.

1.1 Bricks data

#?aus_production
bricks_production <- aus_production |>
 # filter(year(Quarter) >= 1980) |> 
  select(Bricks) 
  

bricks_production

The bricks_productionseries consist of 218 observations of Australian quaterly bricks productions . For this series time interval is a quater of the year.

bricks_production |> 
  autoplot() + 
  labs(title = 'Bricks Production in Australia (millions)')

Bricks production plot reveals the following patterns and features.

  • Trend: There is strong increasing production trend until 1975, and we can see that after that production started to decline. Also, there are huge dips in 1975 and 1983 occurred during recessions in Australian economy.
    After 1985 there is a steady decrease of the trend in overall bricks production.

  • Seasonality: Frequent spikes and drops suggest presence of seasonality in this data. Usually demand increases during construction works during summer and warm periods of the year, and demand decreases in winter causing decline of bricks production.

  • Cyclic: After 1983 the data exhibits cycles of rising and falling in production.

1.2 Lynx data

#?pelt
lynx <- pelt |> 
  select(Lynx)
lynx 

The lynx contains yearly totals of Canadian Lynx trades. For this series time interval is a year.

lynx |> 
  autoplot() + 
  labs(title = 'Canadian Lynx trades')

Lynx data plot reveals following patterns and features.

  • Trend: There is no visual long-term increase or decrease in the data. Trade peak points increasing till 1885-86, and decreasing after that.

  • Seasonality: There is no visual evidence of seasonality in the data. Seasonality can occur in time periods less than a year.

  • Cyclic: This time series appear to be cyclic with rise and falls frequencies every 5 - 7 years

stock_series <- gafa_stock |> 
  select(Close)
stock_series

We can see that this dataset has a Key variable, meaning there are 4 subseries.

stock_series |> 
  distinct(Symbol)

The stock_series data has 5032 observations of daily closing stock prices for 4 companies (AAPL, AMZN, FB, GOOG) in the period of 2014- 2019. These observations are made of closing prices on the trading days, but these data does not have a fixed time interval. The reason is that the stock markets are close on holidays and weekends. In this case we can use the row numbers to use as indexes for analysis.

stock_series |> 
  autoplot()+
  labs(x = "Trading day",
       y = 'Closing price',
       title = 'Daily closing stock prices')
## Plot variable not specified, automatically selected `.vars = Close`

There are following patterns findings in this data.

  • Seasonality: Heavy fluctuations cannot suggest seasonal patterns in each of the time series. More analysis needed to see if there are seasonal patterns.

  • Trend: There is a strong trend of rising for GOOG and AMZN stock prices from 2014 till the 3rd quater of 2018, followed by the falling. Also, there is a slight stock prices rising trends for both FB and AAPL for the same period of time, followed by the slight falling.

  • Cyclic: I don’t see strong evidence of cycles in this data, even there are rises followed by declines of prices in the 3rd quater of 2018. Since there is no data beyond 2018 I cannot make a conclusion that these time series have cycle patterns.

1.3 Electricity demand

elec_demand <- vic_elec |> 
  select(Demand)
elec_demand

This dataset has 52608 observations of demand for electricity. Observations are for each 30 minutes of the day. The time interval in this data is a period of 30 minutes.

elec_demand |> 
  autoplot() + 
  labs(title = "Electricity usage in every 30 minutes",
       x = 'Time period (30 minutes)',
       y = 'Electricity usage (megawatts)')
## Plot variable not specified, automatically selected `.vars = Demand`

Here are patterns and features of elictricity demand.

  • Trend: There are no visual trend of long-term increase or decrease of demand in this data.

  • Seasonal: There is clear seasonal pattern. We can see rise in demand at the start of the year followed by falling demand. Then demand starts rising again towards the middle of the year followed by fall of demand. Demand starts rising towards the end of the year. Therefore there a seasonal patern in this data.

  • Cyclic: I don’t see cyclic pattern in this data.

There is also noticeable increase of demand spike max values during January-February in each year.
We can see that:

  • in January-February 2012 the max spike was just about 8000 MW

  • in the same periods in 2013 the max spike is almost 9000 MW, and in 2014 it is almost 9400 - 9500 MW

  • These increasing pattern could be affected by climate change, growth of population and infrastructures.

  1. Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
stock_series |> 
  group_by(Symbol) |> 
  filter(Close == max(Close))

There is an interesting facts about dates of the peaks.
Facebook (FB) and Google (GOOG) had peak closing prices in just one day apart. Apple (AAPL) and Amazon (AMZN) peaks occured in one month apart. All four companies stock prices peaks happened in span of 3 calendar months (which is actually will be less than 3 months if we count only trading days), followed by declines in stock prices. We need more data beyond 2018 to see if there are other patterns for these stocks.

  1. Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- readr::read_csv('tute1.csv')
timeseries <- tute1 |> 
  mutate(Quarter = yearquarter(Quarter)) |> 
  as_tsibble(index = Quarter)
timeseries |> 
  pivot_longer(-Quarter) |> 
  ggplot(aes(
    x= Quarter,
    y = value,
    colour = name
  ))+
  geom_line()+
  labs(title = "'Spagetti plot' of time series")

timeseries |> 
  pivot_longer(-Quarter) |> 
  ggplot(aes(
    x= Quarter,
    y = value,
    colour = name
  ))+
  geom_line() +
  facet_grid(name ~ ., scales = 'free_y')+
  labs(title = "Time series plots using facet_grid")

I can tell that it is much better plots using facet_grid.

  • It gives a clean y-axis scaling for each times series compared with all of them on the same y-scale.

  • each time series scaled according it’s own variation

  • for example, we can see GDP fluctuations much better then it is plotted on ‘spagetti plot’

  • in 'facet_grid' there is clear pattern of positive correlaration between Salesand AdBudgetfeatures.
    At the same time, there is negative correlation between GDPand other two features. It’s hard to see this correlation on ‘spagetti plot’

  1. The USgas package contains data on the demand for natural gas in the US.
library(USgas)
#?us_total
usgas_consumption <- us_total |> 
  as_tsibble(index = year, key = state)
usgas_consumption
ne_consumption <- usgas_consumption |> 
  filter(state %in% c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')) |> 
  mutate(total = y/1000)

ne_consumption |> 
  ggplot(aes(
    x = year,
    y = total,
    colour = state
  ))+
  geom_line() +
  facet_grid(state ~., scales = 'free_y')

Findings from the data of natural gas consumption in New England region states.

  • Since the data has yearly observations of gas consumption, we cannot see seasonal patterns in each time series.

  • Connecticut: We can see rising trend in natural gas consumption. There is also evidence of cycles right after year of 2006.

  • Maine: There is a strong increase of consumption from 1997 to 2003. The shift in preferance of industrial energy sources was a cause of this huge rise. After 2003 we can declining trend with no cycles.

  • Massachusetts: The data show overall increasing trend with presence of cycles.

  • New Hampshire: There is a big increase from 2002 to 2005. After 2005 there is a slightly decreasing trend with cycles.

  • Rhode Island: We can see sharp decline until 2005, followed with slight increasing trend with presence of cycles.

  • Vermont: There is a spike towards the end of 2000, followed by decline and steady consumption trend. From 2002 there is a risind trend with presence of cycles.

2.5 Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

2.5.1 Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism1 <- readxl::read_excel('tourism.xlsx') |> 
  mutate(Quarter = yearquarter(Quarter)) |> 
  as_tsibble(index = Quarter,
             key = c('Region', 'State','Purpose')) 
tourism1

The tourism1is time series dataset with time period of quarter of the year.

2.5.2 Find what combination of Region and Purpose had the maximum number of overnight trips on average.

mean_data <- aggregate(Trips ~ Region + Purpose, 
                        data = tourism1,
                        FUN = mean)
mean_data[which.max(mean_data$Trips), ]

The ‘Sydney - Visiting’ combination has the maximum number (747.27) of overnight trips on average

2.5.3 Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

total_trips <- tourism1 |> 
  group_by(State) |> 
  summarise(Total_trips = sum(Trips)) |> 
  ungroup()

total_trips

The new time series total_tripsaggregates the data by summing Trips across all regions and purposes for each state

2.8 Use the following graphics functions: autoplot()gg_season()gg_subseries()gg_lag()ACF() and explore features from the following time series: “Total Private” Employed from us_employmentBricks from aus_productionHare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

2.8.1 Total private employment time series

us_employment
private_employment <- us_employment |> 
  filter(Title == 'Total Private')
private_employment

These time series consists of 969 monthly observations of US total private employment from the end of 1939 t9 January, 2020.
The time series period is monthly

private_employment |> autoplot(Employed)+
  labs(title = "US Monthly Private Employment")

The trend is positive with srtong evidence of seasonality and appearance of longer-term fluctuations probably affected by economic cycles. There are noticeable drops in the trend. Especially we can see big drop in private employment during months of 2007 - 2010.

private_employment |> 
  gg_season(Employed, 
            alpha = 0.3)+ 
  labs(title = '')
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

From these plot I can see there is a consistent slight overall increase in a monthly levels of private employment within of most years. There is evidence of seasonality pattern as we see increase of employment during spring and summer seasons. Growth flattens towards the end of the year.

private_employment |> 
  gg_subseries(Employed)
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

This plot shows clear differences between monthly means. There is a tendency of higher average employment levels from March reaching peaks in August, and stabilizing towards the end of the year

private_employment |> gg_lag(Employed, 
                             geom = 'point',
                             lags = 1:24)+
  labs(
    title = 'Lag Plots for monthly US private employment'
  )
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Overall, it is difficult to notice seasonality from this plot. Strong trend in this time series affecting that each lag appers similar to each other.
When i zoomed out this plot I’ve noticed stronger similarity between lags 12 and 24. Also I can see that there is a variation in dispersion of the points - slighly spreading up towards the middle of the year, and tightening up towards the end of the year. These could indicate possible seasonality pattern but do not confirm it.

private_employment |> 
  ACF(Employed, lag_max = 60) |> 
  autoplot()

ACF plot reveals dominating strong trend pattern. This reason makes dificult to identify seasonality pattern from ACF Lag plots due to the strong trend presence.

Even frequent fluctuations in overall time series plot suggest seasonality pattern, it is difficult to explain seasonality in Lags and ACF plots if there is strong trend pattern. For example, if we compare quarter 1 with quarter 2 separated by several years - values in these quarters may be similar due to the long-term growth, even if seasonality differences exist. So, strong trend can overlap and hide seasonality in ACF and lag plots.


My conclusion is ACF and lag functions may not always reveal seasonality patterns when there is a stron presence of the trend.

2.8.2 Bricks time series

?aus_production
## starting httpd help server ... done
bricks_production <- aus_production |> 
  select(Bricks) |> 
  filter(!is.na(Bricks))
bricks_production

This times series have 198 quarterly observations of Australian bricks production during 1956 - 2005.
The time period is quarterly.

bricks_production |> 
  autoplot(Bricks)+
  labs(title = 'Brick production in Australia 1956 - 2005 (millions)')

This plot reveals rising trend from begin of series until 1983, followed by declining trend. There are also sharp drops of droduction in 1974 and 1983 due to economic recession in Australia.
There is a presence of cyclic pattern, especially in a period from 1985 and after.
Frequent fluctuations may suggest presence of seasonality pattern.

bricks_production |> 
  gg_season(Bricks, alpha = 0.6)+
  labs(title = 'Seasonality plot')

Production levels generally low in Q1. We can see there is increases in levels of production from Q1 through Q3, followed by stable or slightly declining levels.
However, there are few years with declines starting Q2 towards Q4. Since the plot is little bit noisy and cannot provide clarity, these declines possibly due to economic recessions in 1970s and 1980s.

bricks_production |> 
  gg_subseries(Bricks)

As we see in this subseries plot average production level is low in Q1, and followed average levels are increasing reaching its highest level in Q3. Q4 shows declined average level from Q3. This suggest the presence of seasonality in bricks production.

bricks_production |>
 # filter(year(Quarter) >= 1990) |> 
  gg_lag(geom = 'point', lags = 1:9)
## Plot variable not specified, automatically selected `y = Bricks`

If we look carefully at this lags plot, we can notice that spread of the points are increasing from lag1 and tightening up towards lag4. There is slightly weaker similar tendency from lag4 to lag8. This may suggest presence of seasonality pattern.

bricks_production |> ACF(Bricks, lag_max = 48) |> 
  autoplot()

Because this series have both rising and declining trends, there a slow overall decrease in autocorrelations as lags increase.
Also strong presence of trends overlap and mask seasonality, but If we look at lags that multiples of 4 (4, 8, 12, …. , 36), we see they are larger than lags between them. Lags beyond lag 36 do not have significant autocorrelation. This creates “scalloped shape” line suggesting that series have quarterly seasonality pattern.

2.8.3 Hare pelts trades time series

?pelt
hare <- pelt |> 
  select(Hare)
hare

This time series have 91 observations of annual numbers of Hare pelts traded during 1845 - 1934 years.
The thime period is annual.

hare |> 
  autoplot(Hare)+
  labs(title = 'Hale pelts trades annual numbers 1845 -1934')


I don’t see evidence of overall trend in this series. There are significant rises in trades occurred approximately in 1863 and 1887.
There is an evidence of cyclic pattern.
Seasonal patterns cannot be observed, because seasonality occurs at frequencies shorter than one year.

hare |> 
  gg_subseries()
## Plot variable not specified, automatically selected `y = Hare`

hare |> 
  gg_lag(geom = 'point', lags = 1:20)
## Plot variable not specified, automatically selected `y = Hare`

I found that it is not helpful to use gg_subseries,and gg_seasonfunctions. They are helpful to detect seasonality patterns of time periods less than a year.

hare |> ACF(Hare,lag_max = 30) |> 
  autoplot()

ACF plot reveals interesting insight, its shows appearance of cyclic pattern occurring approximately every 10-11 years. This suggests cyclic pattern in this time series.
Plot shows significant positive and negative autocorrelations, suggesting dominance of cycles than trend.

2.8.4 H02 cost of scripts time series

#?PBS
HO2_cost <- PBS |> 
  filter(ATC2 == 'H02') |> 
  summarise(Cost = sum(Cost))
HO2_cost

The time series have 204 monthly observations of H02 scrips total costs.
The time period is monthly.

HO2_cost |> 
  autoplot(Cost) +
  labs(title = 'H02 monthly costs time series')

The trend rises steadily until 2005, and after 2005 the series levels off and show slight decline toward the end.
Stable regular peaks and drops suggesting presence of seasonality.
There is no visible evidence of cyclic behavior.

HO2_cost |> 
  gg_season(Cost, labels = 'both') +
  labs(title = 'Seasonal plot: H02 drug sales')

We can see large jumps in December and January over the years. Also there is small number of sales in December 2000 and March 2008.

HO2_cost |> 
  gg_subseries(Cost)+
  labs(title = 'Subseries: H02 monthly cost averages')

This plot shows highest average H02 sales were in a month of December. There is a huge drop of average sales in February, followed followed by steady rise of average levels towards the end of the year.
These findings are confirming the presence of seasonality pattern.

HO2_cost |> 
  gg_lag(Cost, 
         lags = 1:24,
         geom = 'point')+
  labs(title = 'Lags plot: H02 sales')

Lags 12 and 24 strong relationship have a strong positive relationship, and this indicates the presence of strong seasonality pattern in the series.
There are no visible strong negative relationships at lags, suggesting the presence of the trend in the data.

HO2_cost |> 
  ACF(Cost, lag_max = 96) |> 
  autoplot()

This ACF correlogram plot confirms strong seasonality pattern. Lags with multiples of 12 (such as 12, 24, …, 72) have a positive autocorrelation coefficients above significance boundary. Also there is a presence of the trend because autocorrelations coefficients slowly decrease as the lags increase.

Starting from lag 78 we can see that coefficients are becoming negative below significance boundary. This may suggest that the series are not dominated by a stable trend.

2.8.5 Time series of US weekly gasoline supplies

#?us_gasoline
us_gasoline |> autoplot(Barrels)+
  labs(title = "US gasoline supply plot",
       y = 'Barrels  (millions per day)')

This data is observations of weekly US average gasoline supply.
Time period is weekly.
There is a strong positive trend form 1991 until approximately 2007. After that trend slightly declines, probably due to economic recession in 2006 - 2010. Starting from 2011 trend appears to be positive towards the end of the data.

us_gasoline |> 
  gg_season(Barrels, alpha = 0.6)+
  labs(title = 'Seasonal plot: US weekly gasoline supply')

The gg_season plot is dense and noisy. However, I can see that there is overall increase of supply levels from early April towards the end of August followed by decline toward the end of the year. This may suggest the presence of seasonality in the data.
There are five years that have 53rd week observations in the end of the lines, and they reflect calendar effects rather than seasonality..

us_gasoline |> 
  gg_subseries(Barrels)

Looking carefully at the plot we can see that there are spikes of average supply levels in weeks 21, 26, 31, 33, 43 and 51. This weeks correspond to periods around Memorial Day, July 4th, Thanksgiving and Christmas holidays. Also travel is usualy higher during summer months, therefore demand for gasoline is usually is higher during those months.

us_gasoline |> 
  gg_lag(Barrels,
         lags = 1:104)

The gg_lagplot is very dense and unreadable. Therefore it is not helpful for this series.

us_gasoline |> 
  ACF(Barrels, lag_max = 208) |> 
  autoplot()

The ACF plot strongly suggest the presence of seasonality in data. As we see lags with multiples of 52 (such as 52, 104 and 208) have a positive autocorrelation coefficients above significance boundary. The form of “scalloped shape” line confirms strong seasonality pattern in series.
Also there is a presence of the trend because autocorrelation coefficients slowly decrease as the lags increase.